Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "VT MPI within EGI:Nagios"

From EGIWiki
Jump to navigation Jump to search
m
Line 9: Line 9:
**org.sam.mpi.ComplexJob
**org.sam.mpi.ComplexJob


=== 1.org.sam.mpi.EnvSanityCheck ===
=== org.sam.mpi.EnvSanityCheck ===


*'''Name:''' org.sam.mpi.EnvSanityCheck
*'''Name:''' org.sam.mpi.EnvSanityCheck  
*'''Requirements:''' The service should be registered in GOCDB as a MPI (or Parallel) Service Type.
*'''Requirements:''' The service should be registered in GOCDB as a MPI (or Parallel) Service Type.  
*'''Purpose:''' Test the information published by the (MPI or Parallel) service.
*'''Purpose:''' Test the information published by the (MPI or Parallel) service.  
*'''Description:''' The probe should test if the service:
*'''Description:''' The probe should test if the service:  
**Publishes '''MPI-START''' tag under ''GlueHostApplicationSoftwareRunTimeEnvironment''
**Publishes '''MPI-START''' tag under ''GlueHostApplicationSoftwareRunTimeEnvironment''  
**Publishes '''MPI flavour''' tag under ''GlueHostApplicationSoftwareRunTimeEnvironment'' according to one of the following formats:
**Publishes '''MPI flavour''' tag under ''GlueHostApplicationSoftwareRunTimeEnvironment'' according to one of the following formats:  
***''<MPI flavour>''
***''&lt;MPI flavour&gt;''  
***''<MPI flavour>-<MPI version>''
***''&lt;MPI flavour&gt;-&lt;MPI version&gt;''  
***''<MPI flavour>-<MPI version>-<Compiler>''
***''&lt;MPI flavour&gt;-&lt;MPI version&gt;-&lt;Compiler&gt;''  
**Has the ''GlueCEPolicyMaxSlotsPerJob'' variable set to a reasonable value (not 0 nor 1 nor 999999999) for the queue where the MPI job is supposed to run.
**Has the ''GlueCEPolicyMaxSlotsPerJob'' variable set to a reasonable value (not 0 nor 1 nor 999999999) for the queue where the MPI job is supposed to run.  
**Publishes reasonable ''GlueCEPolicyMaxCPUTime'' and ''GlueCEPolicyMaxWallClockTime'' values (not 0 nor 999999999), and that ''GlueCEPolicyMaxCPUTime'' allows to execute a parallel application requesting 4 slots spending ''GlueCEPolicyMaxWallClockTime'' of WallClockTime.
**Publishes reasonable ''GlueCEPolicyMaxCPUTime'' and ''GlueCEPolicyMaxWallClockTime'' values (not 0 nor 999999999), and that ''GlueCEPolicyMaxCPUTime'' allows to execute a parallel application requesting 4 slots spending ''GlueCEPolicyMaxWallClockTime'' of WallClockTime.  
*'''Dependencies:''' None.
*'''Dependencies:''' None.  
*'''Frequency:''' Each hour.
*'''Frequency:''' Each hour.  
*'''Time out:''' ''120s (Do you this is too much for an ldap query?)''
*'''Time out:''' ''120s (Do you this is too much for an ldap query?)''

Revision as of 12:07, 27 February 2012

MPI VT Nagios Specifications

The present SAM MPI testing infrastructure is completely dependent of the information which is published by each individual site. If a site is publishing the MPI-START tag, the resource is tested, otherwise it is not. This information system dependency does not allow to test sites which are offering MPI functionality but are not broadcasting it, or sites which are broadcasting the MPI/Parallel support in an incorrect way. The introduction of a new service type in GOCDB (MPI or Parallel) would break this dependency and would allow the definition of a MPI test profile to probe:

  • The information published by the (MPI or Parallel) service.
    • org.sam.mpi.EnvSanityCheck
  • The (MPI or Parallel) functionality offered by the site.
    • org.sam.mpi.SimpleJob
    • org.sam.mpi.ComplexJob

org.sam.mpi.EnvSanityCheck

  • Name: org.sam.mpi.EnvSanityCheck
  • Requirements: The service should be registered in GOCDB as a MPI (or Parallel) Service Type.
  • Purpose: Test the information published by the (MPI or Parallel) service.
  • Description: The probe should test if the service:
    • Publishes MPI-START tag under GlueHostApplicationSoftwareRunTimeEnvironment
    • Publishes MPI flavour tag under GlueHostApplicationSoftwareRunTimeEnvironment according to one of the following formats:
      • <MPI flavour>
      • <MPI flavour>-<MPI version>
      • <MPI flavour>-<MPI version>-<Compiler>
    • Has the GlueCEPolicyMaxSlotsPerJob variable set to a reasonable value (not 0 nor 1 nor 999999999) for the queue where the MPI job is supposed to run.
    • Publishes reasonable GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxWallClockTime values (not 0 nor 999999999), and that GlueCEPolicyMaxCPUTime allows to execute a parallel application requesting 4 slots spending GlueCEPolicyMaxWallClockTime of WallClockTime.
  • Dependencies: None.
  • Frequency: Each hour.
  • Time out: 120s (Do you this is too much for an ldap query?)