Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "VT MPI within EGI:Nagios"

From EGIWiki
Jump to navigation Jump to search
Line 25: Line 25:
*'''Frequency:''' Each hour.  
*'''Frequency:''' Each hour.  
*'''Time out:''' ''120s (Do you this is too much for an ldap query?)''
*'''Time out:''' ''120s (Do you this is too much for an ldap query?)''
*'''Expected behaviour:'''
{| cellspacing="0" cellpadding="2" border="1" class="wikitable sortable" style="border: 1px solid black;"
|- align="left" style="background: none repeat scroll 0% 0% Lightgray;"
! Use Case
! Probe Result
|-
| '''MPI-START''' tag is not present  under ''GlueHostApplicationSoftwareRunTimeEnvironment''
| '''CRITICAL'''
|-
| One MPI flavour tag (following any of the proposed formats)  is not present  under ''GlueHostApplicationSoftwareRunTimeEnvironment''
| '''CRITICAL'''
|-
| The probe reaches a timeout and the probe execution is canceled
| '''CRITICAL'''
|-
| ''GlueCEPolicyMaxSlotsPerJob'' is equal to 0 or 1 or to 999999999
| '''WARNING'''
|-
| If (''GlueCEPolicyMaxCPUTime'' is equal to 0 or to 999999999) or (''GlueCEPolicyMaxWallClockTime'' is equal to 0 or to 999999999) or  (''GlueCEPolicyMaxCPUTime''  /  ''GlueCEPolicyMaxWallClockTime'' < 4)
| '''WARNING'''
|-
| If '''MPI-START''' tag '''AND''' MPI flavour are present under ''GlueHostApplicationSoftwareRunTimeEnvironment'' '''AND'''  ''GlueCEPolicyMaxSlotsPerJob'' variable is not 0 or 1 or 999999999 '''AND''' ''GlueCEPolicyMaxCPUTime''  /  ''GlueCEPolicyMaxWallClockTime'' >=4)
| '''OK'''
|}

Revision as of 14:38, 27 February 2012

MPI VT Nagios Specifications

The present SAM MPI testing infrastructure is completely dependent of the information which is published by each individual site. If a site is publishing the MPI-START tag, the resource is tested, otherwise it is not. This information system dependency does not allow to test sites which are offering MPI functionality but are not broadcasting it, or sites which are broadcasting the MPI/Parallel support in an incorrect way. The introduction of a new service type in GOCDB (MPI or Parallel) would break this dependency and would allow the definition of a MPI test profile to probe:

  • The information published by the (MPI or Parallel) service.
    • org.sam.mpi.EnvSanityCheck
  • The (MPI or Parallel) functionality offered by the site.
    • org.sam.mpi.SimpleJob
    • org.sam.mpi.ComplexJob

org.sam.mpi.EnvSanityCheck

  • Name: org.sam.mpi.EnvSanityCheck
  • Requirements: The service should be registered in GOCDB as a MPI (or Parallel) Service Type.
  • Purpose: Test the information published by the (MPI or Parallel) service.
  • Description: The probe should test if the service:
    • Publishes MPI-START tag under GlueHostApplicationSoftwareRunTimeEnvironment
    • Publishes MPI flavour tag under GlueHostApplicationSoftwareRunTimeEnvironment according to one of the following formats:
      • <MPI flavour>
      • <MPI flavour>-<MPI version>
      • <MPI flavour>-<MPI version>-<Compiler>
    • Has the GlueCEPolicyMaxSlotsPerJob variable set to a reasonable value (not 0 nor 1 nor 999999999) for the queue where the MPI job is supposed to run.
    • Publishes reasonable GlueCEPolicyMaxCPUTime and GlueCEPolicyMaxWallClockTime values (not 0 nor 999999999), and that GlueCEPolicyMaxCPUTime allows to execute a parallel application requesting 4 slots spending GlueCEPolicyMaxWallClockTime of WallClockTime.
  • Dependencies: None.
  • Frequency: Each hour.
  • Time out: 120s (Do you this is too much for an ldap query?)
  • Expected behaviour:
Use Case Probe Result
MPI-START tag is not present under GlueHostApplicationSoftwareRunTimeEnvironment CRITICAL
One MPI flavour tag (following any of the proposed formats) is not present under GlueHostApplicationSoftwareRunTimeEnvironment CRITICAL
The probe reaches a timeout and the probe execution is canceled CRITICAL
GlueCEPolicyMaxSlotsPerJob is equal to 0 or 1 or to 999999999 WARNING
If (GlueCEPolicyMaxCPUTime is equal to 0 or to 999999999) or (GlueCEPolicyMaxWallClockTime is equal to 0 or to 999999999) or (GlueCEPolicyMaxCPUTime / GlueCEPolicyMaxWallClockTime < 4) WARNING
If MPI-START tag AND MPI flavour are present under GlueHostApplicationSoftwareRunTimeEnvironment AND GlueCEPolicyMaxSlotsPerJob variable is not 0 or 1 or 999999999 AND GlueCEPolicyMaxCPUTime / GlueCEPolicyMaxWallClockTime >=4) OK