SAM Nagios probes refactoring TF

From EGIWiki
Jump to: navigation, search
EGI Activity groups Special Interest groups Policy groups Virtual teams Distributed Competence Centres



Alert.png This article is Deprecated and should no longer be used, but is still available for reasons of reference.

Mandate

  • assess the support status of various Nagios probes available
    • recommend removal or replacement of unsupported probes from the SAM Nagios framework
    • request improvement/correction of supported ones
    • ease their maintenance and support by:
      • reducing the number of probes
      • reducing the dependencies between services (a reported service error should not be caused by the problems of another service in the same site)
  • improve documentation:
    • availability of references to the individual nagios probes/tests descriptions in a central place
    • update known documentations web pages with proper references (avoid broken links)
    • improve developers guides
    • require change to SAM to remove harcoded documentation URLs in metrics configuration

Reference:OMB, June 26, 2014 - Re-factoring SAM probes

Tools

Tasks

Documentation

Generic improvements

  • eliminate Broken Links
  • collect all documentation links in a central place
  • create a unique page collecting descriptions of all available tests/probes

Developers Documentation

  • collect requierments and suggestions from Probes Developers

Probes

Identified Issues

  • NGI SAM Nagioses have documentation URL hardcoded in metric configuration - GGUS #108242
    • changing the URLs requires SAM update
    • Solution:
      • Plan the Change
  • org.gstat.SanityCheck - not maintained anymore - GGUS #108243
    • checks small subset of BDII GLUE 1.3 data
    • (add reference to profiles were it is enabled)
    • Solution:
      • replace with org.bdii.GLUE2-Validate - through validation of GLUE 2 data
  • SAM requires unsupported software
    • UMD 2 middleware:
    • CentOS/SL5
      • Solution: - migration to CentOS/SL6 planned within EGI InSPIRE JRA2 activity
  • org.sam.SRM-GetTURLs fails if webdav is published
    • the probe takes from the BDII the list of protocols published in the GlueSEAccessProtocol object and tries to access them using SRM. This fails for webdav.
    • Solution:
      • CERN DPM PT is developping a webdav SAM probe - GGUS #108571
      • improve org.sam.SRM-GetTURLs, under dCache PT maintenance, to not ask/test SRM for a webdav TURL
  • MPI Nagios probe issues for GE batch system - GGUS #108443
    • GGUS #101406 - WARNING: Publishes GlueCEPolicyMaxCPUTime (30) / GlueCEPolicyMaxWallClockTime (30) < 4
    • Solution
      • rewriting the test to take into acount that GE provider now also provides correct CPU limit in the BDII per core and include it in the next SAM update
  • remove/replace FTS(2) probes - GGUS #108458
    • FTS2 has been decommissioned since the 1st of August;
    • the Nagios FTS probes are not suited for FTS3 and so they can be removed.
    • Solution:
      • follow-up with TP (OliverK & MaiteBL) the development of FTS3 probes and their integration in in SAM
  • SAM CE Nagios framework is unsupported
    • used for WN* tests
    • Solution: - replace them
  • LFC decommissioning - with implications in tests depending on it
    • org.sam.WN-Rep* (CREAM-CE)
    • org.sam.WN-Rep* (CREAM-CE)
    • Possible Solutions (under evaluation:
      • deploy dedicated LFC and reconfigure all NGI SAM Nagioses
  • SRM tests (add reference) cause false alarms on new DPM versions (tests for SRM API all the interfaces published in the Top-BDII) (add link to GGUS tkt)
    • Solution: - ??

TF Activity


Description Status
Replace all references to "grid-monitoring.egi.eu" with "mon.egi.eu" DONE
Provide lists of all ROC SAM tests, in ROC_SAM_Tests DONE
Update description of ROC SAM tests, in ROC_SAM_Tests In Progress
Update all broken links in SAM In Progress
Include info from SAM_Tests in SAM, and obsolete it In Progress
etc ToDo

SAM/ARGO Roadmap

People

  • Cristina Aiftimiei
  • Emir Imamagic
  • Peter Solagna
  • David Crooks
  • Tiziana Ferrari
  • Paloma Fuente
  • Kashif Mohammad
  • Stuart Pullinger
  • Marcin Radecki
  • Ievgen Sliusar
  • Ulf Tigerstedt
  • Petter Urkedal