Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "SAM Nagios probes refactoring TF"

From EGIWiki
Jump to navigation Jump to search
Line 30: Line 30:


* Identified Isuses:
* Identified Isuses:
** NGI SAM Nagioses have documentation URL hardcoded in metric configuration
** '''NGI SAM Nagioses have documentation URL hardcoded in metric configuration'''
*** changing the URLs requires SAM update
*** changing the URLs requires SAM update
*** '''Solution:''' - Plan the Change
*** '''Solution:''' - Plan the Change
** SAM CE Nagios framework is unsupported
** '''SAM CE Nagios framework is unsupported'''
*** used for WN* tests
*** used for WN* tests
*** '''Solution:''' - replace them
*** '''Solution:''' - replace them
** SRM tests (add reference) cause false alarms on new DPM versions (add link to GGUS tkt)
** '''SRM tests (add reference) cause false alarms on new DPM versions''' (add link to GGUS tkt)
*** '''Solution:''' - ??
*** '''Solution:''' - ??
** org.gstat.SanityCheck - not maintained anymore
** '''org.gstat.SanityCheck - not maintained anymore'''
*** checks small subset of BDII GLUE 1.2 data
*** checks small subset of BDII GLUE 1.2 data
*** (add reference to profiles were it is enabled)
*** (add reference to profiles were it is enabled)
*** '''Solution:''' replace with org.bdii.GLUE2-Validate - through validation of GLUE 2 data
*** '''Solution:''' replace with org.bdii.GLUE2-Validate - through validation of GLUE 2 data
** LFC decommissioning - with implications in tests depending on it
** '''LFC decommissioning''' - with implications in tests depending on it
*** org.sam.WN-Rep* (CREAM-CE)
*** org.sam.WN-Rep* (CREAM-CE)
*** org.sam.WN-Rep* (CREAM-CE)
*** org.sam.WN-Rep* (CREAM-CE)
Line 48: Line 48:
**** remove all LFC-dependent tests, or find replacement
**** remove all LFC-dependent tests, or find replacement
**** deploy dedicated LFC and reconfigure all NGI SAM Nagioses
**** deploy dedicated LFC and reconfigure all NGI SAM Nagioses
** SAM requires unsupported software
**''' SAM requires unsupported software'''
*** UMD 2 middleware:
*** UMD 2 middleware:
**** '''Solution:''' migration to UMD-3 planned in September - FOllow-Up
**** '''Solution:''' migration to UMD-3 planned in September - FOllow-Up

Revision as of 15:28, 23 July 2014

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Tools menu: Main page Instructions for developers AAI Proxy Accounting Portal Accounting Repository AppDB ARGO GGUS GOCDB
Message brokers Licenses OTAGs Operations Portal Perun EGI Collaboration tools LToS EGI Workload Manager


Mandate

  • assess the support status of various Nagios probes available
    • recommend removal or replacement of unsupported probes from the SAM Nagios framework
  • improve documentation:
    • availability of references to the individual nagios probes/tests descriptions in a central place
    • update known documentations web pages with proper references (avoid broken links)
    • improve developers guides
    • require change to SAM to remove harcoded documentation URLs in metrics configuration

Reference:OMB, June 26, 2014 - Re-factoring SAM probes

Tasks

Documentation

Generic improvements

  • Eliminate Broken Links
    • collect all documentation links in a central place

Developers Documentation

  • (add link)
  • colelct requierments and suggestions from Developers

Probes

  • Identified Isuses:
    • NGI SAM Nagioses have documentation URL hardcoded in metric configuration
      • changing the URLs requires SAM update
      • Solution: - Plan the Change
    • SAM CE Nagios framework is unsupported
      • used for WN* tests
      • Solution: - replace them
    • SRM tests (add reference) cause false alarms on new DPM versions (add link to GGUS tkt)
      • Solution: - ??
    • org.gstat.SanityCheck - not maintained anymore
      • checks small subset of BDII GLUE 1.2 data
      • (add reference to profiles were it is enabled)
      • Solution: replace with org.bdii.GLUE2-Validate - through validation of GLUE 2 data
    • LFC decommissioning - with implications in tests depending on it
      • org.sam.WN-Rep* (CREAM-CE)
      • org.sam.WN-Rep* (CREAM-CE)
      • Solutions:
        • remove all LFC-dependent tests, or find replacement
        • deploy dedicated LFC and reconfigure all NGI SAM Nagioses
    • SAM requires unsupported software
      • UMD 2 middleware:
        • Solution: migration to UMD-3 planned in September - FOllow-Up
      • CentOS/SL5
        • Solution: - migration to CentOS/SL6 planned within EGI InSPIRE JRA2 activity

People

  • Cristina Aiftimiei
  • Emir Imamagic
  • Peter Solagna