Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "PROC06 Setting Nagios test status to operations"

From EGIWiki
Jump to navigation Jump to search
Line 85: Line 85:
|-
|-
|}
|}
editorial notes:
1.Setting nagios tests to critical. COD should be authorized to manage critical test list. Action on M K: update a table with steps to make the test critical. A kind of best practices. Luuk's remark: the test should be running properly on all sites. We are talking about globally critical tests. New tests is developed. Then test is distributed to regions. Anyone can request a test to be critical to COO. COD has to agree (checking if the test does not spoil infrastructure) and OMB is informed, they can protests if needed? 75% OK rule still applies. COD does not have access to NGI nagioses thus communication with NGIs about the percentage of service instances passing the test is by GGUS tickets.
Any test change in fundamental way requires test certification.
COD will inform the OTAG about this requirement.

Revision as of 15:48, 21 September 2010

Procedure for setting Nagios tests critical DRAFT

The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical.

Revision history

Version Authors Date Comments
0.1 Małgorzata Krakowian First draft

Setting Nagios tests critical request

The request should be submited to The Chief Operations Officer. The request should be approved by OMB and COD.

TBD

How to start the process

  • The Chief Operations Officer opens a GGUS ticket to COD to start the process.
  • The Central Operator on Duty team - in charge of EGI oversight - is responsible of processing the request ticket.

Prerequisities

Before opening the GGUS ticket, the test should be implemented and approved by Nagios team.

TBD

Setting Nagios tests critical steps

The general idea is that tickets must be closed before being able to move on to the next step.

Steps:

Step Action on Action
1 Nagios Add test to official Nagios package.
2 NGIs Nagios update.
3 NGIs Request to the ROD teams to ask the if they can verify if the test is acceptable

to them (75% of affected nodes should be OK.)

4 COD The information is broadcast by COD.

(This broadcast should be sent to VO managers and NOC/ROC managers) See the template below for an indication of the message content.

Subject:   

Dear All,

We would like to announce that test XXX will become critical XXX

Best regards,
5 who? Add test to critical tests list wiki page. https://twiki.cern.ch/twiki/bin/view/LCG/SAMCriticalTestsForCODs (Request from Cyril to Nagios for a more dynamical list that could be exploited directly.)
6 Operational Dashboad Add new test as critical.
7 COD Final check. Close parent ticket


editorial notes: 1.Setting nagios tests to critical. COD should be authorized to manage critical test list. Action on M K: update a table with steps to make the test critical. A kind of best practices. Luuk's remark: the test should be running properly on all sites. We are talking about globally critical tests. New tests is developed. Then test is distributed to regions. Anyone can request a test to be critical to COO. COD has to agree (checking if the test does not spoil infrastructure) and OMB is informed, they can protests if needed? 75% OK rule still applies. COD does not have access to NGI nagioses thus communication with NGIs about the percentage of service instances passing the test is by GGUS tickets. Any test change in fundamental way requires test certification. COD will inform the OTAG about this requirement.