Difference between revisions of "PROC06 Setting Nagios test status to operations"
Line 1: | Line 1: | ||
* '''Title''': Procedure for setting Nagios test critical for COD | * '''Title''': Procedure for setting Nagios test critical for COD | ||
* '''Document link''': https://wiki.egi.eu/wiki/Operations:Setting_Nagios_tests_critical_procedure | * '''Document link''': https://wiki.egi.eu/wiki/Operations:Setting_Nagios_tests_critical_procedure | ||
* '''Last modified''': | * '''Last modified''': 23.11.2010 | ||
* '''Version''': 0 | * '''Version''': 1.0 | ||
* '''Policy Group Acronym''': GOO/COD | * '''Policy Group Acronym''': GOO/COD | ||
* '''Policy Group Name''': Grid Operations Oversight/Central Operator on Duty | * '''Policy Group Name''': Grid Operations Oversight/Central Operator on Duty | ||
* '''Contact Person''': Małgorzata Krakowian, Marcin Radecki | * '''Contact Person''': Małgorzata Krakowian, Marcin Radecki | ||
* '''Document Status''': | * '''Document Status''': APPROVED | ||
* '''Approved Date''': | * '''Approved Date''': 23.11.2010 | ||
* '''Procedure Statement''':The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical. A Nagios test is set to critical to enable the operations dashboard to display an alarm in case the test fails. | * '''Procedure Statement''':The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical. A Nagios test is set to critical to enable the operations dashboard to display an alarm in case the test fails. | ||
Line 23: | Line 23: | ||
! Comments | ! Comments | ||
|- | |- | ||
| | | 1.0 | ||
| Małgorzata Krakowian, Marcin Radecki | |||
| 23.11.2010 | |||
| Approved by OMB | |||
| Małgorzata Krakowian | |||
| | |||
| | |||
|- | |- | ||
|} | |} | ||
= Setting Nagios tests critical request = | = Setting Nagios tests critical request = | ||
Line 58: | Line 34: | ||
== Request == | == Request == | ||
* Everyone is allowed to submit the request for making the test critical for COD. | * Everyone is allowed to submit the request for making the test critical for COD. | ||
* The request should be submitted to | * The request should be submitted to COD. | ||
== Prerequisites == | == Prerequisites == | ||
Line 81: | Line 57: | ||
|-v | |-v | ||
| 1 | | 1 | ||
| | | Applicant | ||
| Opens a GGUS ticket to COD to start the process. | | Opens a GGUS ticket to COD to start the process. | ||
<pre> | |||
Subject: | |||
blebleblebel | |||
</pre> | |||
|- | |- | ||
| 2 | | 2 |
Revision as of 12:07, 26 November 2010
- Title: Procedure for setting Nagios test critical for COD
- Document link: https://wiki.egi.eu/wiki/Operations:Setting_Nagios_tests_critical_procedure
- Last modified: 23.11.2010
- Version: 1.0
- Policy Group Acronym: GOO/COD
- Policy Group Name: Grid Operations Oversight/Central Operator on Duty
- Contact Person: Małgorzata Krakowian, Marcin Radecki
- Document Status: APPROVED
- Approved Date: 23.11.2010
- Procedure Statement:The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical. A Nagios test is set to critical to enable the operations dashboard to display an alarm in case the test fails.
Procedure for setting Nagios test critical for COD DRAFT
The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical for COD. As a result the operations dashboard displays an alarm in case the test fails.
This procedure only applies for tests run under OPS VO and its range is global, applies for all Operations Centres in EGI project.
Revision history
Version | Authors | Date | Comments |
---|---|---|---|
1.0 | Małgorzata Krakowian, Marcin Radecki | 23.11.2010 | Approved by OMB |
Setting Nagios tests critical request
Request
- Everyone is allowed to submit the request for making the test critical for COD.
- The request should be submitted to COD.
Prerequisites
- The Nagios test needs to satisfy quality criteria in agreement with the UMD roadmap.
- The test needs to be properly documented, and its correct functionality need to be proven (one month of successful running o=in the production infrastructure).
- The test needs to run for at least 1 month on all regional instances (it must be present in official nagios package to provide data for validation).
Quality criteria needs to be provided by SA1
Validation
The general idea is that tickets must be closed before being able to move on to the next step.
Steps:
Step | Action on | Action |
---|---|---|
1 | Applicant | Opens a GGUS ticket to COD to start the process.
Subject: blebleblebel |
2 | COD | Checks the status of the Nagios probe to see if it meets the specified quality criteria (to be defined). |
3 | COD | COD contacts the OMB to request the approval of the new critical test. Date is specified (at least 1 month in future) |
4 | NGIs | Request to the ROD teams to try making the test OK. 75% OK in total (entire EGI) is understood as threshold for passing to the next step. If not possible to proceed, report problems to OMB. |
5 | COD | The announcement about the new critical test is broadcast by COD.
(This broadcast should be sent to VO managers and NOC/ROC managers) See the template below for an indication of the message content. Subject: Dear All, We would like to announce that test XXX will become critical XXX Best regards, |
6 | COD | Add the test to the critical tests list. https://wiki.egi.eu/wiki/Operations:Operations_tests |
7 | Operational Portal | Mark the test as critical in the Operational Portal.
This step will be removed when COD gets an access to manage operations tests list in Operational Portal. See Requirements to be implemented |
8 | COD | Final check. Close parent ticket |
Requirements to be implemented
- COD is responsible to manage OPS critical test list from operational portal side. https://rt.egi.eu/rt/Ticket/Display.html?id=482