Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "PROC06 Setting Nagios test status to operations"

From EGIWiki
Jump to navigation Jump to search
Line 88: Line 88:
| 2
| 2
| COD
| COD
| Checks the status of the Nagios probe to see if it meets the specified quality criteria (to be defined).
| Checks the status of the Nagios probe to see if it meets the specified quality criteria.
|-
|-
| 3
| 3

Revision as of 11:57, 3 December 2010

  • Title: Procedure for setting Nagios test critical for COD
  • Document link: https://wiki.egi.eu/wiki/Operations:Setting_Nagios_tests_critical_procedure
  • Last modified: 23.11.2010
  • Version: 1.0
  • Policy Group Acronym: GOO/COD
  • Policy Group Name: Grid Operations Oversight/Central Operator on Duty
  • Contact Person: Małgorzata Krakowian, Marcin Radecki
  • Document Status: APPROVED
  • Approved Date: 23.11.2010
  • Procedure Statement:The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical. A Nagios test is set to critical to enable the operations dashboard to display an alarm in case the test fails.

Procedure for setting Nagios test critical for COD DRAFT

The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical for COD. As a result the operations dashboard displays an alarm in case the test fails.

This procedure only applies for tests run under OPS VO and its range is global, applies for all Operations Centres in EGI project.

Revision history

Version Authors Date Comments
1.0 Małgorzata Krakowian, Marcin Radecki 23.11.2010 Approved by OMB

Setting Nagios tests critical request

Request

  • Everyone is allowed to submit the request for making the test critical for COD.
  • The request should be submitted to COD.

Prerequisites

SAM test needs to

✔ satisfy quality criteria in agreement with the UMD operational capabilities quality criteria: https://documents.egi.eu/document/240

✔ be properly documented

✔ be present in official nagios package and run for at least 1 month

✔ no issues in production infrastructure observed

✔ available for validation by COD


Validation

The general idea is that tickets must be closed before being able to move on to the next step.

Steps:


Step Action on Action
1 Applicant Opens a GGUS ticket to COD to start the process.
Subject: Request for setting XXX test critical for COD

Dear COD,

We would like to request for setting XXX test critical for COD

Prerequisite data:
* name of nagios probe:
* name of service on which the test runs: 
* link to documentation page:
* motivation (which part of the infrastructure will be improved by making XXX test 
 or description of users' problems which will be avoided in future - provide list 
 of GGUS tickets is possible)

Best Regards
XXX
2 COD Checks the status of the Nagios probe to see if it meets the specified quality criteria.
3 COD COD contacts the OMB to request the approval of the new critical test. Date is specified (at least 1 month in future)
4 NGIs Request to the ROD teams to try making the test OK. 75% OK in total (entire EGI) is understood as threshold for passing to the next step. If not possible to proceed, report problems to OMB.
5 COD The announcement about the new critical test is broadcast by COD.

(This broadcast should be sent to site managers, NOC/ROC managers and ROD teams) See the template below for an indication of the message content.

Subject:   

Dear All,

We would like to announce that test XXX will become critical on XXX

Short description of the test:

The documentation can be found:

Best regards,
6 COD Add the test to the critical tests list. https://wiki.egi.eu/wiki/Operations:Operations_tests
7 Operational Portal Mark the test as critical in the Operational Portal.

This step will be removed when COD gets an access to manage operations tests list in Operational Portal. See Requirements to be implemented

8 COD Final check. Close parent ticket

Requirements to be implemented