Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

PROC06 Setting Nagios test status to operations

From EGIWiki
Revision as of 15:21, 21 October 2011 by Mkrakowi (talk | contribs)
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Procedure for setting Nagios test status to operations

  • Title: Procedure for setting Nagios test status to operations
  • Document link: https://wiki.egi.eu/wiki/PROC06
  • Last modified: 23.11.2010
  • Version: 1.0
  • Policy Group Acronym: GOO/COD
  • Policy Group Name: Grid Operations Oversight/Central Operator on Duty
  • Contact Person: Małgorzata Krakowian, Marcin Radecki
  • Document Status: APPROVED
  • Approved Date: 23.11.2010
  • Procedure Statement:The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests an operations test. A Nagios test is set as operations test to enable the operations dashboard to display an alarm in case the test fails.

Overview

The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests an operations test. A Nagios test is set as operations test to enable the operations dashboard to display an alarm in case the test fails.

This procedure only applies for tests run under OPS VO and its range is global, applies for all Operations Centres in EGI project.

The current list of OPERATIONS tests is maintained by COD and is available here

Steps

Prerequisites

The SAM test needs to meet the following requirements.

  1. It satisfies quality criteria in agreement with the UMD operational capabilities quality criteria: https://documents.egi.eu/document/240.
  2. It is properly documented.
  3. It must be part of an official nagios release.
  4. It must have been deployed in production for at least one month without problems.
  5. It must be available for validation by COD

Sending a request

  • Anybody is can submit the request for making the test an operations test.
  • The request should be submitted to COD via a GGUS ticket.

Validation

The general idea is that tickets specified in the table below must be closed before being able to move on to the next step in the procedure.

Step Action on Action
1 Applicant Opens a GGUS ticket to COD to start the process.
Subject: Request for setting XXX test an operations test

Dear COD,

We would like to request for setting XXX test an operations test

Prerequisite data:
* name of nagios probe:
* name of service on which the test runs: 
* link to documentation page:
* motivation (which part of the infrastructure will be improved by making XXX test 
 or description of users' problems which will be avoided in future - provide list 
 of GGUS tickets is possible)

Best Regards
XXX
2 COD Checks the status of the Nagios probe to see if it meets the specified quality criteria.
3 COD COD contacts the OMB to request the approval of the new operations test. Date is specified (at least 1 month in future)
4 NGIs Request to the ROD teams to try making the test OK. 75% OK in total (entire EGI) is understood as threshold for passing to the next step. If not possible to proceed, report problems to OMB.
5 COD The announcement about the new operations test is broadcast by COD.

(This broadcast should be sent to site managers, NOC/ROC managers and ROD teams) See the template below for an indication of the message content.

Subject:   

Dear All,

We would like to announce that test XXX will become operational on XXX

Short description of the test:

The documentation can be found:

Best regards,
6 COD Add the test to the operations tests list. https://wiki.egi.eu/wiki/Operations:Operations_tests
7 COD Mark the test as operations test in the Operational Portal.
8 SAM Add new test into ROC_OPERATORS profile in MyEGI
9 COD Final check. Close parent ticket


Revision history

  • 21/20/2011: New step (8) for adding new test into ROC_OPERATORS profile in MyEGI
  • 17/03/2011: original title Operations:Procedure_for_setting_Nagios_test_an_operations_test updated and associated to a procedure number. TFERRARI