Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "PROC05 Validation of Operations Centre Nagios"

From EGIWiki
Jump to navigation Jump to search
Line 25: Line 25:
== Registration Steps  ==
== Registration Steps  ==
{| class="wikitable"
{| class="wikitable"
|-
! <br>
! Responsible
! Action
|- valign="top"
|- valign="top"
| 1  
| 1  
|NGI
| Add instance to [https://goc.egi.eu/ GOC DB]  and to [[NGI_services_in_GOCDB|service group NGI_XX_SERVICES]]  
| Add instance to [https://goc.egi.eu/ GOC DB]  and to [[NGI_services_in_GOCDB|service group NGI_XX_SERVICES]]  
|- valign="top"
|- valign="top"
| 2  
| 2  
|NGI
| Open GGUS ticket in [http://ggus.eu/ GGUS system] for SAM/Nagios SU to add your NAGIOS instance to http://gridops.cern.ch/config/nagios-roles.conf
| Open GGUS ticket in [http://ggus.eu/ GGUS system] for SAM/Nagios SU to add your NAGIOS instance to http://gridops.cern.ch/config/nagios-roles.conf
|- valign="top"
|- valign="top"
| 3  
| 3
|NGI
| Open GGUS ticket in [http://ggus.eu/ GGUS system] for Operations Portal SU to enable receiving alarms from your NAGIOS instance
| Open GGUS ticket in [http://ggus.eu/ GGUS system] for Operations Portal SU to enable receiving alarms from your NAGIOS instance
|}
|}
Line 41: Line 48:


{| class="wikitable"
{| class="wikitable"
|-
! <br>
! Responsible
! Action
|- valign="top"
|- valign="top"
| 1  
| 1
|NGI
| Check if your Nagios instance pass all tests on [https://ops-monitor.cern.ch/nagios/ Ops-Monitor instance] for at least 7 days.  
| Check if your Nagios instance pass all tests on [https://ops-monitor.cern.ch/nagios/ Ops-Monitor instance] for at least 7 days.  
|- valign="top"
|- valign="top"
| 2  
| 2
|NGI
| Check all internal tests (assigned to NAGIOS server) which must be OK for at least 7 days.
| Check all internal tests (assigned to NAGIOS server) which must be OK for at least 7 days.
|- valign="top"
|- valign="top"
| 3  
| 3  
|NGI
| ROD team should check status of site's services to find errors that cannot be justified with actual site error for at least 7 days.   
| ROD team should check status of site's services to find errors that cannot be justified with actual site error for at least 7 days.   
|}
|}


[[Category:Operations_Procedures]]
[[Category:Operations_Procedures]]

Revision as of 14:03, 21 May 2013

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Title Validation of a Operations Centre Nagios
Document link https://wiki.egi.eu/wiki/PROC05
Last modified 1.0
Policy Group Acronym OMB
Policy Group Name Operations Management Board
Contact Group Emir Imamagic
Document Status Approved
Approved Date 10 August 2010
Procedure Statement The purpose of this document is to define validation procedure of Operations Centre Nagios
Owner Owner of procedure


Overview

The document describes the process of how to register and validate Operations Centre Nagios instance.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Registration Steps


Responsible Action
1 NGI Add instance to GOC DB and to service group NGI_XX_SERVICES
2 NGI Open GGUS ticket in GGUS system for SAM/Nagios SU to add your NAGIOS instance to http://gridops.cern.ch/config/nagios-roles.conf
3 NGI Open GGUS ticket in GGUS system for Operations Portal SU to enable receiving alarms from your NAGIOS instance

Validation Steps

Validation should take at least 7 days.


Responsible Action
1 NGI Check if your Nagios instance pass all tests on Ops-Monitor instance for at least 7 days.
2 NGI Check all internal tests (assigned to NAGIOS server) which must be OK for at least 7 days.
3 NGI ROD team should check status of site's services to find errors that cannot be justified with actual site error for at least 7 days.