Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "PROC06 Setting Nagios test status to operations"

From EGIWiki
Jump to navigation Jump to search
Line 1: Line 1:
'''draft'''
- ticket to COD
- wait till the NGI's have had the time to update their Nagios boxes.
- send an email to the ROD teams (or NGI) to inform them and ask the if
they can verify if the test is acceptable to them
(I believe that at the end of EGEE it was proposed that more that 75% of
affected nodes should be OK.)
- mention the activation at the operations meeting.
----
= Procedure for setting Nagios tests critical DRAFT =
= Procedure for setting Nagios tests critical DRAFT =


The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical.
The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical.  


= Revision history =
= Revision history =
Line 34: Line 17:
|}
|}


= EGEE ROC decommission request validation =
= Setting Nagios tests critical request =
 
* The EGEE ROC decommission request should be submit to COD by EGEE ROC which wants to start decommission procedure.
* The Central Operator on Duty team - in charge of EGI oversight - is responsible of performing validation of the request and final check of the process.
 
= How to start the decommission process =  


* The decommission of an EGEE ROC starts when the ROC opens a ROC decommission ticket to COD (via GGUS).
The request should be submited to The Chief Operations Officer. The request should be approved by OMB and COD.  
* Once the ticket is filed, COD can start the validation of the request.
* When COD validate the request, in order to trigger the actions described in this document the ROC creates a set of new child tickets that are assigned to the individual partners that are responsible for the various steps. Thereby, the integration process should be as transparent as possible to all parties involved. The required actions are described below.  


An example/template for the EGEE ROC decommission ticket is provided here:
'''TBD'''


<pre>Subject: Decommission of EGEE ROC XXX
= How to start the process =


Dear COD,
* The Chief Operations Officer opens a GGUS ticket to COD to start the process.
* The Central Operator on Duty team - in charge of EGI oversight - is responsible of processing the request ticket.


According to procedure https://wiki.egi.eu/wiki/Operations:EGEE_ROC_decommission we would like to start decommission procedure for ROC XXX.
==  Prerequisities ==


Best regards,
Before opening the GGUS ticket, the test should be implemented and approved by Nagios team.
</pre>


 
'''TBD'''
==  EGEE ROC prerequisities ==
 
Before opening an ROC decommission GGUS ticket, the EGEE ROC should:
{| border="1" cellspacing="0" cellpadding="5" align="center"
! Concerns
! Prerequisities
|-
| GSTAT
| All sites should be reconfigured according to the instructions at: http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_my_site_information
<pre>
change the old information from:
  SITE_OTHER_EGEE_ROC="XX"
  SITE_OTHER_GRID="EGEE"
to:
  SITE_OTHER_EGI_NGI="NGI_XXX"
  SITE_OTHER_GRID="EGEE|EGI"
</pre>
|-
| GGUS
| All the tickets assigned to the ROC SU should be closed.
|-
| Operational Dashboard
| All alarms and operational tickets should be closed
|-
| GOC DB
| No sites defined in GOCDB for the ROC, other than in Closed status.
|-
|}


= Setting Nagios tests critical steps=
= Setting Nagios tests critical steps=
Line 90: Line 38:
The general idea is that tickets must be closed before being able to move on to the next step.
The general idea is that tickets must be closed before being able to move on to the next step.


Decommission steps:  
Steps:  


{| border="1" cellspacing="0" cellpadding="5" align="center"
{| border="1" cellspacing="0" cellpadding="5" align="center"
Line 98: Line 46:
|-v
|-v
| 1
| 1
| ROC
| Nagios
| The information that the EGEE ROC started decommission process is broadcast by ROC officials.
| Add test to official Nagios package.
 
(This broadcast should be sent to VO managers and NOC/ROC managers)
 
See the template below for an indication of the message content.
 
<pre>
Subject: Decommission of ROC XXX has started
 
Dear All,
 
We would like to announce that EGEE ROC XXX started decommission procedure.
 
Best regards,
</pre>
|-
|-
| 2
| 2
| GGUS
| NGIs
| Request to close ROC SU.
| Nagios update.
|-
|-
| 3
| 3
| GOC DB
| NGIs
| Request to deactivated ROC in the GOC DB.
| Request to the ROD teams to ask the if they can verify if the test is acceptable
to them (75% of affected nodes should be OK.)
|-
|-
| 4
| 4
| COD
| COD
| Request to :
| The information is broadcast by COD.  
* remove ROD mailing list from all-operator-on-duty@mailman.egi.eu mailing list
(This broadcast should be sent to VO managers and NOC/ROC managers)  
* remove ROD mailing list from operational manual
See the template below for an indication of the message content.  
|-
| 5
| Operational portal
| Request to remove/move data related to the ROC.
|-
| 6
| Nagios
| Request to stop monitoring ROC Nagios instance.
|-
| 7
| COD
| Final checks by the IPC.
(Were all steps taken and finished properly? Close the parent ticket.)
|-
| 8
| ROC
|
The information that the EGEE ROC was decommissioned is broadcast by ex-ROC officials.
 
(This broadcast should be sent to VO managers and NOC/ROC managers)
 
See the template below for an indication of the message content.
<pre>
<pre>
Subject: ROC XXX was decommissioned 
Subject:  


Dear All,
Dear All,


We would like to announce that ROC XXX is now closed.
We would like to announce that test XXX will become critical XXX
All references to ROC XXX were removed from operational tools.


Best regards,
Best regards,
</pre>
</pre>
|-
| 5
| who?
| Add test to critical tests list wiki page.
|-
| 6
| Operational Dashboad
| Add new test as critical.
|-
| 7
| COD
| Final check. Close parent ticket
|-
|-
|}
|}

Revision as of 14:56, 8 September 2010

Procedure for setting Nagios tests critical DRAFT

The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical.

Revision history

Version Authors Date Comments
0.1 Małgorzata Krakowian, Marcin Radecki First draft

Setting Nagios tests critical request

The request should be submited to The Chief Operations Officer. The request should be approved by OMB and COD.

TBD

How to start the process

  • The Chief Operations Officer opens a GGUS ticket to COD to start the process.
  • The Central Operator on Duty team - in charge of EGI oversight - is responsible of processing the request ticket.

Prerequisities

Before opening the GGUS ticket, the test should be implemented and approved by Nagios team.

TBD

Setting Nagios tests critical steps

The general idea is that tickets must be closed before being able to move on to the next step.

Steps:

Step Action on Action
1 Nagios Add test to official Nagios package.
2 NGIs Nagios update.
3 NGIs Request to the ROD teams to ask the if they can verify if the test is acceptable

to them (75% of affected nodes should be OK.)

4 COD The information is broadcast by COD.

(This broadcast should be sent to VO managers and NOC/ROC managers) See the template below for an indication of the message content.

Subject:   

Dear All,

We would like to announce that test XXX will become critical XXX

Best regards,
5 who? Add test to critical tests list wiki page.
6 Operational Dashboad Add new test as critical.
7 COD Final check. Close parent ticket