Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "PROC06 Setting Nagios test status to operations"

From EGIWiki
Jump to navigation Jump to search
(Remove deprecated content)
Tag: Replaced
 
(131 intermediate revisions by 10 users not shown)
Line 1: Line 1:
'''draft'''
{{Template:Op menubar}} {{Template:Doc_menubar}} {{TOC_right}}
 
[[Category:Deprecated]]
- ticket to COD
{| style="border:1px solid black; background-color:lightgrey; color: black; padding:5px; font-size:140%; width: 90%; margin: auto;"
 
| style="padding-right: 15px; padding-left: 15px;" |  
- wait till the NGI's have had the time to update their Nagios boxes.
|[[File:Alert.png]] This page is '''Deprecated'''; the content has been moved to https://confluence.egi.eu/display/EGIPP/PROC06+Setting+Nagios+test+status+to+operations  
 
- send an email to the ROD teams (or NGI) to inform them and ask the if
they can verify if the test is acceptable to them
(I believe that at the end of EGEE it was proposed that more that 75% of
affected nodes should be OK.)
 
- mention the activation at the operations meeting.
 
 
 
----
 
= Procedure for setting Nagios tests critical DRAFT =
 
The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical.
 
= Revision history =
{| border="1" cellspacing="0" cellpadding="5" align="center"
! Version
! Authors
! Date
! Comments
|-
| 0.1
| Małgorzata Krakowian, Marcin Radecki
|
| First draft
|-
|}
 
= EGEE ROC decommission request validation =
 
* The EGEE ROC decommission request should be submit to COD by EGEE ROC which wants to start decommission procedure.
* The Central Operator on Duty team - in charge of EGI oversight - is responsible of performing validation of the request and final check of the process.
 
= How to start the decommission process =
 
* The decommission of an EGEE ROC starts when the ROC opens a ROC decommission ticket to COD (via GGUS).
* Once the ticket is filed, COD can start the validation of the request.
* When COD validate the request, in order to trigger the actions described in this document the ROC creates a set of new child tickets that are assigned to the individual partners that are responsible for the various steps. Thereby, the integration process should be as transparent as possible to all parties involved. The required actions are described below.
 
An example/template for the EGEE ROC decommission ticket is provided here:
 
<pre>Subject: Decommission of EGEE ROC XXX
 
Dear COD,
 
According to procedure https://wiki.egi.eu/wiki/Operations:EGEE_ROC_decommission we would like to start decommission procedure for ROC XXX.
 
Best regards,
</pre>
 
 
==  EGEE ROC prerequisities ==
 
Before opening an ROC decommission GGUS ticket, the EGEE ROC should:
{| border="1" cellspacing="0" cellpadding="5" align="center"
! Concerns
! Prerequisities
|-  
| GSTAT
| All sites should be reconfigured according to the instructions at: http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_my_site_information
<pre>
change the old information from:
  SITE_OTHER_EGEE_ROC="XX"
  SITE_OTHER_GRID="EGEE"
to:
  SITE_OTHER_EGI_NGI="NGI_XXX"
  SITE_OTHER_GRID="EGEE|EGI"
</pre>
|-
| GGUS
| All the tickets assigned to the ROC SU should be closed.
|-
| Operational Dashboard
| All alarms and operational tickets should be closed
|-
| GOC DB
| No sites defined in GOCDB for the ROC, other than in Closed status.
|-
|}
 
= Setting Nagios tests critical steps=
 
The general idea is that tickets must be closed before being able to move on to the next step.
 
Decommission steps:  
 
{| border="1" cellspacing="0" cellpadding="5" align="center"
! Step
! Action on
! Action
|-v
| 1
| ROC
| The information that the EGEE ROC started decommission process is broadcast by ROC officials.
 
(This broadcast should be sent to VO managers and NOC/ROC managers)
 
See the template below for an indication of the message content.
 
<pre>
Subject: Decommission of ROC XXX has started
 
Dear All,
 
We would like to announce that EGEE ROC XXX started decommission procedure.
 
Best regards,
</pre>
|-
| 2
| GGUS
| Request to close ROC SU.
|-
| 3
| GOC DB
| Request to deactivated ROC in the GOC DB.
|-
| 4
| COD
| Request to :
* remove ROD mailing list from all-operator-on-duty@mailman.egi.eu mailing list
* remove ROD mailing list from operational manual
|-
| 5
| Operational portal
| Request to remove/move data related to the ROC.
|-
| 6
| Nagios
| Request to stop monitoring ROC Nagios instance.
|-
| 7
| COD
| Final checks by the IPC.
(Were all steps taken and finished properly? Close the parent ticket.)
|-
| 8
| ROC
|
The information that the EGEE ROC was decommissioned is broadcast by ex-ROC officials.
 
(This broadcast should be sent to VO managers and NOC/ROC managers)
 
See the template below for an indication of the message content.
<pre>
Subject: ROC XXX was decommissioned  
 
Dear All,
 
We would like to announce that ROC XXX is now closed.
All references to ROC XXX were removed from operational tools.
 
Best regards,
</pre>
|-
|}
|}

Latest revision as of 10:43, 15 April 2022