Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "PROC06 Setting Nagios test status to operations"

From EGIWiki
Jump to navigation Jump to search
Line 1: Line 1:
'''draft'''  
'''draft'''  
- ticket to COD


- wait till the NGI's have had the time to update their Nagios boxes.
- wait till the NGI's have had the time to update their Nagios boxes.
Line 9: Line 11:


- mention the activation at the operations meeting.
- mention the activation at the operations meeting.
----
= Procedure for setting Nagios tests critical DRAFT =
The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical.
= Revision history =
{| border="1" cellspacing="0" cellpadding="5" align="center"
! Version
! Authors
! Date
! Comments
|-
| 0.1
| Małgorzata Krakowian, Marcin Radecki
|
| First draft
|-
|}
= EGEE ROC decommission request validation =
* The EGEE ROC decommission request should be submit to COD by EGEE ROC which wants to start decommission procedure.
* The Central Operator on Duty team - in charge of EGI oversight - is responsible of performing validation of the request and final check of the process.
= How to start the decommission process =
* The decommission of an EGEE ROC starts when the ROC opens a ROC decommission ticket to COD (via GGUS).
* Once the ticket is filed, COD can start the validation of the request.
* When COD validate the request, in order to trigger the actions described in this document the ROC creates a set of new child tickets that are assigned to the individual partners that are responsible for the various steps. Thereby, the integration process should be as transparent as possible to all parties involved. The required actions are described below.
An example/template for the EGEE ROC decommission ticket is provided here:
<pre>Subject: Decommission of EGEE ROC XXX
Dear COD,
According to procedure https://wiki.egi.eu/wiki/Operations:EGEE_ROC_decommission we would like to start decommission procedure for ROC XXX.
Best regards,
</pre>
==  EGEE ROC prerequisities ==
Before opening an ROC decommission GGUS ticket, the EGEE ROC should:
{| border="1" cellspacing="0" cellpadding="5" align="center"
! Concerns
! Prerequisities
|-
| GSTAT
| All sites should be reconfigured according to the instructions at: http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_my_site_information
<pre>
change the old information from:
  SITE_OTHER_EGEE_ROC="XX"
  SITE_OTHER_GRID="EGEE"
to:
  SITE_OTHER_EGI_NGI="NGI_XXX"
  SITE_OTHER_GRID="EGEE|EGI"
</pre>
|-
| GGUS
| All the tickets assigned to the ROC SU should be closed.
|-
| Operational Dashboard
| All alarms and operational tickets should be closed
|-
| GOC DB
| No sites defined in GOCDB for the ROC, other than in Closed status.
|-
|}
= Setting Nagios tests critical steps=
The general idea is that tickets must be closed before being able to move on to the next step.
Decommission steps:
{| border="1" cellspacing="0" cellpadding="5" align="center"
! Step
! Action on
! Action
|-v
| 1
| ROC
| The information that the EGEE ROC started decommission process is broadcast by ROC officials.
(This broadcast should be sent to VO managers and NOC/ROC managers)
See the template below for an indication of the message content.
<pre>
Subject: Decommission of ROC XXX has started
Dear All,
We would like to announce that EGEE ROC XXX started decommission procedure.
Best regards,
</pre>
|-
| 2
| GGUS
| Request to close ROC SU.
|-
| 3
| GOC DB
| Request to deactivated ROC in the GOC DB.
|-
| 4
| COD
| Request to :
* remove ROD mailing list from all-operator-on-duty@mailman.egi.eu mailing list
* remove ROD mailing list from operational manual
|-
| 5
| Operational portal
| Request to remove/move data related to the ROC.
|-
| 6
| Nagios
| Request to stop monitoring ROC Nagios instance.
|-
| 7
| COD
| Final checks by the IPC.
(Were all steps taken and finished properly? Close the parent ticket.)
|-
| 8
| ROC
|
The information that the EGEE ROC was decommissioned is broadcast by ex-ROC officials.
(This broadcast should be sent to VO managers and NOC/ROC managers)
See the template below for an indication of the message content.
<pre>
Subject: ROC XXX was decommissioned 
Dear All,
We would like to announce that ROC XXX is now closed.
All references to ROC XXX were removed from operational tools.
Best regards,
</pre>
|-
|}

Revision as of 16:07, 6 September 2010

draft

- ticket to COD

- wait till the NGI's have had the time to update their Nagios boxes.

- send an email to the ROD teams (or NGI) to inform them and ask the if they can verify if the test is acceptable to them (I believe that at the end of EGEE it was proposed that more that 75% of affected nodes should be OK.)

- mention the activation at the operations meeting.



Procedure for setting Nagios tests critical DRAFT

The purpose of this document is to clearly describe the actions and the relative steps to be undertaken for setting Nagios tests critical.

Revision history

Version Authors Date Comments
0.1 Małgorzata Krakowian, Marcin Radecki First draft

EGEE ROC decommission request validation

  • The EGEE ROC decommission request should be submit to COD by EGEE ROC which wants to start decommission procedure.
  • The Central Operator on Duty team - in charge of EGI oversight - is responsible of performing validation of the request and final check of the process.

How to start the decommission process

  • The decommission of an EGEE ROC starts when the ROC opens a ROC decommission ticket to COD (via GGUS).
  • Once the ticket is filed, COD can start the validation of the request.
  • When COD validate the request, in order to trigger the actions described in this document the ROC creates a set of new child tickets that are assigned to the individual partners that are responsible for the various steps. Thereby, the integration process should be as transparent as possible to all parties involved. The required actions are described below.

An example/template for the EGEE ROC decommission ticket is provided here:

Subject: Decommission of EGEE ROC XXX

Dear COD,

According to procedure https://wiki.egi.eu/wiki/Operations:EGEE_ROC_decommission we would like to start decommission procedure for ROC XXX.

Best regards,


EGEE ROC prerequisities

Before opening an ROC decommission GGUS ticket, the EGEE ROC should:

Concerns Prerequisities
GSTAT All sites should be reconfigured according to the instructions at: http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_my_site_information
change the old information from:
  SITE_OTHER_EGEE_ROC="XX"
  SITE_OTHER_GRID="EGEE"
to:
  SITE_OTHER_EGI_NGI="NGI_XXX"
  SITE_OTHER_GRID="EGEE|EGI"
GGUS All the tickets assigned to the ROC SU should be closed.
Operational Dashboard All alarms and operational tickets should be closed
GOC DB No sites defined in GOCDB for the ROC, other than in Closed status.

Setting Nagios tests critical steps

The general idea is that tickets must be closed before being able to move on to the next step.

Decommission steps:

Step Action on Action
1 ROC The information that the EGEE ROC started decommission process is broadcast by ROC officials.

(This broadcast should be sent to VO managers and NOC/ROC managers)

See the template below for an indication of the message content.

Subject: Decommission of ROC XXX has started

Dear All,

We would like to announce that EGEE ROC XXX started decommission procedure.

Best regards,
2 GGUS Request to close ROC SU.
3 GOC DB Request to deactivated ROC in the GOC DB.
4 COD Request to :
  • remove ROD mailing list from all-operator-on-duty@mailman.egi.eu mailing list
  • remove ROD mailing list from operational manual
5 Operational portal Request to remove/move data related to the ROC.
6 Nagios Request to stop monitoring ROC Nagios instance.
7 COD Final checks by the IPC.

(Were all steps taken and finished properly? Close the parent ticket.)

8 ROC

The information that the EGEE ROC was decommissioned is broadcast by ex-ROC officials.

(This broadcast should be sent to VO managers and NOC/ROC managers)

See the template below for an indication of the message content.

Subject: ROC XXX was decommissioned  

Dear All,

We would like to announce that ROC XXX is now closed. 
All references to ROC XXX were removed from operational tools. 

Best regards,