Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Operations and Operations Support"

From EGIWiki
Jump to navigation Jump to search
Line 28: Line 28:
|-
|-
! Action  
! Action  
! Description
! Responsible<br>
! Related procedures
! Procedure
! Instructions and related pages<br>
|-
|-
| '''GGUS tickets assigned to COD'''  
| '''ROD certification'''
| <br>
| OS<br>
*'''ROD certification'''
|
**see [[WI01 ROD certification ticket handling|New ROD team certification work instructions]]  
*[https://wiki.egi.eu/wiki/PROC02 Operations Centre Creation]
*'''Creation of a new NGI'''  
 
**see [[PROC02|Creation of a new Operations Centre process coordination]]
|
**see [[WI02 Operations centre creation|work instruction]]
*[[WI01 ROD certification ticket handling|WI01 - New ROD team certification work instructions]]
**In case where COD is also the Integration Process Coordinator, COD is responsible for the whole procedure.
 
*'''Operations Centre decommission'''
|-
**see [[PROC03|Operations Centre decommission process coordination]]
| '''Creation of a new NGI'''
**COD validates the request and removes ROD information from all-operators mailing list
| OS<br>
*'''Setting a Nagios test to an operations test'''
|  
**see [[PROC06|Procedure for setting a Nagios test to an operations test]]
*[https://wiki.egi.eu/wiki/PROC02 Operations Centre Creation]
**Test can be turned ops in ops portal here: https://operations-portal.egi.eu/dashboard/regionalPreferences. You choose "ALL" as a scope.
**Broadcast can be done here: https://operations-portal.egi.eu/broadcast Subject: New OPERATIONS tests related to (choose right scope here). No option to select RODs: CC to: all-operator-on-duty@mailman.egi.eu
**Nagios ROC_OPERATORS profile must be updated by SAM team.http://grid-monitoring.cern.ch/poem/admin/poem/profile/26/
**COD is responsible for coordinating the whole process.


If the shifter doesn't know what kind of action should be taken, he/she should contact COD managers
|
*[[WI02 Operations centre creation|WI02 - New Opertions Centre creation work instruction]]


|-
| '''Operations Centre decommission'''
| O<br>
|  
|  
*[[PROC02|Creation of a new Operations Centre process coordination]]
*[https://wiki.egi.eu/wiki/PROC03 Operations Centre decommissioning]
*[[PROC03|Operations Centre decommission process coordination]]
*[[PROC06|Procedure for setting Nagios test an operations test]]


<br>  
| <br>
|-
| '''Setting a Nagios test to an operations test'''
| O<br>
|
*[https://wiki.egi.eu/wiki/PROC06 Setting a Nagios test status to OPERATIONS]


| <br>
|-
|-
| '''Operational portal dashboard issues'''  
| '''Operational portal dashboard issues'''  
| O<br>
|  
|  
*[https://operations-portal.egi.eu/codDashboard/ngi/any/tab/list/filter/operators/page/list COD dashboard link]
*[https://wiki.egi.eu/wiki/PROC01 EGI Infrastructure Oversight Escalation]


|  
|  
*[[PROC01|COD escalation procedure]]
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]
*[[WI06 Tickets older than 30 days|WI06 - Tickets &gt; 30 days]]


|-
|-
| '''Availability/reliability followup procedure'''  
| '''Availability/reliability followup procedure'''  
| O<br>
|  
|  
*[[WI03 Availability and Reliability report followup|WI03 - Availability and reliability report work instruction]]
*[https://wiki.egi.eu/wiki/PROC04 Quality verification of monthly availability and reliability statistics]<br>
*[[Underperforming sites and suspensions|Underperforming sites and suspensions]]


|  
|  
*[[PROC04|Availability and reliability monthly statistics procedure]]
*[https://wiki.egi.eu/wiki/PROC10 Recomputation of monitoring results and availability statistics]
 
*[[WI03 Availability and Reliability report followup|WI03 - Availability and reliability report work instruction]]
*[[Underperforming sites and suspensions|Underperforming sites and suspensions<br>]]
[[Underperforming sites and suspensions|Underperforming sites and suspensions]]
|-
|-
| '''Unknown followup procedure'''  
| '''Unknown followup procedure'''  
| O<br>
|  
|  
*[[WI08 Unknown report followup|WI08 - Unknown report work instruction]]
*[https://wiki.egi.eu/wiki/PROC04 Quality verification of monthly availability and reliability statistics]
*[[Unknown issue|UNKNOWN issue ]]


|  
|  
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]
*[[Unknown issue|UNKNOWN issue]]
*[[WI08 Unknown report followup|WI08 - Unknown report work instruction]]


|-
|-
| '''Top-level BDII followup procedure'''  
| '''Top-level BDII followup procedure'''  
| O<br>
|  
|  
*[[WI04 Core services report followup|WI04 - Core services report work instruction ]]
*[https://wiki.egi.eu/wiki/PROC04 Quality verification of monthly availability and reliability statistics]


|  
|  
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]
*[[WI04 Core services report followup|WI04 - Core services report work instruction]]


|-
|-
| '''ROD performance index followup procedure'''  
| '''ROD performance index followup procedure'''  
| O<br>
|  
|  
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]
*[[WI07 ROD performance index report follwup|WI07 - ROD Performance Index report work instruction]]  
*[[WI07 ROD performance index report follwup|WI07 - ROD Performance Index report work instruction]]  
*[[ROD performance index|ROD performance index]]
*[[ROD performance index|ROD performance index]]
|
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]


|}
|}
Line 170: Line 183:


[[Category:Grid_Oversight]]
[[Category:Grid_Oversight]]
<br>

Revision as of 18:08, 18 August 2014

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Infrastructure Operations Oversight menu: Home EGI.eu Operations Team Regional Operators (ROD) 




Introduction

This page collects internal materials needed by EGI.eu Operations and EGI Operations Support team to perform the EGI Infrastructure operations oversight activities.

Contact

EGI.eu Operations:

  • GGUS Support Unit: Operation
  • operations @ egi.eu

EGI Operations Support:

  • GGUS Suport Unit: EGI Operations Support
  • operations-support @ mailman.egi.eu

Duties

Shifters work instructions

In this section are collected all work instructions containing detailed information specifying exactly what steps are to be followed to carry out an activity.

Action Responsible
Procedure Instructions and related pages
ROD certification OS
Creation of a new NGI OS
Operations Centre decommission O

Setting a Nagios test to an operations test O

Operational portal dashboard issues O
Availability/reliability followup procedure O

Underperforming sites and suspensions

Unknown followup procedure O
Top-level BDII followup procedure O
ROD performance index followup procedure O

Work Instructions

Events

Resources


Oct 2011 to date

  • Please provide a link here



Definition of Operations Support metrics

May 2010-Sep 2011

Until April 2010

  • EGEE-III Operations Support metrics

Nagios tests

OTAG topics

Operational Portal: Dashboard

Pages in draft state