Difference between revisions of "Operations and Operations Support"
Line 139: | Line 139: | ||
*[[Grid operations oversight/WI01|WI01 - New ROD team certification work instructions]] | *[[Grid operations oversight/WI01|WI01 - New ROD team certification work instructions]] | ||
*[https://wiki.egi.eu/wiki/Grid_operations_oversight/WI02 WI02 - New Opertions Centre creation work instruction] | *[https://wiki.egi.eu/wiki/Grid_operations_oversight/WI02 WI02 - New Opertions Centre creation work instruction] | ||
*[[Grid operations oversight/WI03|WI03 - Availability and reliability work instruction]] | *[[Grid operations oversight/WI03|WI03 - Availability and reliability report work instruction]] | ||
*[[Grid operations oversight/WI04|WI04 - Core services report work instruction ]] | *[[Grid operations oversight/WI04|WI04 - Core services report work instruction ]] | ||
*[[Grid operations oversight/WI05|WI05 - Escalation procedure in case of unresponsive NGI]] | *[[Grid operations oversight/WI05|WI05 - Escalation procedure in case of unresponsive NGI]] | ||
*[[Grid operations oversight/WI06|WI06 - Tickets > 30 days]] | *[[Grid operations oversight/WI06|WI06 - Tickets > 30 days]] | ||
*[ | *[[Grid operations oversight/WI06|WI07 - Top-BDII report work instruction]] | ||
*[[Grid operations oversight/WI06|WI08 - Unknown report work instruction]] | |||
= Events = | = Events = |
Revision as of 16:14, 27 November 2012
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
EGI Infrastructure Operations Oversight menu: | Home • | EGI.eu Operations Team • | Regional Operators (ROD) |
Introduction
COD team is a small team responsible for coordination of RODs, provided on a global layer. COD represents the whole ROD structure in terms of technical requirements for operations tools as well as on political level.
The purpose of this page is to collect all materials needed by COD team to perform the Grid operations oversight activities.
People and contact
COD team is formed from Dutch and Polish team and includes COD managers (people responsible for managerial issues) and COD shifters (people performing day-to-day COD work)
- COD managers:
- Ron Trompert (Chair), Marcin Radecki, Luuk Uljee, Tadeusz Szymocha, Magda Szopa
- COD shifters:
- Tadeusz Szymocha, Magda Szopa, Ron Trompert, Luuk Uljee, Maarten van Ingen, Ernst Pijper, Alexander Verkooijen
There are 2 mailing lists used for different cases:
- manager-central-operator-on-duty AT mailman.egi.eu - for COD managerial issues like suggesting changes in procedures, tools. COD managers are recipients of this list.
- central-operator-on-duty AT mailman.egi.eu - for reporting COD day-to-day issues like problems with tools or Nagios tests. COD shifters are recipients of this list.
COD Duties
- COD managers
- representing RODs/COD in OTAG, OMB and Operations meetings - collecting requirements and improvements proposals from RODs concerning operations tools and procedures
- suspending Resource Centres in case of operational issues
- taking part in OLA task force
- writing new procedures - in case of need COD is taking part in procedures creation process
- preparing ROD newsletters - informing RODs about recent and upcoming developments related to Grid Oversight
- preparing ROD metrics reports - providing an overview of operations support process in grid infrastructure.
- COD shifters
- escalation of operational problems with RODs
- dealing with GGUS tickets assigned to COD
- process coordination of:
- creation and decommission of Operations Centre
- setting a Nagios test to an operations test
- getting explanations for low availability and reliability metrics
COD shifters work instructions
In this section are collected all work instructions containing detailed information specifying exactly what steps are to be followed to carry out an activity.
Action | Description | Related procedures |
---|---|---|
GGUS tickets assigned to COD |
COD shifter is obliged to check the current status of all GGUS tickets assigned to COD
If the shifter doesn't know what kind of action should be taken, he/she should contact COD managers |
|
Operational portal dashboard issues | ||
Handover |
|
|
Availability/reliability followup procedure | ||
Unknown followup procedure | ||
Top-level BDII followup procedure |
|
|
ROD performance index followup procedure |
Work Instructions
- WI01 - New ROD team certification work instructions
- WI02 - New Opertions Centre creation work instruction
- WI03 - Availability and reliability report work instruction
- WI04 - Core services report work instruction
- WI05 - Escalation procedure in case of unresponsive NGI
- WI06 - Tickets > 30 days
- WI07 - Top-BDII report work instruction
- WI08 - Unknown report work instruction
Events
- EGI indico page with COD meeting agendas.
- All open actions can be found from COD actions
Resources
- Document server: ROD newsletter
- Document server: Operations Support Metrics
- Operations Procedures
- Youtube channel
Oct 2011 to date
- Please provide a link here
Definition of Operations Support metrics
May 2010-Sep 2011
- Operations Support metrics
Until April 2010
- EGEE-III Operations Support metrics
Nagios tests
- Operations tests list : list of Nagios probes generating alarms for visualization in the Operations Dashboard
- Availability and reliability tests list: list of Nagios probes whose results are used for Availability and Reliability computation