Difference between revisions of "Operations and Operations Support"
Line 52: | Line 52: | ||
In case of a request for: | In case of a request for: | ||
* '''ROD certification''' | * '''ROD certification''' | ||
** see [[ | ** see [[Grid_operations_oversight/WI01| New ROD team certification work instructions]] | ||
* '''Creation of a new NGI''' | * '''Creation of a new NGI''' | ||
** see [[ | ** see [[PROC02 | Creation of a new Operations Centre process coordination]] | ||
** In case where COD is also the Integration Process Coordinator, COD is responsible for the whole procedure. | ** In case where COD is also the Integration Process Coordinator, COD is responsible for the whole procedure. | ||
* '''Operations Centre decommission''' | * '''Operations Centre decommission''' | ||
** see [[ | ** see [[PROC03|Operations Centre decommission process coordination]] | ||
** COD validates the request and removes ROD information from all-operators mailing list | ** COD validates the request and removes ROD information from all-operators mailing list | ||
* '''Setting a Nagios test to an operations test''' | * '''Setting a Nagios test to an operations test''' | ||
** see [[ | ** see [[PROC06| Procedure for setting a Nagios test to an operations test]] | ||
** COD is responsible for coordinating the whole process. | ** COD is responsible for coordinating the whole process. | ||
If the shifter doesn't know what kind of action should be taken, he/she should contact COD managers | If the shifter doesn't know what kind of action should be taken, he/she should contact COD managers | ||
| | | | ||
* [[ | * [[PROC02| Creation of a new Operations Centre process coordination]] | ||
* [[ | * [[PROC03|Operations Centre decommission process coordination]] | ||
* [[ | * [[PROC06| Procedure for setting Nagios test an operations test]] | ||
|- | |- | ||
| '''Availability/reliability reports''' | | '''Availability/reliability reports''' | ||
| | | | ||
* Handling availability/reliability reports: [[ | * Handling availability/reliability reports: [[Availability_and_reliability_work_instruction_for_COD | Availability and reliability work instruction]] | ||
** [[ | ** [[Underperforming_sites_and_suspensions | AR reports metrics]] | ||
| | | | ||
* [[Operations:COD_Escalation_Procedure|COD escalation procedure]] | * [[Operations:COD_Escalation_Procedure|COD escalation procedure]] | ||
Line 80: | Line 80: | ||
| | | | ||
*[https://operations-portal.in2p3.fr/dashboard/ccodView COD dashboard link] | *[https://operations-portal.in2p3.fr/dashboard/ccodView COD dashboard link] | ||
| | | | ||
* [[ | * [[PROC01|COD escalation procedure]] | ||
|- | |- | ||
| '''Handover''' | | '''Handover''' |
Revision as of 15:20, 22 March 2011
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Introduction
The purpose of this page is to collect all materials needed by COD team to perform the Grid operations oversight activities.
People and contact
COD team is formed from Dutch and Polish team and includes COD managers (people responsible for managerial issues) and COD shifters (people performing day-to-day COD work)
- COD managers:
- Ron Trompert (Chair), Marcin Radecki, Luuk Uljee, Malgorzata Krakowian
- COD shifters:
- Malgorzata Krakowian, Ron Trompert, Luuk Uljee, Maarten van Ingen, Ernst Pijper, Alexander Verkooijen
There are 2 mailing lists used for different cases:
- manager-central-operator-on-duty AT mailman.egi.eu - for COD managerial issues like suggesting changes in procedures, tools. COD managers are recipients of this list.
- central-operator-on-duty AT mailman.egi.eu - for reporting COD day-to-day issues like problems with tools or Nagios tests. COD shifters are recipients of this list.
COD Duties
- COD managers
- representing RODs/COD in OTAG, OMB and Operations meetings - collecting requirements and improvements proposals from RODs concerning operations tools and procedures
- suspending Resource Centres in case of operational issues
- taking part in OLA task force
- writing new procedures - in case of need COD is taking part in procedures creation process
- preparing ROD newsletters - informing RODs about recent and upcoming developments related to Grid Oversight
- preparing ROD metrics reports - providing an overview of operations support process in grid infrastructure.
- COD shifters
- escalation of operational problems with RODs
- dealing with GGUS tickets assigned to COD
- process coordination of:
- creation and decommission of Operations Centre
- setting a Nagios test to an operations test
- getting explanations for low availability and reliability metrics
COD shifters work instructions
In this section are collected all work instructions containing detailed information specifying exactly what steps are to be followed to carry out an activity.
Action | Description | Related procedures |
---|---|---|
GGUS tickets assigned to COD |
COD shifter is obliged to check the current status of all GGUS tickets assigned to COD
If the shifter doesn't know what kind of action should be taken, he/she should contact COD managers |
|
Availability/reliability reports |
|
|
Operational portal dashboard issues | ||
Handover |
|
NOTE: all procedures should contain the following template: https://wiki.egi.eu/wiki/PDT:Procedure_Template