Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI Infrastructure operations oversight"

From EGIWiki
Jump to navigation Jump to search
Line 36: Line 36:


== COD shifters daily work instructions ==
== COD shifters daily work instructions ==
In this section were collected all work instructions containing detailed instructions that specify exactly what steps to follow to carry out an activity.  
In this section are collected all work instructions containing detailed information specifying exactly what steps are to be followed to carry out an activity.  


=== Dealing with the GGUS tickets assigned to COD ===
=== Dealing with the GGUS tickets assigned to COD ===
* COD shifter is oblige to check current status of all tickets assigned in GGUS to COD: [http://tinyurl.com/2ws735h Link to all GGUS tickets assigned to COD]
* COD shifter is obliged to check the current status of all GGUS tickets assigned to COD: [http://tinyurl.com/2ws735h Link to all GGUS tickets assigned to COD]
* If the ticket is waiting for COD action then he/she should perform the action
* If the ticket is waiting for COD action then he/she should perform the action
* In case of request for:
* In case of a request for:
** '''ROD certification''' see [[Procedure_to_handle_new_ROD_certification_GGUS_tickets | New ROD team certification work instruction]]
** '''ROD certification''' - see [[Procedure_to_handle_new_ROD_certification_GGUS_tickets | New ROD team certification work instructions]]
** '''New NGI creation''' see [[Operations:NewNGIs_creation |  New NGI creation process coordination]]
** '''New NGI creation''' - see [[Operations:NewNGIs_creation |  New NGI creation process coordination]]
*** In case where COD is the Integration Process Coordinator, COD is responsible for the whole procedure.  
*** In case where COD is also the Integration Process Coordinator, COD is responsible for the whole procedure.  
** '''Operations Centre decommission''' see [[Operations:Operations_Centre_decommission|Operations Centre decommission process coordination]]
** '''Operations Centre decommission''' see [[Operations:Operations_Centre_decommission|Operations Centre decommission process coordination]]
*** COD validate the request and remove ROD information from all-operators mailing list
*** COD validates the request and removes ROD information from all-operators mailing list
** '''Setting Nagios test an operations test''' see [[Operations:Procedure_for_setting_Nagios_test_an_operations_test| Procedure for setting Nagios test an operations test]]  
** '''Setting Nagios test to an operations test''' see [[Operations:Procedure_for_setting_Nagios_test_an_operations_test| Procedure for setting Nagios test to an operations test]]  
*** COD is responsible for the coordination of the whole process.
*** COD is responsible for coordinating the whole process.
* If the shifter doesn't know what kind of action should be taken, he/she should contact with COD managers
* If the shifter doesn't know what kind of action should be taken, he/she should contact COD managers


=== Availability/reliability reports ===
=== Availability/reliability reports ===
Line 57: Line 57:


=== Issues on Operational portal dashboard ===
=== Issues on Operational portal dashboard ===
*[[Operations:Work_instruction_for_escalating_operational_problems_with_ROD | Escalation for operational problem with ROD - work instruction]]
*[[Operations:Work_instruction_for_escalating_operational_problems_with_ROD | Escalation for operational problems with ROD - work instruction]]


=== Handover ===
=== Handover ===
* At the and of the shift handover should be submitted send to COD via Handover tool in the Operational Portal  
* At the end of the shift a handover should be submitted (send to COD) via Handover tool in the Operational Portal  
** Problems on the dashboard which will pass to next week: the ggus id of the ticket and when next escalation step should be taken
** Problems on the dashboard which will pass to next week: the ggus id of the ticket and when next escalation step should be taken
** GGUS tickets assigned to COD: for each one should be provided last status and the action taken by the shifter
** GGUS tickets assigned to COD: for each ticket its last status and the action taken by the shifter should be provided
** Other issues: problems with the tools etc.
** Other issues: problems with tools etc.


== Internal area ==
== Internal area ==

Revision as of 15:58, 10 December 2010

EGI.eu Operations Oversight Pages

EGI Grid Operations oversight of the e-Infrastructure is a co-ordination task for ensuring that GRID monitoring across EGI runs smoothly. This team communicates among the 3 groups - Operations and e-Infrastructure Oversight (OE); Operational Documentation (OD); and "Coordination of interoperations between NGIs and with other Grids".

The Operations oversight team works with the Tool Developers (and particularly the OTAG group), NGIs and their Operations Teams (ROD). There are regular phone meetings for the co-ordinators and others working in the tasks. The OE co-ordinators also organise face to face meetings for the ROD teams 3 to 4 times a year.

Co-ordinators:
Ron Trompert (Chair), Marcin Radecki, Luuk Uljee
Deputy:
Malgorzata Krakowian
Contact:
  • There are 3 mailing lists used for different cases:
    • manager-central-operator-on-duty AT mailman.egi.eu - for COD managerial issues like suggesting changes in procedures, tools. COD managers are recipients of this list.
    • central-operator-on-duty AT mailman.egi.eu - for reporting COD day-to-day issues like problems with tools or Nagios tests. COD shifters are recipients of this list.
    • all-central-operator-on-duty AT mailman.egi.eu - for contacting all ROD teams in NGIs. Each ROD team is a recipient of this list.



COD offical web pages

Procedures used in COD activity

In this section were collected all procedures in force for COD

COD shifters daily work instructions

In this section are collected all work instructions containing detailed information specifying exactly what steps are to be followed to carry out an activity.

Dealing with the GGUS tickets assigned to COD

Availability/reliability reports

Issues on Operational portal dashboard

Handover

  • At the end of the shift a handover should be submitted (send to COD) via Handover tool in the Operational Portal
    • Problems on the dashboard which will pass to next week: the ggus id of the ticket and when next escalation step should be taken
    • GGUS tickets assigned to COD: for each ticket its last status and the action taken by the shifter should be provided
    • Other issues: problems with tools etc.

Internal area


NOTE: all procedures should contain the following template: https://wiki.egi.eu/wiki/PDT:Procedure_Template

Procedures

Approved

To be approved by OMB

OTAG topics

Operational Portal: Dashboard

GOC DB

Pages in draft state