Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Operations and Operations Support"

From EGIWiki
Jump to navigation Jump to search
Line 1: Line 1:
{{Template:Op menubar}}
{{Template:Op menubar}} {{TOC_right}}  
{{TOC_right}}


[[Category:COD]]
= Introduction  =
= Introduction  =


'''COD''''''team '''is a small team responsible for coordination of RODs, provided on a global layer. COD represents the whole ROD structure in terms of technical requirements for operations tools as well as on political level.
'''COD team '''is a small team responsible for coordination of RODs, provided on a global layer. COD represents the whole ROD structure in terms of technical requirements for operations tools as well as on political level.  


The purpose of this page is to collect all materials needed by COD team to perform the Grid operations oversight activities.
The purpose of this page is to collect all materials needed by COD team to perform the Grid operations oversight activities.  


= People and contact  =
= People and contact  =
Line 25: Line 23:
*'''central-operator-on-duty''' AT mailman.egi.eu - for reporting COD day-to-day issues like problems with tools or Nagios tests. '''COD shifters''' are recipients of this list.
*'''central-operator-on-duty''' AT mailman.egi.eu - for reporting COD day-to-day issues like problems with tools or Nagios tests. '''COD shifters''' are recipients of this list.


= COD Duties =
= COD Duties =
* COD managers
 
** '''representing RODs/COD in OTAG, OMB and Operations meetings''' - collecting requirements and improvements proposals from RODs concerning operations tools and procedures
*COD managers  
** '''suspending Resource Centres''' in case of operational issues
**'''representing RODs/COD in OTAG, OMB and Operations meetings''' - collecting requirements and improvements proposals from RODs concerning operations tools and procedures  
** '''taking part in OLA task force'''
**'''suspending Resource Centres''' in case of operational issues  
** '''writing new procedures''' - in case of need COD is taking part in procedures creation process  
**'''taking part in OLA task force'''  
** '''preparing ROD newsletters''' - informing RODs about recent and upcoming developments related to Grid Oversight  
**'''writing new procedures''' - in case of need COD is taking part in procedures creation process  
** '''preparing ROD metrics reports''' - providing an overview of operations support process in grid infrastructure.
**'''preparing ROD newsletters''' - informing RODs about recent and upcoming developments related to Grid Oversight  
* COD shifters
**'''preparing ROD metrics reports''' - providing an overview of operations support process in grid infrastructure.  
** '''escalation of operational problems with RODs'''  
*COD shifters  
** '''dealing with GGUS tickets assigned to COD'''
**'''escalation of operational problems with RODs'''  
** '''process coordination''' of:
**'''dealing with GGUS tickets assigned to COD'''  
*** creation and decommission of Operations Centre
**'''process coordination''' of:  
*** setting a Nagios test to an operations test
***creation and decommission of Operations Centre  
*** getting explanations for low availability and reliability metrics
***setting a Nagios test to an operations test  
***getting explanations for low availability and reliability metrics
 
= COD shifters work instructions  =


= COD shifters work instructions =
In this section are collected all work instructions containing detailed information specifying exactly what steps are to be followed to carry out an activity.  
In this section are collected all work instructions containing detailed information specifying exactly what steps are to be followed to carry out an activity.  


{| border="1" cellspacing="0" cellpadding="5" align="center"
{| cellspacing="0" cellpadding="5" border="1" align="center"
! Action
|-
! Description
! Action  
! Description  
! Related procedures
! Related procedures
|-v
|-
| '''GGUS tickets assigned to COD'''
| '''GGUS tickets assigned to COD'''  
|
|  
COD shifter is obliged to check the current status of all '''GGUS tickets assigned to COD'''
COD shifter is obliged to check the current status of all '''GGUS tickets assigned to COD'''  
* see [http://tinyurl.com/2ws735h Link to all GGUS tickets assigned to COD]
* If the ticket is waiting for COD action then he/she should perform the action


*see [http://tinyurl.com/2ws735h Link to all GGUS tickets assigned to COD]
*If the ticket is waiting for COD action then he/she should perform the action


In case of a request for:
<br> In case of a request for:  
* '''ROD certification'''  
 
** see [[Grid_operations_oversight/WI01| New ROD team certification work instructions]]
*'''ROD certification'''  
* '''Creation of a new NGI'''  
**see [[Grid operations oversight/WI01|New ROD team certification work instructions]]  
** see [[PROC02 | Creation of a new Operations Centre process coordination]]
*'''Creation of a new NGI'''  
** In case where COD is also the Integration Process Coordinator, COD is responsible for the whole procedure.  
**see [[PROC02|Creation of a new Operations Centre process coordination]]  
* '''Operations Centre decommission'''  
**In case where COD is also the Integration Process Coordinator, COD is responsible for the whole procedure.  
** see [[PROC03|Operations Centre decommission process coordination]]
*'''Operations Centre decommission'''  
** COD validates the request and removes ROD information from all-operators mailing list
**see [[PROC03|Operations Centre decommission process coordination]]  
* '''Setting a Nagios test to an operations test'''  
**COD validates the request and removes ROD information from all-operators mailing list  
** see [[PROC06| Procedure for setting a Nagios test to an operations test]]  
*'''Setting a Nagios test to an operations test'''  
** COD is responsible for coordinating the whole process.
**see [[PROC06|Procedure for setting a Nagios test to an operations test]]  
**COD is responsible for coordinating the whole process.
 
If the shifter doesn't know what kind of action should be taken, he/she should contact COD managers


If the shifter doesn't know what kind of action should be taken, he/she should contact COD managers
|  
|  
* [[PROC02| Creation of a new Operations Centre process coordination]]
*[[PROC02|Creation of a new Operations Centre process coordination]]  
* [[PROC03|Operations Centre decommission process coordination]]
*[[PROC03|Operations Centre decommission process coordination]]  
* [[PROC06| Procedure for setting Nagios test an operations test]]  
*[[PROC06|Procedure for setting Nagios test an operations test]]
 
|-
|-
| '''Availability/reliability reports'''
| '''Availability/reliability reports'''  
|  
|  
* Handling availability/reliability reports: [[Availability_and_reliability_work_instruction_for_COD | Availability and reliability work instruction]]
*Handling availability/reliability reports: [[Availability and reliability work instruction for COD|Availability and reliability work instruction]]  
** [[Underperforming_sites_and_suspensions | AR reports metrics]]
**[[Underperforming sites and suspensions|AR reports metrics]]
 
|  
|  
* [[Operations:COD_Escalation_Procedure|COD escalation procedure]]
*[[Operations:COD Escalation Procedure|COD escalation procedure]]  
* [[Availability_and_reliability_monthly_statistics | Availability and reliability monthly statistics procedure]]
*[[Availability and reliability monthly statistics|Availability and reliability monthly statistics procedure]]
 
|-
|-
| '''Operational portal dashboard issues'''
| '''Operational portal dashboard issues'''  
|  
|  
*[https://operations-portal.egi.eu/dashboard/ccodView COD dashboard link]
*[https://operations-portal.egi.eu/dashboard/ccodView COD dashboard link]
|
 
* [[PROC01|COD escalation procedure]]
|  
*[[PROC01|COD escalation procedure]]
 
|-
|-
| '''Handover'''
| '''Handover'''  
|
[https://operations-portal.egi.eu/dashboard/ccodView COD dashboard link]
 
*At the end of the shift a handover should be submitted (send to COD) via Handover tool in the Operational Portal
**Problems on the dashboard which will pass to next week: the ggus id of the ticket and when next escalation step should be taken
**GGUS tickets assigned to COD: for each ticket its last status and the action taken by the shifter should be provided
**Other issues: problems with tools etc.
 
|  
|  
[https://operations-portal.egi.eu/dashboard/ccodView COD dashboard link]
* At the end of the shift a handover should be submitted (send to COD) via Handover tool in the Operational Portal
** Problems on the dashboard which will pass to next week: the ggus id of the ticket and when next escalation step should be taken
** GGUS tickets assigned to COD: for each ticket its last status and the action taken by the shifter should be provided
** Other issues: problems with tools etc.
|
|-
|}
|}


<br> ''NOTE: all procedures should contain the following template: https://wiki.egi.eu/wiki/PDT:Procedure_Template''


''NOTE: all procedures should contain the following template: https://wiki.egi.eu/wiki/PDT:Procedure_Template''
= Events =
 
= Events =


* [[Grid_operations_oversight/CODOD|Phone conference Meetings, Agenda and Actions]]
*[[Grid operations oversight/CODOD|Phone conference Meetings, Agenda and Actions]]


= Resources  =
= Resources  =


*[https://documents.egi.eu/secure/ShowDocument?docid=298 Document server: ROD newsletter]  
*[https://documents.egi.eu/secure/ShowDocument?docid=298 Document server: ROD newsletter]  
*[https://documents.egi.eu/secure/ShowDocument?docid=155 Document server: Operations Support Metrics]
*[https://documents.egi.eu/secure/ShowDocument?docid=155 Document server: Operations Support Metrics]  
*[http://www.youtube.com/user/EGIGridOversight Youtube channel]
*[http://www.youtube.com/user/EGIGridOversight Youtube channel]


Line 143: Line 151:
*[[Grid operations oversight/ROD Quick Start Guide|ROD_Quick_Start_Guide (draft) ]]
*[[Grid operations oversight/ROD Quick Start Guide|ROD_Quick_Start_Guide (draft) ]]


*[[Grid operations oversight/ROD FAQ |ROD FAQ draft]]
*[[Grid operations oversight/ROD FAQ|ROD FAQ draft]]


[[Category:COD]]
[[Category:COD]]

Revision as of 10:24, 2 June 2011

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security



Introduction

COD team is a small team responsible for coordination of RODs, provided on a global layer. COD represents the whole ROD structure in terms of technical requirements for operations tools as well as on political level.

The purpose of this page is to collect all materials needed by COD team to perform the Grid operations oversight activities.

People and contact

COD team is formed from Dutch and Polish team and includes COD managers (people responsible for managerial issues) and COD shifters (people performing day-to-day COD work)

COD managers: 
Ron Trompert (Chair), Marcin Radecki, Luuk Uljee, Małgorzata Krakowian
COD shifters: 
Małgorzata Krakowian, Ron Trompert, Luuk Uljee, Maarten van Ingen, Ernst Pijper, Alexander Verkooijen


People behind the names


There are 2 mailing lists used for different cases:

  • manager-central-operator-on-duty AT mailman.egi.eu - for COD managerial issues like suggesting changes in procedures, tools. COD managers are recipients of this list.
  • central-operator-on-duty AT mailman.egi.eu - for reporting COD day-to-day issues like problems with tools or Nagios tests. COD shifters are recipients of this list.

COD Duties

  • COD managers
    • representing RODs/COD in OTAG, OMB and Operations meetings - collecting requirements and improvements proposals from RODs concerning operations tools and procedures
    • suspending Resource Centres in case of operational issues
    • taking part in OLA task force
    • writing new procedures - in case of need COD is taking part in procedures creation process
    • preparing ROD newsletters - informing RODs about recent and upcoming developments related to Grid Oversight
    • preparing ROD metrics reports - providing an overview of operations support process in grid infrastructure.
  • COD shifters
    • escalation of operational problems with RODs
    • dealing with GGUS tickets assigned to COD
    • process coordination of:
      • creation and decommission of Operations Centre
      • setting a Nagios test to an operations test
      • getting explanations for low availability and reliability metrics

COD shifters work instructions

In this section are collected all work instructions containing detailed information specifying exactly what steps are to be followed to carry out an activity.

Action Description Related procedures
GGUS tickets assigned to COD

COD shifter is obliged to check the current status of all GGUS tickets assigned to COD


In case of a request for:

If the shifter doesn't know what kind of action should be taken, he/she should contact COD managers

Availability/reliability reports
Operational portal dashboard issues
Handover

COD dashboard link

  • At the end of the shift a handover should be submitted (send to COD) via Handover tool in the Operational Portal
    • Problems on the dashboard which will pass to next week: the ggus id of the ticket and when next escalation step should be taken
    • GGUS tickets assigned to COD: for each ticket its last status and the action taken by the shifter should be provided
    • Other issues: problems with tools etc.


NOTE: all procedures should contain the following template: https://wiki.egi.eu/wiki/PDT:Procedure_Template

Events

Resources

ROD and COD Performance

Nagios tests

OTAG topics

Operational Portal: Dashboard

GOC DB

Pages in draft state