Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Regional Operator on Duty"

From EGIWiki
Jump to navigation Jump to search
 
(23 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{Template:Op menubar}}  
{{Template:Op menubar}} {{Template:GO menubar}} {{TOC_right}}  
{{Template:GO menubar}} {{TOC_right}}  
[[Category:Grid Oversight]]


= Introduction =
= ROD =


ROD team is responsible for solving problems on the infrastructure within own Operations Centre according to agreed procedures. They ensure that problems are properly recorded and progress according to specified time lines. They ensure that necessary information is available to all parties. The team is provided by each Operations Centre and requires procedural knowledge on the process.


The purpose of this page is to collect in one place all materials related to ROD work.


<br>
:'''ROD''' (Regional Operator on Duty) is a role which oversees the smooth operation of EGI&nbsp;infrastructure in the respective NGI. ROD team is responsible for solving problems on the infrastructure within own Operations Centre according to agreed procedures. They ensure that problems are properly recorded and progress according to specified time lines. They ensure that necessary information is available to all parties. The team is provided by each Operations Centre and requires procedural knowledge on the process. The role is usually covered by a team or people and is provided by each NGI. Depending on how an NGI is organised there might be a number of members inthe ROD team who work on duty roster (shifts on a daily or weekly basis), or there may be one person working as ROD on a daily basis and a few deputies who take over the responsibilities when necessary. This latter model is generally more suitable for small NGIs.


'''If you are new in this activity please see first page '''[[Grid_operations_oversight/ROD_Welcome_page |'''ROD Welcome''']]
:In this text, the acronym '''ROD''' will be used both for the whole team, or for the person who is actually working on shift.


= People and Contact  =
:In order to become a ROD member, one first needs to go through the steps described in [[Regional Operator on Duty welcome|Joining operations]].


The list of people responsible for NGI oversight and contact points can be found in [https://operations-portal.in2p3.fr/dashboard/regionalPreferences Operations Portal].  
:The following text describes the duties that ROD (teams) are responsible for.


To contact with all ROD teams can be used following mailing list where are subscribed all RODs' mailing lists:


*'''all-operator-on-duty''' AT mailman.egi.eu


= ROD duties  =
'''Contact: '''all-operator-on-duty AT mailman.egi.eu <br>


The Regional Operations team is responsible for detecting problems, coordinating the diagnosis, and monitoring the problems through to a resolution. It monitors sites in their region, and react to problems identified by the monitors, either<br>directly or indirectly, provide support to sites as needed, add to the knowledge base, and provide informational flow to oversight bodies in cases of non-reactive or non-responsive sites. ROD is a team responsible for solving problems on the infrastructure according to agreed procedures. They ensure that problems are properly recorded and progress according to specified time lines. They ensure that necessary information is available to all parties. The team is provided by each ROC and requires procedural knowledge on the process (rather than technical skills) for their work.
== [[ROD Duties|Duties]]  ==


All duties listed are mandatory for ROD team:<br>
:A list describing [[ROD Duties|Duties]].


*'''Handling incidents '''- The main responsibility of ROD is to deal with incidents at sites in the region. This includes making sure that the tickets are opened and handled properly. The procedure for handling tickets is described in [[PROC01| COD esclation procedure]]<br>
== [[ROD Alarms and tickets|Alarms and tickets]] ==
*'''Propagate actions from COD down to sites''' - ROD is responsible for ensuring that decisions taken on the COD level are propagated to sites.
*'''Putting a site in downtime or suspend for urgent matters''' - In general, ROD can place a site in downtime (in the GOCDB) if it is either requested by the site, or ROD sees an urgent need to put the site into downtime. ROD may also suspend a site, under exceptional circumstances, without going through all the steps of the escalation procedure. For example, if a security hazard occurs, ROD must suspend a site on the spot in the case of such an emergency. It is important to know that COD can also suspend a site in the case of an emergency e.g. security incidents or lack of response.
*'''Notify COD about core or urgent matters''' - ROD should create tickets to COD in the case of core or urgent matters.


= Manuals and procedures  =
:Information on how to [[ROD Alarms and tickets|deal]] with alarms raised in the Dashboard and how to generate and deal with tickets.


In this section are linked manuals and procedures which RODs should be familiar with&nbsp;:
== [[ROD Downtimes|Downtimes]]  ==


*[[PROC01|COD Escalation Procedure]]  
:How [[ROD Downtimes|downtimes]] are managed.
*[https://documents.egi.eu/document/301 Dashboard HowTOs and Training Guides]


*[[Grid operations oversight/ROD FAQ|ROD FAQ ]]  
== [[ROD Communication|Communication]] ==


== Video tutorials  ==
:Communication [[ROD Communication|channels]] for ROD to Sites and to management.


*[http://www.youtube.com/watch?v=p-SrqJMDlOo 1. How to become a ROD member] - 7 steps which should be done to become a ROD member
== [[ROD Security|Security]] ==


*[http://www.youtube.com/watch?v=bNm4oupAmqI 2. Operations tools] - brief introduction of operations tools which a ROD mamber needs to perform duties
:How ROD should deal with [[ROD Security|security]] issues.


*[http://www.youtube.com/watch?v=rmgdaziDhUk 3. How to handle alarms] - an instruction how to manage alarms on the Operations Portal (ticket creation from an alarm, closing and masking alarms)&nbsp;&nbsp;
== Manuals and procedures  ==


*[http://www.youtube.com/watch?v=NKkbnwWnADw 4. How to handle tickets] - an instruction how to manage tickets on the Operations Portal (ticket creation, updating and closing tickets)
In this section are linked manuals and procedures which RODs should be familiar with:  


*[http://www.youtube.com/watch?v=5EEInTO2dVE 5. Issues escalated to COD] - an introduction of cases which are escalated to COD and how to deal with
*[[PROC01 Grid Oversight escalation|PROC01_Grid_Oversight_escalation]]
*[https://documents.egi.eu/document/301 Dashboard HowTOs and Training Guides]
**Webinar shortcuts.
**Introduction.&nbsp;
***[https://www.youtube.com/watch?feature=player_detailpage&v=pJsCx5sj9Uc#t=230 ROD duties ]
***[https://www.youtube.com/watch?feature=player_detailpage&v=pJsCx5sj9Uc#t=355 ROD – procedures&nbsp;]
**Becoming ROD team Member
***[https://www.youtube.com/watch?feature=player_detailpage&v=pJsCx5sj9Uc#t=427 Obtaining X509 certificate ]
***[https://www.youtube.com/watch?feature=player_detailpage&v=pJsCx5sj9Uc#t=496 Registration in GOCDB ] [https://www.youtube.com/watch?feature=player_detailpage&v=SBelpfcc00Y#t=690]
***[https://www.youtube.com/watch?feature=player_detailpage&v=pJsCx5sj9Uc#t=690 Registration in GGUS ]
***[https://www.youtube.com/watch?feature=player_detailpage&v=pJsCx5sj9Uc#t=747 Registration in dteam VO ]
**ROD shift
***[https://www.youtube.com/watch?feature=player_detailpage&v=pJsCx5sj9Uc#t=872 Dashboard overview ]
***[https://www.youtube.com/watch?feature=player_detailpage&v=pJsCx5sj9Uc#t=1493 Issues aka alarms ]  
***[https://www.youtube.com/watch?feature=player_detailpage&v=pJsCx5sj9Uc#t=1967 Tickets ]
***[https://www.youtube.com/watch?feature=player_detailpage&v=pJsCx5sj9Uc#t=2232 Notepads ]
***[https://www.youtube.com/watch?feature=player_detailpage&v=pJsCx5sj9Uc#t=2475 Handover ]
**Webinar Presentation
***[https://documents.egi.eu/public/RetrieveFile?docid=301&version=7&filename=ROD-webinar.pdf Slides]
*[[FAQ Regional Operator on Duty|FAQ_Regional_Operator_on_Duty]]


*[http://www.youtube.com/watch?v=tsbcYoGNZls 6. Operations portal tools] - a brief introduction of the Operations Portal tools
== Resources  ==


= ROD performance - Operations Support Metrics  =
*[[Tools|Operations tools]]  
'''THIS SECTION IS OBSOLETED'''
*[[Operations Procedures|Procedures]]
<br>
The Operations Support Metrics are designed to provide an overview of operations support process in grid infrastructure. The operations support means all actions related to identification, investigation and operational problem solution.
 
More information about metrics can be found in&nbsp; [[Grid operations oversight/OperationsSupportMetrics|Operations Support Metrics introduction]]  
 
*2011
**[https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2011-01.ods Jan]|[https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2011-02.ods Feb]| [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2011-03.ods Mar] | [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2011-04.ods Apr] | [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2011-05.ods May]|[https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2011-06.ods Jun]|[https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2011-07.ods Jul] | [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2011-08.ods Sept]| [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2011-09.ods Aug]
*2010
**[https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2010-05.ods May] | [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2010-06.ods Jun] | [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2010-07.ods Jul] | [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2010-08.ods Aug] | [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2010-09.ods Sep] | [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2010-10.ods Oct] | [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2010-11.ods Nov] | [https://documents.egi.eu/secure/RetrieveFile?docid=155&version=2&filename=EGI-Operations_Support_Metrics-2010-12.ods Dec]
 
<br>
 
Old [https://documents.egi.eu/secure/ShowDocument?docid=829 EGEE 3 metrics]
 
= Newsletter<br>  =
 
A ROD Newsletter is periodically released since December 2010 to consolidate the Grid oversight teams (central and local ones). The purpose of this newsletter is to inform about recent and upcoming developments related to Grid Oversight and to show the support performance indicators during the month.&nbsp; <span lang="en" class="short_text" id="result_box"><span class="hps" title="Kliknij, aby wyświetlić alternatywne tłumaczenia">It</span> <span class="hps" title="Kliknij, aby wyświetlić alternatywne tłumaczenia">is</span> <span class="hps" title="Kliknij, aby wyświetlić alternatywne tłumaczenia">issued</span> <span class="hps" title="Kliknij, aby wyświetlić alternatywne tłumaczenia">every month</span></span> and the information about new releases is sent to all RODs mailing list and to NGI managers.<br> The newsletters may be found at: https://documents.egi.eu/public/ShowDocument?docid=298
 
= ROD presentations  =
 
This section is created to collect all ROD presentations which took place on our f2f meetings.
 
*[https://www.egi.eu/indico/getFile.py/access?contribId=210&sessionId=9&resId=0&materialId=slides&confId=207 NGI_IBERGRID]


= Events  =
[[Category:Infrastructure_Oversight]]
 
Technical Forum 2012
 
*[https://indico.egi.eu/indico/contributionDisplay.py?sessionId=56&contribId=242&confId=1019 Grid Oversight Session]
 
Technical Forum 2011
 
*[https://www.egi.eu/indico/contributionDisplay.py?contribId=35&confId=452 Grid Oversight session]<br>
 
User Forum 2011
 
*[https://www.egi.eu/indico/contributionDisplay.py?sessionId=9&contribId=91&confId=207 ROD teams training session]
*[https://www.egi.eu/indico/contributionDisplay.py?sessionId=9&contribId=92&confId=207 Grid Oversight, ensuring the quality of the Grid infrastructure]
 
EGI technical Forum 2010
 
*[https://www.egi.eu/indico/sessionDisplay.py?sessionId=117&confId=48#20100915 Grid Oversight, ensuring the quality of the Grid infrastructure]
*[https://www.egi.eu/indico/sessionDisplay.py?sessionId=116&confId=48#all Grid Oversight Training]
 
ROD teams workshop Jun 2010
 
*[https://www.egi.eu/indico/conferenceDisplay.py?ovw=True&confId=29 ROD teams workshop]
 
= Resources  =
 
*[[Tools|Operations tools]]
*[[Operations_Procedures Operations |Procedures]]

Latest revision as of 09:54, 30 March 2015

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Infrastructure Operations Oversight menu: Home EGI.eu Operations Team Regional Operators (ROD) 



ROD

ROD (Regional Operator on Duty) is a role which oversees the smooth operation of EGI infrastructure in the respective NGI. ROD team is responsible for solving problems on the infrastructure within own Operations Centre according to agreed procedures. They ensure that problems are properly recorded and progress according to specified time lines. They ensure that necessary information is available to all parties. The team is provided by each Operations Centre and requires procedural knowledge on the process. The role is usually covered by a team or people and is provided by each NGI. Depending on how an NGI is organised there might be a number of members inthe ROD team who work on duty roster (shifts on a daily or weekly basis), or there may be one person working as ROD on a daily basis and a few deputies who take over the responsibilities when necessary. This latter model is generally more suitable for small NGIs.
In this text, the acronym ROD will be used both for the whole team, or for the person who is actually working on shift.
In order to become a ROD member, one first needs to go through the steps described in Joining operations.
The following text describes the duties that ROD (teams) are responsible for.


Contact: all-operator-on-duty AT mailman.egi.eu

Duties

A list describing Duties.

Alarms and tickets

Information on how to deal with alarms raised in the Dashboard and how to generate and deal with tickets.

Downtimes

How downtimes are managed.

Communication

Communication channels for ROD to Sites and to management.

Security

How ROD should deal with security issues.

Manuals and procedures

In this section are linked manuals and procedures which RODs should be familiar with:

Resources