Difference between revisions of "Regional Operator on Duty"

From EGIWiki
Jump to: navigation, search
Line 25: Line 25:
 
All duties listed are mandatory for ROD team:<br>  
 
All duties listed are mandatory for ROD team:<br>  
  
*'''Handling incidents '''- The main responsibility of ROD is to deal with incidents at sites in the region. This includes making sure that the tickets are opened and handled properly. The procedure for handling tickets is described in [[PROC01_Grid_Oversight_escalation_procedure|PROC01_Grid_Oversight_escalation]]<br>  
+
*'''Handling incidents '''- The main responsibility of ROD is to deal with incidents at sites in the region. This includes making sure that the tickets are opened and handled properly. The procedure for handling tickets is described in [[PROC01 Grid Oversight escalation procedure|PROC01_Grid_Oversight_escalation]]<br>  
 
*'''Propagate actions from COD down to sites''' - ROD is responsible for ensuring that decisions taken on the COD level are propagated to sites.  
 
*'''Propagate actions from COD down to sites''' - ROD is responsible for ensuring that decisions taken on the COD level are propagated to sites.  
 
*'''Putting a site in downtime or suspend for urgent matters''' - In general, ROD can place a site in downtime (in the GOCDB) if it is either requested by the site, or ROD sees an urgent need to put the site into downtime. ROD may also suspend a site, under exceptional circumstances, without going through all the steps of the escalation procedure. For example, if a security hazard occurs, ROD must suspend a site on the spot in the case of such an emergency. It is important to know that COD can also suspend a site in the case of an emergency e.g. security incidents or lack of response.  
 
*'''Putting a site in downtime or suspend for urgent matters''' - In general, ROD can place a site in downtime (in the GOCDB) if it is either requested by the site, or ROD sees an urgent need to put the site into downtime. ROD may also suspend a site, under exceptional circumstances, without going through all the steps of the escalation procedure. For example, if a security hazard occurs, ROD must suspend a site on the spot in the case of such an emergency. It is important to know that COD can also suspend a site in the case of an emergency e.g. security incidents or lack of response.  
Line 34: Line 34:
 
In this section are linked manuals and procedures which RODs should be familiar with&nbsp;:  
 
In this section are linked manuals and procedures which RODs should be familiar with&nbsp;:  
  
*[[PROC01_Grid_Oversight_escalation|PROC01_Grid_Oversight_escalation]]
+
*[[PROC01 Grid Oversight escalation|PROC01_Grid_Oversight_escalation]]  
 
*[https://documents.egi.eu/document/301 Dashboard HowTOs and Training Guides]
 
*[https://documents.egi.eu/document/301 Dashboard HowTOs and Training Guides]
  
*[[Grid operations oversight/ROD FAQ|ROD FAQ ]]
+
*[[Grid operations oversight/ROD FAQ|FAQ_Regional_Operator_on_Duty]]
  
 
== Video tutorials  ==
 
== Video tutorials  ==
Line 70: Line 70:
 
= Newsletter<br>  =
 
= Newsletter<br>  =
  
A ROD Newsletter is periodically released since December 2010 to consolidate the Grid oversight teams (central and local ones). The purpose of this newsletter is to inform about recent and upcoming developments related to Grid Oversight and to show the support performance indicators during the month.&nbsp; <span lang="en" id="result_box" class="short_text"><span title="Kliknij, aby wyświetlić alternatywne tłumaczenia" class="hps">It</span> <span title="Kliknij, aby wyświetlić alternatywne tłumaczenia" class="hps">is</span> <span title="Kliknij, aby wyświetlić alternatywne tłumaczenia" class="hps">issued</span> <span title="Kliknij, aby wyświetlić alternatywne tłumaczenia" class="hps">every month</span></span> and the information about new releases is sent to all RODs mailing list and to NGI managers.<br> The newsletters may be found at: https://documents.egi.eu/public/ShowDocument?docid=298  
+
A ROD Newsletter is periodically released since December 2010 to consolidate the Grid oversight teams (central and local ones). The purpose of this newsletter is to inform about recent and upcoming developments related to Grid Oversight and to show the support performance indicators during the month.&nbsp; <span lang="en" class="short_text" id="result_box"><span class="hps" title="Kliknij, aby wyświetlić alternatywne tłumaczenia">It</span> <span class="hps" title="Kliknij, aby wyświetlić alternatywne tłumaczenia">is</span> <span class="hps" title="Kliknij, aby wyświetlić alternatywne tłumaczenia">issued</span> <span class="hps" title="Kliknij, aby wyświetlić alternatywne tłumaczenia">every month</span></span> and the information about new releases is sent to all RODs mailing list and to NGI managers.<br> The newsletters may be found at: https://documents.egi.eu/public/ShowDocument?docid=298  
  
 
= ROD presentations  =
 
= ROD presentations  =

Revision as of 10:24, 20 December 2012

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Infrastructure Operations Oversight menu: Home EGI.eu Operations Team Regional Operators (ROD) 



Introduction

ROD team is responsible for solving problems on the infrastructure within own Operations Centre according to agreed procedures. They ensure that problems are properly recorded and progress according to specified time lines. They ensure that necessary information is available to all parties. The team is provided by each Operations Centre and requires procedural knowledge on the process.

The purpose of this page is to collect in one place all materials related to ROD work.


If you are new in this activity please see first page ROD Welcome

People and Contact

The list of people responsible for NGI oversight and contact points can be found in Operations Portal.

To contact with all ROD teams can be used following mailing list where are subscribed all RODs' mailing lists:

  • all-operator-on-duty AT mailman.egi.eu

ROD duties

The Regional Operations team is responsible for detecting problems, coordinating the diagnosis, and monitoring the problems through to a resolution. It monitors sites in their region, and react to problems identified by the monitors, either
directly or indirectly, provide support to sites as needed, add to the knowledge base, and provide informational flow to oversight bodies in cases of non-reactive or non-responsive sites. ROD is a team responsible for solving problems on the infrastructure according to agreed procedures. They ensure that problems are properly recorded and progress according to specified time lines. They ensure that necessary information is available to all parties. The team is provided by each ROC and requires procedural knowledge on the process (rather than technical skills) for their work.

All duties listed are mandatory for ROD team:

  • Handling incidents - The main responsibility of ROD is to deal with incidents at sites in the region. This includes making sure that the tickets are opened and handled properly. The procedure for handling tickets is described in PROC01_Grid_Oversight_escalation
  • Propagate actions from COD down to sites - ROD is responsible for ensuring that decisions taken on the COD level are propagated to sites.
  • Putting a site in downtime or suspend for urgent matters - In general, ROD can place a site in downtime (in the GOCDB) if it is either requested by the site, or ROD sees an urgent need to put the site into downtime. ROD may also suspend a site, under exceptional circumstances, without going through all the steps of the escalation procedure. For example, if a security hazard occurs, ROD must suspend a site on the spot in the case of such an emergency. It is important to know that COD can also suspend a site in the case of an emergency e.g. security incidents or lack of response.
  • Notify COD about core or urgent matters - ROD should create tickets to COD in the case of core or urgent matters.

Manuals and procedures

In this section are linked manuals and procedures which RODs should be familiar with :

Video tutorials

  • 2. Operations tools - brief introduction of operations tools which a ROD mamber needs to perform duties
  • 3. How to handle alarms - an instruction how to manage alarms on the Operations Portal (ticket creation from an alarm, closing and masking alarms)  
  • 4. How to handle tickets - an instruction how to manage tickets on the Operations Portal (ticket creation, updating and closing tickets)

ROD performance - Operations Support Metrics

THIS SECTION IS OBSOLETED
The Operations Support Metrics are designed to provide an overview of operations support process in grid infrastructure. The operations support means all actions related to identification, investigation and operational problem solution.

More information about metrics can be found in  Operations Support Metrics introduction


Old EGEE 3 metrics

Newsletter

A ROD Newsletter is periodically released since December 2010 to consolidate the Grid oversight teams (central and local ones). The purpose of this newsletter is to inform about recent and upcoming developments related to Grid Oversight and to show the support performance indicators during the month.  It is issued every month and the information about new releases is sent to all RODs mailing list and to NGI managers.
The newsletters may be found at: https://documents.egi.eu/public/ShowDocument?docid=298

ROD presentations

This section is created to collect all ROD presentations which took place on our f2f meetings.

Events

Technical Forum 2012

Technical Forum 2011

User Forum 2011

EGI technical Forum 2010

ROD teams workshop Jun 2010

Resources