Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "ROD Duties"

From EGIWiki
Jump to navigation Jump to search
(Removed 1st-line Support references)
Line 1: Line 1:
A ROD team is provided by each NGI. It is responsible for handling operational tickets for sites in their respective region.
__NOTOC__
== Handling alarms and tickets ==


== Duties ==
The main responsibility of ROD is to deal with alarms and tickets issued for sites in the region. This includes making sure that the tickets are created and handled properly. The procedure for handling tickets is described in the section on [[Operations/ROD/Dashboard|Dashboard]].


All duties listed in this section are mandatory for ROD team.
== Putting a site in downtime for urgent matters ==


A new ROD member needs to follow the procedures in the [[Operations/General/Joining_operations |procedure for joining ROD teams ]].
ROD can place a site or a host in downtime (in the GOCDB) if it is either requested by the site, or if ROD sees an urgent need to put the site into downtime.  
=== Handling tickets ===


The main responsibility of ROD is to deal with tickets for sites in the region. This includes making sure that the tickets are opened and handled properly. The procedure for handling tickets is described in section [[Operations/ROD#Handling_tickets|Handling tickets]].
ROD may also suspend a site, under exceptional circumstances, without going through all the steps of the escalation procedure. For example, if a security hazard occurs, ROD must suspend a site on the spot in the case of such an emergency. It is important to know that COD can also suspend a site in the case of an emergency, for example as a result of a security incident or lack of response.  
 
=== Putting a site in downtime for urgent matters ===
 
In general, ROD can place a site or a service endpoint (i.e., a host) in downtime (in the GOCDB) if it is either requested by the site, or ROD sees an urgent need to put the site into downtime.
 
ROD may also suspend a site, under exceptional circumstances, without going through all the steps of the escalation procedure. For example, if a security hazard occurs, ROD must suspend a site on the spot in the case of such an emergency. It is important to know that COD can also suspend a site in the case of an emergency, e.g. security incidents or lack of response.  


In both scenarios, it is important that communication channels between all parties involved are active.
In both scenarios, it is important that communication channels between all parties involved are active.


=== Notify COD and EGI CSIRT about urgent matters ===
== Notify COD and EGI CSIRT about urgent matters ==
 
ROD should create tickets to COD in the case of urgent matters. For security related issues,  ROD should also notify the [[EGI_CSIRT:Main_Page|CSIRT]] duty contact.


=== Summary of ROD duties ===
ROD should create tickets to COD in the case of urgent matters. For security related issues, ROD should also notify the [[EGI_CSIRT:Main_Page|CSIRT]] duty contact.


{| border="1"
== Summary of ROD duties ==
|-
| '''Duties of ROD'''
| '''Requirements'''
|-
| Receive incident notification from sites in the scope
| Mandatory
|-
| Handle incidents less than 24h old
| Mandatory
|-
| Create tickets for alarms older then 24h and that are not in an OK state
| Mandatory
|-
| Escalate tickets to COD if necessary: assignment to COD can be made directly through the dashboard.
| Mandatory
|-
| Propagate actions from COD down to sites
| Mandatory
|-
| Monitor and update any GGUS tickets up to the “solved” status (via the Dashboard)
| Mandatory
|-
| Close alarms for “solved problems”
| Mandatory
|-
| Handle the final state of GGUS tickets not opened from the operations portal by marking them as verified.
| Mandatory
|-
| Put the site in downtime for urgent matters
| Optional
|-
| Create tickets to COD for urgent matters
| Mandatory
|}


(Definitions in the “Requirements” column: ''Mandatory'' – must be covered by the ROD team, ''Optional'' the NGI decides how to implement this.)
* Receive alarm notifications from sites in the scope
* Handle alarms less than 24 hours old
* Create tickets for alarms older then 24 hours that are not in an OK state
* Escalate tickets to COD if necessary (can be done directly through the Dashboard)
* Propagate actions from COD down to sites
* Monitor and update any GGUS tickets up to the ''solved'' status (via the Dashboard)
* Close alarms for ''solved'' problems
* Handle the final state of GGUS tickets not opened from the Operations Portal by marking them as verified.
* Put the site in downtime for urgent matters. ''Note: This is actually optional; an NGI may decide on a different policy if the site admins are not happy with ROD setting downtimes for them. However, it should be considered mandatory in case of urgent security incidents.''
* Create tickets to COD for urgent matters

Revision as of 12:42, 16 June 2011

Handling alarms and tickets

The main responsibility of ROD is to deal with alarms and tickets issued for sites in the region. This includes making sure that the tickets are created and handled properly. The procedure for handling tickets is described in the section on Dashboard.

Putting a site in downtime for urgent matters

ROD can place a site or a host in downtime (in the GOCDB) if it is either requested by the site, or if ROD sees an urgent need to put the site into downtime.

ROD may also suspend a site, under exceptional circumstances, without going through all the steps of the escalation procedure. For example, if a security hazard occurs, ROD must suspend a site on the spot in the case of such an emergency. It is important to know that COD can also suspend a site in the case of an emergency, for example as a result of a security incident or lack of response.

In both scenarios, it is important that communication channels between all parties involved are active.

Notify COD and EGI CSIRT about urgent matters

ROD should create tickets to COD in the case of urgent matters. For security related issues, ROD should also notify the CSIRT duty contact.

Summary of ROD duties

  • Receive alarm notifications from sites in the scope
  • Handle alarms less than 24 hours old
  • Create tickets for alarms older then 24 hours that are not in an OK state
  • Escalate tickets to COD if necessary (can be done directly through the Dashboard)
  • Propagate actions from COD down to sites
  • Monitor and update any GGUS tickets up to the solved status (via the Dashboard)
  • Close alarms for solved problems
  • Handle the final state of GGUS tickets not opened from the Operations Portal by marking them as verified.
  • Put the site in downtime for urgent matters. Note: This is actually optional; an NGI may decide on a different policy if the site admins are not happy with ROD setting downtimes for them. However, it should be considered mandatory in case of urgent security incidents.
  • Create tickets to COD for urgent matters