Difference between revisions of "PROC01 EGI Infrastructure Oversight escalation"
Line 99: | Line 99: | ||
'''The communication should be recorded in GGUS ticket.''' | '''The communication should be recorded in GGUS ticket.''' | ||
== Escalation for operational problem with unsupported MW at site == | == Escalation for operational problem with unsupported MW at site == | ||
'''DRAFR''' | '''DRAFR''' | ||
When an alarm appears on the ROD dashboard, at most after 24 hours from the problem occurrence ROD should start the procedure below: | |||
(Max Duration collumn shows time in working days which you have to wait before you move to next step in the escalation procedure ) | (Max Duration collumn shows time in working days which you have to wait before you move to next step in the escalation procedure ) | ||
Line 110: | Line 110: | ||
|- | |- | ||
| '''Step [#]''' | | '''Step [#]''' | ||
| '''Dashboad step''' | |||
| | | | ||
'''Max. Duration [work days]''' | '''Max. Duration [work days]''' | ||
Line 120: | Line 121: | ||
|- | |- | ||
| 1 | | 1 | ||
| | | 1st step | ||
| 10?<br> | |||
| ROD | | ROD | ||
| | | | ||
'''Create a ticket through Operations Portal. ''' | |||
Mail is send to the site administrator with CC to NGI/ROC operations manager and GGUS. | |||
| | | | ||
*ask | *ask to<span class="solution"> provide information about </span>'''<span class="solution">upgrade plan</span>''' with 2 weeks deadline<br> | ||
*in case of no response | *in case of no response or plan, ROD will escalate the issue to NGI manager<br> | ||
|- | |- | ||
| | | 2 | ||
| 5 | | NGI step | ||
| 5?<br> | |||
| NGI manager | | NGI manager | ||
| | | | ||
NGI | '''Escalate ticket to NGI manager through Operations Dashboard. '''<br> | ||
Mail is send to the site administrator with CC to NGI/ROC operations manager and GGUS. | |||
(optionally: a phone call to site, just to make sure that e-mail communication channel is working); | |||
NGI manager should check why site is unresponsive or what is the reason site cannot migrate to supported software version. Site and NGI manager should decide on upgrade plan or site/endpoint decommission. | |||
In case of issues which cannot be solved on NGI level, ROD should escalate ticket to COD | |||
| | | | ||
* | *inform NGI managers about unresponsive site | ||
*in case of no upgrade plan till the end of '''XXXX '''<span lang="en" class="short_text" id="result_box"><span class="hps">site might be suspended by COD or CSIRT | |||
</span></span> | |||
<br> | <br> | ||
|- | |||
| 3 | |||
| COD step | |||
| ? | |||
| COD<br> | |||
| If NGI cannot solve the problem at the NGI level, COD try to help to find the solution. | |||
| | | | ||
|} | |} | ||
<br> | <br> | ||
'''The communication should be recorded in GGUS ticket.''' | '''The communication should be recorded in GGUS ticket.''' | ||
== Escalation for operational problem with ROD == | == Escalation for operational problem with ROD == |
Revision as of 13:25, 19 November 2012
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Title | Grid Oversight escalation |
Document link | https://wiki.egi.eu/wiki/PROC01 |
Last modified | 2.0 - 30.09.2011 |
Policy Group Acronym | GOO/COD |
Policy Group Name | Grid Operations Oversight/Central Operator on Duty |
Contact Group | manager-central-operator-on-duty@mailman.egi.eu |
Document Status | Approved |
Approved Date | 26.07.201 |
Procedure Statement | The purpose of this document is to define escalation procedure for operational problems |
Owner | Owner of procedure |
Workflow and escalation procedures
Escalation for operational problem at site
This section introduces a critical part of operations in terms of sites' problems detection, identification and solving. The escalation procedure is a procedure that ROD must follow whenever any problem related to a site is detected. The main goal of the procedure is to track the problem follow-up process as a whole and keep the process consistent from the time of detection until the time when the ultimate solution is reached. Below are the detailed steps of the escalation procedure if no response is received for the notification of a problem or the problem has been unattended for. |
When an alarm appears on the ROD dashboard, at most after 24 hours from the problem occurrence ROD should start the procedure below:
(Max Duration collumn shows time in working days which you have to wait before you move to next step in the escalation procedure )
Step [#] |
Max. Duration [work days] (time before moving to next step) |
Resp. Unit | Escalation procedure | Content of the message |
1 | 3 | ROD | Send mail to the site administrator with CC to NGI/ROC operations manager and GGUS (operational ticket is being created). |
|
2 | 3 | ROD | Send mail to the site administrator with CC to NGI/ROC operations manager and GGUS.
(optionally: a phone call to site, just to make sure that e-mail communication channel is working); After 3 days period with no response from site administrator issue should be escalated to NGI/ROC operations manager. |
|
3 | 5 | NGI manager |
NGI/ROC operations manager should at the political level make site responsive or suspend the site. (it can be done by phone, mail or on the meeting) If the problem needs to be escalated to EGI level then NGI/ROC operations manager ask ROD to send an mail to COD with CC to site administrator, ROD and GGUS.(see Content of the message) ROD team is still responsible to take care about the ticket on the Operations Portal. |
|
4 | 1 | COD |
|
|
The communication should be recorded in GGUS ticket.
Escalation for operational problem with unsupported MW at site
DRAFR
When an alarm appears on the ROD dashboard, at most after 24 hours from the problem occurrence ROD should start the procedure below:
(Max Duration collumn shows time in working days which you have to wait before you move to next step in the escalation procedure )
Step [#] | Dashboad step |
Max. Duration [work days] (time before moving to next step) |
Resp. Unit | Escalation procedure | Content of the message |
1 | 1st step | 10? |
ROD |
Create a ticket through Operations Portal. Mail is send to the site administrator with CC to NGI/ROC operations manager and GGUS. |
|
2 | NGI step | 5? |
NGI manager |
Escalate ticket to NGI manager through Operations Dashboard. Mail is send to the site administrator with CC to NGI/ROC operations manager and GGUS. (optionally: a phone call to site, just to make sure that e-mail communication channel is working); NGI manager should check why site is unresponsive or what is the reason site cannot migrate to supported software version. Site and NGI manager should decide on upgrade plan or site/endpoint decommission. In case of issues which cannot be solved on NGI level, ROD should escalate ticket to COD |
|
3 | COD step | ? | COD |
If NGI cannot solve the problem at the NGI level, COD try to help to find the solution. |
The communication should be recorded in GGUS ticket.
Escalation for operational problem with ROD
This section introduces a critical part of operations in terms of problem with ROD. The escalation procedure is a procedure that COD must follow whenever any problem related to ROD work is detected. The main goal of the procedure is to track the problem follow-up process as a whole and keep the process consistent from the time of detection until the time when the ultimate solution is reached.
The procedure applies only in case when ROD is not handling issues on operational dashboard according to operational procedures.
(Max Duration collumn shows time in working days which you have to wait before you move to next step in the escalation procedure )
Step [#] |
Max. Duration [work days] (time before moving to next step) |
Resp. Unit | Escalation procedure | Content of the message |
1 | 3 | COD | Send mail to the ROD with CC to NGI/ROC, COD and GGUS (operational ticket is being created). |
|
2 | 3 | COD | Send mail to NGI/ROC manager with CC to ROD, COD and GGUS. |
|
3 | (without delay) | COD | Send mail to COO with CC NGI/ROC manager, COD and GGUS. |
|
The precondition to stop escalation is that all issues not handled according to procedure disappeared from COD dashboard.
The communication should be recorded in GGUS ticket.
Revision history
Version | Authors | Date | Comments |
---|---|---|---|