Difference between revisions of "MAN02 Service intervention management"
(Created page with '{{Template: Op menubar}} {{Template:Doc_menubar}} Category:Operations Manuals {{TOC_right}} '''DISCLAIMER: This manual obsoletes the previos EGEE version maintained on [http:…') |
|||
Line 11: | Line 11: | ||
|- | |- | ||
| '''Document link''' | | '''Document link''' | ||
| ''https://wiki.egi.eu/wiki/ | | ''https://wiki.egi.eu/wiki/MAN02_Service_intervention_managementn'' | ||
|- | |- | ||
| '''Last | | '''Last review''' | ||
| '' | | ''01 Mar 2011'' | ||
|- | |- | ||
| '''Policy Group Acronym''' | | '''Policy Group Acronym''' | ||
Line 23: | Line 23: | ||
|- | |- | ||
| '''Contact Person''' | | '''Contact Person''' | ||
| '' | | ''T. Ferrari''. Original authors: M. Barroso, J. Shade | ||
|- | |- | ||
| '''Document Status''' | | '''Document Status''' | ||
Line 29: | Line 29: | ||
|- | |- | ||
| '''Approved Date''' | | '''Approved Date''' | ||
| '' | | ''specify''. EGEE approved (Oct 2009) | ||
|- | |- | ||
| '''Procedure Statement''' | | '''Procedure Statement''' | ||
| ''This manual provides information on how to | | ''This manual provides information on how to manage service interventions.'' | ||
|- | |- | ||
|} | |} | ||
---- | ---- | ||
=Service Intervention= | |||
A '''service intervention''' is defined as an action which will involve or lead to the possibility of a loss, or noticeable degradation of a service. Depending on the planning of the outage, we have two types of intervention: | |||
# '''Scheduled''' interventions: planned and agreed in advance | |||
# '''Unscheduled''' interventions: unplanned, usually triggered by an unexpected failure | |||
=How to manage an intervention= | |||
Interventions are recorded through the [https://gocdb4.esc.rl.ac.uk/portal/index.php?Page_Type=Show_Entity&object_id=0&grid_id=0&object_type=downtime_add1 downtime management facility] of [https://goc.egi.eu/ GOCDB]. | |||
== Scheduled interventions == | |||
* Scheduled interventions MUST be declared at least 24 h in advance, specifying reason and duration. | |||
* Existing scheduled interventions CAN be extended, provided that it’s done 24 hours in advance. | |||
== Unscheduled interventions == | |||
* Any intervention declared less than 24 h in advance will be considered '''unscheduled'''. | |||
* Sites MUST declare unscheduled interventions as soon as they are detected to inform the users. Unscheduled interventions CAN be declared up to 48 hours in the past (retroactive information to the user community) | |||
=Recommendations= | |||
* For interventions that impact end users, the downtime SHOULD be declated 5 working days in advance, specifying reason and duration. | |||
* A post−mortem SHOULD be included in the downtime report. | |||
=Notifications= | |||
intervention notifications (through broadcasts, RSS feeds, etc) as specified in the following procedures are automatically sent when declaring a downtime in GOCDB: at declaration time, 24 h in advance and 1 h before the intervention. | |||
=Suspension policy= | |||
Sites on downtime for more than 1 month will be suspended/uncertified. | |||
AT_RISK downtime declarations are only for providing warnings to users, and are ignored for calculating site availability (actual status will be used). |
Revision as of 18:14, 1 March 2011
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
DISCLAIMER: This manual obsoletes the previos EGEE version maintained on gocwiki
Title | Service intervention management |
Document link | https://wiki.egi.eu/wiki/MAN02_Service_intervention_managementn |
Last review | 01 Mar 2011 |
Policy Group Acronym | OMB |
Policy Group Name | Operations Management Board |
Contact Person | T. Ferrari. Original authors: M. Barroso, J. Shade |
Document Status | REVIEW |
Approved Date | specify. EGEE approved (Oct 2009) |
Procedure Statement | This manual provides information on how to manage service interventions. |
Service Intervention
A service intervention is defined as an action which will involve or lead to the possibility of a loss, or noticeable degradation of a service. Depending on the planning of the outage, we have two types of intervention:
- Scheduled interventions: planned and agreed in advance
- Unscheduled interventions: unplanned, usually triggered by an unexpected failure
How to manage an intervention
Interventions are recorded through the downtime management facility of GOCDB.
Scheduled interventions
- Scheduled interventions MUST be declared at least 24 h in advance, specifying reason and duration.
- Existing scheduled interventions CAN be extended, provided that it’s done 24 hours in advance.
Unscheduled interventions
- Any intervention declared less than 24 h in advance will be considered unscheduled.
- Sites MUST declare unscheduled interventions as soon as they are detected to inform the users. Unscheduled interventions CAN be declared up to 48 hours in the past (retroactive information to the user community)
Recommendations
- For interventions that impact end users, the downtime SHOULD be declated 5 working days in advance, specifying reason and duration.
- A post−mortem SHOULD be included in the downtime report.
Notifications
intervention notifications (through broadcasts, RSS feeds, etc) as specified in the following procedures are automatically sent when declaring a downtime in GOCDB: at declaration time, 24 h in advance and 1 h before the intervention.
Suspension policy
Sites on downtime for more than 1 month will be suspended/uncertified. AT_RISK downtime declarations are only for providing warnings to users, and are ignored for calculating site availability (actual status will be used).