Difference between revisions of "MAN02 Service intervention management"
m (Protected "MAN02 Service intervention management" ([edit=sysop] (indefinite) [move=sysop] (indefinite))) |
|||
(10 intermediate revisions by 3 users not shown) | |||
Line 10: | Line 10: | ||
|Doc_title = Service intervention management | |Doc_title = Service intervention management | ||
|Doc_link = [[MAN02|https://wiki.egi.eu/wiki/MAN02]] | |Doc_link = [[MAN02|https://wiki.egi.eu/wiki/MAN02]] | ||
|Version = | |Version = 19 August 2014 | ||
|Policy_acronym = OMB | |Policy_acronym = OMB | ||
|Policy_name = Operations Management Board | |Policy_name = Operations Management Board | ||
|Contact_group = | |Contact_group = operations@egi.eu | ||
|Doc_status = Approved | |Doc_status = Approved | ||
|Approval_date = EGEE approved (Oct 2009) | |Approval_date = EGEE approved (Oct 2009) | ||
Line 26: | Line 26: | ||
# '''Unscheduled''' interventions: unplanned, usually triggered by an unexpected failure | # '''Unscheduled''' interventions: unplanned, usually triggered by an unexpected failure | ||
=How to manage an intervention= | = How to manage an intervention = | ||
Interventions are recorded through the [https:// | Interventions are recorded through the [https://goc.egi.eu/ GOCDB]. For more information, have a look at the [https://wiki.egi.eu/wiki/GOCDB/Input_System_User_Documentation#Downtimes|downtimes description] | ||
== Scheduled interventions == | == Scheduled interventions == | ||
Line 36: | Line 36: | ||
* Any intervention declared less than 24 h in advance will be considered '''unscheduled'''. | * Any intervention declared less than 24 h in advance will be considered '''unscheduled'''. | ||
* Sites MUST declare unscheduled interventions as soon as they are detected to inform the users. Unscheduled interventions CAN be declared up to 48 hours in the past (retroactive information to the user community) | * Sites MUST declare unscheduled interventions as soon as they are detected to inform the users. Unscheduled interventions CAN be declared up to 48 hours in the past (retroactive information to the user community) | ||
== Required information == | |||
The required information to fill in when declaring an intervention are: | |||
* Severity (Outage or Warning) | |||
* Description | |||
* Timezone | |||
* Starting and ending dates | |||
* Affected site / Affected services and endpoints | |||
=Recommendations= | =Recommendations= | ||
Line 47: | Line 56: | ||
Sites on downtime for more than 1 month will be suspended/uncertified. | Sites on downtime for more than 1 month will be suspended/uncertified. | ||
AT_RISK downtime declarations are only for providing warnings to users, and are ignored for calculating site availability (actual status will be used). | AT_RISK downtime declarations are only for providing warnings to users, and are ignored for calculating site availability (actual status will be used). | ||
== Revision History == | |||
{| border="3" | |||
|- | |||
! Version | |||
! Authors | |||
! Date | |||
! Comments | |||
|- | |||
| | |||
| A. Paolini | |||
| 2016-10-31 | |||
| Added required information paragraph and a link to downtimes description; changed the contact group to operations. | |||
|- | |||
| | |||
| M. Krakowian | |||
| 2014-08-19 | |||
| Change contact group -> Operations support | |||
|} |
Revision as of 15:27, 31 October 2016
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Service intervention management
DISCLAIMER: This manual obsoletes the previous EGEE version maintained on CERN EDMS
Title | Service intervention management |
Document link | https://wiki.egi.eu/wiki/MAN02 |
Last modified | 19 August 2014 |
Policy Group Acronym | OMB |
Policy Group Name | Operations Management Board |
Contact Group | operations@egi.eu |
Document Status | Approved |
Approved Date | EGEE approved (Oct 2009) |
Procedure Statement | This manual provides information on how to manage service interventions. |
Owner | Owner of procedure |
Service Intervention
A service intervention is defined as an action which will involve or lead to the possibility of a loss, or noticeable degradation of a service. Depending on the planning of the outage, we have two types of intervention:
- Scheduled interventions: planned and agreed in advance
- Unscheduled interventions: unplanned, usually triggered by an unexpected failure
How to manage an intervention
Interventions are recorded through the GOCDB. For more information, have a look at the description
Scheduled interventions
- Scheduled interventions MUST be declared at least 24 h in advance, specifying reason and duration.
- Existing scheduled interventions CAN be extended, provided that it’s done 24 hours in advance.
Unscheduled interventions
- Any intervention declared less than 24 h in advance will be considered unscheduled.
- Sites MUST declare unscheduled interventions as soon as they are detected to inform the users. Unscheduled interventions CAN be declared up to 48 hours in the past (retroactive information to the user community)
Required information
The required information to fill in when declaring an intervention are:
- Severity (Outage or Warning)
- Description
- Timezone
- Starting and ending dates
- Affected site / Affected services and endpoints
Recommendations
- For interventions that impact end users, the downtime SHOULD be declared 5 working days in advance, specifying reason and duration.
- A post−mortem SHOULD be included in the downtime report.
Notifications
intervention notifications (through broadcasts, RSS feeds, etc) as specified in the following procedures are automatically sent when declaring a downtime in GOCDB: at declaration time, 24 h in advance and 1 h before the intervention.
Suspension policy
Sites on downtime for more than 1 month will be suspended/uncertified. AT_RISK downtime declarations are only for providing warnings to users, and are ignored for calculating site availability (actual status will be used).
Revision History
Version | Authors | Date | Comments |
---|---|---|---|
A. Paolini | 2016-10-31 | Added required information paragraph and a link to downtimes description; changed the contact group to operations. | |
M. Krakowian | 2014-08-19 | Change contact group -> Operations support |