Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "PROC12 Production Service Decommissioning"

From EGIWiki
Jump to navigation Jump to search
(Remove deprecated content)
Tag: Replaced
 
(46 intermediate revisions by 7 users not shown)
Line 1: Line 1:
{{Template:Op menubar}}  
{{Template:Op menubar}}  
{{Template:Doc_menubar}}  
{{Template:Doc_menubar}}  
{{TOC_right}}
[[Category:Deprecated]]
 
{| style="border:1px solid black; background-color:lightgrey; color: black; padding:5px; font-size:140%; width: 90%; margin: auto;"
{| border="1"
| style="padding-right: 15px; padding-left: 15px;" |  
|-
|[[File:Alert.png]] This page is '''Deprecated'''; the content has been moved to https://confluence.egi.eu/display/EGIPP/PROC12+Production+Service+Decommissioning 
| '''Title'''
| ''Service Decommissioning Procedure''
|-
| '''Document link'''
|
|-
| '''Version - last modified'''
|
|-
| '''Policy Group Acronym'''
| ''OMB''
|-
| '''Policy Group Name'''
| ''Operations Management Board''
|-
| '''Contact Person'''
| operational-documentation@mailman.egi.eu
|-
| '''Document Status'''
| ''DRAFT''
|-
| '''Approved Date'''
| N/A<br>
|-
| '''Procedure Statement'''
| ''A procedure for the steps involved to decommission a Service operated by a Resource Centre in the EGI infrastructure. ''
|}
 
= Grid Service Decommissioning Procedure  =
 
This procedure drafts the good practices between a Resource Centre (aka site) and its users when a grid service is being decommissioned.
 
= Definitions  =
 
*'''Resource Centre''' refers to the definition in the "Resource Centre OLA".
 
:''In this document, the term "'''site'''" is '''deprecated''', and '''Resource Centre''' has been used in its place.''
 
*Other entities involved in this procedure are defined in the [[Glossary|EGI Glossary]].
 
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
 
= Entities involved in the procedure  =
 
<!-- There are minimally two sets of players involved in this procedure -->
 
*'''Resource Centre Operations Manager''': person who is responsible for initiating the decommissioning procedure by contacting the Resource Infrastructure Operations Manager.
*'''Resource Infrastructure Operations Manager''' (aka NGI manager)&nbsp;: person who is responsible for finding and agreement with the VO Manager about the migration of the service in another site, in case the service is a VO specific service hosted by the site according to an agreement between the Resource Infrastructure Provider and the VO.
*'''Virtual Organizations (VO's)''': Data and other stateful objects of the supported VO's may be stored at the Resource Centre.
*'''Virtual Organizations (VO) managers''': persons who are responsible for retrieving this data from the Resource Centre in due time. Tracking is done through their support unit in GGUS. If such support unit is not available, the VOs should be contacted directly using the contact information available in the VO ID card.
 
= Contact information  =
 
*EGI Resource Infrastructure Providers are listed on the EGI [https://www.egi.eu/infrastructure/Resource-providers/index.html web site]  
*A list of EGI Operations Centres with their respective contact information is available from the [http://go.egi.eu/operations-centres GOCDB]  
*The list of VO's served by a specific Resource Centre and their ID cards can be retrieved from the [http://operations-portal.egi.eu/vo/rd Operations Portal].
*The VO managers and their contact information for a specific VO can be retrieved from the [http://operations-portal.egi.eu/vo Operations Portal].
 
= Actions and responsibilities  =
 
== Resource Centre Operations Manager  ==
 
#The Operations Centre is responsible for decommissioning the service.
#The Operations Centre is responsible for updating the corresponding entries in the EGI configuration repository [[GOCDB|GOCDB]].
#The Resource Centre Operations Manager is REQUIRED to provide the necessary Resource Centre information needed to complete the decommission process, and he/she is responsible for its accuracy and maintenance.<br>
 
 
 
<!--(<span style="background:#FFFF00">Res: How about an RC not being responsive any more?</span>) &lt;<span style="background:#FFFF00">Peter: If a site is no more responsive the RP Operations Center staff should provide -when available- the needed information. 7-12-2011</span>) -->
 
== Resource Infrastructure Operations Manager  ==
 
#A Resource Infrastructure Provider is REQUIRED to be responsible for all Resource Centres within its respective jurisdiction. For this reason the Resource Infrastructure Provider is responsible for assuring that all the Resource Centres follow this procedure for services decommissioning.
 
== VO's and VO managers  ==
 
#give the users the relevant information about the decommissioning (deadlines, involved resources, files, how to handle it)
#follow-up and support users in their file migration procedures until the deadline
#inform Resource Centre about the status of the migration(s)
 
<!--(<span style="background:#FFFF00">Tristan: if the VO is not responsive any more then the decommissioning could happen after x reminders. Res: We'll need to specify this in more detail.)
</span> -->
 
= Workflow  =
 
== Service Centre decommissioning  ==
 
=== Steps  ===
 
*Actions tagged '''RC''' are the responsibility of the Resource Centre Operations Manager.
*Actions tagged '''RIP''' are the responsibility of the Resource Infrastructure Operations Manager.
*Actions tagged '''OC''' are the responsibility of the Operations Centre
 
{| cellspacing="0" cellpadding="5" border="1"
|-
! #
! Responsible
! Action
|- valign="top"
| 1
| RC
|
# The Resource Centre Operations Manager opens a GGUS ticket, which will be used as ''master ticket'' to track the whole process. The ticket must remain in an open status until the service is removed from GOCDB.
# The Resource Centre Operations Manager contacts the Resource Provider regional staff, communicating the plan to decommission the service.
 
|- valign="top"
| 2
| RC
|
#The Resource Centre Operations Manager announces through the broadcast tool to VO managers and users of all the VOs supported by the service under decommissioning that it is starting the decommissioning procedure:  
#*Announce a detailed timeline for the decommissioning and that the Resource Centre Managare will start a downtimes of the service to prevent any further usage. In the timeline must be '''clearly''' listed the deadlines for the VO Managers' actions.
#*The timeline is recorded in the ''master ticket''.
#*The broadcast link is recorded in the ''master ticket''.
#*The downtime should start no earlier than 15 days and no later than one month after the broadcast.
#*State that the aim is to remove the service in ''XX'' weeks (min 6 weeks for stateful services). <br>
|- valign="top"
| 3
| RC
|
#[''If the service is a CE or a workload management service''] After the announce of the service decommissioning the Resource Centre MAY disable VO job submissions to prevent further VO activity - except the monitoring jobs.
#:[''If the service is a storage or data management service''] After the announce of the service decommissioning the Resource Centre MAY disable VO writing access to prevent further VO activity - except infrastructure VOs (If selective permissions are not possible, the service must remain enabled also in writing until the begin of the downtime).
|- valign="top"
|3 bis
| VO
|
# [''If the service is a storage element''] The VO Manager in the time between the announcement of the decommissioning and the begin of the downtime SHOULD check If the volume of data stored by a VO in the site is big enough to require more than one month to be moved, the VO manager can ask to reschedule the downtime period.
#* If no communications are sent to the Resource Centre by the first week of downtime the schedule can be considered agreed by all VO Managers.
#* Any request of reschedule MUST be supported by technical reasons (e.g. total amount of data to move / Site max data transfer throughput)
# [''If the service is a central service like VOMS or LFC for a given VO''] VO Manager, Resource Centre Operations Manager and Resource Infrastructure Operations Manager should discuss finding a new Resource Centre for hosting these services, taking into account pre-existing agreement between VO and NGI. For international VOs, this discussion could be held at the EGI level, especially if a solution cannot be easily found within that Resource Infrastructure Provider.
|- valign="top"
| 4
| RC
|
#According to the dates announced in the broadcast or differently agreed in step '''3 bis''', the Resource Centre puts the service in downtime to prevent any further usage. This downtime shall last for the scheduled period or until phase 5 is over - which ever is the shorter.
#* The downtime must be recorded in the ''master ticket'' <br>
 
|- valign="top"
| 5
| RC
|
If the service is a stateful service containing VO data:
 
*Once the service is in downtime and closed for write access (if possible) the Resource Centre Operations Manager opens N child tickets of the procedure's ''master ticket'' to each of the N VO managers of the N VOs the service supports.
*The VOs are given up to the amount of time agreed in step '''3 bis''' - to retrieve their data from the decommissioning service. During this period, the Resource Centre should make sure that the servcie works for the different VOs to allow them to migrate their data. The VO managers can specify any specific requirements in their child ticket. For instance:
**Request in the child ticket from the Resource Centre Operations Manager the time limit needed to retrieve data.
** (If the service is an SE) Request from VO central services admins the list of LFNs/DNs still having SURLs on SEs at that Resource Centre.
**VO Manager MUST communicate to the Resource Centre - if possible using the GGUS child ticket - when the data moving is completed.
**If the service's data cannot be migrated using the user interface (e.g. if there is the need to have access to a database dump) the Resource Centre administrators should cooperate with the VO Managers.
 
<br>
 
|- valign="top"
| 7
| OC
|
#At the end of the scheduled downtime period or when step 6 is completed and validated:
#*The service is set to "production=N" "monitored=N" in the GOCDB.
#*Once the service disappears from Nagios, it must be removed from the Resource Centre GIIS (e.g. Site-BDII).
#*The downtime is terminated.
#*All this actions must be recorded in the ''master ticket''.
#At this point the service is no longer listed in the top-BDIIs of EGI. If hardware is closed down, the Resource Centre will need to address this, possibly informing these users that their data could be at risk.
 
|- valign="top"
| 9
| RC
|
#Logs are to be kept at the Resource Centre, available for the period of time requested by the [https://documents.egi.eu/document/81 Grid Security Traceability and Logging Policy] (90 days) after the service has been removed from the resource centre GIIS, in case of inquiries related to security incidents the period could be extended. ''Note:''If the logs are saved elsewhere the services hardware can be disposed.
 
|- valign="top"
| 10
| OC
|
#Service is removed from ''GOCDB''.
#* This action must be recorded in the ''master ticket''.
|- valign="top"
| 11
| OC
|
# ''Master ticket'' is closed.
|}
|}

Latest revision as of 10:44, 15 April 2022