Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

PROC11 Resource Centre Decommissioning

From EGIWiki
Revision as of 18:05, 7 February 2012 by Psolagna (talk | contribs) (Created page with '{{Template:Op menubar}} {{Template:Doc_menubar}} {{TOC_right}} {| border="1" |- | '''Title''' | ''Resource Centre Decommissioning Procedure'' |- | '''Document link''' | |-…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Title Resource Centre Decommissioning Procedure
Document link
Version - last modified
Policy Group Acronym OMB
Policy Group Name Operations Management Board
Contact Person operational-documentation@mailman.egi.eu
Document Status DRAFT
Approved Date N/A
Procedure Statement A procedure for the steps involved to decommission Resource Centres (sites) in the EGI infrastructure.

Service Decommissioning Procedure

This procedure drafts the good practices between a Resource Centre (aka site) and its users when the resource center/site is being decommissioned.

It should be noted that the whole process of decommissioning a Resource Centre in an ordered manner will take up to four months. Note: the site hardware decommissioning can start after one month

Note: A separate document provides the process for Resource Centre Registration and Certification.

Definitions

  • Resource Centre refers to the definition in the "Resource Centre OLA".
In this document, the term "site" is deprecated, and Resource Centre has been used in its place.
  • Other entities involved in this procedure are defined in the EGI Glossary.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Entities involved in the procedure

  • Resource Centre Operations Manager: person who is responsible for initiating the decommissioning procedure by contacting the Resource Infrastructure Operations Manager.
  • Resource Infrastructure Operations Manager (aka NGI manager) : person who is responsible for finding and agreement with the Resource Centre about the timeline, in order to minimize the impact on the user communities and infrastructure.
  • Virtual Organizations (VO's): Data and other stateful objects of the supported VO's may be stored at the Resource Centre.
  • Virtual Organizations (VO) managers: persons who are responsible for retrieving this data from the Resource Centre in due time. Tracking is done through their support unit in GGUS. If such support unit is not available, the VOs should be contacted directly using the contact information available in the VO ID card.
  • Operations Centre: entity which is technically responsible for carrying out the main ticket and database updates.

The Resource Infrastructure Operations Manager can determine the level of involvement of other actors together with the Resource Centre Operations Manager.

Contact information

  • EGI Operations: operations (at) mailman.egi.eu
  • EGI Resource Infrastructure Providers are listed on the EGI web site
  • A list of EGI Operations Centres with their respective contact information is available from the GOCDB
  • EGI CSIRT: egi-csirt-team (at) mailman.egi.eu
  • The list of VO's served by a specific Resource Centre and their ID cards can be retrieved from the Operations Portal.
  • The VO managers and their contact information for a specific VO can be retrieved from the Operations Portal.

Actions and responsibilities

Resource Centre Operations Manager

  1. A Resource Infrastructure Provider is responsible for all Resource Centres (RC's) within its respective jurisdiction (for example, an NGI is responsible for all Resource Centres in its country). For this reason, the Resource Centre Operations Manager of a Resource Centre is REQUIRED
    • to contact the respective NGI if the Resource Centre is located in Europe,
    • to contact the respective Resource Infrastructure Provider active in a relevant geographical area if the Resource Centre is outside Europe, about the intention of the Resource Centre to decommission operation.
  2. The Resource Centre Operations Manager is REQUIRED to provide the necessary Resource Centre information needed to complete the decommission process, and he/she is responsible for its accuracy and maintenance.


Resource Infrastructure Operations Manager

  1. A Resource Infrastructure Provider is REQUIRED to be responsible for all Resource Centres within its respective jurisdiction. For example, an NGI is responsible for all Resource Centres in its respective country.
  2. The Resource Infrastructure Operations Managers MUST attend Resource Centre decommissioning applications and MUST provide feedback to the requesting partners in a timely manner to accept or reject the requests received.
  3. The Resource Infrastructure Operations Manager MUST contact the relevant Operations Centre to start the Resource Centre decommissioning procedure.

VO's and VO managers

  1. give the users the relevant information about the decommissioning (deadlines, involved resources, files, how to handle it)
  2. follow-up and support users in their file migration procedures until the deadline
  3. inform Resource Centre about the status of the migration(s)


Operations Centre

  1. The Operations Centre is responsible for decommissioning Resource Centre.
  2. The Operations Centre is responsible for updating the corresponding entries in the EGI configuration repository GOCDB.
  3. The Operations Centre MUST keep Resource Centre information up to date and in all operations tools as needed, such as the local NAGIOS server for monitoring of certified Resource Centres, the local helpdesk (if available) for the registration of the Resource Centre support staff, etc.

Workflow

The various steps required by both the Resource Infrastructure Operations Manager and the Resource Centre Operations Manager are explained in the tables below. The procedure below covers the transition from the Certified to the Closed status. The transition from the Suspended to the Closed status can be derived analogously.

The general status flow that a Resource Centre is allowed to follow is illustrated by the following diagram. Information on Resource Centre status and on how to manipulate it is available from GOCDB Documentation.

SiteStatusFlow.png


A Resource Centre cannot be in Candidate state for more than two month, and Suspended state for longer than four months. After this period the Resource Centre SHOULD be closed.

Resource Centre decommissioning

Steps

  • Actions tagged RC are the responsibility of the Resource Centre Operations Manager.
  • Actions tagged RIP are the responsibility of the Resource Infrastructure Operations Manager.
  • Actions tagged OC are the responsibility of the Operations Centre
# Responsible Action
1 RC
  1. The Resource Centre Operations Manager contacts her Resource Infrastructure Operations Manager that the Resource Centre is going to be decommissioned and together they agree on the plan for decommissioning it.
    • The Resource Centre Operations Manager opens a GGUS ticket, which will be used as master ticket to track the whole process. The ticket must remain in an open status until the site is closed in GOCDB.
2 RC
  1. The Resource Centre Operations Manager announces through the broadcast tool to VO managers and users of all the VOs supported by the Resource Centre and to EGI Operations (COD) that it is starting the decommissioning procedure:
    • Announce a detailed (agreed) timeline for the decommissioning and that the Resource Centre will schedule downtimes of its resources or site downtime to prevent any further usage. In the timeline must be clearly listed the deadlines for the VO Managers' actions.
    • The timeline is recorded in the master ticket.
    • The broadcast link is recorded in the master ticket.
    • The downtime should start no earlier than 15 days and no later than one month after the broadcast.
    • State that the aim is to make the status change to “suspended” in GOCDB within 6 (or 8) weeks from broadcast date.
3 RC
  1. After the announce of the site decommissioning Resource Centre may disable VO job submissions to prevent further VO activity

- except the monitoring jobs.

3 bis VO
  1. The VO Manager in the time between the announcement of the decommissioning and the begin of the downtime SHOULD check If the volume of data stored by a VO in the site is big enough to require more than one month to be moved, the VO manager can ask to reschedule the downtime period.
    • If no communications are sent to the Resource Centre by the first week of downtime the schedule can be considered agreed by all VO Managers.
    • Any request of reschedule MUST be supported by technical reasons (e.g. total amount of data to move / Site max data transfer throughput)
4 RC
  1. According to the dates announced in the broadcast or differently agreed in step 3 bis, the Resource Centre puts its services in downtime to prevent any further jobs being sent to it. This downtime shall last for the scheduled period or until phase 6 is over - which ever is the shorter.
    • The downtime must be recorded in the master ticket
5 RIP
  1. Communicate to EGI operations (COD) the start of the 3 month decommissioning period.
  2. The Operations Centre staff should check the reason of possible bottlenecks from the past experience: Eg: make sure that the communications flow from Resource Centre to VOs is correctly performed.
6 RIP

If the Resource Centre has storage elements (SEs) :

  • Once the Resource Infrastructure Operations Manager has received confirmation that the Resource Centre's SEs are closed for write access, he opens N child tickets of the procedure's master ticket to each of the N VO managers of the N VOs the Resource Centre supported.
  • The VOs are given up to 4 weeks - or the amount of time agreed in step 3 bis - to retrieve their data from the Resource Centre. During these 4 weeks, the Resource Centre should make sure that the SE works for the different VOs to allow them to retrieve their files. The VO managers can specify any specific requirements in their child ticket. For instance:
    • Request in the child ticket from the Resource Centre Operations Manager the time limit needed to retrieve data.
    • Request from VO central services admins the list of LFNs/DNs still having SURLs on SEs at that Resource Centre.
    • VO Manager MUST communicate to the Resource Centre - if possible using the GGUS child ticket - when the data moving is completed.
7 RIP
  1. If the Resource Centre hosts central services like VOMS or LFC for a given VO,VO Manager, Resource Centre Operations Manager and Resource Infrastructure Operations Manager should discuss finding a new Resource Centre for hosting these services, taking into account pre-existing agreement between VO and NGI. For international VOs, this discussion could be held at the EGI level, especially if a solution cannot be easily found within that Resource Infrastructure Provider.


8 OC
  1. Once step 6 is completed and validated AND, if applicable, step 7 has been successfully executed:
    • The Resource Centre's status is changed to suspended.
    • It is advised that the services of the Resource Centre are set to "production=N" "monitored=N" in the GOCDB.
    • The downtime is terminated. 
    • All this actions must be recorded in the master ticket.
  2. At this point the Resource Centre is no longer listed in the topBDIIs of EGI and cannot be reached by simply submitting a job. It might still be possible to directly access the Resource Centre for members of VOs which the Resource Centre supported. If hardware is closed down, the Resource Centre will need to address this, possibly informing these users that their data could be at risk.


9 RC
  1. Logs are to be kept at the Resource Centre, available for the period of time requested by the Grid Security Traceability and Logging Policy (90 days) after the status is set to suspended, in case of inquiries related to security incidents the period could be extended. Note:If the logs are saved elsewhere the services hardware can be disposed.
10 OC
  1. Resource Infrastructure Operations Manager is to communicate to EGI operations AND EGI CSIRT the end of the 90 days period. Revoke the roles of Resource Centre Administrator and people relevant to this Resource Centre in GOCDB and to the relevant CA if appropriate. Resource Infrastructure Operations Manager is to clean the VOMRS dteam server accordingly. In case there is no user left relevant to this very Resource Centre, the Resource Infrastructure Operations Manager has to inform his/her CA in order to close this entity officially to avoid keeping “ghost entities”.
  2. Site is closed in GOCDB
    • This action must be recorded in the master ticket
  • NOTE: People will have to separately handle any subscriptions to mailing lists which have been initiated by Resource Centre Administrator and which were not triggered by contact definitions in the GOCDB.
11 OC
  1. Master ticket is closed.