PROC11 Resource Centre Decommissioning

From EGIWiki
(Redirected from PROC11)
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


Contents

Title Resource Centre Decommissioning
Document link https://wiki.egi.eu/wiki/PROC11
Last modified 8 June 2016
Policy Group Acronym OMB
Policy Group Name Operations Management Board
Contact Group operations@egi.eu
Document Status Approved
Approved Date 28/02/2012
Procedure Statement A procedure for the steps involved to decommission Resource Centres (sites) in the EGI infrastructure.



Overview

This procedure defines the good practices between a Resource Centre (aka site) and its users when the resource centre/site is being decommissioned.

It should be noted that the whole process of decommissioning a Resource Centre in an ordered manner will take up to four months. Note: the site hardware decommissioning can start after one month

Note: A separate document provides the process for Resource Centre Registration and Certification.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Entities involved in the procedure

The Resource Infrastructure Operations Manager can determine the level of involvement of other actors together with the Resource Centre Operations Manager.

Contact information

Actions and responsibilities

Resource Centre Operations Manager

  1. A Resource Centre Operations Manager is responsible for all Resource Centres (RC's) within its respective domain.
  2. The Resource Centre Operations Manager of a Resource Centre in case of RC decommission is REQUIRED
    • to contact the respective NGI if the Resource Centre is located in Europe,
    • to contact the respective Resource Infrastructure Provider active in a relevant geographical area if the Resource Centre is outside Europe, about the intention of the Resource Centre to decommission operation.
  3. The Resource Centre Operations Manager is REQUIRED to provide the necessary Resource Centre information needed to complete the decommission process, and he/she is responsible for its accuracy and maintenance.
  4. The Resource Centre Operations Manager MUST attend Resource Centre decommissioning applications and MUST provide feedback to the requesting partners in a timely manner to accept or reject the requests received.

Resource Infrastructure Operations Manager

  1. A Resource Infrastructure Operations Manager is REQUIRED to be responsible for all Resource Centres within its respective jurisdiction. For example, an NGI is responsible for all Resource Centres in its respective country.


VO's and VO managers

  1. give the users the relevant information about the decommissioning (deadlines, involved resources, files, how to handle it)
  2. follow-up and support users in their file migration procedures until the deadline
  3. inform Resource Centre about the status of the migration(s)

Operations Centre

  1. The Operations Centre is responsible for decommissioning Resource Centre.
  2. The Operations Centre is responsible for updating the corresponding entries in the EGI configuration repository GOCDB.
  3. The Operations Centre MUST keep Resource Centre information up to date and in all operations tools as needed, such as the local NAGIOS server for monitoring of certified Resource Centres, the local helpdesk (if available) for the registration of the Resource Centre support staff, etc.

Workflow

The various steps required by both the Resource Infrastructure Operations Manager and the Resource Centre Operations Manager are explained in the tables below. The procedure below covers the transition from the Certified to the Closed status. The transition from the Suspended to the Closed status can be derived analogously.

The general status flow that a Resource Centre is allowed to follow is illustrated by the following diagram. Information on Resource Centre status and on how to manipulate it is available from GOCDB Documentation.

SiteStatusFlow.png


A Resource Centre cannot be in Candidate state for more than two month, and Suspended state for longer than four months. After this period the Resource Centre SHOULD be closed.

Resource Centre decommissioning

Steps

# Responsible Action
1 RC
  1. The Resource Centre Operations Manager contacts her Resource Infrastructure Operations Manager that the Resource Centre is going to be decommissioned and together they agree on the plan for decommissioning it.
    • The Resource Centre Operations Manager opens a GGUS ticket to Operations Center Support Unit it belongs to, which will be used as Parent ticket to track the whole process. The ticket must remain in an open status until the site is closed in GOCDB. This Parent ticket can be used as parent ticket for the resource centre's services decommission procedures (see PROC12, step 1).
2 RC
  1. The Resource Centre Operations Manager should use the broadcast tool (https://operations-portal.egi.eu/broadcast) to announce to both VO managers and VO users of the VOs supported by the RC(excluding Ops and dteam VO) that it is starting the decommissioning procedure:
    • Announce a detailed (agreed) timeline for the decommissioning and that the Resource Centre will schedule downtimes of its resources or site downtime to prevent any further usage. In the timeline must be clearly listed the deadlines for the VO Managers' actions.
    • In the ticket should be announced also the list of all the resource centre's decommissioning services and the scheduled date of decommission (this supersedes PROC12 step 2).
    • The timeline is recorded in the Parent ticket (including the timelines of all the services).
    • The broadcast link is recorded in the Parent ticket.
    • The downtime should start no earlier than 15 days and no later than one month after the broadcast.
    • State that the aim is to make the status change to “suspended” in GOCDB within 6 (or 8) weeks from broadcast date.
3 RC, VO, RP
  1. The resource centre starts the Service Decommissioning Procedure (PROC12) for every production service of the site.
    • The procedures for the services can be run in parallel
    • Service decommissioning procedures can start from step 3, using this procedure parent ticket as parent ticket for all the decommissioning procedures.
4 OC
  1. Once the PROC12 step 7 -all services end the scheduled downtime- is completed for all services of the site:
    • The Resource Centre's status is changed to suspended.
    • This action must be recorded in the parent ticket.
  2. At this point the Resource Centre is no longer listed in the topBDIIs of EGI and cannot be reached by simply submitting a job. It might still be possible to directly access the Resource Centre for members of VOs which the Resource Centre supported. If hardware is closed down, the Resource Centre will need to address this, possibly informing these users that their data could be at risk.
5 RC
  1. Logs are to be kept at the Resource Centre, available for the period of time requested by the Grid Security Traceability and Logging Policy.
6 OC
  1. Resource Infrastructure Operations Manager should email the EGI operations team (operations 'at' egi.eu) and EGI CSIRT ( contact) at the end of the 90 days period informing about end of the logs retention period and that site is going to be closed. Revoke the roles of Resource Centre Administrator and people relevant to this Resource Centre in GOCDB and to the relevant CA if appropriate. Resource Infrastructure Operations Manager is to clean the VOMRS dteam server accordingly. In case there is no user left relevant to this very Resource Centre, the Resource Infrastructure Operations Manager has to inform his/her CA in order to close this entity officially to avoid keeping “ghost entities”.
  2. Site is closed in GOCDB, at the end of the logs retention period.
    • This action must be recorded in the parent ticket
  • NOTE: People will have to separately handle any subscriptions to mailing lists which have been initiated by Resource Centre Administrator and which were not triggered by contact definitions in the GOCDB.
7 OC
  1. Parent ticket is closed.
    • This operations can be performed only if all the service decommissioning procedures are completed

Revision history

Version Authors Date Comments
M. Krakowian 19 August 2014 Change contact group -> Operations support
Alessandro Paolini 2016-06-08 Changed contact group -> Operations
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox
Print/export