PROC21 Resource Center suspension

From EGIWiki
(Redirected from PROC21)
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Title Resource Center suspension
Document link https://wiki.egi.eu/wiki/PROC21
Last modified 0.1 - 8 June 2016
Policy Group Acronym OMB
Policy Group Name Operations Management Board
Contact Group operations@egi.eu
Document Status Approved
Approved Date 25.06.2015
Procedure Statement The document describes the process of Resource Center suspension in EGI infrastructure
Owner Alessandro Paolini


Overview

The document describes the process of Resource Center suspension in EGI infrastructure. The aim of this procedure is to ensure that the all parties are notified about suspension and that record history is kept.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Entities involved in the procedure

  • Resource Center Manager: person who is responsible for Resource Center.
  • NGI Representative (NGI): person who is responsible for NGI Operations.
  • EGI Operations / EGI CSIRT (EGI): person who decides and perform suspension at EGI Level
  • VO Manager: person resposible for VO. Notified about Resource Center suspension.

Use cases

Resource Center can be suspended by NGI or EGI in case of breaking  Resource Centre Operational Level Agreement

  • PROC01: Resource Center is failing EGI Infrastructure Oversight escalation procedure
    • Level 3: NGI/ROC operations manager should make Resource Center responsive or suspend it
    • Level 4: If no action was taken by NGI/ROC operations manager for 5 working days Operations send an mail to NGI/ROC operations manager with CC to site administrator, ROD and GGUS. If no response after 1 working day Operations performs Resource Center suspension.
  • PROC04: Resource Center is underperforming (below the OLA target) for 3 consecutive months
  • PROC16: Resource Center is failing Decommissioning of unsupported software procedure
    • Follow up the migration: Resource Center which didn't provide information on migration plans can be suspended
  • SEC01-05: Resource Center is failing SecurityIncident or Critical Security procedure

Notes for suspended sites

  • suspended sites are not displayed in GGUS
  • when a site gets suspended in GOC DB the "Notified site" field is flushed automatically in the corresponding tickets, and NGIs start managing them (processing or closing is up to them)

Steps



Responsible Action Notes
1 NGI/EGI   Decision about suspension
2 NGI/EGI

Notification is sent to Resource Center Manager and to NGI Representative – 3 working day on reaction


3 NGI/EGI
If there is no reply from Resource Center Manager
  1. If NGI is realizing procedure: Send notification to:Resource Center Manager and VO Managers of supported VOs
  2. If EGI is realizing procedure: Send notifications to: NGI Representative, Resource Center Manager and VO Manager of supported VOs

VO supported on given site can be found via VAPOR, in the Resource Distribution Page


Notification should contain reason of the suspension and who conducted it.


Revision history

Version Authors Date Comments




Alessandro Paolini 2016-06-08 Changed contact group -> Operations
Alessandro Paolini 2016-09-22 Changed the link showing the VO supported in the sites. The feature is now provided by VAPOR.