Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "PROC21 Resource Center suspension"

From EGIWiki
Jump to navigation Jump to search
Line 2: Line 2:


{{Ops_procedures
{{Ops_procedures
|Doc_title = Site suspension
|Doc_title = Resource Center suspension
|Doc_link = [[PROC21|https://wiki.egi.eu/wiki/PROC21]]
|Doc_link = [[PROC21|https://wiki.egi.eu/wiki/PROC21]]
|Version = 0.1  - 14 May 2015
|Version = 0.1  - 14 May 2015
Line 10: Line 10:
|Doc_status = Draft
|Doc_status = Draft
|Approval_date =
|Approval_date =
|Procedure_statement = The document describes the process of site suspension in EGI infrastructure
|Procedure_statement = The document describes the process of Resource Center suspension in EGI infrastructure
}}  
}}  


= Overview  =
= Overview  =


The document describes the process of site suspension in EGI infrastructure. The aim of this procedure is to ensure that the all parties are notified about suspension and that record history is kept.  
The document describes the process of Resource Center suspension in EGI infrastructure. The aim of this procedure is to '''ensure that the all parties are notified about suspension and that record history is kept.'''


= Definitions  =
= Definitions  =
Line 27: Line 27:
<!-- There are minimally two sets of players involved in this procedure -->  
<!-- There are minimally two sets of players involved in this procedure -->  


*'''Site Manager (SiteM)''': person who is responsible for Site.  
*'''Resource Center Manager (RC)''': person who is responsible for Resource Center.  
*'''NGI Representative (NGIR)''': person who is responsible for NGI Operations.  
*'''NGI Representative (NGI)''': person who is responsible for NGI Operations.  
*'''EGI&nbsp;Operations / EGI&nbsp;CSIRT''' '''(EGI)''': person who decides and perform suspension at EGI&nbsp;Level  
*'''EGI&nbsp;Operations / EGI&nbsp;CSIRT''' '''(EGI)''': person who decides and perform suspension at EGI&nbsp;Level  
*'''VO&nbsp;Manager''' '''(VOM)''': person resposible for VO. Notified about site suspension.
*'''VO&nbsp;Manager''' '''(VOM)''': person resposible for VO. Notified about Resource Center suspension.<br>


<br>
= Use cases  =


== Requirements  ==
Resource Center can be suspended by NGI or EGI in case of breaking&nbsp; [https://documents.egi.eu/public/ShowDocument?docid=31 Resource Centre Operational Level Agreement]


The reason for suspension should be identified.
*[[PROC01|PROC01:]] Resource Center is failing EGI Infrastructure Oversight escalation procedure
**Level 3: NGI/ROC operations manager should make Resource Center responsive or suspend it
**Level 4: If no action was taken by NGI/ROC operations manager for 5 working days Operations send an mail to NGI/ROC operations manager with CC to site administrator, ROD and GGUS. If no response after 1 working day Operations performs Resource Center suspension.
*[[PROC04|PROC04:]]&nbsp;Resource Center is underperforming (below the OLA target) for 3 consecutive months
*[[PROC16|PROC16]]: Resource Center is failing Decommissioning of unsupported software procedure<br>
**Follow up the migration: Resource Center which didn't provide information on migration plans can be suspended
*[[Operations_Procedures#EGI_Policies_and_Procedures|SEC01-05:]] Resource Center is failing SecurityIncident or Critical Security procedure<br>


*PROC01: Site is failing EGI Infrastructure Oversight escalation procedure: https://wiki.egi.eu/wiki/PROC01_EGI_Infrastructure_Oversight_escalation
= Steps  =
**Level 3: NGI/ROC operations manager should make site responsive or suspend it
**Level 4: If no action was taken by NGI/ROC operations manager for 5 working days Operations send an mail to NGI/ROC operations manager with CC to site administrator, ROD and GGUS. If no response after 1 working day Operations performs site suspension.
*PROC16:&nbsp;Site is failing Decommissioning of unsupported software procedure https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software
**Follow up the migration: sites which didn't provide information on migration plans can be suspended
*SEC01-05: Site is failing SecurityIncident or Critical Security procedure
**https://wiki.egi.eu/wiki/Operations_Procedures#Security
 
== Steps  ==


<br>  
<br>  
Line 55: Line 53:
! Responsible  
! Responsible  
! Action  
! Action  
! Prerequisites, if any
! Notes
|- valign="top"
|- valign="top"
| 1  
| 1  
| NGIR/EGI  
| NGI/EGI  
| &nbsp; '''Decision about suspension'''<br>  
| &nbsp; '''Decision about suspension'''<br>  
| Site is failing some procedure (could be unresponsive)<br>
|  
|- valign="top"
|- valign="top"
| 2  
| 2  
| NGIR/EGI  
| NGI/EGI  
|  
|  
'''Notification''' is sent to Site Manager and to NGI Representative – 1 working day on reaction  
'''Notification''' is sent to Resource Center Manager and to NGI Representative – '''3 working day on reaction'''


| <br>
| <br>
Line 71: Line 69:
| 3  
| 3  
| NGIR/EGI<br>  
| NGIR/EGI<br>  
| If there is no reply from Site Manager'''<br>'''<br>
| If there is no reply from Resource Center Manager'''<br>'''  
Change status of the site in the GOCDB to ‘suspended’.  
*Change status of the Resource Center in the GOCDB to ‘suspended’.  
*Register date, Resource Center and reason in [https://wiki.egi.eu/wiki/Underperforming_sites_and_suspensions https://wiki.egi.eu/wiki/Underperforming_sites_and_suspensions]


#Register date, site and reason in [https://wiki.egi.eu/wiki/Underperforming_sites_and_suspensions https://wiki.egi.eu/wiki/Underperforming_sites_and_suspensions]
#If NGI is realizing procedure: Send notification to: Resource Center Manager and VO&nbsp;Managers of supported VOs<br>
#If NGIR is realizing procedure: Send notification to: Site Manager and VO&nbsp;Manager
#If EGI is realizing procedure:&nbsp;Send notifications to: NGI Representative, Resource Center Manager and VO&nbsp;Manager of supported VOs
#If EGI is realizing procedure:&nbsp;Send notifications to: NGI Representative, Site Manager and VO&nbsp;Manager


|  
|  
Find related VO Managers (VO supported on site according to Dashboard?) accounting/Gstat&nbsp;
VO supported on given site can be found via:<br>
 
http://operations-portal.egi.eu/vo/rd


Send a Broadcast to related Managers (at once or monthy?!)
<br>


|}
|}
Line 94: Line 94:
! Comments
! Comments
|-
|-
|  
| <br>
|  
| <br>
|  
| <br>
|  
| <br>
|}
|}


[[Category:Operations_Procedures]]
[[Category:Operations_Procedures]]

Revision as of 11:39, 15 May 2015

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Title Resource Center suspension
Document link https://wiki.egi.eu/wiki/PROC21
Last modified 0.1 - 14 May 2015
Policy Group Acronym OMB
Policy Group Name Operations Management Board
Contact Group operations-support@mailman.egi.eu
Document Status Draft
Approved Date
Procedure Statement The document describes the process of Resource Center suspension in EGI infrastructure
Owner Owner of procedure


Overview

The document describes the process of Resource Center suspension in EGI infrastructure. The aim of this procedure is to ensure that the all parties are notified about suspension and that record history is kept.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Entities involved in the procedure

  • Resource Center Manager (RC): person who is responsible for Resource Center.
  • NGI Representative (NGI): person who is responsible for NGI Operations.
  • EGI Operations / EGI CSIRT (EGI): person who decides and perform suspension at EGI Level
  • VO Manager (VOM): person resposible for VO. Notified about Resource Center suspension.

Use cases

Resource Center can be suspended by NGI or EGI in case of breaking  Resource Centre Operational Level Agreement

  • PROC01: Resource Center is failing EGI Infrastructure Oversight escalation procedure
    • Level 3: NGI/ROC operations manager should make Resource Center responsive or suspend it
    • Level 4: If no action was taken by NGI/ROC operations manager for 5 working days Operations send an mail to NGI/ROC operations manager with CC to site administrator, ROD and GGUS. If no response after 1 working day Operations performs Resource Center suspension.
  • PROC04: Resource Center is underperforming (below the OLA target) for 3 consecutive months
  • PROC16: Resource Center is failing Decommissioning of unsupported software procedure
    • Follow up the migration: Resource Center which didn't provide information on migration plans can be suspended
  • SEC01-05: Resource Center is failing SecurityIncident or Critical Security procedure

Steps



Responsible Action Notes
1 NGI/EGI   Decision about suspension
2 NGI/EGI

Notification is sent to Resource Center Manager and to NGI Representative – 3 working day on reaction


3 NGIR/EGI
If there is no reply from Resource Center Manager
  1. If NGI is realizing procedure: Send notification to: Resource Center Manager and VO Managers of supported VOs
  2. If EGI is realizing procedure: Send notifications to: NGI Representative, Resource Center Manager and VO Manager of supported VOs

VO supported on given site can be found via:

http://operations-portal.egi.eu/vo/rd


Revision history

Version Authors Date Comments