Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Operations Procedures"

From EGIWiki
Jump to navigation Jump to search
Line 11: Line 11:
|- style="background-color: lightgray;"
|- style="background-color: lightgray;"
| '''Number'''  
| '''Number'''  
| '''Title'''
| '''Comment'''
| '''Status'''  
| '''Status'''  
| '''Area'''  
| '''Area'''  
| '''Relevant to'''  
| '''Relevant to'''  
| '''Title'''
| '''Comment'''
|-
|-
| [[PROC01|PROC 01]]  
| [[PROC01|PROC 01]]  
| [[PROC01|COD Escalation Procedure]]
| Operations ticket escation
| ''approved'', October 26 2010  
| ''approved'', October 26 2010  
| Ticket Management  
| Ticket Management  
| Resource Centre Administrators, Operations Centres, COD  
| Resource Centre Administrators, Operations Centres, COD  
| [[PROC01|COD Escalation Procedure]]
| Operations ticket escation
|-
|-
| [[PROC02|PROC 02]]  
| [[PROC02|PROC 02]]  
| [[PROC02|Operations Centre Creation]]
| Step-by-step instructions on how to create a new Operations Centre
| ''approved'', August 17 2010  
| ''approved'', August 17 2010  
| Operations Centre Management  
| Operations Centre Management  
| Operations Centres, COD  
| Operations Centres, COD  
| [[PROC02|Operations Centre Creation]]
| Step-by-step instructions on how to create a new Operations Centre
|-
|-
| [[PROC03|PROC 03]]  
| [[PROC03|PROC 03]]  
| [[PROC03|Operations Centre decommissioning]]
| Step-by-step instructions on how to decommission an Operations Centre
| ''approved'', October 26 2010  
| ''approved'', October 26 2010  
| Operations Centre Management  
| Operations Centre Management  
| Operations Centres, COD  
| Operations Centres, COD  
| [[PROC03|Operations Centre decommissioning]]
| Step-by-step instructions on how to decommission an Operations Centre
|-
|-
| [[Availability and reliability monthly statistics#Process_for_quality_verification|PROC 04]]  
| [[Availability and reliability monthly statistics#Process_for_quality_verification|PROC 04]]  
| [[Availability and reliability monthly statistics#Process_for_quality_verification|Quality verification of monthly availability and reliability statistcs]]
| Instructions RODs and Operations Centres on how to handle justification for poor monthly performance through GGUS
| ''approved'', August 17 2010  
| ''approved'', August 17 2010  
| Availability and Monitoring  
| Availability and Monitoring  
| Resource Centre Administrators, Operations Centres, COD  
| Resource Centre Administrators, Operations Centres, COD  
| [[Availability and reliability monthly statistics#Process_for_quality_verification|Quality verification of monthly availability and reliability statistcs]]
| Instructions RODs and Operations Centres on how to handle justification for poor monthly performance through GGUS
|-
|-
| [[PROC05|PROC 05]]  
| [[PROC05|PROC 05]]  
| [https://twiki.cern.ch/twiki/bin/view/EGEE/ValidateROCNagios Validation of a Operations Centre Nagios]
| This procedure is part of the [[Operations Centre creation process coordination|Operations Centre creation]] procedure.
| ''approved'', August 17 2010  
| ''approved'', August 17 2010  
| Availability and Monitoring  
| Availability and Monitoring  
| Operations Centres, COD  
| Operations Centres, COD  
| [https://twiki.cern.ch/twiki/bin/view/EGEE/ValidateROCNagios Validation of a Operations Centre Nagios]
| This procedure is part of the [[Operations Centre creation process coordination|Operations Centre creation]] procedure.
|-
|-
| [[PROC06|PROC 06]]  
| [[PROC06|PROC 06]]  
| [[PROC06|Setting a Nagios test status to OPERATIONS]]
| A Nagios probe is set to OPERATIONS when its results are used to generate notifications for the Operations Dashboard. This procedure details the steps to turn a Nagios test to OPERATIONs.
| ''approved'', Nov 23 2010  
| ''approved'', Nov 23 2010  
| Availability and Monitoring  
| Availability and Monitoring  
| Operations Centres, COD  
| Operations Centres, COD  
| [[PROC06|Setting a Nagios test status to OPERATIONS]]
| A Nagios probe is set to OPERATIONS when its results are used to generate notifications for the Operations Dashboard. This procedure details the steps to turn a Nagios test to OPERATIONs.
|-
|-
| [[PROC07|PROC 07]] <!-- Procedure number -->  
| [[PROC07|PROC 07]] <!-- Procedure number -->  
| [[PROC07|Adding new probes to SAM]] <!-- Title -->
| Addition of new OPS Nagios probes to the SAM release. <!-- Comment -->
| ''approved'', Mar 28 2011 <!-- Status -->  
| ''approved'', Mar 28 2011 <!-- Status -->  
| Availability and Monitoring <!-- Area -->  
| Availability and Monitoring <!-- Area -->  
| Resource Centre Administrators, Operations Centres, COD <!-- Relevant to -->  
| Resource Centre Administrators, Operations Centres, COD <!-- Relevant to -->  
| [[PROC07|Adding new probes to SAM]] <!-- Title -->
| Addition of new OPS Nagios probes to the SAM release. <!-- Comment -->
|-
|-
| [[PROC08|PROC 08]] <!-- Procedure number -->  
| [[PROC08|PROC 08]] <!-- Procedure number -->  
| [[PROC08|Management of the EGI OPS Availability and Reliability Profile]] <!-- Title -->
| Request of a OPS EGI Availability and Reliability profile. A change in the profile is needed every time a new Nagios test needs to be added/removed to/from the profile, in order to have its results included/removed in/from Availability and Reliability monthly statistics. <!-- Comment -->
| ''approved'', Mar 28 2011 <!-- Status -->  
| ''approved'', Mar 28 2011 <!-- Status -->  
| Availability and Monitoring <!-- Area -->  
| Availability and Monitoring <!-- Area -->  
| Resource Centre Administrators, Operations Centres, COD <!-- Relevant to -->  
| Resource Centre Administrators, Operations Centres, COD <!-- Relevant to -->  
| [[PROC08|Management of the EGI OPS Availability and Reliability Profile]] <!-- Title -->
| Request of a OPS EGI Availability and Reliability profile. A change in the profile is needed every time a new Nagios test needs to be added/removed to/from the profile, in order to have its results included/removed in/from Availability and Reliability monthly statistics. <!-- Comment -->
|-
|-
|[[PROC09|PROC 09]] <!-- Procedure number -->  
|[[PROC09|PROC 09]] <!-- Procedure number -->  
| [[PROC09|Resource Centre Registration and Certification Procedure]] <!-- Title -->
| Registration of a new Resource Centre in the GOCDB
| ''approved May 17 2011''
| ''approved May 17 2011''
| Resource Centre Management
| Resource Centre Management
| Resource Centre Administrator, Operations Centres
| Resource Centre Administrator, Operations Centres
| [[PROC09|Resource Centre Registration and Certification Procedure]] <!-- Title -->
| Registration of a new Resource Centre in the GOCDB
|-
|-
|[[PROC10|PROC 10]] <!-- Procedure number -->  
|[[PROC10|PROC 10]] <!-- Procedure number -->  
| [[PROC10|Recomputation of monitoring results and availability statistics]] <!-- Title -->
| Notification of problems with the monitoring results gathered by SAM and to request a recomputation of results and the related availability and reliability statistics
| ''approved'', Oct 17 2011 <!-- Status -->  
| ''approved'', Oct 17 2011 <!-- Status -->  
| Availability and Monitoring <!-- Area -->  
| Availability and Monitoring <!-- Area -->  
| Resource Centre Administrators, Operations Centres<!-- Relevant to -->  
| Resource Centre Administrators, Operations Centres<!-- Relevant to -->  
| [[PROC10|Recomputation of monitoring results and availability statistics]] <!-- Title -->
| Notification of problems with the monitoring results gathered by SAM and to request a recomputation of results and the related availability and reliability statistics
|-
|-
| [[PROC11|PROC 11]]
| [[PROC11|PROC 11]]
| [[PROC11|Resource Centre Decommissioning Procedure]]
| Decommissioning of a Resource Centre before it is turned into CLOSED in GOCDB
| ''approved'', Feb 28 2012
| ''approved'', Feb 28 2012
| Resource Centre Management
| Resource Centre Management
| Resource Centre Administrator, Operations Centres
| Resource Centre Administrator, Operations Centres
| [[PROC11|Resource Centre Decommissioning Procedure]]
| Decommissioning of a Resource Centre before it is turned into CLOSED in GOCDB
|-
|-
| [[PROC12|PROC 12]]
| [[PROC12|PROC 12]]
| [[PROC12|Production Service Decommissioning Procedure]]
| Decommissioning of a EGI production service
| ''approved'', Feb 28 2012
| ''approved'', Feb 28 2012
| Resource Centre Management
| Resource Centre Management
| Resource Centre Administrator, Operations Centres
| Resource Centre Administrator, Operations Centres
| [[PROC12|Production Service Decommissioning Procedure]]
| Decommissioning of a EGI production service
|-
|-
| [[PROC13|PROC 13]]
| [[PROC13|PROC 13]]
| [[PROC13|Vo Deregistration Procedure]]
| Decommissioning of a Virtual Organization supported by the European Grid Infrastructure
| ''approved'', Jul 17 2012
| ''approved'', Jul 17 2012
| VO Management
| VO Management
| VO Managers, Operations Manager
| VO Managers, Operations Manager
| [[PROC13|Vo Deregistration Procedure]]
| Decommissioning of a Virtual Organization supported by the European Grid Infrastructure
|}
|}


Line 113: Line 113:
|- style="background-color:lightgray;"
|- style="background-color:lightgray;"
| '''Number'''
| '''Number'''
| '''Title'''
| '''Comment'''
| '''Status'''
| '''Status'''
| '''Area'''
| '''Area'''
| '''Relevant to'''
| '''Relevant to'''
| '''Title'''
| '''Comment'''
|-
|-
| SEC 01
| SEC 01
| [https://documents.egi.eu/document/710 EGI Security Incident Handling]
| The "Security Incident Handling Procedure" define site and incident coordinator responsibilities when handling Grid-related security incident. ALL EGI sites are required to follow this procedure to report and handle Grid-related security incident.
| ''approved'', July 2010 (MS405)
| ''approved'', July 2010 (MS405)
| Security  
| Security  
| Resource Centres, EGI CSIRT
| Resource Centres, EGI CSIRT
| [https://documents.egi.eu/document/710 EGI Security Incident Handling]
| The "Security Incident Handling Procedure" define site and incident coordinator responsibilities when handling Grid-related security incident. ALL EGI sites are required to follow this procedure to report and handle Grid-related security incident.
|-
|-
| SEC 02 <!-- number -->
| SEC 02 <!-- number -->
| [https://documents.egi.eu/document/717 EGI Vulnerability issue handling process] <!-- title and wiki link -->
| The process used to report and resolve Grid Software vulnerabilities in the EGI Inspire project. <!-- comment-->
| ''approved'', July 2010 (MS405) <!-- status, date of approval -->
| ''approved'', July 2010 (MS405) <!-- status, date of approval -->
| Security <!-- area -->
| Security <!-- area -->
| Resource Centres, Risk Assessment Team, Technology Providers, SVG <!-- Relevant to -->
| Resource Centres, Risk Assessment Team, Technology Providers, SVG <!-- Relevant to -->
| [https://documents.egi.eu/document/717 EGI Vulnerability issue handling process] <!-- title and wiki link -->
| The process used to report and resolve Grid Software vulnerabilities in the EGI Inspire project. <!-- comment-->
|-
|-
| [[SEC03|SEC 03]] <!-- number -->
| [[SEC03|SEC 03]] <!-- number -->
| [https://documents.egi.eu/document/283 Critical Vulnerability Operational Procedure] <!-- title and wiki link -->
| After a problem has been assessed as critical, and a solution is available, then sites are required to take action. This document primarily defines the procedure from this time, where sites are asked to take action, and what steps are taken if they do not respond or do not take action. If a site fails to take action, this may lead to site suspension. <!-- comment-->
| ''approved'', March 15 2011 <!-- status, date of approval -->
| ''approved'', March 15 2011 <!-- status, date of approval -->
| Security <!-- area -->
| Security <!-- area -->
| Resource Centres, Operations Centres, EGI-CSIRT, SVG <!-- Relevant to -->
| Resource Centres, Operations Centres, EGI-CSIRT, SVG <!-- Relevant to -->
| [https://documents.egi.eu/document/283 Critical Vulnerability Operational Procedure] <!-- title and wiki link -->
| After a problem has been assessed as critical, and a solution is available, then sites are required to take action. This document primarily defines the procedure from this time, where sites are asked to take action, and what steps are taken if they do not respond or do not take action. If a site fails to take action, this may lead to site suspension. <!-- comment-->
|-
|-
|}
|}

Revision as of 16:08, 23 October 2012

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators

Operations

EGI Operational Procedures are prescriptive documents that describe step-by-step processes involving several partners. The purpose of a procedure is define the related workflow. Procedures are approved by the OMB and are periodically reviewed.

Number Title Comment Status Area Relevant to
PROC 01 COD Escalation Procedure Operations ticket escation approved, October 26 2010 Ticket Management Resource Centre Administrators, Operations Centres, COD
PROC 02 Operations Centre Creation Step-by-step instructions on how to create a new Operations Centre approved, August 17 2010 Operations Centre Management Operations Centres, COD
PROC 03 Operations Centre decommissioning Step-by-step instructions on how to decommission an Operations Centre approved, October 26 2010 Operations Centre Management Operations Centres, COD
PROC 04 Quality verification of monthly availability and reliability statistcs Instructions RODs and Operations Centres on how to handle justification for poor monthly performance through GGUS approved, August 17 2010 Availability and Monitoring Resource Centre Administrators, Operations Centres, COD
PROC 05 Validation of a Operations Centre Nagios This procedure is part of the Operations Centre creation procedure. approved, August 17 2010 Availability and Monitoring Operations Centres, COD
PROC 06 Setting a Nagios test status to OPERATIONS A Nagios probe is set to OPERATIONS when its results are used to generate notifications for the Operations Dashboard. This procedure details the steps to turn a Nagios test to OPERATIONs. approved, Nov 23 2010 Availability and Monitoring Operations Centres, COD
PROC 07 Adding new probes to SAM Addition of new OPS Nagios probes to the SAM release. approved, Mar 28 2011 Availability and Monitoring Resource Centre Administrators, Operations Centres, COD
PROC 08 Management of the EGI OPS Availability and Reliability Profile Request of a OPS EGI Availability and Reliability profile. A change in the profile is needed every time a new Nagios test needs to be added/removed to/from the profile, in order to have its results included/removed in/from Availability and Reliability monthly statistics. approved, Mar 28 2011 Availability and Monitoring Resource Centre Administrators, Operations Centres, COD
PROC 09 Resource Centre Registration and Certification Procedure Registration of a new Resource Centre in the GOCDB approved May 17 2011 Resource Centre Management Resource Centre Administrator, Operations Centres
PROC 10 Recomputation of monitoring results and availability statistics Notification of problems with the monitoring results gathered by SAM and to request a recomputation of results and the related availability and reliability statistics approved, Oct 17 2011 Availability and Monitoring Resource Centre Administrators, Operations Centres
PROC 11 Resource Centre Decommissioning Procedure Decommissioning of a Resource Centre before it is turned into CLOSED in GOCDB approved, Feb 28 2012 Resource Centre Management Resource Centre Administrator, Operations Centres
PROC 12 Production Service Decommissioning Procedure Decommissioning of a EGI production service approved, Feb 28 2012 Resource Centre Management Resource Centre Administrator, Operations Centres
PROC 13 Vo Deregistration Procedure Decommissioning of a Virtual Organization supported by the European Grid Infrastructure approved, Jul 17 2012 VO Management VO Managers, Operations Manager

Security

Number Title Comment Status Area Relevant to
SEC 01 EGI Security Incident Handling The "Security Incident Handling Procedure" define site and incident coordinator responsibilities when handling Grid-related security incident. ALL EGI sites are required to follow this procedure to report and handle Grid-related security incident. approved, July 2010 (MS405) Security Resource Centres, EGI CSIRT
SEC 02 EGI Vulnerability issue handling process The process used to report and resolve Grid Software vulnerabilities in the EGI Inspire project. approved, July 2010 (MS405) Security Resource Centres, Risk Assessment Team, Technology Providers, SVG
SEC 03 Critical Vulnerability Operational Procedure After a problem has been assessed as critical, and a solution is available, then sites are required to take action. This document primarily defines the procedure from this time, where sites are asked to take action, and what steps are taken if they do not respond or do not take action. If a site fails to take action, this may lead to site suspension. approved, March 15 2011 Security Resource Centres, Operations Centres, EGI-CSIRT, SVG

More information

EGI Policies and Procedures

See all EGI policies and procedures

Contacts

If you wish to report problems with this page, or want to suggest additions and improvements please contact:

operational-documentation-manuals[at]mailman.egi.eu