Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI Operations Start Guide"

From EGIWiki
Jump to navigation Jump to search
(44 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Template:Doc_menubar}} {{Template:Man_menubar}} {{TOC_right}}  
{{Template:Op menubar}} {{TOC_right}}  


== Introduction  ==
== Introduction  ==


This document present the procedures and responsibilities of the various parties involved in the running of the EGI infrastructure. As a newcomer, you need to understand the structure of the EGI project and roles of operators at different levels, and read the parts of the manual which apply to you. You are encouraged to read also the other parts of the manual. It is not necessary — we strive to keep the individual parts as independent as possible — but reading the whole document will give you a complete overall picture of daily operations within EGI.  
EGI Operations Start Guide was created to help you<span lang="en" id="result_box" class="short_text"><span title="Click for alternate translations" class="hps"> start with EGI&nbsp;Operations duties. It</span></span> presents the responsibilities of the various parties involved in the running of the EGI infrastructure and guide how to join operations. As a newcomer, you need to understand the structure of the EGI project and roles of operators at different levels. Reading the whole document will give you a complete overall picture of daily operations within EGI.  


== [[Operations/General/Roles|Roles]] ==
== Roles  ==


A brief description of the various players in EGI Operations. There are more detailed descriptions found in the menu bar at the top of this page.  
The following describes the roles that are commonly found in the EGI Infrastructure and Operations. Other terms and definitions can be found in [[Glossary|EGI Glossary]].  


== [[Operations/General/Joining operations|Joining operations]] ==
=== '''Site level''' ===


A minimal list of requirements for joining operations teams.
==== Site Administrator  ====


== [[Operations/General/Security|Security]] ==
The person responsible for keeping the site operational. In the scope of Operations, site administrators primarily receive and react on notification of one or more incidents at their site. They will also need to react to security issues that are at a global level, but affect their site. Site administrators should respond to [http://ggus.eu GGUS tickets] in a suitable time frame and be aware of the alarms at their site, eg. through the [https://operations-portal.egi.eu operations dashboard]. Sites must only operate supported middleware versions. This implies upgrading it from time to time. Emergency releases are treated in a special way. See [[EGI CSIRT:Critical Vulnerability Handling]].


A brief note on the role the CSIRT and EGI security teams play in operations.  
All Site management responsibilities are listed in [https://documents.egi.eu/document/31 RC OLA document].  


== [[Operations/General/Tools|Tools]] ==
==== Site Operations Manager ====


A list of tools relevant to EGI operations. A full of EGI tools can also found in at https://wiki.egi.eu/wiki/Tools
The person responsible for the site at the political and legal level. S/he is responsible for signing the Operations Level Agreement ([https://documents.egi.eu/public/ShowDocument?docid=31 OLA]) between the Site and the NGI that hosts the site operationally. The Site Operations Manager is also responsible for assigning and approving the other site roles in the [https://goc.egi.eu/ GOCDB]. Further, s/he should ensure that administrators are subscribed to relevant mailing lists.


== [[Operations/General/Links|Links]] ==
==== Site Security Officer ====


Other links which may be useful for operations.  
The person responsible for keeping the site compliant with the [[EGI CSIRT:Policies|Security policies]]. She/he is also the primary contact for the NGI Security officer and EGI CSIRT. The Site Security Officer deals with security incidents and shall respond to enquiries in a timely fashion as defined in the collection of [[EGI CSIRT:Policies|security procedures and policies]].  


== Updating these manuals ==
=== '''Regional level''' ===


The Operations Documentation group is primarily responsible for maintaining and updating these manuals. If you discover inconsistencies or obsolete procedures, please contact them via the [mailto:operational-documentation@mailman.egi.eu Operational documentation mailing list]
==== Regional Operator on Duty (ROD)<br> ====


== [[Operations/General/Revision|Revision history and Approval]] ==
A team responsible for solving problems/incidents in the infrastructure according to agreed procedures. ROD (teams) monitor the sites in their region, react to problems identified by the monitoring tools, and oversee problems through to their resolution. They ensure that problems are properly recorded and that the solutions progress according to specified time lines. They also provide support to sites and VOs as needed and provide informational flow to oversight bodies in cases of non-responsive sites. They ensure that all necessary information is available to all parties. The team is provided by each NGI and requires procedural knowledge on the process (rather than technical skills) for their work. New ROD team members are required to read the [[Grid operations oversight/ROD Welcome page|ROD Welcome page]] and be familiar with [[Grid operations oversight/ROD|ROD wiki page]].


A page to track the revision history and approval dates for the entire manual. This keeps the individual pages clean of excess clutter and makes it easier for navigation
==== NGI Security officer  ====


[[Category:TODO_DOC]]
The member of EGI-CSIRT IRTF (Incidendent Response Task Force) currently on shift. Further information can be found at the [[EGI CSIRT:IRTF|CSIRT:IRTF]] page. The role of the IRTF team is to handle day to day operational security issues and coordinate Computer-Security-Incident-Response across the EGI infrastructure. NGIs and Sites '''MUST''' respond in a timely manner to its requests and alerts.
 
==== NGI operations manager  ====
 
NGI operations manager is the contact point for all operational matters and represents the NGI within the [[OMB|Operations Management Board]].
 
S/he is mainly responsible for:
 
*keeping the NGI entry in the GOCDB up to date and for managing the status of all sites under that NGI, and ensuring that that information is also kept current
*addressing problems with Site availability or reliability. The reports are issued on a monthly basis and the NGI operations manager has 10 days to respond to identified problems
*attending regular [[OMB|Operations-Management-Board (OMB) meetings]]
 
All NGI operations management responsibilities are listed in [https://documents.egi.eu/document/463 RP OLA document].
 
=== '''Project level'''  ===
 
==== Chief Operations Officer  ====
 
Chief Operations Officer leads EGI Operations, and is responsible for coordinating the operations of the infrastructure across the project.
 
==== EGI CSIRT  ====
 
[[Security|EGI CSIRT]] is an official security team coordinator and contact point at project level.
 
==== Operations Support  ====
 
Operations Support team is provided on a global layer and is responsible for the supporting EGI Operations. Examples of its activities are service level management, service level reporting, service management in general and central technical.
 
==== VO  ====
 
A Virtual Organisation (VO) is a group of users and, optionally, resources, often not bound to a single institution or national borders, who, by reason of their common membership and in sharing a common goal, are given authority to use a set of resources. Each VO member signs the VO AUP (during registration) which is the policy document describing the goals of the VO thereby defining the expected and acceptable use of the Grid by the users of the VO. User documentation can be found [[User Documentation|here]].
 
==== VO manager  ====
 
An individual responsible for the membership registry of the VO including its accuracy and integrity.
 
== Joining operations  ==
 
In order to join any of the organisational groups in your NGI, you will need to go through the following steps in order:
 
=== Obtain a Grid certificate.  ===
 
If you do not already have a GRID certificate [http://www.eugridpma.org/members/worldmap/ this page] provides a map of all certification authorities according to country (or NGI). Select your country on the map to find out who is your local CA. Follow the procedure for your local CA to request a certificate. When you have received your certificate, install it into your web browser.
 
If case of setting up new Resource Center please request for Host certificate.
 
CERN provides a webpage for testing your certificate [https://grid-deployment.web.cern.ch/grid-deployment/cgi-bin/CertTest/CertTest.cgi here]. Please use this resource and contact your CA if your certificate does not work.
 
=== Join Dteam VO  ===
 
It is recommended to join the [[Dteam vo|dteam VO]] at the [https://voms.hellasgrid.gr:8443/vo/dteam/vomrs dteam Registration] page. You should request group membership for <tt>/dteam</tt> and <tt>/dteam/YOUR_NGI</tt>. The dteam group manager will then be notified by the vomrs software.
 
=== Request GOCDB access  ===
 
*Read [[GOCDB/Input System User Documentation|Input System User Documentation ]] first.
*Go to the [http://goc.egi.eu/ GOCDB instance] and follow [[GOCDB/Input System User Documentation#Users_and_roles|the instruction]]
 
All new members '''need to notify their NGI operations manager''' about their role request, as GOCDB currently '''does not '''send any notification about pending requests.
 
=== Register into GGUS  ===
 
To register into GGUS please follow the [https://ggus.eu/?mode=register Central GGUS registration] link. GGUS can be accessed with only your certificate. Do not forget to apply for [https://ggus.eu/?mode=register the support role] as well. (The GGUS support staff will approve you quickly as they get the notification automatically.)
 
Some NGIs also have a local helpdesk or a regional GGUS. Ask your NGI operations manager if how to register to them.
 
=== Subscribe to mailing lists.  ===
 
NGIs and Sites have local mailing lists for ROD team members and Site Administrators respectively. Please ensure that you subscribe to them. Depending on your role ask your NGI operations manager or Site operations manager to have you included on the necessary mailing lists if there is no automatic subscription process.
 
NGI operations manager  should contact operations@egi.eu and state that wish to be subscribed to noc-managers mailing list noc-managers@mailman.egi.eu.
 
<br>
 
== Documentation  ==
 
Documentation relevant to EGI operations can also found at [[Documentation|EGI Documentation wiki page]]
 
== Tools  ==
 
A list of tools relevant to EGI operations can also found at [[Tools|EGI Tools wiki page]]
 
{| width="100%" style="background: none repeat scroll 0% 0% rgb(249, 249, 249); margin: 1.2em 0px 6px; border: 1px solid rgb(221, 221, 221);"
|-
| style="width:61%; color:#000;" |
{| style="width:280px; border:none; background:none;"
|-
| style="width:280px; text-align:center; white-space:nowrap; color:#000;" | <div style="font-size:162%; border:none; margin:0; padding:.1em; color:#000;">[[Support |Need support?]]<br></div>
|}
 
| style="width:40%; font-size:95%;" | <br>
|}
 
[[Category:Operations]]

Revision as of 07:00, 3 May 2017

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security



Introduction

EGI Operations Start Guide was created to help you start with EGI Operations duties. It presents the responsibilities of the various parties involved in the running of the EGI infrastructure and guide how to join operations. As a newcomer, you need to understand the structure of the EGI project and roles of operators at different levels. Reading the whole document will give you a complete overall picture of daily operations within EGI.

Roles

The following describes the roles that are commonly found in the EGI Infrastructure and Operations. Other terms and definitions can be found in EGI Glossary.

Site level

Site Administrator

The person responsible for keeping the site operational. In the scope of Operations, site administrators primarily receive and react on notification of one or more incidents at their site. They will also need to react to security issues that are at a global level, but affect their site. Site administrators should respond to GGUS tickets in a suitable time frame and be aware of the alarms at their site, eg. through the operations dashboard. Sites must only operate supported middleware versions. This implies upgrading it from time to time. Emergency releases are treated in a special way. See EGI CSIRT:Critical Vulnerability Handling.

All Site management responsibilities are listed in RC OLA document.

Site Operations Manager

The person responsible for the site at the political and legal level. S/he is responsible for signing the Operations Level Agreement (OLA) between the Site and the NGI that hosts the site operationally. The Site Operations Manager is also responsible for assigning and approving the other site roles in the GOCDB. Further, s/he should ensure that administrators are subscribed to relevant mailing lists.

Site Security Officer

The person responsible for keeping the site compliant with the Security policies. She/he is also the primary contact for the NGI Security officer and EGI CSIRT. The Site Security Officer deals with security incidents and shall respond to enquiries in a timely fashion as defined in the collection of security procedures and policies.

Regional level

Regional Operator on Duty (ROD)

A team responsible for solving problems/incidents in the infrastructure according to agreed procedures. ROD (teams) monitor the sites in their region, react to problems identified by the monitoring tools, and oversee problems through to their resolution. They ensure that problems are properly recorded and that the solutions progress according to specified time lines. They also provide support to sites and VOs as needed and provide informational flow to oversight bodies in cases of non-responsive sites. They ensure that all necessary information is available to all parties. The team is provided by each NGI and requires procedural knowledge on the process (rather than technical skills) for their work. New ROD team members are required to read the ROD Welcome page and be familiar with ROD wiki page.

NGI Security officer

The member of EGI-CSIRT IRTF (Incidendent Response Task Force) currently on shift. Further information can be found at the CSIRT:IRTF page. The role of the IRTF team is to handle day to day operational security issues and coordinate Computer-Security-Incident-Response across the EGI infrastructure. NGIs and Sites MUST respond in a timely manner to its requests and alerts.

NGI operations manager

NGI operations manager is the contact point for all operational matters and represents the NGI within the Operations Management Board.

S/he is mainly responsible for:

  • keeping the NGI entry in the GOCDB up to date and for managing the status of all sites under that NGI, and ensuring that that information is also kept current
  • addressing problems with Site availability or reliability. The reports are issued on a monthly basis and the NGI operations manager has 10 days to respond to identified problems
  • attending regular Operations-Management-Board (OMB) meetings

All NGI operations management responsibilities are listed in RP OLA document.

Project level

Chief Operations Officer

Chief Operations Officer leads EGI Operations, and is responsible for coordinating the operations of the infrastructure across the project.

EGI CSIRT

EGI CSIRT is an official security team coordinator and contact point at project level.

Operations Support

Operations Support team is provided on a global layer and is responsible for the supporting EGI Operations. Examples of its activities are service level management, service level reporting, service management in general and central technical.

VO

A Virtual Organisation (VO) is a group of users and, optionally, resources, often not bound to a single institution or national borders, who, by reason of their common membership and in sharing a common goal, are given authority to use a set of resources. Each VO member signs the VO AUP (during registration) which is the policy document describing the goals of the VO thereby defining the expected and acceptable use of the Grid by the users of the VO. User documentation can be found here.

VO manager

An individual responsible for the membership registry of the VO including its accuracy and integrity.

Joining operations

In order to join any of the organisational groups in your NGI, you will need to go through the following steps in order:

Obtain a Grid certificate.

If you do not already have a GRID certificate this page provides a map of all certification authorities according to country (or NGI). Select your country on the map to find out who is your local CA. Follow the procedure for your local CA to request a certificate. When you have received your certificate, install it into your web browser.

If case of setting up new Resource Center please request for Host certificate.

CERN provides a webpage for testing your certificate here. Please use this resource and contact your CA if your certificate does not work.

Join Dteam VO

It is recommended to join the dteam VO at the dteam Registration page. You should request group membership for /dteam and /dteam/YOUR_NGI. The dteam group manager will then be notified by the vomrs software.

Request GOCDB access

All new members need to notify their NGI operations manager about their role request, as GOCDB currently does not send any notification about pending requests.

Register into GGUS

To register into GGUS please follow the Central GGUS registration link. GGUS can be accessed with only your certificate. Do not forget to apply for the support role as well. (The GGUS support staff will approve you quickly as they get the notification automatically.)

Some NGIs also have a local helpdesk or a regional GGUS. Ask your NGI operations manager if how to register to them.

Subscribe to mailing lists.

NGIs and Sites have local mailing lists for ROD team members and Site Administrators respectively. Please ensure that you subscribe to them. Depending on your role ask your NGI operations manager or Site operations manager to have you included on the necessary mailing lists if there is no automatic subscription process.

NGI operations manager should contact operations@egi.eu and state that wish to be subscribed to noc-managers mailing list noc-managers@mailman.egi.eu.


Documentation

Documentation relevant to EGI operations can also found at EGI Documentation wiki page

Tools

A list of tools relevant to EGI operations can also found at EGI Tools wiki page