PROC09 Resource Centre Registration and Certification
Title | Site Certification Procedure |
Document link | to be determined |
Last modified | |
Policy Group Acronym | |
Policy Group Name | Operational Documentation |
Contact Person | Vera Hansper |
Document Status | DRAFT |
Approved Date | |
Procedure Statement | A procedure for the steps involved to both register and certify new sites in the EGI infrastructure. The certification step can also be used to re-certify suspended sites. |
Introduction
Certification is a pre-requisite for a Resource Centre (aka site) to become part of a Resource Infrastructure such as a National Grid Initiative (NGI) and EIRO (in Europe), or multi-country Resource Infrastructure.
This document describes the steps required
- to register and certify a new site,
- to re-certify a site which has been suspended.
Note: A separate document provides the process for decommissioning a site.
Through its parent Resource Infrastructure, a certified Resource Centre becomes member of the EGI Resource Infrastructure to make resources available to international user communities.
A certified site guarantees a minimum quality of service of service of these resources (currently expressed in terms of monthly availability and reliability), it must ensure troubles are handled in a timely fashion, it must understand and adhere to a common set of policies and procedures. This compares to an uncertified, or test Resource Centre, which does not provide a guarantee on the availability or usability of it's resources.
Definitions
The entities involved in this procedureare defined in the EGI Glossary.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Entities involved in the procedure
- Resource Centre (or Site) Operations Manager, who is responsible of initiating the certification process by applying for membership to a Resource Infrastructure
- Resource Infrastructure Operations Manager, who is responsible of approving the integration of a new Resource Centre into the respective Infrastructure
- Operations Centre, the entity who is technically responsible of carrying out the Resource Centre certification part of the procedure, once the membership is approved
The Resource Infrastructure Operations Manager can determine with the Site Operations Manager the level of involvement of other actors.
Contact information
- EGI Operations: operations (at) mailman.egi.eu
- EGI Resource Infrastructure Providers are listed on the EGI [web site] // provide reference
- Operations Centre contact information is available on GOCBD // provide link to instructions page
Actions and responsibilities
Site Operations Manager
- A Resourece Infratructure Provider is responsible for all sites within their jurisdiction (for example, a NGI is the reference entity for each country). For this reason, the Site Operations Manager of a new site needs to contact the respective NGI if in Europe, or a Resource Infrastructure Provider active in a relevant geographical area if outside Europe, about the intention to join the EGI infrastructure. If needed, EGI Operations can assist the Site Operations Manager to get in contact with the relevant partners (see the Contact information section).
- In order to be certified, the Site Operations Manager is responsible of reading, understanding and accepting the Resource Centre Operational Level Agreement, which defines the obligations of a Resource Centre and the committment to deliver a minimum quality of service to its future users. Endorsement of OLA implies - among other things - the acceptance of
- the Grid Security Policy
- the Grid Site Operations Policy
- the Site Registration Security Policy
- all other policies for all EGI participants from Security Policy Group
Resource Infrastructure Provider Operations Manager
- The Resource Infrastructure Provider Operations Managers MUST attend site certification applications and MUST provide feedback to the requesting partners in a timely manner to accept or reject the requests received.
- He/she MUST provide information to the Site Operations Manager about the Resource Centre OLA, and is responsible of keeping records of Site Operations Manager agreement, as deemed suitable by the Resource Infrastructure Provider (for example, through signed e-mail agreement, the collection of signatories on a paper copy of the OLA, or other means).
- In case a request is accepted, the Resource Infrastructure Provider Operations Manager MUST contact the relevant Operations Centre to start the site registration as candidate, and the certification procedure. Registration is only needed in case of new sites.
ROD actions and responsibilities
The various steps then required by both the NGI manager and the Site Operations Manager are explained in the tables below. The first part for a new site is the registration process and it is important to note that a Site Security Officer is required for each site and that mailing lists for both the Site's CSIRT and to contact site administrators is required. Details of the required information are also found in a link in step one of the first table (SiteCertMan/Required_information) and which are entered into EGI's infrastructure data base, the GOCDB.
Further, before a site can be certified, it is important that the Site Operations Manager reads and accepts the Grid Site Operations Policy, the Site Registration Security Policy and the NGI/SITE OLA. The links for these are found in the required documentation section.
There are also a number of steps which require the integration of the site with monitoring tools, and during the certification process, the site should become registered into the NGI's NAGIOS instance. Once the site has passed all tests consecutively for 2 to 3 days, it can be marked as certified. The actual certification process, in the second table, is applicable to both new and suspended sites.
The general status flow that a site is allowed to follow is neatly given by the following:
One final point: It is highly recommended that email contacts for the site's administrators and security officer(s) are mailing lists, and not individuals.
Site registration procedure
These steps describe what a site that is willing to join the EGI infrastructure needs to do and is applicable for a site not already registered in the GOCDB.
Actions falling on the NGI are the responsibility of the NGI manager. Actions falling on the Site are the responsibility of the Site Manager/Representative.
Note that a site MUST be part of an NGI/Group of NGIs, and if there is no suitable NGI for your country, it may be that the NGI must first be created. In this case, please see [this NewNGIs_creation link] for how to create a new NGI.
# | Responsible | Action |
---|---|---|
1 | Site |
|
2 | NGI |
The following actions can be done in parallel:
|
3 | NGI |
|
4 | Site |
|
5 | Site or NGI |
|
6 | NGI |
|
7 | NGI |
|
8 | NGI |
|
After the successful completion of all these steps, the site is considered as to be in the "Candidate" state and is ready for the certification process.
Site certification procedure
The Site Certification procedure is applicable for both new sites which have reached the "Candidate" state and for suspended sites, The following is a detailed description of the steps required for the transition from the "Uncertified" to the "Certified" state of the site.
# | Responsible | Action |
---|---|---|
1 | Site |
|
2 | NGI |
|
3 | NGI | Check that the GIIS (gLite: BDII) is working, and publishing coherent values, namely:
There are detailed examples for how to do this in SiteCertMan/GIIS_BDII_check. |
4 | NGI |
Check that the registered services are fully functional by performing manual tests. e.g. from the UI or a dedicated SAM/Nagios testbed infrastructure provided by the NGI. There is an example of how to create a testbed nagios at this page. Contact the site admins if there are problems, and ensure that they fix them. Include the ROD and help-desk teams if necessary. Iterate this step with the site admins until tests pass. The prime tests to check are:
Details for submitting manual tests can be found at SiteCertMan/Grid_manual_tests. |
5 | NGI |
|
6 | NGI | After two days check that the site appears in all operational tools. If there are problems with a specific tool, open GGUS tickets to the relevant Support Units. The major tools that are relevant are:
|
7 | NGI | Ensure that, before the end of the maintenance downtime
|
8 | NGI |
|
9 | NGI |
|
10 | NGI | (Optional?) The NGI can broadcast that a new site is now part of the EGI infrastructure. |
After the successful completion of these steps, the site is considered as "Certified".
Revision history
Version | Authors | Date | Comments |
---|---|---|---|
0.7 | Vera Hansper | 2011-02-02 | Updated introduction to include roles, etc. and added required documentation link for policies |