PROC08 Management of the EGI OPS Availability and Reliability Profile

From EGIWiki
(Redirected from PROC08)
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators

Contents


Title Management of the EGI OPS Availability and Reliability Profile
Document link https://wiki.egi.eu/wiki/PROC08
Last modified 08.06.2016
Policy Group Acronym OMB
Policy Group Name Operations Management Board
Contact Group operations@egi.eu
Document Status Approved
Approved Date 29.10.2015
Procedure Statement This document specifies the procedure for modifying the EGI OPS Availability and Reliability profile.
Owner Alessandro Paolini



Overview

A change in the profile is needed every time a new Nagios test needs to be added/removed to/from the profile, in order to have its results included/removed in/from Availability and Reliability monthly statistics. A change in the OPS Availability and Reliability profile affects the computation of the monthly Availability and Reliability statistics of all EGI Resource Infrastructures and Resource Centres.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Scope

This procedure is applicable to the EGI OPS Availability and Reliability profile. Any change applied is global, as it has effects on all EGI Resource Centres. The ARGO compute engine (CE) uses profiles to generate monthly Availability and Reliability reports.

This procedure is NOT applicable to VO-specific Availability and Reliability profiles used by non-OPS VOs (e.g. user communities, national operations VOs, etc.).

Entities involved in the procedure

Pre-requirements

Steps

Step Action on Action
1 Applicant Sends a change request to the attention of the respective own Operations Centre. The request is submitted through a GGUS ticket.

Use the "Affected ROC/NGI" to address the ticket to the appropriate Operations Centre. Template:

Subject: Request for adding/removing XXX(,YYY,...) test(s) to/from the EGI OPS A/R Profile

We would like to request adding/removing XXX(,YYY,...) test(s) to/from from the EGI OPS Profile

Prerequisite data:
* name of SAM test(s):
* name of service on which the test runs:
* link to documentation page:
* motivation (which part of the infrastructure will be improved with the new test
 or description of users' problems which will be avoided in future - provide list
 of GGUS tickets is possible)
2 Operations Centre The Operations Centre process the request specified in the GGUS ticket for acceptance/rejection.

Motivations for rejection need to specify in the GGUS ticket.

In case of acceptance, a GGUS ticket is opened to EGI Operations Support Unit to forward the request for discussion in the OMB. Template:

Subject: Request for adding/removing XXX(,YYY,...) test(s) to/from the EGI OPS Profile

We would like to request adding/removing XXX(,YYY,...) test(s) to/from the EGI OPS Profile. 
Please see details in GGUS ticket _link to Applicant's GGUS ticket_.
3 EGI Opertions team  and Resource Infrastructure Operations Manager EGI Opertions team schedules a presentation of the change requested at the next possible OMB meeting. The relevant Resource Infrastructure Operations Manager presents the request during the meeting. The Applicant is invited to attend the meeting. Only one request will be processed at a time as the impact of a change needs to be assessed. Requests are processed depending on their priority, as agreed by the OMB.
4 (*) EGI Opertions team Opens a GGUS ticket requesting the creation of a new ARGO profile with a modified set of AVAILABILITY tests to the ARGO/SAM EGI SU.
5 (*) ARGO team ARGO team creates the new ARGO profile.
6 EGI Opertions team For the following one month two availability and reliability reports are generated. EGI Operations staff compares the figures and presents them at the next OMB meeting with a recommendation for acceptance or rejection.
7 OMB If the availability and reliability statistics generated with the new profile are satisfactory, OMB approves the modification.
8 (*) EGI Opertions team Opens a child GGUS ticket requesting that the new availability and reliability profile becomes the official for EGI.
9 EGI Opertions team Update Availability SAM tests page.
10 EGI Opertions team Broadcasts the modification to all relevant parties (i.e. Operations Centres and Resource Centres). Closes the GGUS ticket (parent and child).

(*) - These steps depend on the procedure for creating new profiles which will be defined by the ARGO team. Steps defined here have been provided by the ARGO team. This procedure will be updated if any change occurs.

Revision History

Version Authors Date Comments

T. Ferrari 18/10/2011 removed obsolete information: "This procedure does not apply to modifications which have already been agreed with the SAM team: including CREAM-CE probes, switching from the old SAM CA probe to the new one, switching from the old SAM ARC probes to Nagios ones."

M. Krakowian 19 August 2014 Change contact group -> Operations support

P. Daoglou and P. Korosoglou 30 September 2015 Updates in procedure. Replacement of ACE references with ARGO.
Alessandro Paolini 2016-06-08 Change contact group -> Operations
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox
Print/export