Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

PROC08 Management of the EGI OPS Availability and Reliability Profile

From EGIWiki
Revision as of 11:56, 17 March 2011 by Tferrari (talk | contribs) (→‎Steps)
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


Title Modification of the set of AVAILABILITY tests
Version 1.0
Document link https://wiki.egi.eu/wiki/PROC08_Modification_of_the_set_of_AVAILABILITY_tests
Last modified 16:18, 16 March 2011 (UTC)
Policy Group Acronym OMB
Policy Group Name Operations Management Board
Contact Person E. Imamagic
Document Status DRAFT
Approved Date specify
Procedure Statement This document specifies the procedure for modifying the set of AVAILABILITY tests, i.e. of those tests whose results affect the computation of the monthly Availability and Reliability statistics.

Overview

The purpose of this document is to clearly describe the procedure for modifying the set of AVAILABILITY tests, i.e. of those tests whose results affect the computation of the monthly Availability and Reliability (A/R) statistics.

Detailed description of probes and tests can be found on the SAM Tests page.

Scope

This procedure is applicable to the set of AVAILABILITY tests which are run under OPS VO and its range is global, as they are applied to all Resource Centres in EGI project. These tests are used in the official EGI ACE profile used for generating monthly A/R reports.

This procedure does not apply to availability/reliability statistics calculated for other VOs (e.g. user communities, national operations VOs).

This procedure does not apply to modifications which have already been agreed with the SAM team:

  • including CREAM-CE results
  • switching from the old SAM CA probe to the new one
  • switching from the old SAM ARC probes to Nagios ones.

Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Entities involved in the procedure

  • Applicant. The Applicant submits a request for adding a new AVAILABILITY probe. Anybody is allowed to submit the request for modifying the set of AVAILABILITY tests. The request is submitted to the respective Operations Centre, who after acceptance, forwards it to the Operations Management Board OMB for discussion.
  • Operations Centre. The entity responsible of delivering local operational services to a Resource Infrastructure Provider. In order to contribute resources to EGI a Resource Infrastructure Provider must be associated to an Operations Centre.
  • Chief Operations Officer (COO). COO is the chairmain of the Operations Management Board OMB.
  • Resource Infrastructure Operations Manager. Represents the respective Resource Infrastructure within the OMB.
  • SAM Product Team. The SAM Product Team is responsible of scheduling, integrating and releasing the accepted probes.



Pre-requirements

This procedure requires usage of the ACE system for generating monthly availability and reliability statistics. The procedure is not applicable to the GridView system which is currently used. The critical feature which ACE supports and GridView lacks is definition of multiple profiles for availability and reliability statistics.

If the request of change includes the addition of new tests, each test MUST first go through following steps:

  • integration of the probe in the SAM release (see procedure PROC07);
  • integration of the probe in the Operations Dashboard (i.e. being an OPERATIONS test is a necessary condition to be an AVAILABILITY test) (see procedure PROC06).

Two procedures above assure that the new tests are included in the SAM release, deployed on all Resource Infrastruture SAM instances and accepted by Operations Centre operators.

Steps

Step Action on Action
1 Applicant Sends a change request to the respective Operations Centre. The request is submitted through a GGUS ticket.

Use the "Affected ROC/NGI" to address the ticket to the appropriate Operations Centre. Template:

Subject: Request for adding/removing XXX(,YYY,...) test(s) from the set of AVAILABILITY tests

We would like to request adding/removing XXX(,YYY,...) test(s) from the set of AVAILABILITY tests

Prerequisite data:
* name of SAM test(s):
* name of service on which the test runs:
* link to documentation page:
* motivation (which part of the infrastructure will be improved with the new probe
 or description of users' problems which will be avoided in future - provide list
 of GGUS tickets is possible)
2 Operations Centre The Operations Centre processes the request specified in the GGUS ticket for acceptance/rejection.

Motivations for rejection need to specified in the GGUS ticket. In case of acceptance, a RT ticket is opended in queue noc-managers to forward the request for discussion in the OMB. Template:

Subject: Request for adding/removing XXX(,YYY,...) test(s) from the set of AVAILABILITY tests

We would like to request adding/removing XXX(,YYY,...) test(s) from the set of AVAILABILITY tests. 
Please see details in GGUS ticket _link to Applicant's GGUS ticket_.
3 COO and Resource Infrastructure Operations Manager COO chedules presentation of the new probe at the next possible OMB meeting. The relevant Resource Infrastructure Operations Manager presents the request during the meeting. The Applicant is invited to attend the meeting. Only one request will be processed at a time as the impact of a change needs to be assessed. Requests are processed depending on their priority, as agreed by the OMB.
4 (*) COO Opens a GGUS ticket requesting the creation of a new ACE profile with a modified set of AVAILABILITY tests to the GridView/Availabilities SU.
5 (*) ACE team ACE team creates the new ACE profile.
6 SA1.8 task staff For the following one month two svailability and reliability reports are generated. SA1.8 task staff compares the figures and presents them at the next OMB meeting.
7 OMB If the availability and reliability statistics generated with the new profile are satisfactory OMB approves the modification.
8 (*) COO Opens a child GGUS ticket requesting that the new availability and reliability profile becomes the official for EGI.
9 COO Broadcasts the modification to all relevant parties (i.e. Operations Centres and Resource Centres). Closes the GGUS ticket (parent and child).

(*) - These steps depend on the procedure for creating new profiles which will be defined by the ACE team once the ACE is in production. Steps defined here have been provided by the ACE team. This procedure will be updated if any change occurs.

Revision History