Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "PROC08 Management of the EGI OPS Availability and Reliability Profile"

From EGIWiki
Jump to navigation Jump to search
(Replaced content with "{{Template:Op menubar}} {{Template:Doc_menubar}} Category:Operations Procedures Category:Deprecated {| style="border:1px solid black; background-color:lightgrey; color: black; padding:5px; font-size:140%; width: 90%; margin: auto;" | style="padding-right: 15px; padding-left: 15px;" | |File:Alert.png This page is '''Deprecated'''; the content has been moved to https://confluence.egi.eu/display/EGIPP/PROC08+Management+of+the+EGI+OPS+Availability+and+Reliab...")
Tag: Replaced
 
(85 intermediate revisions by 7 users not shown)
Line 1: Line 1:
{{Template: Op menubar}}
{{Template:Op menubar}}
{{Template:Doc_menubar}}
{{Template:Doc_menubar}}
[[Category:Operations Manuals]]
[[Category:Operations Procedures]]
{{TOC_right}}
[[Category:Deprecated]]
 
{| style="border:1px solid black; background-color:lightgrey; color: black; padding:5px; font-size:140%; width: 90%; margin: auto;"
{| border="1"
| style="padding-right: 15px; padding-left: 15px;" |  
|-
|[[File:Alert.png]] This page is '''Deprecated'''; the content has been moved to https://confluence.egi.eu/display/EGIPP/PROC08+Management+of+the+EGI+OPS+Availability+and+Reliability+Profile 
| '''Title'''
| ''Modification of the set of AVAILABILITY tests''
|-
|'''Version'''
| 1.0
|-
| '''Document link'''
| ''https://wiki.egi.eu/wiki/PROC08_Modification_of_the_set_of_AVAILABILITY_tests''
|-
| '''Last modified'''
| 16:18, 16 March 2011 (UTC)
|-
| '''Policy Group Acronym'''
| ''OMB''
|-
| '''Policy Group Name'''
| ''Operations Management Board''
|-
| '''Contact Person'''
| ''E. Imamagic''
|-
| '''Document Status'''
| ''DRAFT''
|-
| '''Approved Date'''
| ''specify''
|-
| '''Procedure Statement'''
| ''This document specifies the procedure for modifying the set of AVAILABILITY tests, i.e. of those tests whose results affect the computation of the monthly Availability and Reliability statistics.''
|-
|}
|}
----
= Overview =
The purpose of this document is to clearly describe the procedure for modifying the set of [[Availability_and_reliability_tests|AVAILABILITY tests]], i.e. of those tests whose results affect the computation of the monthly Availability and Reliability (A/R) statistics.
Detailed description of probes and tests can be found on the [[SAM Tests]] page.
= Scope =
This procedure is applicable to the set of AVAILABILITY tests which are run under OPS VO and its range is global, as they are applied to all Resource Centres in EGI project. These tests are used in the official EGI [[External_tools|ACE]] profile used for generating monthly A/R reports.
This procedure does not apply to availability/reliability statistics calculated for other VOs (e.g. user communities, national operations VOs).
This procedure does not apply to modifications which have already been agreed with the SAM team:
* including CREAM-CE results
* switching from the old SAM CA probe to the new one
* switching from the old SAM ARC probes to Nagios ones.
= Definitions =
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
= Entities involved in the procedure =
* '''Applicant'''. The Applicant submits a request for adding a new AVAILABILITY probe.
<!--Anyone in the operations community - Resource Centre adminitrators, Operations Centre staff, Resource Infratstructure Operations Managers - is allowed to submit such a request. -->
* '''Chief Operations Officer'''. [[OTAG|OTAG]] is the Operations Tool Advisory Group who is responsible of processing the request and of accepting/refusing it with the consensus of the Resource Infrastructure Providers, and is chaired by the person in charge of coordinating tool development activities in EGI.
* '''SAM Product Team'''. The SAM Product Team is responsible of scheduling, integrating and releasing the accepted probes.
= Pre-requirements =
This procedure requires usage of the [[External_tools|ACE]] system for generating monthly availability and reliability statistics. The procedure is not applicable to the GridView system which is currently used. The critical feature which ACE supports and GridView lacks is definition of multiple profiles for availability and reliability statistics.
<!-- Detailed information about ACE system can be found on the following link: https://tomtools.cern.ch/confluence/display/SAM/ACE.-->
If the request of change includes the addition of new tests, each test MUST first go through following steps:
* integration of the probe in the SAM release (see procedure [[PROC07_Adding_new_probes_to_SAM|PROC07]];
* integration of the probe in the Operations Dashboard (i.e. being an OPERATIONS test is a necessary condition to be an AVAILABILITY test) (see procedure [[PROC06_Setting_a_Nagios_test_status_to_OPERATIONS|PROC06]]
Two procedures above assure that the new tests are included in SAM release, deployed on all NGI SAM instances and accepted by operators.
= Request =
* Everyone is allowed to submit the request for modifying the set of AVAILABILITY tests.
* The procedure requires generation of two A/R reports for comparison (find details below) and therefore only one request will be processed at a time. Order of processing requests will be defined by the SA1 activity leader.
= Procedure =
{| border="1" cellspacing="0" cellpadding="5" align="center"
! Step
! Action on
! Action
|-
| 1
| Requester
| Opens a RT ticket (https://rt.egi.eu/rt/index.html) in queue '''noc-managers'''.
<pre>
Subject: Request for adding/removing XXX(,YYY,...) test(s) from the set of AVAILABILITY tests
We would like to request adding/removing XXX(,YYY,...) test(s) from the set of AVAILABILITY tests
Prerequisite data:
* name of SAM test(s):
* name of service on which the test runs:
* link to documentation page:
* motivation (which part of the infrastructure will be improved with the new probe
or description of users' problems which will be avoided in future - provide list
of GGUS tickets is possible)
</pre>
|-
| 2
| SA1 activity leader
| Schedules presentation of the new probe at the next possible OMB meeting.
|-
| 3
| Requester
| Explains the reason for modifying set of AVAILABILITY tests
|-
| 4 (*)
| SA1 activity leader
| Opens a ticket in JIRA system (https://tomtools.cern.ch/jira/secure/) requesting creation of new ACE profile with modified set of AVAILABILITY tests.
|-
| 5 (*)
| ACE team
| ACE team creates the new ACE profile.
|-
| 6
| SA1.8 task staff
| For the following '''one month''' two A/R reports are generated. SA1.8 task staff compares the figures and presents them at the next OMB meeting.
|-
| 7
| OMB
| If the A/R statistics generated with the new A/R profile are satisfactory OMB approves the modification
|-
| 8 (*)
| SA1 activity leader
| Opens a ticket in JIRA requesting that the new A/R profile becomes the official for EGI.
|-
| 9
| SA1 activity leader
| Broadcasts the modification to all relevant parties (i.e. noc-managers, inspire-sa1). Closes the initial RT ticket.
|}
(*) - These steps depend on the procedure for creating new profiles which will be defined by the ACE team once the ACE is in production. Steps defined here have been provided by the ACE team. This procedure will be updated if any change occurs.
= Revision History =
<!-- this section will track changes introduced in the document AFTER it is officially approved by OMB -->

Latest revision as of 10:43, 15 April 2022