Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "PROC10 Recomputation of SAM results or availability reliability statistics"

From EGIWiki
Jump to navigation Jump to search
(48 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{Template:Op menubar}}
{{Template:Op menubar}} {{Template:Doc_menubar}} {{TOC_right}}  
{{Template:Doc_menubar}}
[[Category:Deprecated]]
[[Category:Procedures]]
{| style="border:1px solid black; background-color:lightgrey; color: black; padding:5px; font-size:140%; width: 90%; margin: auto;"
__TOC__
| style="padding-right: 15px; padding-left: 15px;" |
|[[File:Alert.png]] This page is '''Deprecated'''; the content has been moved to https://confluence.egi.eu/display/EGIPP/PROC10+Recomputation+of+SAM+results+or+availability+reliability+statistics 
|}


= Procedure for the recomputation of SAM results and/or availability/reliability statistics=
{{Ops_procedures
|Doc_title = Recomputation of availability/reliability statistics
|Doc_link = [[PROC10|https://wiki.egi.eu/wiki/PROC10]]
|Version = 05.09.2019
|Policy_acronym = OMB
|Policy_name = Operations Management Board
|Contact_group = operations@egi.eu
|Doc_status = Approved
|Approval_date = 29.10.2015
|Procedure_statement = This procedure documents the steps for requesting a correction in the SAM test results and in the related availability/reliability statistics.
|Owner = Alessandro Paolini
}}


*'''Title''': Recomputation of SAM results and/or availability/reliability statistics
<br>
*'''Document link''': https://wiki.egi.eu/wiki/PROC10
*'''Last modified''': 16 Jan 2012
*'''Version''': 1.1
*'''Policy Group Acronym''': OMB
*'''Policy Group Name''': Operations Management Board
*'''Contact Person''': George Fergadis/AUTH
*'''Document Status''': APPROVED
*'''Approved Date''': 17 October 2011
*'''Procedure Statement''': This procedure documents the steps for requesting a correction in the SAM test results and in the related availability/reliability when applicable statistics.


= Overview  =
= Overview  =
This procedure documents the steps for requesting a correction in the
[[SAM_Instances|SAM test results]] and in the related [[Availability_and_reliability_monthly_statistics|availability/reliability statistics]] if applicable. A recomputation of these statistics for the affected month is not needed if test results are notified and corrected before the statistics of that month are computed and distributed. Problems with the SAM results should be notified as soon as possible once detected, in order to allow sufficient time for fixing of these and thus to avoid that monthly availability/reliability statistics for the affected month have to be re-computed.


DISCLAIMER: This procedure is only applicable to EGI OPS test results. Procedures for the computation of VO-specific availability report are VO-specific and are out of scope.
This procedure documents the steps for requesting a correction in the OPS VO test results and in the related [https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics availability/reliability statistics] if applicable.  


= Who can submit a request? =
Figures are available trough the web interface&nbsp;: http://egi.ui.argo.grnet.gr/
Re-computations can be requested by site administrators and by regional operations staff.


= Prerequisites =
DISCLAIMER: This procedure is only applicable to EGI OPS test results. Procedures for the computation of VO-specific availability report are VO-specific and are out of this scope.
Fixes in test results are accepted only when failures in test results were due to problems
cased to the monitoring infrastructure itself. Some examples:
* invalid proxy certificate used for submitting the monitoring probes in a Nagios instance;
* problems with the Storage Element used for replica management tests resulting in errors on CE's metrics.


= Steps =
= Definitions =


# '''STEP 1''': as soon as the problem is detected, notify by opening a [http://helpdesk.egi.eu/ GGUS ticket]. '''If the submitter is a Resource Centre administrator''': please address the ticket to your Operations Centre support unit. '''If the submitter is a member of a regional operations staff''': please address the ticket to the SAM support unit. In the GGUS ticket you must mention:
Please refer to the [[Glossary|EGI Glossary]] for the definitions of the terms used in this procedure.  
## the starting and ending time of the problem (including day and hour in UTC)
## the Site, NGI/federation of NGIs affected by the problem
## the VO affected by the problem (must be the OPS VO)
## a description of the problem
# '''STEP 2''': (only applicable if the submitter of the request is a Resource Centre administrator) the Operations Centre anlayzes the request. If the request is validated, the ticket is re-assigned to the [[GGUS:SLM-FAQ|Service Level Management]](SLM) Support Unit, who will be responsible of (1) collecting all reported problems and (2) discuss the reported problems with the SAM Support Unit by re-assigning the ticket to the [[GGUS:SAM/Nagios_FAQ|SAM/Nagios SU]].
# '''STEP 3''': if the request for recomputation of the test results is accepted, the SAM Support Unit will be reponsible of fixing the results and of triggering a recomputation of the monthly availability statistics if necessary. The following these steps are followed:
## All Nagios metric results for any site and service are set to ''unknown'' status from the beginning of the hour reported in the starting time to one hour after the ending time. This is to cover late results that could have arrived later.
## Availability/reliability are then recomputed for that particular period, Site, NGI/federation of NGIs if necessary. As a consequence, the availability and reliability of other sites won't be affected, as unknown periods are not considered in the computation.
# '''STEP 4''': in case new availability/reliability statistics are computed, when these are ready for distribution, the SAM/Nagios SU reassignes the ticket to the SLM Support Unit, in order to notify that a new set of reports can be re-distributed to EGI.


= External links =
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
* [https://tomtools.cern.ch/confluence/display/SAMDOC/Availability+Re-computation+Policy WLCG Availability re-computation policy]
 
= Who can submit a request?  =
 
Re-computations can be requested by:
 
*site administrators
*regional operations staff.
 
= Re-computation policy  =
 
Starting from the 01 May 2012:
 
*monitoring '''results can be recomputed only in the case of problems with the monitoring infrastructure itself. '''<br>
*'''No re-computations will be performed in case of issues with the deployed middleware '''(e.g. in case of documented bugs affecting the availability of a production service end-point), which will be consequently reflected in lower availability/reliability.
 
<br>
 
Some examples of possible issues justifying a re-computation request:
 
*invalid proxy certificate used for submitting the monitoring probes in a Nagios instance;
*problems with the Storage Element used for replica management tests resulting in errors on CE's metrics.<br>
 
<br>
 
'''The deadline: '''10 calendar days after the publication and announcement of the monthly Availability/Reliability reports for a given month X (typically the announcement will be distributed on the 1st day of month X+1). <br>
 
According to the re-computation requests received, A/R reports will be regenerated only once for each month, after the 10th of month X+1.
 
= How to request a re-computation of OPS monitoring results  =
 
{| class="wikitable"
|-
! Step
! Action on
! Action
|-
| 1
| Site administrator / ROD team
|
As soon as the problem is detected, please fill in the form in the following page: https://ui.argo.grnet.gr/recomputation
 
Provide an explanation for the request.
 
The submission of the form will inform the ARGO team and your request will be in pending status.
 
|-
| 2
| ARGO&nbsp;team
|
Member of the staff validates the request. <br>
 
You will be informed about the confirmation / rejection of the request by email .
 
|-
| 3
| ARGO&nbsp;team
|
If the request is accepted - the recomputation will be triggered as soon as possible .
 
The status of the recomputation will be visible trough a web page (link given in the email of the previous step )
 
|}
 
<br>


= Revision history  =
= Revision history  =
17/01/2012: the text of the procedure is fixed to clarify that both RC administrators and regional operations staff can request a re-computation.


16/01/2012: the text of the procedure is fixed to clarify that the recomputation of test results can be requested before the end of the affected month, in which case if sufficient time is allowed for fixing of the test results, no re-computation of availability/reliability statistics will be needed.
{| class="wikitable"
{{Template:Creative_commons}}
|-
! Version
! Authors
! Date
! Comments
|-
| <br>
| George Fergadis/AUTH
| 03/05/2012
| updated policy and procedure to reflect the OMB decision of the March 2012 meeting
|-
| <br>
| George Fergadis/AUTH
| 17/01/2012
| the text of the procedure is fixed to clarify that both RC administrators and regional operations staff can request a re-computation.
|-
| <br>
| George Fergadis/AUTH
| 16/01/2012  
| the text of the procedure is fixed to clarify that the recomputation of test results can be requested before the end of the affected month, in which case if sufficient time is allowed for fixing of the test results, no re-computation of availability/reliability statistics will be needed.
|-
| <br>
| M. Krakowian
| 19 August 2014
| Change contact group -&gt; Operations support
|-
|
| M. Krakowian
| 3.11.2014
|
Add step2: Note:''&nbsp;It is recommended to open a [https://wiki.egi.eu/wiki/PROC10 ticket for recomputation as] soon as the problems has been detected.''
 
|-
|
| C. L'Orphelin
| 28.09.2015
| ARGO&nbsp;procedure&nbsp;: no need of a ticket , replacement by a recomputation form
|-
|
| Alessandro Paolini
| 2016-06-08
| Changed contact group -&gt; Operations
|-
|
| Alessandro Paolini
| 2019-07-08
| the old link for re-computation (http://argo.egi.eu/lavoisier/recomputation) has been removed while waiting for the correspondent page on the new web UI; a ggus ticket should be opened for the re-computation request
|-
|
| Alessandro Paolini
| 2019-09-05
| the new link for the re-computation request has been added; removed some old links.
|}
 
[[Category:Operations_Procedures]]

Revision as of 16:09, 30 July 2021

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


Alert.png This page is Deprecated; the content has been moved to https://confluence.egi.eu/display/EGIPP/PROC10+Recomputation+of+SAM+results+or+availability+reliability+statistics
Title Recomputation of availability/reliability statistics
Document link https://wiki.egi.eu/wiki/PROC10
Last modified 05.09.2019
Policy Group Acronym OMB
Policy Group Name Operations Management Board
Contact Group operations@egi.eu
Document Status Approved
Approved Date 29.10.2015
Procedure Statement This procedure documents the steps for requesting a correction in the SAM test results and in the related availability/reliability statistics.
Owner Alessandro Paolini



Overview

This procedure documents the steps for requesting a correction in the OPS VO test results and in the related availability/reliability statistics if applicable.

Figures are available trough the web interface : http://egi.ui.argo.grnet.gr/

DISCLAIMER: This procedure is only applicable to EGI OPS test results. Procedures for the computation of VO-specific availability report are VO-specific and are out of this scope.

Definitions

Please refer to the EGI Glossary for the definitions of the terms used in this procedure.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", “MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Who can submit a request?

Re-computations can be requested by:

  • site administrators
  • regional operations staff.

Re-computation policy

Starting from the 01 May 2012:

  • monitoring results can be recomputed only in the case of problems with the monitoring infrastructure itself.
  • No re-computations will be performed in case of issues with the deployed middleware (e.g. in case of documented bugs affecting the availability of a production service end-point), which will be consequently reflected in lower availability/reliability.


Some examples of possible issues justifying a re-computation request:

  • invalid proxy certificate used for submitting the monitoring probes in a Nagios instance;
  • problems with the Storage Element used for replica management tests resulting in errors on CE's metrics.


The deadline: 10 calendar days after the publication and announcement of the monthly Availability/Reliability reports for a given month X (typically the announcement will be distributed on the 1st day of month X+1).

According to the re-computation requests received, A/R reports will be regenerated only once for each month, after the 10th of month X+1.

How to request a re-computation of OPS monitoring results

Step Action on Action
1 Site administrator / ROD team

As soon as the problem is detected, please fill in the form in the following page: https://ui.argo.grnet.gr/recomputation

Provide an explanation for the request.

The submission of the form will inform the ARGO team and your request will be in pending status.

2 ARGO team

Member of the staff validates the request.

You will be informed about the confirmation / rejection of the request by email .

3 ARGO team

If the request is accepted - the recomputation will be triggered as soon as possible .

The status of the recomputation will be visible trough a web page (link given in the email of the previous step )


Revision history

Version Authors Date Comments

George Fergadis/AUTH 03/05/2012 updated policy and procedure to reflect the OMB decision of the March 2012 meeting

George Fergadis/AUTH 17/01/2012 the text of the procedure is fixed to clarify that both RC administrators and regional operations staff can request a re-computation.

George Fergadis/AUTH 16/01/2012 the text of the procedure is fixed to clarify that the recomputation of test results can be requested before the end of the affected month, in which case if sufficient time is allowed for fixing of the test results, no re-computation of availability/reliability statistics will be needed.

M. Krakowian 19 August 2014 Change contact group -> Operations support
M. Krakowian 3.11.2014

Add step2: Note: It is recommended to open a ticket for recomputation as soon as the problems has been detected.

C. L'Orphelin 28.09.2015 ARGO procedure : no need of a ticket , replacement by a recomputation form
Alessandro Paolini 2016-06-08 Changed contact group -> Operations
Alessandro Paolini 2019-07-08 the old link for re-computation (http://argo.egi.eu/lavoisier/recomputation) has been removed while waiting for the correspondent page on the new web UI; a ggus ticket should be opened for the re-computation request
Alessandro Paolini 2019-09-05 the new link for the re-computation request has been added; removed some old links.