Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Resource Centres OLA and Resource infrastructure Provider OLA reports"

From EGIWiki
Jump to navigation Jump to search
(Replaced content with "see https://confluence.egi.eu/display/EGISLM/RC+OLA+and+RIP+OLA+reports")
Tag: Replaced
 
(506 intermediate revisions by 20 users not shown)
Line 1: Line 1:
==Introduction==
see https://confluence.egi.eu/display/EGISLM/RC+OLA+and+RIP+OLA+reports
EGI availability and reliability statistics are produced every month for all certified production sites. The current version of the [https://documents.egi.eu/document/31 Site-NGI Operational Level Agreement] defines the following requirements:
* minimum tolerated availability: 70%,
* minimum tolerated reliabilty: 75%.
For each monthly report, underperforming sites are requested through a GGUS to motivate the poor performance provided.
 
'''Suspension procedure''': sites which have an availability of less than 50% for three consecutive months will be suspended, i.e. removed from the production infrastructure.
 
==Tools and documentation==
*[http://gvdev.cern.ch/GVPC/Excel/ xls availability/reliability report generator] (providing access to the database including Nagios results for OPS and  SAM results for VOs)
* Availability/reliability computation [https://twiki.cern.ch/twiki/pub/LCG/GridView/Gridview_Service_Availability_Computation.pdf algorithm]
* [https://twiki.cern.ch/twiki/bin/view/LCG/ACE Availability Computation Engine] (ACE)
* OLD: [https://twiki.cern.ch/twiki/bin/view/EGEE/MonthlyAvailability EGEE-III Comments on site availability and reliability statistics]
 
Note: in EGI sites do not need to provide comments anymore. In EGI comments will be solicited and collected through GGUS tickets instead.
 
==Statistics==
* June 2010 [https://documents.egi.eu/document/96]
* May 2010 [https://documents.egi.eu/document/42]
* [https://edms.cern.ch/document/963325 January 2008 - April 2010] (EGEE league tables)
 
==Suspended sites==
* [https://twiki.cern.ch/twiki/bin/view/EGEE/SuspendedSites List of suspended sites (2009)]
 
==Description of the process==
 
* '''Generation of statistics'''
Availability and reliability statistics are automatically generated the first week of the month by GridView in pdf format and placed under [http://gvdev.cern.ch/GRIDVIEW/downloads/Reports/]. An Excel version is available at [http://gvdev.cern.ch/GVPC/Excel/]
 
* '''Preliminary processing'''
Once the reports are generated, sanity checks are performed by EGI SA1 (Task TSA1.8). After this step is completed, statistics are uploaded into the EGI document server. Links to monthly statistics will be provided on a regular basis at this wiki page.
 
* '''Publication'''
An announcement of the new results is distributed by EGI SA1 (TSA1.8) to the NGI Operations Managers mailing list. COD (TSA1.7) is responsible of supervising statistics by chasing NGIs to chase sites that need to provide comments in case thresholds are not met, and candidates sites for suspension. This phase starts by filing a ticket to the COD Support Unit. The overall comments gathering process is handled through tickets.
 
* '''Handling of sites below targets'''
For a site that misses availability/reliability targets but is not eligible for suspension:
# a child ticket is opened by the COD team and assigned to the respective NGI, asking for explanation to be given
## the explanation must be produced within 7 working days since the ticket is received by the site
## if the explanation is found satisfactory the ticket is closed
## conversely if the explanation is not given in due time, or the explanation is found inadequate:
*** the EGI Chief Operations Officer can decide within 3 working days after the deadline if he/she objects to the site being added to the EGI "Hall of Shame" wiki page.
*** after the 3 days pass, the site is added to the wiki "Hall of Shame" webpage, unless the EGI Chief Operations Officer objects or decides to accelerate the process.
## the child ticket can then be closed.
## the parent ticket will be closed when all child tickets have been closed.
 
* '''Handling of sites that are eligible for suspension'''
For a site that is eligible for suspension:
## a child ticket is opened by the COD team assigned to appropriate NGI, notifying that the site will be suspended within 7 working days
## after the 7 days period passes, the site is suspended unless the NGI has intervened or the EGI Chief Operations officer objects
## in the case of NGI intervention, non suspension will occur if both the COD and COO agree on the reasoning provided by the NGI
## the child ticket closes either when the site is suspended or when suspension is canceled
## the parent ticket will be closed when all child tickets have been closed
 
* '''Removal from the "Hall of Shame" list'''
Every month, the COD will examine if a site that belonged to the "Hall of Shame" has managed to meet its targets. In the case, COD will remove the site from the "Hall of Shame" wiki.

Latest revision as of 15:15, 25 August 2020