Resource Centres OLA and Resource infrastructure Provider OLA reports
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Introduction
EGI Performance is measured using two parameters: Availability and Reliability (definition).
Availability/Reliability data is provided by the MyEGI portal. Note: GridView Availability/Reliability views are now obsolete.
Availability/Reliability are measured at a Resource Centre (RC) level and at a Resource infrastructure Provider (RP) level (for NGIs and EIROs).
SAM metric results are used for the calculation of Availability/Reliability.
Service Level Targets
For a Resource Centre(RC)
Is is mandatory that EGI certified Resource Centres provide a minimum monthly Availability/Reliability as specified below (see the RC Operational Level Agreement for details). Availability/Reliability statistics (OPS VO) are issued on a monthly basis.
minimum Availability | 70% |
minimum Reliabilty | 75% |
Profile | ROC_CRITICAL
IMPORTANT: as of 01 January 2012, ROC_CRITICAL replaces WLCG_CREAM_LCGCE_CRITICAL in Resource Centre monthly statistics |
Condition for suspension | Resource Centres which have an Availability of less than 70% for three consecutive months will be suspended, i.e. removed from the production infrastructure. Note. This new suspension policy was introduced in April 2011, to increase the original 50% threshold to 70%. |
Condition for justification | Resource Centres not providing minimum monthly performance (70% availability, 75% reliability) MUST provide justification through a GGUS ticket. |
For a Resource infrastructure Provider (NGI/EIRO)
As of January 2012, it is mandatory that top-BDII services operated by NGIs provide a minimum availability of 99% (see the RP Operational Level Agreement for details). Availability/Reliability NGI reports are distributed monthly.
Note: Service Level Targets specified below will come into force as of Januwary 2011.
minimum top-BDII Availability | 99% |
minimum top-BDII Reliabilty | 99% |
Profile | ROC |
Liability | Resource infrastructure Providers not providing the minimum requested monthly performance for one month (99% Availability, 99% Reliability) MUST provide a service improvement plan. |
- See the list of NGIs' Top-BDIIs used for the Availability/Reliability computation.
Performance reports
2011
Resource Centre Performance
Service Level | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | |||||||||||||
03/11 | 04/11 | 05/11 | 06/11 | 07/11 | 08/11 | 09/11 | 10/11 | 11/11 re-computations are in progress... | 12/11
-} Resource infrastructure Provider Performance
2010EGI-wide Availability and ReliabilityIt is available here (xls file, data from May 01 2010) Underperforming/Suspended RCs
Process for quality verification
Availability and reliability statistics are automatically generated the first week of the month by the Availability Computation Engine (Gridview until May 2011) using the profile in pdf format and placed under [1]. An Excel version is available at [2]
Once the reports are generated, sanity checks are performed by EGI SA1 (Task TSA1.8). After this step is completed, statistics are uploaded into the EGI document server. Links to monthly statistics will be provided on a regular basis at this wiki page.
An announcement of the new results is distributed by EGI SA1 (TSA1.8) to the NGI Operations Managers mailing list. COD (TSA1.7) is responsible of supervising statistics by chasing NGIs to chase sites that need to provide comments in case thresholds are not met, and identifies sites eligible for suspension. This phase starts by filing a ticket to the COD Support Unit. The overall comments gathering process is handled through tickets.
For a site that misses availability/reliability targets but is not eligible for suspension:
For a site that is eligible for suspension:
Sites that fail to provide explanations justifying the failure to meet OLA targets, or the explanation is found inadequate, as well as sites that are suspended, will be recorded in a wiki page [8]
Should there be doubts about the validity of Availability/Reliability reports, a RC/NGI can request recomputations according to the procedure defined at [9] Known issues and recommendations to NGIs
Operational Level AgreementsLinks
|