Difference between revisions of "Service Level Target - ROD performance index"
Line 143: | Line 143: | ||
'''Prerequisite''': | '''Prerequisite''': | ||
#In case of problems with the regional SAM, Resource infrastructure Provider should create a GGUS ticket to SAM team. | #In case of problems with the regional SAM, Resource infrastructure Provider should create a GGUS ticket to SAM team. | ||
#In case of work carried out on regional SAM | #In case of work carried out on regional SAM the Resource infrastructure Provider should declare downtime in GOC DB. | ||
'''Procedure steps:''' | '''Procedure steps:''' | ||
#When an Operations Centre get a ticket from EGI | #When an Operations Centre get a ticket from EGI Operations about ROD performance, the Operations Centre should provide GGUS ticket or a link to SAM or the operations dashboard downtime page in GOC DB. | ||
#*If ROD performance index is below 10 items, NGI can create a GGUS ticket to EGI Operations asking for recalculation | #*If ROD performance index is below 10 items, NGI can create a GGUS ticket to EGI Operations asking for recalculation | ||
#Based on GGUS trouble tickets referenced in prerequisites, or on the GGUS ticket opened by the Operations Centre to MyEGI requesting for A/R recalculation, or GOC DB service downtime entry, EGI Operations, knowing when the problem occurred,<br>can remove the metrics items for given days from final report pdf. | #Based on GGUS trouble tickets referenced in prerequisites, or on the GGUS ticket opened by the Operations Centre to MyEGI requesting for A/R recalculation, or GOC DB service downtime entry, EGI Operations, knowing when the problem occurred,<br>can remove the metrics items for given days from final report pdf. |
Revision as of 14:41, 6 October 2014
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
EGI Infrastructure Operations Oversight menu: | Home • | EGI.eu Operations Team • | Regional Operators (ROD) |
The ROD performance index (formerly known as ROD OLA metric) was introcuded to track the level of Grid Oversight service delivered by Operations Centres according to Resource Provider OLA.
The index was accepted during Technical Forum 2011 in Lyon and is available on EGI Operations Portal. (choose Metrics tab).
Definition
ROD performance index is the sum of:
- No. of ticket expired* in the operations dashboard daily
- No. alarms older than 72h appearingin operations dashboard daily
A ticket in counted as expired in the Operations Portal dashboard if the "Expiration date" is set at a time in the past. The "Expiration date" field is set according to procedure, but can be freely changed by ROD. It refers to the date when the status of issue should be checked next time.
The ROD performance index is calculated monthly from the data gathered by EGI Operations Portal. It does not take into account weekends.
Threshold
The maximum value of the index must be 10. Above this value ROD teams has to provide explanation and provide a plan of improvement of the oversight service.
Performance reports
Service Level:
ROD Performance Index ticket/Report |
Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
2011 | - | - | - | - | - | - | - | - | - | |||
2012 | | |||||||||||
2013 | 06/13 | 07/13 | 08/13 | 09/13 | 10/13 | 11/13 | 12/13 | |||||
2014 | 01/14 | 02/14 | ---- |
04/14 | 05/14 | 06/14 | 07/14 | 08/14 |
Recalculation procedure in case of intervention on the NGI SAM or the operations dashboard
Prerequisite:
- In case of problems with the regional SAM, Resource infrastructure Provider should create a GGUS ticket to SAM team.
- In case of work carried out on regional SAM the Resource infrastructure Provider should declare downtime in GOC DB.
Procedure steps:
- When an Operations Centre get a ticket from EGI Operations about ROD performance, the Operations Centre should provide GGUS ticket or a link to SAM or the operations dashboard downtime page in GOC DB.
- If ROD performance index is below 10 items, NGI can create a GGUS ticket to EGI Operations asking for recalculation
- Based on GGUS trouble tickets referenced in prerequisites, or on the GGUS ticket opened by the Operations Centre to MyEGI requesting for A/R recalculation, or GOC DB service downtime entry, EGI Operations, knowing when the problem occurred,
can remove the metrics items for given days from final report pdf.
Future plans
In the future the metric will also include no. alarms closed in NON-OK status without explanation. This will need some implementation effort.
Issues to be implemented:
- Taking into account holidays periods in alarms ageing
- Automatic check if site/node is in downtime while alarm is closing
- Automatic check if node is not in production while alarm is closing
- In case of SCHEDULED interventions, the monthly metrics calculation should automatically take the scheduled downtime into account. At the time the metrics are computed, the application which does such calculation should access the GOC PI to determine which regional nagios machines were in downtime, and include that restriction in the calculation.