Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Service Level Target - ROD performance index"

From EGIWiki
Jump to navigation Jump to search
Line 1: Line 1:
{{Template:Op menubar}} {{TOC_right}}  
{{Template:Op menubar}} {{TOC_right}}  


The ROD performance index (<span lang="en" class="short_text" id="result_box"><span class="hps">formerly known as ROD&nbsp;OLA metric</span></span>) was introcuded to track the level of Grid Oversight service delivered by Operations Centres according to [https://documents.egi.eu/secure/ShowDocument?docid=463 Resource Provider OLA].  
The ROD performance index (<span lang="en" id="result_box" class="short_text"><span class="hps">formerly known as ROD&nbsp;OLA metric</span></span>) was introcuded to track the level of Grid Oversight service delivered by Operations Centres according to [https://documents.egi.eu/secure/ShowDocument?docid=463 Resource Provider OLA].  


The index was accepted during Technical Forum 2011 in Lyon and is available on [https://operations-portal.in2p3.fr/dashboard/rodOlaMetrics EGI&nbsp;Operations Portal].  
The index was accepted during Technical Forum 2011 in Lyon and is available on [https://operations-portal.in2p3.fr/dashboard/rodOlaMetrics EGI&nbsp;Operations Portal].  
Line 7: Line 7:
= Definition  =
= Definition  =


ROD performance index is calculated monthly from the data gathered by EGI Operations Portal. It does not <span lang="en" id="result_box" class="short_text"><span class="hps">take</span> <span class="hps">into account</span> <span class="hps">weekends</span></span>.  
ROD performance index is calculated monthly from the data gathered by EGI Operations Portal. It does not <span lang="en" class="short_text" id="result_box"><span class="hps">take</span> <span class="hps">into account</span> <span class="hps">weekends</span></span>.  


'''ROD performance index is a sum of:'''  
'''ROD performance index is a sum of:'''  


*No. of ticket expired* <span lang="en" id="result_box" class="short_text"><span class="hps">appearances in operations dashboard daily</span></span>  
*No. of ticket expired<sup>*</sup> <span lang="en" class="short_text" id="result_box"><span class="hps">appearances in operations dashboard daily</span></span>  
*No. alarms older than 72h <span lang="en" id="result_box" class="short_text"><span class="hps">appearances </span></span><span lang="en" id="result_box" class="short_text"><span class="hps">in operations dashboard daily</span></span>
*No. alarms older than 72h <span lang="en" class="short_text" id="result_box"><span class="hps">appearances </span></span><span lang="en" class="short_text" id="result_box"><span class="hps">in operations dashboard daily</span></span>


<span lang="en" class="short_text">
<span lang="en" class="short_text">
Line 19: Line 19:
'''The threshold was set to 10 items.''' Above this value ROD teams has to provide explanation and provide a plan of improbvment of the oversight service.<br>  
'''The threshold was set to 10 items.''' Above this value ROD teams has to provide explanation and provide a plan of improbvment of the oversight service.<br>  


<br>


 
<sup>*&nbsp;</sup>expired ticket = a ticket with "Expiration date" set at the past<br>  
*expired ticket = a ticket with "Expiration date" set at the past<br>


= <span class="mw-headline">Performance reports</span><br>  =
= <span class="mw-headline">Performance reports</span><br>  =


Performance reports can be found on [https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics#Performance_reports Availability and reliability monthly statistics wiki page]
Performance reports can be found on [https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics#Performance_reports Availability and reliability monthly statistics wiki page]  


= Recalculation procedure in case of intervention on the NGI SAM or the operations dashboard  =
= Recalculation procedure in case of intervention on the NGI SAM or the operations dashboard  =

Revision as of 16:01, 19 January 2012

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security



The ROD performance index (formerly known as ROD OLA metric) was introcuded to track the level of Grid Oversight service delivered by Operations Centres according to Resource Provider OLA.

The index was accepted during Technical Forum 2011 in Lyon and is available on EGI Operations Portal.

Definition

ROD performance index is calculated monthly from the data gathered by EGI Operations Portal. It does not take into account weekends.

ROD performance index is a sum of:

  • No. of ticket expired* appearances in operations dashboard daily
  • No. alarms older than 72h appearances in operations dashboard daily

The threshold was set to 10 items. Above this value ROD teams has to provide explanation and provide a plan of improbvment of the oversight service.


expired ticket = a ticket with "Expiration date" set at the past

Performance reports

Performance reports can be found on Availability and reliability monthly statistics wiki page

Recalculation procedure in case of intervention on the NGI SAM or the operations dashboard

Prerequisite:

  1. In case of problems with synchronization between regional operations dashboard the Resource infrastructure Provider should create a GGUS ticket to Operations Portal team.
  2. In case of problems with the regional SAM, Resource infrastructure Provider should create a GGUS ticket to SAM team.
  3. In case of work carried out on regional SAM or operations dashboard the Resource infrastructure Provider should declare downtime in GOC DB.

Procedure steps:

  1. When an Operations Centre get a ticket from COD about ROD performance, the Operations Centre should provide GGUS ticket or a link to SAM or the oeprations dashboard downtime page in GOC DB.
  2. Based on GGUS trouble tickets referenced in prerequisites, or on the GGUS ticket opened by the Operations Centre to MyEGI requesting for A/R recualculation, or GOC DB service downtime entry, COD, knowing when the problem occurred,
    can remove the metrics items for given days from final report pdf.

Future plans

In the future the metric will also include no. alarms closed in NON-OK status without explanation. This will need some inplementation effort.

Issues to be implemented:

  • Taking into account holidays periods in alarms ageing
  • Automatic check if site/node is in downtime while alarm is closing
  • Automatic check if node is not in production while alarm is closing
  • In case of SCHEDULED interventions, the monthly metrics calculation should automatically take the scheduled downtime into account. At the time the metrics are computed, the application which does such calculation should access the GOC PI to determine which regional nagios machines were in downtime, and include that restriction in the calculation.