Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "WI03 RC and RP OLA violation report followup"

From EGIWiki
Jump to navigation Jump to search
Line 3: Line 3:
<br>  
<br>  


= Availability and reliability report work instruction for COD =
= Availability and reliability report work instruction for EGI Operations =


This page describes steps which should be taken by COD shifter to follow availability/reliability issues.  
This page describes steps which should be taken to follow availability/reliability issues.  


== General info  ==
== General info  ==
Line 20: Line 20:
=== Parent ticket  ===
=== Parent ticket  ===


#Ticket is submitted by Georgios Kaklamanos or George Fergadis.  
#Ticket is submitted by EGI&nbsp;SLM team.  
#Add ticket URL to [[COD actions#Monthly_Actions|Monthly actions]]  
#Add ticket URL to [[COD actions#Monthly_Actions|Monthly actions]]  
#Add ticket URL to [[Underperforming sites and suspensions|Underperforming sites and suspensions]]
#Add ticket URL to [[Underperforming sites and suspensions|Underperforming sites and suspensions]]
Line 26: Line 26:
=== Submit child tickets to sites  ===
=== Submit child tickets to sites  ===


#Go to Dropbox - COD - TicketCreator - AvaRel report
#Go to ..........<br>
#Prepare input file EGI_sus.csv based on the records marked as red in the source pdf. We take into account sites for which availability for last 3 months is red. Input file syntax: <br> <pre>NGI;Site;Ava1(oldest);Ava2(middle);Ava3(newest)</pre> Take into account only sites that are in Certified state in [https://goc.egi.eu/portal/ GocDB]. <br> Make sure NGIs are named according to the below table.  
#Prepare input file EGI_sus.csv based on the records marked as red in the source pdf. We take into account sites for which availability for last 3 months is red. Input file syntax: <br> <pre>NGI;Site;Ava1(oldest);Ava2(middle);Ava3(newest)</pre> Take into account only sites that are in Certified state in [https://goc.egi.eu/portal/ GocDB]. <br> Make sure NGIs are named according to the below table.  
#Run ticket creator: <br><pre>perl start-suspend.pl ticket_number ‘date, e.g. Sep 2012’ “EGI_sus.csv”</pre> More info about [[Ticket generator Availability Reliability|Ticket generator for A/R<br>]]If you get errors, make sure to exchange all the " and ' characters in terminal.
#Run ticket creator: <br><pre>perl start-suspend.pl ticket_number ‘date, e.g. Sep 2012’ “EGI_sus.csv”</pre> More info about [[Ticket generator Availability Reliability|Ticket generator for A/R<br>]]If you get errors, make sure to exchange all the " and ' characters in terminal.
Line 41: Line 41:
##[[List of sites for which the availability followup procedures were not applicable|List of sites for which the availability followup procedures were not applicable]]
##[[List of sites for which the availability followup procedures were not applicable|List of sites for which the availability followup procedures were not applicable]]


The whole process should be completed '''by the end of the month'''.
The whole process should be completed '''by the end of the month'''.  


== Additional info  ==
== Additional info  ==

Revision as of 15:18, 19 August 2014

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Infrastructure Operations Oversight menu: Home EGI.eu Operations Team Regional Operators (ROD) 




Availability and reliability report work instruction for EGI Operations

This page describes steps which should be taken to follow availability/reliability issues.

General info

  • Receiver: Site
  • Subject: Availability under target for last 3 months
  • Threshold: Availability: 80%, Reliability: 85%
  • Goal: We expect to see improvement
  • Deadline for answers: 10 days
  • No response: site suspension

Steps

Parent ticket

  1. Ticket is submitted by EGI SLM team.
  2. Add ticket URL to Monthly actions
  3. Add ticket URL to Underperforming sites and suspensions

Submit child tickets to sites

  1. Go to ..........
  2. Prepare input file EGI_sus.csv based on the records marked as red in the source pdf. We take into account sites for which availability for last 3 months is red. Input file syntax:
    NGI;Site;Ava1(oldest);Ava2(middle);Ava3(newest)
    Take into account only sites that are in Certified state in GocDB.
    Make sure NGIs are named according to the below table.
  3. Run ticket creator:
    perl start-suspend.pl ticket_number ‘date, e.g. Sep 2012’ “EGI_sus.csv”
    More info about Ticket generator for A/R
    If you get errors, make sure to exchange all the " and ' characters in terminal.

Handling the child tickets

  1. NGIs that replied within 10 days - check the explanation. If uncertain whether to suspend or not, discuss with COO by submitting a ticket to them.
    1. If after 3 days from receiving the explanation from site performance shows no improvement (Availability is still <80%, Reliability <85%) COD should suspend the site. Inform NGI and site about the suspension.
    2. In cases COD agree the site should not be suspended (such as: raise of availability >70% and reliability >75% or any other important reason, such as NGI SAM problem) the site can be left certified
  2. NGIs that didn’t reply - after 7 days put a reminder in the ticket. If no answer after 10 days from submitting tickets, suspend the site. Inform NGI and site about the suspension.
    Tip: It is recommended to send an e-mail to NGI managers mailing list and all NGI managers informing about the situation, and suspend the site if there's no reply or improvement.
  3. Prepare summary report and place it in the parent ticket.
  4. Update:
    1. Underperforming_sites_and_suspensions
    2. List of sites for which the availability followup procedures were not applicable

The whole process should be completed by the end of the month.

Additional info

Naming the NGIs

In grid view NGIs/ROCs are named differently than in GGUS. You should change NGI/ROC name according to GGUS. NGI name table with these differences can be found here: NGIs GGUS names

A mapping from countries to NGIs is available here: Operations centres

Ticket content

Subject: $SU/$siteName - site suspension

Dear $SU,
	
According to recent availability/reliability report $siteName has achieved poor performance below Availability target
threshold in three consecutive months.
Availability for last 3 months was as follows: $availability1, $availability2, $availability3.
More details: https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics

The aim of submitting this ticket is the intervention of the NGI and immediate improvement of the situation.

According to procedures approved on OMB 17.08.2010, the site will be suspended 10 working days after receiving 
this ticket unless NGI intervene. If NGI intervene and performance is still below targets 3 days after the 
intervention, the site will also be suspended.

If you think that the site should not be suspended please provide justification in this ticket within 10 
working days. In case the site performance rises above targets within 3 days from providing explanation, 
the site will not be suspended. Otherwise COD may decide on suspension of the site.

You will be notified about the outcome in this ticket.

Best Regards,
EGI Operations Team

More info about Ticket generator for A/R