Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "WI03 RC and RP OLA violation report followup"

From EGIWiki
Jump to navigation Jump to search
Line 4: Line 4:
[[Category:Grid Oversight]]
[[Category:Grid Oversight]]


= Internal procedure for COD - '''Availability and reliability work instruction for COD''' =
 
= Availability and reliability work instruction for COD  =


This page describes steps which should be taken by COD shifter to follow availability/reliability issues.  
This page describes steps which should be taken by COD shifter to follow availability/reliability issues.  


<br> When GGUS ticket about availability/reliability metrics is assigned to COD:
== General info ==


<br>
*'''Receiver''': Site
*'''Subject''': Availability under target for last 3 months
*'''Threshold''': 70%
*'''Goal''': We expect to see improvement
*'''Deadline for answers''': 10 days
*'''No response''': site suspension


{| align="center" cellspacing="0" cellpadding="5" border="1"
== Steps ==
|-
! Timelines
! Step
! Substep
! Description
|-
|
| 1
|
| Add ticket url to [[Underperforming_sites_and_suspensions| Underperforming_sites_and_suspensions]] page
|-
|
| 2
|
| Ava/Rel report review
|-
|
|
| 1
| Prepare ''''sites for suspension'''' list: Look at&nbsp; availability metics for two previous months in AR report and the current one. If all are below 70% then sites qualifies for suspension.
Check if the site was mentioned in [[List_of_sites_for_which_the_availability_followup_procedures_were_not_applicable| List of sites for which the availability followup procedures were not applicable]] page. In some cases there could be no need to open a ticket.


|-
=== Parent ticket ===
|
#Ticket is submitted by Georgios Kaklamanos or George Fergadis.
|
#Add ticket URL to [https://wiki.egi.eu/wiki/Grid_operations_oversight/CODOD_actions#Monthly_Actions Monthly actions]
| 2
#Add ticket URL to [https://wiki.egi.eu/wiki/Underperforming_sites_and_suspensions Underperforming sites and suspensions]
| Prepare ''''sites to be asked for explanation'''' list: Look at current months in AR report. If Ava. is below 70% or Rel. below 75% then sites qualifies to be asked for explanation. This list should be prepared according to requirements for input file for [[Grid_operations_oversight/WI03#How_to_use_ticket_generator |ticket generator]].  
Check if the site was mentioned in [[List_of_sites_for_which_the_availability_followup_procedures_were_not_applicable| List of sites for which the availability followup procedures were not applicable]] page In some cases there could be no need to open a ticket.  
 
|-
|
| 3
|
| Create tickets for each case as a child to the tickets assigned to COD
|-
|
|
| 1
| For ''''sites for suspension'''' list please use [[Grid_operations_oversight/WI03#How_to_use_ticket_generator| ticket generator]]
|-
|
|
| 2
| For ''''sites to be asked for explanation'''' list please use [[Grid_operations_oversight/WI03#How_to_use_ticket_generator|ticket generator]]
|-
| '''Within''' 10 working days from when the tickets are created.  
| 4
|
|
'''Handling of sites below targets'''


When explanation is provided and is found satisfactory put as a solution of the ticket  
=== Submit child tickets to NGIs ===
<pre>'The explanation is satisfactory. Thank you!'. </pre>  
#Go to Dropbox - COD - TicketCreator - AvaRel report
After that you should set child ticket to 'verified' status.
#Prepare input file EGI_sus.csv based on the records marked as red in the source pdf. Input file syntax: <br /> <pre>NGI;Site;Availability;Reliability;</pre> Make sure NGIs are named according to the below table.
#Run ticket creator: <br /><pre>perl start-suspend.pl ticket_number ‘date, e.g. Ser 2012’ “EGI_sus.csv”</pre> More info about [[Ticket generator]]


|-
=== Handling the child tickets ===
| '''After''' 10 working days from when the tickets are created.  
#NGIs that replied within 10 days - check the explanation. If uncertain whether to suspend or not, discuss with COO by submitting a ticket to them.
| 5
##If after 3 days from receiving the explanation from NGI availability shows no improvement (is still <70%) COD should suspend the site. Inform NGI and site about the suspension.
|
##In cases COD agree the site should not be suspended (such as: raise of availability >70% or any other important reason, such as NGI SAM problem) the site can be left certified
| Final actions.
#NGIs that didn’t reply - after 10 days suspend the site. Inform NGI and site about the suspension.
|-
#Prepare summary report and place it in the parent ticket.
|
#Update  [[Underperforming_sites_and_suspensions| Underperforming_sites_and_suspensions]] and [[List_of_sites_for_which_the_availability_followup_procedures_were_not_applicable |List of sites for which the availability followup procedures were not applicable]]
|
The whole process should be completed '''by the end of the month'''.
| 1
| '''Handling of sites that are eligible for suspension'''
*in the case of '''no''' NGI intervention, the site is suspended in GOC DB - as a reason put a link to GGUS ticket created for the site
*in the case of NGI intervention:
** non suspension will occur only if both the COD and COO agree on the reasoning provided by the NGI (COO should be involved in the ticket)
** if availability shows no improvement COD can suspend the site
 
|-
|
|
| 2
| '''Handling of sites below targets'''
If the explanation is not given in due time, or the explanation is found inadequate, COD send mail to NGI/ROC manager with CC to ROD and GGUS:
 
*informing that NGI/ROC manager should make the site react on the ticket or suspend the site within 3 days
*if NGI will not react COD will suspend the site on the 4th day.
<pre>Dear XX
 
I would like to inform you that 10 working days passed.
Please make the site react on the ticket or suspend the site within 3 days.
If NGI will not react COD will suspend the site on the 4th day.
 
Best Regards
XXX
On behalf of COD team
</pre>
|-
|
| 6
|
| Prepare summary report (it should be placed in parent ticket):
#sites which are not responsive and didn't provided satisfactory explanation
#sites which were suspended
#ROCs/NGIs which are not responsive
#...
 
|-
|
| 7
|
| Update [[List_of_sites_for_which_the_availability_followup_procedures_were_not_applicable |List of sites for which the availability followup procedures were not applicable]] page. Put here outstanding cases which should be recorded. This could be used for example to avoid opening a ticket next month for a solved issue.
|-
|
| 8
|
| Update [[Underperforming_sites_and_suspensions| Underperforming_sites_and_suspensions]] page.
|}


== Additional info ==


<br> <span style="color: rgb(255, 0, 0);">'''VERY IMPORTANT'''</span>
=== Naming the NGIs ===


<span style="background: none repeat scroll 0% 0% rgb(255, 0, 0);"> In grid view NGIs/ROCs are named differently then in GGUS. You should change NGI/ROC name according to GGUS.</span>
In grid view NGIs/ROCs are named differently then in GGUS. You should change NGI/ROC name according to GGUS.


<br>  
<br>  
Line 173: Line 89:
|}
|}


= Ticket content  =
=== Ticket content  ===


<pre>Subject:$SU/$siteName site suspension
<pre>Subject:$SU/$siteName site suspension
Line 190: Line 106:
</pre>  
</pre>  


[[Ticket generator]]
More info about [[Ticket generator]]

Revision as of 12:23, 28 November 2012

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Infrastructure Operations Oversight menu: Home EGI.eu Operations Team Regional Operators (ROD) 



Availability and reliability work instruction for COD

This page describes steps which should be taken by COD shifter to follow availability/reliability issues.

General info

  • Receiver: Site
  • Subject: Availability under target for last 3 months
  • Threshold: 70%
  • Goal: We expect to see improvement
  • Deadline for answers: 10 days
  • No response: site suspension

Steps

Parent ticket

  1. Ticket is submitted by Georgios Kaklamanos or George Fergadis.
  2. Add ticket URL to Monthly actions
  3. Add ticket URL to Underperforming sites and suspensions

Submit child tickets to NGIs

  1. Go to Dropbox - COD - TicketCreator - AvaRel report
  2. Prepare input file EGI_sus.csv based on the records marked as red in the source pdf. Input file syntax:
    NGI;Site;Availability;Reliability;
    Make sure NGIs are named according to the below table.
  3. Run ticket creator:
    perl start-suspend.pl ticket_number ‘date, e.g. Ser 2012’ “EGI_sus.csv”
    More info about Ticket generator

Handling the child tickets

  1. NGIs that replied within 10 days - check the explanation. If uncertain whether to suspend or not, discuss with COO by submitting a ticket to them.
    1. If after 3 days from receiving the explanation from NGI availability shows no improvement (is still <70%) COD should suspend the site. Inform NGI and site about the suspension.
    2. In cases COD agree the site should not be suspended (such as: raise of availability >70% or any other important reason, such as NGI SAM problem) the site can be left certified
  2. NGIs that didn’t reply - after 10 days suspend the site. Inform NGI and site about the suspension.
  3. Prepare summary report and place it in the parent ticket.
  4. Update Underperforming_sites_and_suspensions and List of sites for which the availability followup procedures were not applicable

The whole process should be completed by the end of the month.

Additional info

Naming the NGIs

In grid view NGIs/ROCs are named differently then in GGUS. You should change NGI/ROC name according to GGUS.


GGUS Gridview
ROC_DECH GermanySwitzerland
NGI_FRANCE NGI_France
NGI_CYGRID NGI_CY
ROC_Asia/Pacific AsiaPacific
ROC_Italy Italy
ROC_CERN CERN
ROC_Russia Russia
ROC_North NorthernEurope
ROC_UK/Ireland UKI
ROC_SE SouthEasternEurope
ROC_SW SouthWesternEurope
NGI_UA Ukraine

Ticket content

Subject:$SU/$siteName site suspension

Dear $SU,

According to recent availability/reliability report $siteName has achieved
poor performance below target Ava. 50% or Rel. 50% in three consecutive months.
More details: [[Availability_and_reliability_monthly_statistics]].

According to procedures approved on OMB 17.08, site will be suspended within 10 working days unless the NGI intervene.
If you think that the site should not be suspended please provide justification within 10 working days.

Best Regards,
EGI Central Operator on Duty

More info about Ticket generator