Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

WI03 RC and RP OLA violation report followup

From EGIWiki
Jump to navigation Jump to search

Internal procedure for COD

This page describes steps which should be taken by COD shifter to follow availability/reliability issues.


When GGUS ticket about availability/reliability metrics is assigned to COD:


Timelines Step Substep Description
1 Add ticket url to Availability and reliability internal procedure for COD tickets page
2 Ava/Rel report review
1 Prepare 'sites for suspension' list: Look at two previous months in AR report and the current one. If all are below 50% then sites qualifies for suspension.
2 Prepare 'sites to be asked for explanation' list: Look at current months in AR report. If Ava. is below 70% or Rel. below 75% then sites qualifies to be asked for explanation. This list should be prepared according to requirements for input file for ticket generator
3 Create tickets for each case as a child to the tickets assigned to COD
1 For 'sites for suspension' list please use template Site for suspension
2 For 'sites to be asked for explanation' list please use ticket generator
Within 7 working days from when the tickets are created. 4 When explanation is provided and is found satisfactory, you should set child ticket to 'verified' status.
After 7 working days from when the tickets are created. 5 Final action.
1 Close all open tickets.
2 Suspend in GOC DB sites from sites for suspension' list qualified for suspension.
3 Prepare summary report of explanations (it should be placed in parent ticket):
  1. sites which are not responsive
  2. sites which provided not satisfactory explanation
  3. ROCs/NGIs which are not responsive

Tickets content

Request for explanation

Subject:$SU/$siteName - availability/reliability statistics for $date

Dear $SU,

According to recent availability/reliability report $siteName has achieved
poor performance Ava. $availability  Rel. $realiability.
More details: https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics.

Could you please provide explanations for poor performance of the $siteName site?

Your explanation must be returned within 7 working days from when the ticket is created.
If the explanation is not given in due time, or the explanation is found inadequate,
the EGI Chief Operations Officer can decide within 3 working days after the deadline
to suspend the site.

If the site was certified during last month please close this ticket and 
put this info in a ticket solution field. There is known bug in report 
generation tool being worked on.


Best Regards,
EGI Central Operator on Duty

Site for suspension

Subject:$SU/$siteName site suspension

Dear $SU,

According to recent availability/reliability report $siteName has achieved
poor performance below target Ava. 50% or Rel. 50% in three consecutive months.
More details: https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics.

According to procedures site will be suspended within 7 working days unless the NGI intervene.
If you think that the site should not be suspended please provide justification within 7 working days.

Best Regards,
EGI Central Operator on Duty


How to use ticket generator

  • Configure the script. In start.pl file at the beginning of the script you have to fill in following variable:
# PRODUCTION
my $endpoint = "https://gusiwr.fzk.de/arsys/services/ARService?server=gusiwr&webService=Grid_HelpDesk";
my $user = ""; # login to GGUS web-services
my $pass = ""; # password to GGUS web-services

# Submitter data, Those data will be used as submitter's data to create tickets
my $Mail = ""; # your email adress
my $DN = "";   # your DN
my $Name = ""; # Name and Surname


  • Prepare input file. the input plain file format is as follow:

ROC/NGI; Site name; Availability; Reliability;

Remember that in each line should be one site and the number of semicolons should be always 4.

example:

NGI_PL; CYFRONET_LCG2; 50%; 10%;
NGI_PL; IFJ-PAN; 15%; 3%;
  • Execute the tool

Login to machine with perl installed and execute the script as follow:

perl start.pl PARENT_TICKET_ID "DATE" FILE_NAME

PARENT_TICKET_ID - number of "Availability/reliability statistics for *" ticket

DATE - date of the report. Format: "month year"

FILE_NAME - file with input availability/reliability data

example:

  perl start.pl 4121 "May 2010" dane.txt