Difference between revisions of "WI03 RC and RP OLA violation report followup"
Line 1: | Line 1: | ||
{{Template:Op menubar}} | {{Template:Op menubar}} {{TOC_right}} | ||
{{TOC_right}} | |||
= Internal procedure for COD = | |||
= Internal procedure for COD = | |||
This page describes steps which should be taken by COD shifter to follow availability/reliability issues. | This page describes steps which should be taken by COD shifter to follow availability/reliability issues. | ||
<br> When GGUS ticket about availability/reliability metrics is assigned to COD: | |||
<br> | |||
{| cellspacing="0" cellpadding="5" border="1" align="center" | |||
{| | |- | ||
!Timelines | ! Timelines | ||
!Step | ! Step | ||
!Substep | ! Substep | ||
! Description | ! Description | ||
|- | |- | ||
| | | | ||
| 1 | | 1 | ||
| | | | ||
| Add ticket url to [https://wiki.egi.eu/wiki/Underperforming_sites_and_suspensions Underperforming_sites_and_suspensions] page | | Add ticket url to [https://wiki.egi.eu/wiki/Underperforming_sites_and_suspensions Underperforming_sites_and_suspensions] page | ||
|- | |- | ||
| | | | ||
|2 | | 2 | ||
| | | | ||
| Ava/Rel report review | | Ava/Rel report review | ||
|- | |- | ||
| | | | ||
| | | | ||
| 1 | | 1 | ||
| Prepare ''''sites for suspension'''' list: Look at two previous months in AR report and the current one. If all are below | | Prepare ''''sites for suspension'''' list: Look at availability metics for two previous months in AR report and the current one. If all are below 70% then sites qualifies for suspension. | ||
Check if the site was mentioned in [https://wiki.egi.eu/wiki/List_of_underperforming_sites List of sites for which the availability followup procedures were not applicable] page. In some cases there could be no need to open a ticket. | Check if the site was mentioned in [https://wiki.egi.eu/wiki/List_of_underperforming_sites List of sites for which the availability followup procedures were not applicable] page. In some cases there could be no need to open a ticket. | ||
|- | |- | ||
| | | | ||
| | | | ||
| 2 | | 2 | ||
| Prepare ''''sites to be asked for explanation'''' list: Look at current months in AR report. If Ava. is below 70% or Rel. below 75% then sites qualifies to be asked for explanation. This list should be prepared according to requirements for input file for [https://wiki.egi.eu/wiki/Availability_and_reliability_internal_procedure_for_COD#How_to_use_ticket_generator ticket generator]. | | Prepare ''''sites to be asked for explanation'''' list: Look at current months in AR report. If Ava. is below 70% or Rel. below 75% then sites qualifies to be asked for explanation. This list should be prepared according to requirements for input file for [https://wiki.egi.eu/wiki/Availability_and_reliability_internal_procedure_for_COD#How_to_use_ticket_generator ticket generator]. | ||
Check if the site was mentioned in [https://wiki.egi.eu/wiki/List_of_underperforming_sites List of sites for which the availability followup procedures were not applicable] page In some cases there could be no need to open a ticket. | Check if the site was mentioned in [https://wiki.egi.eu/wiki/List_of_underperforming_sites List of sites for which the availability followup procedures were not applicable] page In some cases there could be no need to open a ticket. | ||
|- | |- | ||
| | | | ||
|3 | | 3 | ||
| | | | ||
| Create tickets for each case as a child to the tickets assigned to COD | | Create tickets for each case as a child to the tickets assigned to COD | ||
|- | |- | ||
| | | | ||
| | | | ||
| 1 | | 1 | ||
| For ''''sites for suspension'''' list please use [https://wiki.egi.eu/wiki/Availability_and_reliability_internal_procedure_for_COD#How_to_use_ticket_generator ticket generator] | | For ''''sites for suspension'''' list please use [https://wiki.egi.eu/wiki/Availability_and_reliability_internal_procedure_for_COD#How_to_use_ticket_generator ticket generator] | ||
|- | |- | ||
| | | | ||
| | | | ||
| 2 | | 2 | ||
| For ''''sites to be asked for explanation'''' list please use [https://wiki.egi.eu/wiki/Availability_and_reliability_internal_procedure_for_COD#How_to_use_ticket_generator ticket generator] | | For ''''sites to be asked for explanation'''' list please use [https://wiki.egi.eu/wiki/Availability_and_reliability_internal_procedure_for_COD#How_to_use_ticket_generator ticket generator] | ||
|- | |- | ||
| '''Within''' 10 working days from when the tickets are created. | | '''Within''' 10 working days from when the tickets are created. | ||
| 4 | | 4 | ||
| | | | ||
| When explanation is provided and is found satisfactory put as a solution of the ticket | | When explanation is provided and is found satisfactory put as a solution of the ticket <pre>'The explanation is satisfactory. Thank you!'. </pre> | ||
<pre>'The explanation is satisfactory. Thank you!'. </pre> | After that you should set child ticket to 'verified' status. | ||
|- | |- | ||
| '''After''' 10 working days from when the tickets are created. | | '''After''' 10 working days from when the tickets are created. | ||
| 5 | | 5 | ||
| | | | ||
Line 68: | Line 68: | ||
|- | |- | ||
| | | | ||
| | | | ||
|1 | | 1 | ||
| '''Handling of sites that are eligible for suspension''' | | '''Handling of sites that are eligible for suspension''' | ||
* in the case of '''no''' NGI intervention, the site is suspended in GOC DB - as a reason put a link to GGUS ticket created for the site | *in the case of '''no''' NGI intervention, the site is suspended in GOC DB - as a reason put a link to GGUS ticket created for the site | ||
* in the case of NGI intervention, non suspension will occur if both the COD and COO agree on the reasoning provided by the NGI | *in the case of NGI intervention, non suspension will occur if both the COD and COO agree on the reasoning provided by the NGI | ||
** COO should be involved to the ticket | **COO should be involved to the ticket | ||
|- | |- | ||
| | | | ||
| | | | ||
|2 | | 2 | ||
| '''Handling of sites below targets''' | | '''Handling of sites below targets''' | ||
If the explanation is not given in due time, or the explanation is found inadequate, COD escalation procedure will be followed since step #3 https://wiki.egi.eu/wiki/Operations:COD_Escalation_Procedure | If the explanation is not given in due time, or the explanation is found inadequate, COD escalation procedure will be followed since step #3 https://wiki.egi.eu/wiki/Operations:COD_Escalation_Procedure | ||
|- | |- | ||
| | | | ||
| 6 | | 6 | ||
| | | | ||
| Prepare summary report (it should be placed in parent ticket): | | Prepare summary report (it should be placed in parent ticket): | ||
# sites which are not responsive and didn't provided satisfactory explanation | #sites which are not responsive and didn't provided satisfactory explanation | ||
# sites which were suspended | #sites which were suspended | ||
# ROCs/NGIs which are not responsive | #ROCs/NGIs which are not responsive | ||
# ... | #... | ||
|- | |- | ||
| | | | ||
| 7 | | 7 | ||
| | | | ||
| Update [https://wiki.egi.eu/wiki/List_of_underperforming_sites List of sites for which the availability followup procedures were not applicable] page. Put here outstanding cases which should be recorded. This could be used for example to avoid opening a ticket next month for a solved issue. | | Update [https://wiki.egi.eu/wiki/List_of_underperforming_sites List of sites for which the availability followup procedures were not applicable] page. Put here outstanding cases which should be recorded. This could be used for example to avoid opening a ticket next month for a solved issue. | ||
|- | |- | ||
| | | | ||
| 8 | | 8 | ||
| | | | ||
| Update [https://wiki.egi.eu/wiki/Underperforming_sites_and_suspensions Underperforming_sites_and_suspensions] page. | | Update [https://wiki.egi.eu/wiki/Underperforming_sites_and_suspensions Underperforming_sites_and_suspensions] page. | ||
|} | |} | ||
= Questions/issues = | = Questions/issues = | ||
''MR: what do we do with sites marked with "n/a"?'' | ''MR: what do we do with sites marked with "n/a"?'' | ||
''MK: we don't take into account | ''MK: we don't take into account months with "N/A" '' | ||
<br> <span style="color: rgb(255, 0, 0);">'''VERY IMPORTANT'''</span> | |||
<span style=" | <span style="background: none repeat scroll 0% 0% rgb(255, 0, 0);"> In grid view NGIs/ROCs are named differently then in GGUS. You should change NGI/ROC name according to GGUS.</span> | ||
< | <br> | ||
{| cellspacing="0" cellpadding="5" border="1" align="center" | |||
{| | |- | ||
!GGUS | ! GGUS | ||
!Gridview | ! Gridview | ||
|- | |- | ||
| ROC_DECH | | ROC_DECH | ||
| GermanySwitzerland | | GermanySwitzerland | ||
|- | |- | ||
| NGI_FRANCE | | NGI_FRANCE | ||
| NGI_France | | NGI_France | ||
|- | |- | ||
| ROC_Asia/Pacific | | ROC_Asia/Pacific | ||
| AsiaPacific | | AsiaPacific | ||
|- | |- | ||
| ROC_Italy | | ROC_Italy | ||
| Italy | | Italy | ||
|- | |- | ||
| ROC_CERN | | ROC_CERN | ||
| CERN | | CERN | ||
|- | |- | ||
| ROC_Russia | | ROC_Russia | ||
| Russia | | Russia | ||
|- | |- | ||
| ROC_North | | ROC_North | ||
| NorthernEurope | | NorthernEurope | ||
|- | |- | ||
| ROC_UK/Ireland | | ROC_UK/Ireland | ||
| UKI | | UKI | ||
|- | |- | ||
| ROC_SE | | ROC_SE | ||
| SouthEasternEurope | | SouthEasternEurope | ||
|- | |- | ||
| ROC_SW | | ROC_SW | ||
| SouthWesternEurope | | SouthWesternEurope | ||
|} | |} | ||
= Tickets content = | = Tickets content = | ||
== Request for explanation == | == Request for explanation == | ||
<pre>Subject:$SU/$siteName - availability/reliability statistics for $date | |||
<pre> | |||
Subject:$SU/$siteName - availability/reliability statistics for $date | |||
Dear $SU, | Dear $SU, | ||
Line 176: | Line 176: | ||
Best Regards, | Best Regards, | ||
EGI Central Operator on Duty | EGI Central Operator on Duty | ||
</pre> | </pre> | ||
== Site for suspension == | |||
== Site for suspension == | <pre>Subject:$SU/$siteName site suspension | ||
<pre> | |||
Subject:$SU/$siteName site suspension | |||
Dear $SU, | Dear $SU, | ||
Line 194: | Line 191: | ||
Best Regards, | Best Regards, | ||
EGI Central Operator on Duty | EGI Central Operator on Duty | ||
</pre> | </pre> | ||
= How to use ticket generator = | |||
current version of the script: 3.0 | |||
features: | |||
*bulk child ticket creation | |||
* bulk child ticket creation | *'assigned to' set | ||
* 'assigned to' set | *'affected site' set | ||
* 'affected site' set | *'type of problem' set to Operations | ||
* 'type of problem' set to Operations | |||
<br> | |||
<br> | |||
*'''Configure the script'''. | |||
In start-explanations.pl/start-suspend.pl file at the beginning of the script you have to fill in following variable: | |||
<pre># PRODUCTION | |||
my $endpoint = "https://gusiwr.fzk.de/arsys/services/ARService?server=gusiwr&webService=Grid_HelpDesk"; | |||
In start-explanations.pl/start-suspend.pl file at the beginning of the script you have to fill in following variable: | |||
<pre> | |||
# PRODUCTION | |||
my $endpoint = "https://gusiwr.fzk.de/arsys/services/ARService?server=gusiwr&webService=Grid_HelpDesk"; | |||
my $user = ""; # login to GGUS web-services | my $user = ""; # login to GGUS web-services | ||
my $pass = ""; # password to GGUS web-services | my $pass = ""; # password to GGUS web-services | ||
Line 224: | Line 219: | ||
my $DN = ""; # your DN | my $DN = ""; # your DN | ||
my $Name = ""; # Name and Surname | my $Name = ""; # Name and Surname | ||
</pre> | </pre> | ||
<br> | |||
* '''Prepare input file.''' | *'''Prepare input file.''' | ||
The input plain file format for both scripts is as follow: | The input plain file format for both scripts is as follow: | ||
''ROC/NGI support unit in GGUS; Site name; Availability; Reliability;'' | ''ROC/NGI support unit in GGUS; Site name; Availability; Reliability;'' | ||
Remember that in each line should be one site and the number of semicolons should be always 4. For start-suspend.pl script Availability and Reliability values are omitted but semicolons are necessary. | Remember that in each line should be one site and the number of semicolons should be always 4. For start-suspend.pl script Availability and Reliability values are omitted but semicolons are necessary. | ||
example: | example: | ||
<pre> | <pre>NGI_PL; CYFRONET_LCG2; 50%; 10%; | ||
NGI_PL; CYFRONET_LCG2; 50%; 10%; | |||
NGI_PL; IFJ-PAN; 15%; 3%; | NGI_PL; IFJ-PAN; 15%; 3%; | ||
</pre> | </pre> | ||
*'''Execute the tool''' | |||
* '''Execute the tool''' | |||
Login to machine with perl installed and execute the script as follow: | Login to machine with perl installed and execute the script as follow: | ||
''perl start-explanations.pl/start-suspend.pl PARENT_TICKET_ID "DATE" FILE_NAME'' | ''perl start-explanations.pl/start-suspend.pl PARENT_TICKET_ID "DATE" FILE_NAME'' | ||
PARENT_TICKET_ID - number of "Availability/reliability statistics for *" ticket | PARENT_TICKET_ID - number of "Availability/reliability statistics for *" ticket | ||
DATE - date of the report. Format: "month year" | DATE - date of the report. Format: "month year" | ||
FILE_NAME - file with input availability/reliability data | FILE_NAME - file with input availability/reliability data | ||
example: | example: | ||
<pre> | <pre> perl start-explanations.pl 4121 "May 2010" dane.txt | ||
</pre> | |||
</pre> | = Best practice = | ||
*If the site explaining that site administrator was on holidays put as a solution "This time the explanation is found satisfactory, although for the future in case of administrators holidays site should provide administrator deputy. If it is not possible then NGI should put site which is failing in downtime. Thank you!". Close the ticket and verify it. | |||
[[Category:COD]] |
Revision as of 12:21, 10 August 2011
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Internal procedure for COD
This page describes steps which should be taken by COD shifter to follow availability/reliability issues.
When GGUS ticket about availability/reliability metrics is assigned to COD:
Timelines | Step | Substep | Description |
---|---|---|---|
1 | Add ticket url to Underperforming_sites_and_suspensions page | ||
2 | Ava/Rel report review | ||
1 | Prepare 'sites for suspension' list: Look at availability metics for two previous months in AR report and the current one. If all are below 70% then sites qualifies for suspension.
Check if the site was mentioned in List of sites for which the availability followup procedures were not applicable page. In some cases there could be no need to open a ticket. | ||
2 | Prepare 'sites to be asked for explanation' list: Look at current months in AR report. If Ava. is below 70% or Rel. below 75% then sites qualifies to be asked for explanation. This list should be prepared according to requirements for input file for ticket generator.
Check if the site was mentioned in List of sites for which the availability followup procedures were not applicable page In some cases there could be no need to open a ticket. | ||
3 | Create tickets for each case as a child to the tickets assigned to COD | ||
1 | For 'sites for suspension' list please use ticket generator | ||
2 | For 'sites to be asked for explanation' list please use ticket generator | ||
Within 10 working days from when the tickets are created. | 4 | When explanation is provided and is found satisfactory put as a solution of the ticket 'The explanation is satisfactory. Thank you!'. After that you should set child ticket to 'verified' status. | |
After 10 working days from when the tickets are created. | 5 | Final actions. | |
1 | Handling of sites that are eligible for suspension
| ||
2 | Handling of sites below targets
If the explanation is not given in due time, or the explanation is found inadequate, COD escalation procedure will be followed since step #3 https://wiki.egi.eu/wiki/Operations:COD_Escalation_Procedure | ||
6 | Prepare summary report (it should be placed in parent ticket):
| ||
7 | Update List of sites for which the availability followup procedures were not applicable page. Put here outstanding cases which should be recorded. This could be used for example to avoid opening a ticket next month for a solved issue. | ||
8 | Update Underperforming_sites_and_suspensions page. |
Questions/issues
MR: what do we do with sites marked with "n/a"?
MK: we don't take into account months with "N/A"
VERY IMPORTANT
In grid view NGIs/ROCs are named differently then in GGUS. You should change NGI/ROC name according to GGUS.
GGUS | Gridview |
---|---|
ROC_DECH | GermanySwitzerland |
NGI_FRANCE | NGI_France |
ROC_Asia/Pacific | AsiaPacific |
ROC_Italy | Italy |
ROC_CERN | CERN |
ROC_Russia | Russia |
ROC_North | NorthernEurope |
ROC_UK/Ireland | UKI |
ROC_SE | SouthEasternEurope |
ROC_SW | SouthWesternEurope |
Tickets content
Request for explanation
Subject:$SU/$siteName - availability/reliability statistics for $date Dear $SU, According to recent availability/reliability report $siteName has achieved poor performance Ava. $availability Rel. $realiability. More details: https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics. Could you please provide explanations for poor performance of the $siteName site? Your explanation must be returned within 10 working days from when the ticket is created. If the explanation is not given in due time, or the explanation is found inadequate, COD escalation procedure will be followed https://wiki.egi.eu/wiki/Operations:COD_Escalation_Procedure If the site was certified during last month please close this ticket and put this info in a ticket solution field. There is known bug in report generation tool being worked on. Best Regards, EGI Central Operator on Duty
Site for suspension
Subject:$SU/$siteName site suspension Dear $SU, According to recent availability/reliability report $siteName has achieved poor performance below target Ava. 50% or Rel. 50% in three consecutive months. More details: https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics. According to procedures approved on OMB 17.08, site will be suspended within 10 working days unless the NGI intervene. If you think that the site should not be suspended please provide justification within 10 working days. Best Regards, EGI Central Operator on Duty
How to use ticket generator
current version of the script: 3.0
features:
- bulk child ticket creation
- 'assigned to' set
- 'affected site' set
- 'type of problem' set to Operations
- Configure the script.
In start-explanations.pl/start-suspend.pl file at the beginning of the script you have to fill in following variable:
# PRODUCTION my $endpoint = "https://gusiwr.fzk.de/arsys/services/ARService?server=gusiwr&webService=Grid_HelpDesk"; my $user = ""; # login to GGUS web-services my $pass = ""; # password to GGUS web-services # Submitter data, Those data will be used as submitter's data to create tickets my $Mail = ""; # your email address my $DN = ""; # your DN my $Name = ""; # Name and Surname
- Prepare input file.
The input plain file format for both scripts is as follow:
ROC/NGI support unit in GGUS; Site name; Availability; Reliability;
Remember that in each line should be one site and the number of semicolons should be always 4. For start-suspend.pl script Availability and Reliability values are omitted but semicolons are necessary.
example:
NGI_PL; CYFRONET_LCG2; 50%; 10%; NGI_PL; IFJ-PAN; 15%; 3%;
- Execute the tool
Login to machine with perl installed and execute the script as follow:
perl start-explanations.pl/start-suspend.pl PARENT_TICKET_ID "DATE" FILE_NAME
PARENT_TICKET_ID - number of "Availability/reliability statistics for *" ticket
DATE - date of the report. Format: "month year"
FILE_NAME - file with input availability/reliability data
example:
perl start-explanations.pl 4121 "May 2010" dane.txt
Best practice
- If the site explaining that site administrator was on holidays put as a solution "This time the explanation is found satisfactory, although for the future in case of administrators holidays site should provide administrator deputy. If it is not possible then NGI should put site which is failing in downtime. Thank you!". Close the ticket and verify it.