Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Operations and Operations Support"

From EGIWiki
Jump to navigation Jump to search
 
(20 intermediate revisions by 2 users not shown)
Line 5: Line 5:
= Introduction  =
= Introduction  =


This page collects internal materials needed by EGI.eu Operations and EGI Operations Support team to perform the EGI Infrastructure operations oversight activities.  
'''New version on https://wiki.egi.eu/wiki/EGI_Operations_Team'''
 
This page collects internal materials needed by EGI.eu Operations and EGI Operations Support team to perform the EGI Infrastructure operations oversight activities.
 
'''NOTE''': on April 30th 2016 EGI Operations Support activity stopped, all its task passed to Operations


= Contact  =
= Contact  =
Line 11: Line 15:
EGI.eu Operations:  
EGI.eu Operations:  


*GGUS Support Unit: [https://ggus.eu/?mode=ticket_search&show_columns_check[]=TICKET_TYPE&show_columns_check[]=AFFECTED_VO&show_columns_check[]=AFFECTED_SITE&show_columns_check[]=PRIORITY&show_columns_check[]=RESPONSIBLE_UNIT&show_columns_check[]=STATUS&show_columns_check[]=DATE_OF_CHANGE&show_columns_check[]=SHORT_DESCRIPTION&ticket_id=&supportunit=Operations&su_hierarchy=0&vo=&user=&keyword=&involvedsupporter=&assignedto=&affectedsite=&specattrib=none&status=open&priority=&typeofproblem=all&ticket_category=all&mouarea=&date_type=creation+date&tf_radio=1&timeframe=any&from_date=19+Aug+2014&to_date=20+Aug+2014&untouched_date=&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO! Operations]
*GGUS Support Unit:Operations  
*operations @ egi.eu
*operations @ egi.eu


EGI Operations Support:
= Actions =
 
*GGUS Suport Unit: [https://ggus.eu/?mode=ticket_search&show_columns_check[]=TICKET_TYPE&show_columns_check[]=AFFECTED_VO&show_columns_check[]=AFFECTED_SITE&show_columns_check[]=PRIORITY&show_columns_check[]=RESPONSIBLE_UNIT&show_columns_check[]=STATUS&show_columns_check[]=DATE_OF_CHANGE&show_columns_check[]=SHORT_DESCRIPTION&ticket_id=&supportunit=COD&su_hierarchy=0&vo=&user=&keyword=&involvedsupporter=&assignedto=&affectedsite=&specattrib=none&status=open&priority=&typeofproblem=all&ticket_category=all&mouarea=&date_type=creation+date&tf_radio=1&timeframe=any&from_date=19+Aug+2014&to_date=20+Aug+2014&untouched_date=&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO! COD]
*operations-support @ mailman.egi.eu
 
= Actions =


In this section are collected all work instructions containing detailed information specifying exactly what steps are to be followed to carry out an activity.  
In this section are collected all work instructions containing detailed information specifying exactly what steps are to be followed to carry out an activity.  
Line 45: Line 44:


|  
|  
*[[WI02 Operations centre creation|WI02 - New Opertions Centre creation work instruction]]
*[[WI02 Operations centre creation|WI02 - New Operations Centre creation work instruction]]
 
|-
| '''Monthly operations broadcast'''
| OS
|
|
*[[WI04_Monthly_broadcast| WI04 - Monthly Operations broadcast]]


|-
|-
Line 68: Line 74:


|  
|  
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]
*[[WI06 Tickets older than 30 days|WI06 - Tickets > 30 days]]


|-
|-
Line 79: Line 84:
|  
|  
*[https://wiki.egi.eu/wiki/PROC10 Recomputation of monitoring results and availability statistics]  
*[https://wiki.egi.eu/wiki/PROC10 Recomputation of monitoring results and availability statistics]  
*[[WI03 Availability and Reliability report followup|WI03 - Availability and reliability report work instruction]]  
*[[WI03 RC and RP OLA violation report followup|WI03 RC and RP OLA violation report followup]]  
*[[Underperforming sites and suspensions|Underperforming sites and suspensions]]
*[[Underperforming sites and suspensions|Underperforming sites and suspensions]]


Line 91: Line 96:
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]  
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]  
*[[Unknown issue|UNKNOWN issue]]  
*[[Unknown issue|UNKNOWN issue]]  
*[[WI08 Unknown report followup|WI08 - Unknown report work instruction]]
*[[WI03 RC and RP OLA violation report followup|WI03 RC and RP OLA violation report followup]]


|-
|-
Line 101: Line 106:
|  
|  
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]  
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]  
*[[WI04 Core services report followup|WI04 - Core services report work instruction]]
*[[WI03 RC and RP OLA violation report followup|WI03 RC and RP OLA violation report followup]]


|-
|-
| '''ROD performance index followup procedure'''  
| '''ROD performance index followup procedure'''  
| O<br>  
| O<br>  
|  
| <br>
|  
|  
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]  
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]  
*[[WI07 ROD performance index report follwup|WI07 - ROD Performance Index report work instruction]]  
*[[WI03 RC and RP OLA violation report followup|WI03 RC and RP OLA violation report followup]]  
*[[ROD performance index|ROD performance index]]
*[[ROD performance index|ROD performance index]]


Line 118: Line 123:
*[[WI01 ROD certification ticket handling|WI01 - New ROD team certification work instructions]]  
*[[WI01 ROD certification ticket handling|WI01 - New ROD team certification work instructions]]  
*[[WI02 Operations centre creation|WI02 - New Opertions Centre creation work instruction]]  
*[[WI02 Operations centre creation|WI02 - New Opertions Centre creation work instruction]]  
*[[WI03 Availability and Reliability report followup|WI03 - Availability and reliability report work instruction]]  
*[[WI03 RC and RP OLA violation report followup|WI03 - RC and RP OLA violation report followup]]  
*[[WI04 Core services report followup|WI04 - Core services report work instruction ]]  
*[[WI04 Monthly broadcast|WI04 - Monthly Operations broadcast]]  
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]  
*[[WI05 Unresponsive NGI escalation|WI05 - Escalation procedure in case of unresponsive NGI]]
*[[WI06 Tickets older than 30 days|WI06 - Tickets &gt; 30 days]]
*[[WI06_Core_services_process| Core services process]]
*[[WI07 ROD performance index report follwup|WI07 - ROD Performance Index report work instruction]]
*[[WI08 Unknown report followup|WI08 - Unknown report work instruction]]
 
= Events  =
 
*[https://www.egi.eu/indico/categoryDisplay.py?categId=11 EGI indico page] with COD meeting agendas.
*All open actions can be found from [[COD actions|COD actions]]


= Resources =
== Pages listing NGIs<br> ==


*[https://documents.egi.eu/secure/ShowDocument?docid=298 Document server: ROD newsletter]
For EGI&nbsp;Operations:&nbsp;to be updated while OC&nbsp;creation or decommission
*[https://documents.egi.eu/secure/ShowDocument?docid=155 Document server: Operations Support Metrics]
*[[Operations Procedures|Operations Procedures]]
*[http://www.youtube.com/user/EGIGridOversight Youtube channel]
*[https://operations-portal.in2p3.fr/dashboard/regionalPreferences Mailing lists for each ROD]
*[https://wiki.egi.eu/wiki/COD_Knowledge_database Knowledge database]


<!--
*[https://wiki.egi.eu/wiki/GOCDB_grouping_action https://wiki.egi.eu/wiki/GOCDB_grouping_action ]<br>
== ROD and COD Performance  ==
*[https://wiki.egi.eu/wiki/Operations_centres https://wiki.egi.eu/wiki/Operations_centres] <br>
 
*https://wiki.egi.eu/wiki/Top-BDII_list_for_NGI <br>  
*[[Grid operations oversight/OperationsSupportMetrics summary|Operations Support Metrics - reports summary]]-->  
*https://goc.egi.eu/portal/index.php?Page_Type=Service_Group&amp;id=1205<br>
 
*https://goc.egi.eu/portal/index.php?Page_Type=Service_Group&amp;id=1206
=== Oct 2011 to date  ===
*https://goc.egi.eu/portal/index.php?Page_Type=Service_Group&amp;id=1184
 
*https://docs.google.com/a/egi.eu/spreadsheets/d/1Zsk3ykVllc5GzNG2Hhref7wzTvz_rSKcckV8nnWWZIs/edit#gid=163292516
*Please provide a link here
*folder "08 - sites-history Q"


<br>  
<br>  
Line 152: Line 145:
<br>  
<br>  


Definition of [[Operations support metrics|Operations Support metrics]]
<br>


=== May 2010-Sep 2011 ===
= Resources =


*Operations Support [https://documents.egi.eu/document/155 metrics]
*[[Operations Procedures|Operations Procedures]]
*[http://www.youtube.com/user/EGIGridOversight Youtube channel]


=== Until April 2010 ===
<!--
== ROD and COD Performance ==


*EGEE-III Operations Support [https://documents.egi.eu/document/829 metrics]
*[[Grid operations oversight/OperationsSupportMetrics summary|Operations Support Metrics - reports summary]]-->
 
== Nagios tests  ==
 
*[[Operations SAM tests|Operations tests list ]]: list of Nagios probes generating alarms for visualization in the Operations Dashboard
*[[Availability SAM tests|Availability and reliability tests list]]: list of Nagios probes whose results are used for Availability and Reliability computation
 
== OTAG topics  ==
 
=== Operational Portal: Dashboard  ===
 
*[http://bit.ly/dZ3RWN RT tickets]
*[[COD Interaction with Dashboard team|COD interactions with Dashboard team (draft)]]
*[[COD OTAG topics|COD topics to be discussed on OTAG meeting]]
 
== Pages in draft state  ==
 
*[[Availability procedure improvements|Improvements to Availability Calculation Procedure (draft)]]
*[[Candidate or Suspended sites|Candidate Suspended Sites List]]


<br>  
<br>  


[[Category:Grid_Oversight]]
[[Category:Infrastructure_Oversight]]

Latest revision as of 17:17, 28 July 2016

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Infrastructure Operations Oversight menu: Home EGI.eu Operations Team Regional Operators (ROD) 




Introduction

New version on https://wiki.egi.eu/wiki/EGI_Operations_Team

This page collects internal materials needed by EGI.eu Operations and EGI Operations Support team to perform the EGI Infrastructure operations oversight activities.

NOTE: on April 30th 2016 EGI Operations Support activity stopped, all its task passed to Operations

Contact

EGI.eu Operations:

  • GGUS Support Unit:Operations
  • operations @ egi.eu

Actions

In this section are collected all work instructions containing detailed information specifying exactly what steps are to be followed to carry out an activity.

Action Responsible
Procedure Instructions and related pages
ROD certification OS
Creation of a new NGI OS
Monthly operations broadcast OS
Operations Centre decommission O

Setting a Nagios test to an operations test O

Operational portal dashboard issues O
Availability/reliability followup procedure O
Unknown followup procedure O
Top-level BDII followup procedure O
ROD performance index followup procedure O

Work Instructions

Pages listing NGIs

For EGI Operations: to be updated while OC creation or decommission




Resources