Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI Infrastructure operations oversight"

From EGIWiki
Jump to navigation Jump to search
Line 30: Line 30:


* ensuring a timely response to emails sent to: operations@egi
* ensuring a timely response to emails sent to: operations@egi
* verifying the status of GGUS tickets that are marked as URGENT or have not been attended to in a timely manner.  Useful filters are: [http://go.egi.eu/ggus_operations tickets opened against Operations SU] and [http://go.egi.eu/ggus_egi_toppriority open top priority tickets in the EGI scope] and also [http://go.egi.eu/ggus_egi_veryurgent open very urgent tickets in the EGI scope].
* verifying the status of GGUS tickets that are marked as URGENT or have not been attended to in a timely manner.  Useful filters are: [http://go.egi.eu/ggus_operations tickets opened against Operations SU] and [http://go.egi.eu/ggus_egi_toppriority open top priority tickets in the EGI scope] and also [http://go.egi.eu/ggus_egi_veryurgent open very urgent tickets in the EGI scope].  Note that the target response times for these are 1 day; for lower priorities it is 5 days.
* check the ROD Dashboard on the [https://operations-portal.egi.eu/rodDashboard/ngi/any/tab/list/filter/operators/page/list Operations Portal] especially for long-standing critical alarms, and checking with NGIs of their status as necessary
* check the ROD Dashboard on the [https://operations-portal.egi.eu/rodDashboard/ngi/any/tab/list/filter/operators/page/list Operations Portal] especially for long-standing critical alarms, and checking with NGIs of their status as necessary
* check for new tickets in the [https://rt.egi.eu/ RT] Change Management queue, especially changes that need the CAB to be convened urgently, and if so, inform the CHM Manager
* check for new tickets in the [https://rt.egi.eu/ RT] Change Management queue, especially changes that need the CAB to be convened urgently, and if so, inform the CHM Manager

Revision as of 12:01, 30 April 2018

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


EGI Infrastructure Operations Oversight menu: Home EGI.eu Operations Team Regional Operators (ROD) 



EGI Infrastructure operations oversight activity is provided by:

  • EGI Foundation Operations team
  • Regional Operators on Duty (ROD) teams

Oversight activity over the NGI infrastructures is needed for detecting problems, coordinating the diagnosis, and monitoring the problems during the entire lifecycle until resolution. Oversight of the NGI is based on monitoring of status of services operated by sites, opening of tickets and their follow up for problem resolution. EGI.org supports and actively controls the overall status of services and sites, opening of tickets for requesting problem fixing, and tackling of residual problems not successfully distributed to NGI’s.


EGI.eu Operations team is the central team responsible for EGI Production Infrastructure. It is also responsible to provide:

  1. Coordination of activities with the Operations Management Board and the User Community Board.
  2. Central Technical Support to site administrators, NGI operators and new user communities. This includes
    • technical support to the EGI Foundation operations activities
    • technical support to ROD teams through target training activities
    • coordination of technical working groups
    • technical support to new resource centres in their certification phase when requested by the Operations Centre because of lack of sufficient local expertise
    • certification and technical support for new infrastructures being integrated by providing assistance and training about EGI operations services, policies and procedures, and developing documentation as needed
  3. Resource Allocation
    • defining service management processes for resource allocation and other EGI.eu operations services
    • training, communicating, adapting, enforcing these at an NGI level
    • defining requirements for the operations tools that generate from the provisioning of these new services
    • managing resource allocation process (eg. operating e-grant tool)


EGI Foundation Operator on Duty (OD) is a person in the central EGI Foundation Operations team primarily responsible for responding to tickets. This duty is rotated among the team using a rota and ensures that everyone in the team is given hands-on experience dealing with tickets and coordinating operations activities. OD duties include:

  • ensuring a timely response to emails sent to: operations@egi
  • verifying the status of GGUS tickets that are marked as URGENT or have not been attended to in a timely manner. Useful filters are: tickets opened against Operations SU and open top priority tickets in the EGI scope and also open very urgent tickets in the EGI scope. Note that the target response times for these are 1 day; for lower priorities it is 5 days.
  • check the ROD Dashboard on the Operations Portal especially for long-standing critical alarms, and checking with NGIs of their status as necessary
  • check for new tickets in the RT Change Management queue, especially changes that need the CAB to be convened urgently, and if so, inform the CHM Manager


Regional Operators on Duty (ROD) is a team responsible for solving problems on the infrastructure within NGI according to agreed procedures. They ensure that problems are properly recorded and progress according to specified time lines. They ensure that necessary information is available to all parties. The team is provided by each NGI and requires procedural knowledge on the process (rather than technical skills) for their work. Depending on how an NGI is organized there might be a number of members in the ROD team who work on duty roster (shifts on a daily or weekly basis), or there may be one person working as ROD on a daily basis and a few deputies who take over the responsibilities when necessary. This latter model is generally more suitable for small NGIs.