Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:SA1.7-QR6"

From EGIWiki
Jump to navigation Jump to search
(Created page with '__NOTOC__ = 1. Task Meetings = {| cellspacing="0" cellpadding="5" border="1" align="center" |- ! style="width: 25%;" | Date (dd/mm/yyyy) ! style="width: 25%;" | Url Indico A…')
 
Line 24: Line 24:
= 2. Main Achievements  =
= 2. Main Achievements  =


== Grid Oversight ==
'''ROD teams news letter'''
This quarter we have published a ROD teams newsletter in may and june. The rationale behind the newsletter is descibed in the QR4 report.
'''ROD session at EGI TF'''
We will organise a ROD teams session at the EGI TF in Lyon. We have come up with a programme that will be published on the web soon. In the ROD teams session we have an overview of the Grid oversight work as well as we will discuss some more detailed topics. We aim to have a presentation on the operations portal and we will conclude the session with a presentation off the result of a survey that we will soon publish.
'''Procedures'''
We have proposed a change to the COD escalation procedure which entails the removal of an unnecessary step. In the present situation when a site is unresponsive to a GGUS ticket issued after an alarm. Eventually the ROD team escallates the issue to COD who bounces it back to the NGI asking the NGI to try again to get into contact with the site. We deem this step to be unnecessary and we have proposed a change in the procedure that is under discussion at the moment.
'''Availability followup'''
*There is a Nagios probe under development that is going to raise an alarm when a site's avaliability and/or reliability is below the 70%/75% threshold. The COD has provided input which was put into the RT ticket: https://rt.egi.eu/rt/Ticket/Display.html?id=289.
*Recently, we discovered that in the availability en reliability metrics there were a substantial amount of UNKNOWN test results for individual sites but also for all sites in an entire NGI. Since UNKNOWN test results are not taken into account in the availability/reliability metrics, this will cloud the availability and reliability metrics. Currently this issue is under investigation.


= 3. Issues and Mitigation  =
= 3. Issues and Mitigation  =

Revision as of 17:23, 1 November 2011


1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome


2. Main Achievements

Grid Oversight

ROD teams news letter

This quarter we have published a ROD teams newsletter in may and june. The rationale behind the newsletter is descibed in the QR4 report.

ROD session at EGI TF

We will organise a ROD teams session at the EGI TF in Lyon. We have come up with a programme that will be published on the web soon. In the ROD teams session we have an overview of the Grid oversight work as well as we will discuss some more detailed topics. We aim to have a presentation on the operations portal and we will conclude the session with a presentation off the result of a survey that we will soon publish.

Procedures

We have proposed a change to the COD escalation procedure which entails the removal of an unnecessary step. In the present situation when a site is unresponsive to a GGUS ticket issued after an alarm. Eventually the ROD team escallates the issue to COD who bounces it back to the NGI asking the NGI to try again to get into contact with the site. We deem this step to be unnecessary and we have proposed a change in the procedure that is under discussion at the moment.

Availability followup

  • There is a Nagios probe under development that is going to raise an alarm when a site's avaliability and/or reliability is below the 70%/75% threshold. The COD has provided input which was put into the RT ticket: https://rt.egi.eu/rt/Ticket/Display.html?id=289.
  • Recently, we discovered that in the availability en reliability metrics there were a substantial amount of UNKNOWN test results for individual sites but also for all sites in an entire NGI. Since UNKNOWN test results are not taken into account in the availability/reliability metrics, this will cloud the availability and reliability metrics. Currently this issue is under investigation.

3. Issues and Mitigation

Issue Description Mitigation Description

4. Plans for the next period