Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:SA1.7-QR7"

From EGIWiki
Jump to navigation Jump to search
Line 66: Line 66:
For background information on this, have a look at [[SA1.7-QR6]], section '''Non-OK Alarms Followup'''.
For background information on this, have a look at [[SA1.7-QR6]], section '''Non-OK Alarms Followup'''.
We have continued this activity in QR7.
We have continued this activity in QR7.
'''Availability followup'''
*There is a Nagios probe under development that is going to raise an alarm when a site's avaliability and/or reliability is below the 70%/75% threshold. The COD has provided input which was put into the RT ticket: https://rt.egi.eu/rt/Ticket/Display.html?id=289. We have organised a phone conf on the requirements that this probe should fulfill. We have done a new proposal in this field and hope to get aan agreement from all parties involved so this issues can make some progress.
'''Unknown Followup'''
*Recently, we discovered that in the availability en reliability metrics there were a substantial amount of UNKNOWN test results for individual sites but also for all sites in an entire NGI. Since UNKNOWN test results are not taken into account in the availability/reliability metrics, this will cloud the availability and reliability metrics. Currently this issue is under investigation. More information on this topic may be found at: https://wiki.egi.eu/wiki/Grid_operations_oversight/Unknown_issue


== TPM ==
== TPM ==

Revision as of 19:16, 5 February 2012

1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
17-11-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=688 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=688
21-11-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=695 COD-EGI.eu https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=1&confId=695
23-11-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=701 unknown meeting https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=701
05-12-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=703 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=703
15-12-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=708 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=708
19-12-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=716 COD (availability probe) https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=716
19-01-2012 https://www.egi.eu/indico/conferenceDisplay.py?confId=803 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=803

2. Main Achievements

Grid Oversight

ROD teams newsletter

This quarter we have published a ROD teams newsletter in November, December and January. The rationale behind the newsletter is descibed in the SA1.7-QR4 report.

ROD performance index

For background information on this, have a look at SA1.7-QR6, section RP OLA and ROD metrics. Since October we have been asking all NGIs above 10 items in the COD dashboard duting one month about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation. The good news is that we have seen a continuous decling in the amount of items in the COD dashboard.

Non-OK Alarms Followup

For background information on this, have a look at SA1.7-QR6, section Non-OK Alarms Followup. We have continued this activity in QR7.

Availability followup

  • There is a Nagios probe under development that is going to raise an alarm when a site's avaliability and/or reliability is below the 70%/75% threshold. The COD has provided input which was put into the RT ticket: https://rt.egi.eu/rt/Ticket/Display.html?id=289. We have organised a phone conf on the requirements that this probe should fulfill. We have done a new proposal in this field and hope to get aan agreement from all parties involved so this issues can make some progress.

Unknown Followup

  • Recently, we discovered that in the availability en reliability metrics there were a substantial amount of UNKNOWN test results for individual sites but also for all sites in an entire NGI. Since UNKNOWN test results are not taken into account in the availability/reliability metrics, this will cloud the availability and reliability metrics. Currently this issue is under investigation. More information on this topic may be found at: https://wiki.egi.eu/wiki/Grid_operations_oversight/Unknown_issue

TPM

Network Support

3. Issues and Mitigation

Issue Description Mitigation Description

4. Plans for the next period

Grid Oversight

TPM

Network Support