From EGIWiki
Revision as of 17:34, 29 October 2012 by Ron (talk | contribs) (3. Issues and Mitigation)
Jump to: navigation, search

1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
August 16th 2012 COD meeting
September 5th 2012 COD meeting
September 21st 2012 ROD session at EGI TF12
October 3rd 2012 COD meeting
23rd October 2012 COOCOD meeting

2. Main Achievements

Grid Oversight

Followup upgrades of unsupported software There were quite a large number of sites that were still running glite-3.1 and glite-3.2 software that is no longer supported. In this quarter a campaign was started to make these sites upgrade their services that run this software. COD has issued GGUS tickets to these sites and is following this up.

ROD teams newsletter

This quarter we have published a ROD teams newsletter in October. The rationale behind the newsletter is descibed in the SA1.7-QR4 report.

ROD performance index

For background information on this, have a look at SA1.7-QR6, section RP OLA and ROD metrics. Since October 2011 we have been asking all NGIs above 10 items in the COD dashboard duting one month about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation. Currently we are continuing to collect and investigate these metrics and also to correlate this with other metrics and see if we can draw some conclusions from them. It appears that the amount of issues in the COD dashboard is going down.

Availability followup

See SA1.7-QR6 for more background information. A probe measuring the availability and reliability of a site has been supplied to the ops portal developers and is now deployed. The algorithm of this probe is incorporated into the ops portal and it will now generated alarms when a site's availability and reliability is below 70%/75%. As a consequence, COD will stop the activity of monthly issuing GGUS tickets to these sites as of November 1st 2012.

Unknown Followup

See SA1.7-QR6 and SA1.7-QR6 for more background information. In Q10 we have continued this activity.

Followup NGI Core Services availability

We have issued GGUS tickets to NGIs that do not meet the 99% availability requirement. In februari 2012 we have started up this activity. At first we have only submitted GGUS tickets to NGIs informing the of their low top-level BDII availability.


We are busy developing a procedure to incorporate test resources into the EGI infrastructure and to identify possible changes to the operational tools.


We have organised a session for ROD teams at EGI TF12 in Prague. There were 26 participants. Further we gave two presentations from COd in the Future of Ops session at EGI TF12.

COD F2F meeting We have organised a COD fact to face meeting. Topics for this meeting will be:

* activities for the remainder of EGI InSPIRE
* pilot resource allocation
* how to raise availability and reliability and can we rais it further?
* reporting, is the tooling for this sufficient and what can be improved?

3. Issues and Mitigation

Issue Description Mitigation Description
Grid Oversight: Unresponsivity NGI_ZA during NGI certification process We will propose to close the GGUS ticket and roll back all activities that have been carried out so far in this field.
Grid Oversight: Unresponsivity of some NGIs observed during followup activities So far NGIs seem to respond well to personal emails. In these emails NGIs are asked to include into their working habits to have a look at GGUS a few times a day.

4. Plans for the next period