Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-InSPIRE:SA1.7-QR10

From EGIWiki
Jump to navigation Jump to search


1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
August 16th 2012 https://indico.egi.eu/indico/conferenceDisplay.py?confId=1144 COD meeting https://indico.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=1144
September 5th 2012 https://indico.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=1144 COD meeting https://indico.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=1147
September 21st 2012 https://indico.egi.eu/indico/contributionDisplay.py?sessionId=56&contribId=242&confId=1019 ROD session at EGI TF12 https://indico.egi.eu/indico/materialDisplay.py?contribId=242&sessionId=56&materialId=slides&confId=1019
October 3rd 2012 https://indico.egi.eu/indico/conferenceDisplay.py?confId=1171 COD meeting https://indico.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=1171
23rd October 2012 https://indico.egi.eu/indico/conferenceDisplay.py?confId=1200 COOCOD meeting

== Network Supprot

EGI TF 2012 F2F meeting 18 September 2012 https://indico.egi.eu/indico/sessionDisplay.py?sessionId=37&tab=contribs&confId=1019

F2F meeting beween the HINTS Team and PerfSONAR MDM Team in Erlangen, September 12, 2012


2. Main Achievements

Grid Oversight

Followup upgrades of unsupported software There were quite a large number of sites that were still running glite-3.1 and glite-3.2 software that is no longer supported. In this quarter a campaign was started to make these sites upgrade their services that run this software. COD has issued GGUS tickets to these sites and is following this up.

ROD teams newsletter

This quarter we have published a ROD teams newsletter in October. The rationale behind the newsletter is descibed in the SA1.7-QR4 report.

ROD performance index

For background information on this, have a look at SA1.7-QR6, section RP OLA and ROD metrics. Since October 2011 we have been asking all NGIs above 10 items in the COD dashboard duting one month about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation. Currently we are continuing to collect and investigate these metrics and also to correlate this with other metrics and see if we can draw some conclusions from them. It appears that the amount of issues in the COD dashboard is going down.

Availability followup

See SA1.7-QR6 for more background information. A probe measuring the availability and reliability of a site has been supplied to the ops portal developers and is now deployed. The algorithm of this probe is incorporated into the ops portal and it will now generated alarms when a site's availability and reliability is below 70%/75%. As a consequence, COD will stop the activity of monthly issuing GGUS tickets to these sites as of November 1st 2012.

Unknown Followup

See SA1.7-QR6 and SA1.7-QR6 for more background information. In Q10 we have continued this activity.

Followup NGI Core Services availability

We have issued GGUS tickets to NGIs that do not meet the 99% availability requirement. In februari 2012 we have started up this activity. At first we have only submitted GGUS tickets to NGIs informing the of their low top-level BDII availability.

OMB

We are busy developing a procedure to incorporate test resources into the EGI infrastructure and to identify possible changes to the operational tools.

EGI TF12

We have organised a session for ROD teams at EGI TF12 in Prague. There were 26 participants. Further we gave two presentations from COd in the Future of Ops session at EGI TF12.

COD F2F meeting We have organised a COD fact to face meeting. Topics for this meeting will be:

* activities for the remainder of EGI InSPIRE
* pilot resource allocation
* how to raise availability and reliability and can we rais it further?
* reporting, is the tooling for this sufficient and what can be improved?

Network Support

Tested in a preliminar way CREAM CE and DPM using IPv6 in 4 different network configurations. Set up workload components services in the IPv6 testbed. Started structuring a global IPv6 testbed for EGI. Restructured and made more usable the whole IPv6 wiki. Complete test of ARC CE.

HINTS further consolidated. Discussions with pS-MDM team on possible integration of probes still on going.

Software Support

The software support ran along the original processes while waiting for official approval by the project rewiew. On October 8, the main switch to the new process was done:

  • INFN handles the incoming tickets permanently (instead of rotating TPM shift).
  • Frequency of the "hands on tickets" meetings, where non-trivial issues are discussed collectively, was increased to twice a week.
  • KIT focuses on the defined ticket monitoring procedures as well as implementation of supporting procedures in GGUS

Despite minor issues are still to be clarified, the new process has been running for 4 weeks so far smoothly.

In the reporting period, 157 tickets were assigned to software support, out of those 48 (30%) were solved by the unit. This is a higher ratio wrt. previous numbers but due to high oscillations the statistical significance is questionable. Ticket solution time are 28/11 days (average/median), the reasons (external) for such high numbers were discussed in the previous report; due to the high vacation season, the average is even worse while the median remains the same.

3. Issues and Mitigation

Issue Description Mitigation Description
Grid Oversight: Unresponsivity NGI_ZA during NGI certification process We will propose to close the GGUS ticket and roll back all activities that have been carried out so far in this field.
Grid Oversight: Unresponsivity of some NGIs observed during followup activities So far NGIs seem to respond well to personal emails. In these emails NGIs are asked to include into their working habits to have a look at GGUS a few times a day.

Network Support

Need to clarify relationship with CERN site and further integrate sites into the global testbed. Need to dig into the issue of LRMS not working using IPv6.

Software Support

The issue of scattered documentation, reported previously, was addressed by agreeing with SA2 to integrate it into the UMD documentation.

Related issue of missing information flow software support to the EGI operations was discussed and appropriate communication channels (reporting at the operation meetings) defined.

4. Plans for the next period

Network Support

Further extend the global IPv6 testbed including new sites and services. Further report on outcomes on https://wiki.egi.eu/wiki/IPv6TestReports. Finalize discussion in HINTS-pS-MDM integration.

Software Support

Transformation to the new support process will be finalized by clarifying the remaining minor issues. Neither qualitative nor quantitative major changes in the "payload" work -- supporting the users and ticket resolution are expected.