Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-InSPIRE:SA1.7-QR6

From EGIWiki
Jump to navigation Jump to search


1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome


2. Main Achievements

Grid Oversight

ROD teams news letter

This quarter we have published a ROD teams newsletter in August and October. The rationale behind the newsletter is descibed in the QR4 report.

ROD teams questionnaire

Some time ago we have send out a questionnaire to you. The reason for this was that we wanted to have your on opinion on how you perceive your work. We would like to your opinion on the operational tools, documentation, video tutorials, and this newsletter etcetera. We have got no less that 44 responses which we found very valuable. From 12 NGIs we have got more than one response. The outcome was discussed during the Grid Oversight session at the EGI Tech Forum (https://www.egi.eu/indico/getFile.py/access?contribId=35&resId=0&materialId=slides&confId=452).

ROD session at EGI TF

In this edition of the EGI Tech Forum we have organised a 1.5 hour session where we have had three topics. There was a presentation of the COD staff on the new simplified escalation procedure that came into effect as of October 1st. Also the ROD metrics were discussed and its incorporation in the OLA. This topic caused a fair amount of discussion. The outcome of this discussion was that these metrics will continuously be collected and published in the ROD newsletter. Later on we will restart the discussion on how this should enter the OLA. Finally, results were presented of an investigation of the reason for closing alarms in non-OK status and some tips were given on how to do this properly. Next, there was a presentation by COD staff on the results of the survey that we have held among our RODs about the work that they do. There were questions about the operational tools, documentation etcetera. In any case, the COD has provided their feedback on this in this slot. A good thing was that the operational tools developers Cyril l’Orphelin and Emir Imamagic were in the audience so a part of the slot became a Q&A sessions between users of the operational tools and developers which was very useful.

Finally, Cyril ‘lOrphelin gave an interesting presentation on the recent developments and improvements of the dashboard. There is going to be a security dashboard to detect and inform sites about security issues. Further there is also going to be a VO-oriented dashboard. Links to the presentations may be found at: https://www.egi.eu/indico/contributionDisplay.py?contribId=35&confId=452

RP OLA and ROD metrics

For last few months COD team was working within OLA task force to create Resource Provider OLA which will contain obligations between EGI and NGI. This OLA has been approved at the OMB on October 25th 2011. One of the actions was to define ROD metric base on which EGI will check if ROD service is properly delivered by NGIs. During our COD session on Technical Forum in Lyon we presented our proposal for this metric – please read presentation (from page number 9). As a result of the discussion we decided to provide first in a monthly basis simulation of this metric to check what the current status is. We decided to set initially the threshold at the level of 10 items. It means that since October we are going to ask all NGIs above 10 items about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation.

Non-OK Alarms Followup In general alarms should not be closed in non-OK status. However, in some cases it is inevitable. Closing alarms in non-OK status is allowed but a reason for doing so should be given by the ROD teams in question. We have collected information on reasons why ROD teams close alarms in non-OK status for the months August and September in identify if the reasons given were valid or if there are some deficiencies in teh operational tools or there is some lack of training or documentation/information. NGIs were identified that were closing alarms in non-OK status because of invalid or insufficient reasons. Those NGIs were approached by the COD team.

Availability followup

  • There is a Nagios probe under development that is going to raise an alarm when a site's avaliability and/or reliability is below the 70%/75% threshold. The COD has provided input which was put into the RT ticket: https://rt.egi.eu/rt/Ticket/Display.html?id=289. We have organised a phone conf on the requirements that this probe should fulfill. We have done a new proposal in this field and hope to get aan agreement from all parties involved so this issues can make some progress.
  • Unknown issue

TPM

Network Support

3. Issues and Mitigation

Issue Description Mitigation Description

4. Plans for the next period

Grid Oversight

TPM

Network Support