Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:SA1.7-QR6"

From EGIWiki
Jump to navigation Jump to search
Line 92: Line 92:


== Network Support ==
== Network Support ==
TBD
The main task meeting has been held in Lyon during the EGI Technical Forum, at the Network Support Operations workshop (Sept 19, 2011).The agenda of the Net Sup workshop is at https://www.egi.eu/indico/conferenceTimeTable.py?confId=452#20110919 ) ; Additionally, as in the previous quarter, during this quarter task meetings have been carried out by Video Conference and phone calls. The main phone meeting has been the Network Support coordination Video Conference on August 18, 2011. The agenda of the VC meeting has been:
1. Update on the on-going activities on the three tools we intend to present at the Net Sup Workshop at the EGI Tech Forum at the end of September (Lyon, France): - HINTS - PerfSONAR live CD for e2eMON - NetJobs
2. Questionnaire for NRENs about IPv6
3. Collaboration with HEPiX IPv6 WG
4. IPv6 strategy
Another VCONF has been held on October 12, 2011 around IPv6 activities (GARR-SWITCH). The agenda for the VC has been focused on the next steps around IPv6, jointly to be carried out by SWITCH and GARR.


= 3. Issues and Mitigation  =
= 3. Issues and Mitigation  =

Revision as of 14:10, 7 November 2011


1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
30-8-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=553 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=573
19-9-2011 https://www.egi.eu/indico/conferenceTimeTable.py?confId=452#20110919 Network Support workshop at EGI TF https://www.egi.eu/indico/conferenceTimeTable.py?confId=452#20110919
22-9-2011 https://www.egi.eu/indico/contributionDisplay.py?contribId=35&confId=452 Grid Oversight session at EGI TF https://www.egi.eu/indico/contributionDisplay.py?contribId=35&confId=452
22-9-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=682 COD F2F https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=682
19-10-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=674 Nagios A/R probe https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=674
26-10-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=677 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=677
18-8-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=574 NetSup Status Update VideoConference https://www.egi.eu/indico/conferenceDisplay.py?confId=574
weekly https://www.egi.eu/indico/categoryDisplay.py?categId=27 shopping list meeting TPM https://www.egi.eu/indico/categoryDisplay.py?categId=27

2. Main Achievements

Grid Oversight

ROD teams news letter

This quarter we have published a ROD teams newsletter in August and October. The rationale behind the newsletter is descibed in the QR4 report.

ROD teams questionnaire

Some time ago we have send out a questionnaire to the ROD teams. The reason for this was that we wanted to have their opinion on how they perceive their work. We have asked their opinion on the operational tools, documentation, video tutorials, and this newsletter etcetera. We have got no less that 44 responses which we found very valuable. From 12 NGIs we have got more than one response. The outcome was discussed during the Grid Oversight session at the EGI Tech Forum (https://www.egi.eu/indico/getFile.py/access?contribId=35&resId=0&materialId=slides&confId=452).

ROD session at EGI TF

In this edition of the EGI Tech Forum we have organised a 1.5 hour session where we have had three topics. There was a presentation of the COD staff on the new simplified escalation procedure that came into effect as of October 1st. Also the ROD metrics were discussed and its incorporation in the OLA. This topic caused a fair amount of discussion. The outcome of this discussion was that these metrics will continuously be collected and published in the ROD newsletter. Later on we will restart the discussion on how this should enter the OLA. Finally, results were presented of an investigation of the reason for closing alarms in non-OK status and some tips were given on how to do this properly. Next, there was a presentation by COD staff on the results of the survey that we have held among our RODs about the work that they do. There were questions about the operational tools, documentation etcetera. In any case, the COD has provided their feedback on this in this slot. A good thing was that the operational tools developers Cyril l’Orphelin and Emir Imamagic were in the audience so a part of the slot became a Q&A sessions between users of the operational tools and developers which was very useful.

Finally, Cyril ‘lOrphelin gave an interesting presentation on the recent developments and improvements of the dashboard. There is going to be a security dashboard to detect and inform sites about security issues. Further there is also going to be a VO-oriented dashboard. Links to the presentations may be found at: https://www.egi.eu/indico/contributionDisplay.py?contribId=35&confId=452

RP OLA and ROD metrics

For last few months COD team was working within OLA task force to create Resource Provider OLA which will contain obligations between EGI and NGI. This OLA has been approved at the OMB on October 25th 2011. One of the actions was to define ROD metric base on which EGI will check if ROD service is properly delivered by NGIs. During our COD session on Technical Forum in Lyon we presented our proposal for this metric – please read presentation (from page number 9). As a result of the discussion we decided to provide first in a monthly basis simulation of this metric to check what the current status is. We decided to set initially the threshold at the level of 10 items. It means that since October we are going to ask all NGIs above 10 items about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation.

Non-OK Alarms Followup

In general alarms should not be closed in non-OK status. However, in some cases it is inevitable. Closing alarms in non-OK status is allowed but a reason for doing so should be given by the ROD teams in question. We have collected information on reasons why ROD teams close alarms in non-OK status for the months August and September in identify if the reasons given were valid or if there are some deficiencies in teh operational tools or there is some lack of training or documentation/information. NGIs were identified that were closing alarms in non-OK status because of invalid or insufficient reasons. Those NGIs were approached by the COD team.

Availability followup

  • There is a Nagios probe under development that is going to raise an alarm when a site's avaliability and/or reliability is below the 70%/75% threshold. The COD has provided input which was put into the RT ticket: https://rt.egi.eu/rt/Ticket/Display.html?id=289. We have organised a phone conf on the requirements that this probe should fulfill. We have done a new proposal in this field and hope to get aan agreement from all parties involved so this issues can make some progress.
  • Recently, we discovered that in the availability en reliability metrics there were a substantial amount of UNKNOWN test results for individual sites but also for all sites in an entire NGI. Since UNKNOWN test results are not taken into account in the availability/reliability metrics, this will cloud the availability and reliability metrics. Currently this issue is under investigation. More information on this topic may be found at: https://wiki.egi.eu/wiki/Grid_operations_oversight/Unknown_issue

Procedures

The new escalation procedure came into effect on october 1st. There have been no problems so far. This procedure is described at: https://wiki.egi.eu/wiki/PROC01

TPM

Nothing in particular to report on this task except that the change of TPM shift between the Italian and German TPM teams take place at the same local time (regardless of summer or winter time), at 14:00.

Network Support

The main task meeting has been held in Lyon during the EGI Technical Forum, at the Network Support Operations workshop (Sept 19, 2011).The agenda of the Net Sup workshop is at https://www.egi.eu/indico/conferenceTimeTable.py?confId=452#20110919 ) ; Additionally, as in the previous quarter, during this quarter task meetings have been carried out by Video Conference and phone calls. The main phone meeting has been the Network Support coordination Video Conference on August 18, 2011. The agenda of the VC meeting has been: 1. Update on the on-going activities on the three tools we intend to present at the Net Sup Workshop at the EGI Tech Forum at the end of September (Lyon, France): - HINTS - PerfSONAR live CD for e2eMON - NetJobs 2. Questionnaire for NRENs about IPv6 3. Collaboration with HEPiX IPv6 WG 4. IPv6 strategy Another VCONF has been held on October 12, 2011 around IPv6 activities (GARR-SWITCH). The agenda for the VC has been focused on the next steps around IPv6, jointly to be carried out by SWITCH and GARR.

3. Issues and Mitigation

Issue Description Mitigation Description
Grid Oversight: None
TPM: None
Network Support: None

4. Plans for the next period

Grid Oversight

1. Continue investigation of the impact on operations support model related to new middlewares in EGI.

2. Continue the investigation on how to improve availability and reliability metrics. In this respect we will continue to monitor the progress on RT ticket 289 where a request of formulated to create a nagios probe that measures availability and reliability.

3. Evaluation of upcoming new releases of the operational dashboard.

4. Continue reviewing the ROD metrics

TPM

Nothing in particular to report except continuing the work as usual.

Network Support

TBD