|EGI Inspire Main page|
|Inspire reports menu:||Home •||SA1 weekly Reports •||SA1 Task QR Reports •||NGI QR Reports •||NGI QR User support Reports|
1. Task Meetings
|Date (dd/mm/yyyy)||Url Indico Agenda||Title||Outcome|
|weekly||https://www.egi.eu/indico/categoryDisplay.py?categId=27||shopping list meeting TPM||https://www.egi.eu/indico/categoryDisplay.py?categId=27|
2. Main Achievements
ROD teams news letter
This quarter we have published a ROD teams newsletter in may and june. The rationale behind the newsletter is descibed in the QR4 report.
ROD session at EGI TF
We will organise a ROD teams session at the EGI TF in Lyon. We have come up with a programme that will be published on the web soon. In the ROD teams session we have an overview of the Grid oversight work as well as we will discuss some more detailed topics. We aim to have a presentation on the operations portal and we will conclude the session with a presentation off the result of a survey that we will soon publish.
We have proposed a change to the COD escalation procedure which entails the removal of an unnecessary step. In the present situation when a site is unresponsive to a GGUS ticket issued after an alarm. Eventually the ROD team escallates the issue to COD who bounces it back to the NGI asking the NGI to try again to get into contact with the site. We deem this step to be unnecessary and we have proposed a change in the procedure that is under discussion at the moment.
- There is a Nagios probe under development that is going to raise an alarm when a site's avaliability and/or reliability is below the 70%/75% threshold. The COD has provided input which was put into the RT ticket: https://rt.egi.eu/rt/Ticket/Display.html?id=289.
- Recently, we discovered that in the availability en reliability metrics there were a substantial amount of UNKNOWN test results for individual sites but also for all sites in an entire NGI. Since UNKNOWN test results are not taken into account in the availability/reliability metrics, this will cloud the availability and reliability metrics. Currently this issue is under investigation.
ROD Team Responsiveness
In general RODs do their work well. When RODs do appear on the COD dashboard and get issued a ticket by the COD, then in most cases the responsiveness is also OK and the issue is resolved quickly. This holds for the new NGIs as well as the existing NGIs. However, there a small number NGIs that almost continuously populate the COD dashboard.
The set of tutorial videos described in the QR4 report has been finished and may be found at:Grid_operations_oversight/ROD. The videos deal with the following subjects:
- 1. How to become a ROD member - 7 steps which should be done to become a ROD member
- 2. Operations tools - brief introduction of operations tools which a ROD mamber needs to perform duties
- 3. How to handle alarms - an instruction how to manage alarms on the Operations Portal (ticket creation from an alarm, closing and masking alarms)
- 4. How to handle tickets - an instruction how to manage tickets on the Operations Portal (ticket creation, updating and closing tickets)
- 5. Issues escalated to COD - an introduction of cases which are escalated to COD and how to deal with
- 6. Operations portal tools - a brief introduction of the Operations Portal tools
Nothing in particular to report on this task except that the change of TPM shift between the Italian and German TPM teams take place at the same local time (regardless of summer or winter time), at 14:00.
Further testing of HINTS has been carried out both in France and in Italy. Probes have been installed in Bologna at INFN CNAF, in Rome at INFN Roma Tre and GARR; in Toulouse and Paris. A development server has been set up in Paris and a production one is located in Rome. The test of the system has started and feedback has been provided to the developers. This will improve the quality of the tool, including the documentation. Cross registration of probes between the French and Italian sites has been carried out. Participation of the EGI NetSup coordination team to the HEPiX IPv6 Working Group has been organized and the first 2 meetings have been attended. A brain storming session with GEANT3 PERT has been organized and basic ideas have come out for fruitful collaboration. They will be reported by a forthcoming short document.
3. Issues and Mitigation
|Issue Description||Mitigation Description|
|Grid Oversight: None|
|Network Support: None|
4. Plans for the next period
1. Continue investigation of the impact on operations support model related to new middlewares in EGI.
2. Continue the investigation on how to improve availability and reliability metrics.
3. Evaluation of upcoming new releases of the operational dashboard. In this respect we will continue to monitor the progress on RT ticket 289.
4. Review the ROD metrics
Nothing in particular to report except continuing the work as usual.
Keep carrying out the testing and validation campaign for HINTS, pS DVD e2e Live DVD and NetJobs and present the tools in September at the EGI NetSup workshop OMB during the EGI TF in Lyon. Organize a questionnaire on IPv6 for the NGIs.