Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-InSPIRE:SA1.7-QR5

From EGIWiki
(Redirected from SA1.7-QR5)
Jump to navigation Jump to search
EGI Inspire Main page


Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports



1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
16-05-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=480 CODOC F2F https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=480
20-06-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=504 CODOC F2F https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=504
25-07-2011 https://www.egi.eu/indico/conferenceDisplay.py?confId=553 CODOC F2F https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=553
01-06-2011 Network Update
weekly https://www.egi.eu/indico/categoryDisplay.py?categId=27 shopping list meeting TPM https://www.egi.eu/indico/categoryDisplay.py?categId=27

2. Main Achievements

Grid Oversight

ROD teams news letter

This quarter we have published a ROD teams newsletter in may and june. The rationale behind the newsletter is descibed in the QR4 report.

ROD session at EGI TF

We will organise a ROD teams session at the EGI TF in Lyon. We have come up with a programme that will be published on the web soon. In the ROD teams session we have an overview of the Grid oversight work as well as we will discuss some more detailed topics. We aim to have a presentation on the operations portal and we will conclude the session with a presentation off the result of a survey that we will soon publish.

Procedures

We have proposed a change to the COD escalation procedure which entails the removal of an unnecessary step. In the present situation when a site is unresponsive to a GGUS ticket issued after an alarm. Eventually the ROD team escallates the issue to COD who bounces it back to the NGI asking the NGI to try again to get into contact with the site. We deem this step to be unnecessary and we have proposed a change in the procedure that is under discussion at the moment.

Availability followup

  • There is a Nagios probe under development that is going to raise an alarm when a site's avaliability and/or reliability is below the 70%/75% threshold. The COD has provided input which was put into the RT ticket: https://rt.egi.eu/rt/Ticket/Display.html?id=289.
  • Recently, we discovered that in the availability en reliability metrics there were a substantial amount of UNKNOWN test results for individual sites but also for all sites in an entire NGI. Since UNKNOWN test results are not taken into account in the availability/reliability metrics, this will cloud the availability and reliability metrics. Currently this issue is under investigation.

ROD Team Responsiveness

In general RODs do their work well. When RODs do appear on the COD dashboard and get issued a ticket by the COD, then in most cases the responsiveness is also OK and the issue is resolved quickly. This holds for the new NGIs as well as the existing NGIs. However, there a small number NGIs that almost continuously populate the COD dashboard.

Tutorial Videos

The set of tutorial videos described in the QR4 report has been finished and may be found at:Grid_operations_oversight/ROD. The videos deal with the following subjects:

  • 2. Operations tools - brief introduction of operations tools which a ROD mamber needs to perform duties
  • 3. How to handle alarms - an instruction how to manage alarms on the Operations Portal (ticket creation from an alarm, closing and masking alarms)  
  • 4. How to handle tickets - an instruction how to manage tickets on the Operations Portal (ticket creation, updating and closing tickets)

TPM

Nothing in particular to report on this task except that the change of TPM shift between the Italian and German TPM teams take place at the same local time (regardless of summer or winter time), at 14:00.

Network Support

Further testing of HINTS has been carried out both in France and in Italy. Probes have been installed in Bologna at INFN CNAF, in Rome at INFN Roma Tre and GARR; in Toulouse and Paris. A development server has been set up in Paris and a production one is located in Rome. The test of the system has started and feedback has been provided to the developers. This will improve the quality of the tool, including the documentation. Cross registration of probes between the French and Italian sites has been carried out. Participation of the EGI NetSup coordination team to the HEPiX IPv6 Working Group has been organized and the first 2 meetings have been attended. A brain storming session with GEANT3 PERT has been organized and basic ideas have come out for fruitful collaboration. They will be reported by a forthcoming short document.

3. Issues and Mitigation

Issue Description Mitigation Description
Grid Oversight: None
TPM: None
Network Support: None

4. Plans for the next period

Grid Oversight

1. Continue investigation of the impact on operations support model related to new middlewares in EGI.

2. Continue the investigation on how to improve availability and reliability metrics.

3. Evaluation of upcoming new releases of the operational dashboard. In this respect we will continue to monitor the progress on RT ticket 289.

4. Review the ROD metrics

TPM

Nothing in particular to report except continuing the work as usual.

Network Support

Keep carrying out the testing and validation campaign for HINTS, pS DVD e2e Live DVD and NetJobs and present the tools in September at the EGI NetSup workshop OMB during the EGI TF in Lyon. Organize a questionnaire on IPv6 for the NGIs.