|EGI Inspire Main page|
|Inspire reports menu:||Home •||SA1 weekly Reports •||SA1 Task QR Reports •||NGI QR Reports •||NGI QR User support Reports|
1. Task Meetings
2. Main Achievements
ROD teams newsletter
This quarter we have published a ROD teams newsletter in February and April. The rationale behind the newsletter is descibed in the SA1.7-QR4 report.
ROD performance index
For background information on this, have a look at SA1.7-QR6, section RP OLA and ROD metrics. Since October we have been asking all NGIs above 10 items in the COD dashboard duting one month about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation. Currently we are continuing to collect and investigate these metrics and also to correlate this with other metrics and see if we can draw some conclusions from them.
Non-OK Alarms Followup
For background information on this, have a look at SA1.7-QR6, section Non-OK Alarms Followup. We have continued this activity in Q8.
See SA1.7-QR6 for more background information. There has been a phone conf with jra1 (https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=716) where the availability probe has been discussed. There will be a probe that meets the following specs:
- The probe only measures availability
- The probe computes the availability 30 days in the past
- The probe returns a WARNING when: 70%>= availability <=75%
- The probe returns a CRITICAL when: availability <70%
We are waiting for this probe to be available for testing.
Apart from this we have continued the followup of this in the traditional way by means of GGUS tickets in Q8.
Followup NGI Core Services availability
We have issued GGUS tickets to NGIs that do not meet the 99% availability requirement. In februari we have started up this activity. At first we have only submitted GGUS tickets to NGIs informing the of their low top-level BDII availability. The last month we have also pointed them to documentation on how to setup a reliable top-level BDII service. We hope this helps to reduce the number of NGIs gettig these kind of tickets.
Two infos, which should be regarded in the TPM’s daily work:
- We I would like to inform you that the Turkish NGI accepted to provide temporary operational support to Azerbaijan for the coming 12 months. This means that basic operational problems and tickets originated by site managers from Azerbaijan, have to be addressed by NGI_TR. Most of the tickets in GGUS are originated by Parvin Aliyeva (the site manager has a cern e-mail account). The site manager was instructed to contact NGI_TR to arrange the details of the operational support that will be provided by NGI_TR. For the moment I'm aware of a single site that is being configured.
- EGI requested to NGIs to configure their Nagioses to probe the glexec capabilities of the CEs accepting pilot jobs. One of the steps for the nagios administrators is to request the "/pilot" role for the VO ops. In the next couple of weeks or so, if in a GGUS ticket a user is asking for the '/pilot' role (pilot role is a VO role) without specifying any VO, is very likely that this ticket has to be assigned to "VOsupport, ops". New support units were added in the recent past:
- 3rd level EMI Support unit -- caNL
- 3rd level EMI Support unit -- EMIR
- EMI support unit for WNodes Support units renamed in the last quarter are 'GridView/Availabilities' SU/FE to just 'GridView' And these VOs were integrated as new support units VOs “snoplus.snolab.ca ” and "vo.cta.in2p3.fr " Renamed VO "mice.gridpp.ac.uk" to "mice". 853 total number of submitted tickets TPM resolved 45 tickets
- Deployed gLite CE and ARC CE on IPv6 testbed
- Set up new netsup VO and corresponding VOMS server
- Installed new HINTS server and new probes in Rome at GARR
- New probe RPMs available for ia64 architecture for HINTS for SL6
- Fully recovered tools/servers from security accident at GARR
- Started process to aim at integration of HINTS within the pS-MDM packages / on-going discussions
3. Issues and Mitigation
|Issue Description||Mitigation Description|
|Grid Oversight: Unresponsive NGIs with respect to NGI core services followup tickets||We will discuss a procedure how to deal with this with the COO|
|Grid Oversight: NGI creation procedure getting stuck on NGI unresponsiveness||We will discuss a procedure how to deal with this with the COO|
|Network Support: UNICORE middleware testing in IPv6 not assigned so far.|
4. Plans for the next period
The plans for the next period is to proceed with the current activities and come up with a proposal to include test resources in the infrastructure.
- Keep consolidating HINTS - possibly providing RPMs for Server and Probes for SL5 (to be evaluated)
- Continue the deployment /dissemination campaing for HINTS
- Pursue possible integration/inclusion of HINTS within the pS-MDM packages
- Provide detailed test reports on CE/WNs/ workload tests using gLite and ARC
- Consider merging of HEPiX and EGI efforts on the IPv6 testbed at some point