EGI-InSPIRE:SA1.7-QR7
1. Task Meetings
2. Main Achievements
Grid Oversight
ROD teams newsletter
This quarter we have published a ROD teams newsletter in November, December and January. The rationale behind the newsletter is descibed in the SA1.7-QR4 report.
ROD performance index
For background information on this, have a look at SA1.7-QR6, section RP OLA and ROD metrics. Since October we have been asking all NGIs above 10 items in the COD dashboard duting one month about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation. The good news is that we have seen a continuous decling in the amount of items in the COD dashboard.
Non-OK Alarms Followup
For background information on this, have a look at SA1.7-QR6, section Non-OK Alarms Followup. We have continued this activity in QR7.
Availability followup
See SA1.7-QR6 for more background information. There has been a phone conf (https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=716) where the availability probe has been discussed. There will be a probe that meets the following specs:
- The probe only measures availability
- The probe computes the availability 30 days in the past
- The probe returns a WARNING when: 70%>= availability <=75%
- The probe returns a CRITICAL when: availability <70%
A test version of the probe will be available in March.
Unknown Followup
See SA1.7-QR6 for more background information.
TPM
The new SU “MPI User Support” the problem of ticket bouncing back to TPM because of lack of a proper support unit should be solved.
Instructions for the TPM how to use of the SU “Operations”: This SU is meant for managerial problems that concern operations and is not a catch-all: The purpose of this SU is to provide a contact with the EGI.eu team that coordinates EGI operations for any technical and operational matter and to handle requests from Resource Centers and Resource Infrastructure Providers that are willing to be integrated into the EGI production infrastructure. As well it is there to notify any operational issues that is general and does not concern a specific Resource Infrastructure or Grid middleware deployed. Middleware related issues that cannot be handled by TPM must be assigned to the DMSU. This includes configuration and documentation problems with the middleware. For deployment problems that concern a vast majority of the production sites, for which it is infeasible to open an individual ticket to every site/NGI, TPM can assign the ticket to the Operations SU. The SU “Operations” staff can offer coordination of the handling of such incidents, when the scale cannot be managed by TPM.
Network Support
3. Issues and Mitigation
Issue Description | Mitigation Description |
---|---|