Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-InSPIRE:SA1.7-QR8

From EGIWiki
Jump to navigation Jump to search
EGI Inspire Main page


Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports



1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
23-02-2012 https://www.egi.eu/indico/conferenceDisplay.py?confId=827 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=827
22-03-2012 https://www.egi.eu/indico/conferenceDisplay.py?confId=963 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=963
17-04-2012 https://www.egi.eu/indico/conferenceDisplay.py?confId=1016 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=1016

2. Main Achievements

Grid Oversight

ROD teams newsletter

This quarter we have published a ROD teams newsletter in February and April. The rationale behind the newsletter is descibed in the SA1.7-QR4 report.

ROD performance index

For background information on this, have a look at SA1.7-QR6, section RP OLA and ROD metrics. Since October we have been asking all NGIs above 10 items in the COD dashboard duting one month about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation. Currently we are continuing to collect and investigate these metrics and also to correlate this with other metrics and see if we can draw some conclusions from them.

Non-OK Alarms Followup

For background information on this, have a look at SA1.7-QR6, section Non-OK Alarms Followup. We have continued this activity in Q8.

Availability followup

See SA1.7-QR6 for more background information. There has been a phone conf with jra1 (https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=716) where the availability probe has been discussed. There will be a probe that meets the following specs:

  • The probe only measures availability
  • The probe computes the availability 30 days in the past
  • The probe returns a WARNING when: 70%>= availability <=75%
  • The probe returns a CRITICAL when: availability <70%

We are waiting for this probe to be available for testing.

Apart from this we have continued the followup of this in the traditional way by means of GGUS tickets in Q8.

Unknown Followup

See SA1.7-QR6 and SA1.7-QR6 for more background information. In Q8 we have continued this activity.

Followup NGI Core Services availability

We have issued GGUS tickets to NGIs that do not meet the 99% availability requirement. In februari we have started up this activity. At first we have only submitted GGUS tickets to NGIs informing the of their low top-level BDII availability. The last month we have also pointed them to documentation on how to setup a reliable top-level BDII service. We hope this helps to reduce the number of NGIs gettig these kind of tickets.

TPM

Two infos, which should be regarded in the TPM’s daily work:

  1. We I would like to inform you that the Turkish NGI accepted to provide temporary operational support to Azerbaijan for the coming 12 months. This means that basic operational problems and tickets originated by site managers from Azerbaijan, have to be addressed by NGI_TR. Most of the tickets in GGUS are originated by Parvin Aliyeva (the site manager has a cern e-mail account). The site manager was instructed to contact NGI_TR to arrange the details of the operational support that will be provided by NGI_TR. For the moment I'm aware of a single site that is being configured.
  2. EGI requested to NGIs to configure their Nagioses to probe the glexec capabilities of the CEs accepting pilot jobs. One of the steps for the nagios administrators is to request the "/pilot" role for the VO ops. In the next couple of weeks or so, if in a GGUS ticket a user is asking for the '/pilot' role (pilot role is a VO role) without specifying any VO, is very likely that this ticket has to be assigned to "VOsupport, ops". New support units were added in the recent past:
  • NGI_UA  
  • 3rd level EMI Support unit -- caNL  
  • 3rd level EMI Support unit -- EMIR  
  • EMI support unit for WNodes Support units renamed in the last quarter are 'GridView/Availabilities' SU/FE to just 'GridView'  And these VOs were integrated as new support units VOs “snoplus.snolab.ca ” and "vo.cta.in2p3.fr "  Renamed VO "mice.gridpp.ac.uk" to "mice". 853 total number of submitted tickets TPM resolved 45 tickets 

Network Support 

  • Deployed gLite CE and ARC CE on IPv6 testbed
  • Set up new  netsup VO  and corresponding VOMS server
  • Installed new HINTS server and new probes in Rome at GARR
  • New probe RPMs  available for ia64 architecture for HINTS for SL6
  • Fully recovered tools/servers  from security accident at GARR
  • Started process to aim at integration of HINTS within the pS-MDM packages / on-going discussions

3. Issues and Mitigation

Issue Description Mitigation Description
Grid Oversight: Unresponsive NGIs with respect to NGI core services followup tickets We will discuss a procedure how to deal with this with the COO
Grid Oversight: NGI creation procedure getting stuck on NGI unresponsiveness We will discuss a procedure how to deal with this with the COO
Network Support: UNICORE middleware testing in IPv6 not assigned so far.

4. Plans for the next period

Grid Oversight

The plans for the next period is to proceed with the current activities and come up with a proposal to include test resources in the infrastructure.

TPM

Network Support

  • Keep consolidating HINTS - possibly providing RPMs for Server and Probes for SL5 (to be evaluated)
  • Continue the deployment /dissemination campaing for HINTS
  • Pursue possible integration/inclusion of HINTS within the pS-MDM packages
  • Provide detailed test reports on CE/WNs/ workload tests using gLite and ARC
  • Consider merging of HEPiX and EGI efforts on the IPv6 testbed at some point