Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:SA1.7-QR8"

From EGIWiki
Jump to navigation Jump to search
Line 66: Line 66:
We have issued GGUS tickets to NGIs that do not meet the 99% availability requirement. In februari we have started up this activity.
We have issued GGUS tickets to NGIs that do not meet the 99% availability requirement. In februari we have started up this activity.


== TPM ==
== TPM ==
 
Two infos, which should be regarded in the TPM’s daily work:
Two infos, which should be regarded in the TPM’s daily work:
1 We I would like to inform you that the Turkish NGI accepted to provide temporary operational support to Azerbaijan for the coming 12 months. This means that basic operational problems and tickets originated by site managers from Azerbaijan, have to be addressed by NGI_TR. Most of the tickets in GGUS are originated by Parvin Aliyeva (the site manager has a cern e-mail account). The site manager was instructed to contact NGI_TR to arrange the details of the operational support that will be provided by NGI_TR. For the moment I'm aware of a single site that is being configured.


2 EGI requested to NGIs to configure their Nagioses to probe the glexec capabilities of the CEs accepting pilot jobs. One of the steps for the nagios administrators is to request the "/pilot" role for the VO ops. In the next couple of weeks or so, if in a GGUS ticket a user is asking for the '/pilot' role (pilot role is a VO role) without specifying any VO, is very likely that this ticket has to be assigned to "VOsupport, ops".
#We I would like to inform you that the Turkish NGI accepted to provide temporary operational support to Azerbaijan for the coming 12 months. This means that basic operational problems and tickets originated by site managers from Azerbaijan, have to be addressed by NGI_TR. Most of the tickets in GGUS are originated by Parvin Aliyeva (the site manager has a cern e-mail account). The site manager was instructed to contact NGI_TR to arrange the details of the operational support that will be provided by NGI_TR. For the moment I'm aware of a single site that is being configured.
New support units were added in the recent past:
#EGI requested to NGIs to configure their Nagioses to probe the glexec capabilities of the CEs accepting pilot jobs. One of the steps for the nagios administrators is to request the "/pilot" role for the VO ops. In the next couple of weeks or so, if in a GGUS ticket a user is asking for the '/pilot' role (pilot role is a VO role) without specifying any VO, is very likely that this ticket has to be assigned to "VOsupport, ops". New support units were added in the recent past:
• NGI_UA 
 
3rd level EMI Support unit -- caNL 
*NGI_UA  
3rd level EMI Support unit -- EMIR 
*3rd level EMI Support unit -- caNL  
EMI support unit for WNodes
*3rd level EMI Support unit -- EMIR  
Support units renamed in the last quarter are 'GridView/Availabilities' SU/FE to just 'GridView' 
*EMI support unit for WNodes Support units renamed in the last quarter are 'GridView/Availabilities' SU/FE to just 'GridView'  And these VOs were integrated as new support units VOs “snoplus.snolab.ca ” and "vo.cta.in2p3.fr "  Renamed VO "mice.gridpp.ac.uk" to "mice". 853 total number of submitted tickets TPM resolved 45 tickets 
And these VOs were integrated as new support units
VOs “snoplus.snolab.ca ” and "vo.cta.in2p3.fr " 
Renamed VO "mice.gridpp.ac.uk" to "mice".
853 total number of submitted tickets
TPM resolved 45 tickets 


== Network Support  ==
== Network Support  ==

Revision as of 16:19, 3 May 2012

1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
23-02-2012 https://www.egi.eu/indico/conferenceDisplay.py?confId=827 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=827
22-03-2012 https://www.egi.eu/indico/conferenceDisplay.py?confId=963 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=963
17-04-2012 https://www.egi.eu/indico/conferenceDisplay.py?confId=1016 COD https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=1016

2. Main Achievements

Grid Oversight

ROD teams newsletter

This quarter we have published a ROD teams newsletter in February and April. The rationale behind the newsletter is descibed in the SA1.7-QR4 report.

ROD performance index

For background information on this, have a look at SA1.7-QR6, section RP OLA and ROD metrics. Since October we have been asking all NGIs above 10 items in the COD dashboard duting one month about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation. Currently we are continuing to collect and investigate these metrics and also to correlate this with other metrics and see if we can draw some conclusions from them.

Non-OK Alarms Followup

For background information on this, have a look at SA1.7-QR6, section Non-OK Alarms Followup. We have continued this activity in Q8.

Availability followup

See SA1.7-QR6 for more background information. There has been a phone conf with jra1 (https://www.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=716) where the availability probe has been discussed. There will be a probe that meets the following specs:

  • The probe only measures availability
  • The probe computes the availability 30 days in the past
  • The probe returns a WARNING when: 70%>= availability <=75%
  • The probe returns a CRITICAL when: availability <70%

We are waiting for this probe to be available for testing.

Apart from this we have continued the followup of this in the traditional way by means of GGUS tickets in Q8.

Unknown Followup

See SA1.7-QR6 and SA1.7-QR6 for more background information. In Q8 we have continued this activity.

Followup NGI Core Services availability

We have issued GGUS tickets to NGIs that do not meet the 99% availability requirement. In februari we have started up this activity.

TPM

Two infos, which should be regarded in the TPM’s daily work:

  1. We I would like to inform you that the Turkish NGI accepted to provide temporary operational support to Azerbaijan for the coming 12 months. This means that basic operational problems and tickets originated by site managers from Azerbaijan, have to be addressed by NGI_TR. Most of the tickets in GGUS are originated by Parvin Aliyeva (the site manager has a cern e-mail account). The site manager was instructed to contact NGI_TR to arrange the details of the operational support that will be provided by NGI_TR. For the moment I'm aware of a single site that is being configured.
  2. EGI requested to NGIs to configure their Nagioses to probe the glexec capabilities of the CEs accepting pilot jobs. One of the steps for the nagios administrators is to request the "/pilot" role for the VO ops. In the next couple of weeks or so, if in a GGUS ticket a user is asking for the '/pilot' role (pilot role is a VO role) without specifying any VO, is very likely that this ticket has to be assigned to "VOsupport, ops". New support units were added in the recent past:
  • NGI_UA  
  • 3rd level EMI Support unit -- caNL  
  • 3rd level EMI Support unit -- EMIR  
  • EMI support unit for WNodes Support units renamed in the last quarter are 'GridView/Availabilities' SU/FE to just 'GridView'  And these VOs were integrated as new support units VOs “snoplus.snolab.ca ” and "vo.cta.in2p3.fr "  Renamed VO "mice.gridpp.ac.uk" to "mice". 853 total number of submitted tickets TPM resolved 45 tickets 

Network Support

TBD

3. Issues and Mitigation

Issue Description Mitigation Description
Grid Oversight: Unresponsive NGIs with respect to NGI core services followup tickets We will discuss a procedure how to deal with this with the COO

4. Plans for the next period

Grid Oversight

The plans for the next period is to proceed with the current activities and come up with a proposal to include test resources in the infrastructure.

TPM

Network Support