Difference between revisions of "EGI-InSPIRE:SA1.7-QR11"

From EGIWiki
Jump to: navigation, search
(2. Main Achievements)
 
(21 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{Template:Op menubar}}
+
{{Template:EGI-Inspire menubar}}
 +
 
 
{{Template:Inspire_reports_menubar}}
 
{{Template:Inspire_reports_menubar}}
 
{{TOC_right}}
 
{{TOC_right}}
[[Category:SA1 Task QR Reports]]
 
 
= 1. Task Meetings = <!--
 
= 1. Task Meetings = <!--
 
Notes. Report here all task-specific meetings held. This includes (a) face-to-face meetings and (b) phone meetings. Make sure that for all task meetings participants are ALWAYS recorded either on indico from the registrants’ list, or in the minutes.  
 
Notes. Report here all task-specific meetings held. This includes (a) face-to-face meetings and (b) phone meetings. Make sure that for all task meetings participants are ALWAYS recorded either on indico from the registrants’ list, or in the minutes.  
Line 15: Line 15:
 
! style="width: 50%" | Outcome
 
! style="width: 50%" | Outcome
 
|-
 
|-
| ...
+
|3-1-2013
| ....
+
|https://indico.egi.eu/indico/conferenceDisplay.py?confId=1287
| ...
+
|COD
| ...
+
|https://indico.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=1287
 +
|-
 +
|18-12-2012
 +
|https://indico.egi.eu/indico/conferenceDisplay.py?confId=1100
 +
|OMB
 +
|https://indico.egi.eu/indico/getFile.py/access?contribId=4&resId=0&materialId=slides&confId=1100
 +
|-
 +
| 8/9-11-2012
 +
| https://indico.egi.eu/indico/conferenceDisplay.py?confId=1243
 +
| COD F2F
 +
| https://indico.egi.eu/indico/getFile.py/access?resId=2&materialId=minutes&confId=1243
 
|-
 
|-
 
|}
 
|}
Line 29: Line 39:
  
 
'''Followup upgrades of unsupported software'''
 
'''Followup upgrades of unsupported software'''
There were quite a large number of sites that were still running glite-3.1 and glite-3.2 software that is no longer supported. In this quarter a campaign was started to make these sites upgrade their services that run this software. COD has issued GGUS tickets to these sites and is following this up.
+
There were quite a large number of sites that were still running glite-3.1 and glite-3.2 software that is no longer supported. Last quarter a campaign was started to make these sites upgrade their services that run this software. This campaign was continued this quarter. COD has requested RODs to issued GGUS tickets to these sites and is following this up.
  
 
'''ROD teams newsletter'''
 
'''ROD teams newsletter'''
  
This quarter we have published a ROD teams newsletter in October. The rationale behind the newsletter is descibed in the [[SA1.7-QR4]] report.
+
This quarter we have published a ROD teams newsletter in Januaryr. The rationale behind the newsletter is descibed in the [[SA1.7-QR4]] report.
  
 
'''ROD performance index'''
 
'''ROD performance index'''
  
 
For background information on this, have a look at [[SA1.7-QR6]], section '''RP OLA and ROD metrics'''.
 
For background information on this, have a look at [[SA1.7-QR6]], section '''RP OLA and ROD metrics'''.
Since October 2011 we have been asking all NGIs above 10 items in the COD dashboard duting one month about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation. Currently we are continuing to collect and investigate these metrics and also to correlate this with other metrics and see if we can draw some conclusions from them. It appears that the amount of issues in the COD dashboard is going down.
+
Since October 2011 we have been asking all NGIs above 10 items in the COD dashboard duting one month about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation. Currently we are continuing to collect and investigate these metrics and also to correlate this with other metrics and see if we can draw some conclusions from them.  
  
 
'''Availability followup'''
 
'''Availability followup'''
  
See [[SA1.7-QR6]] for more background information. A probe measuring the availability and reliability of a site has been supplied to the ops portal developers and is now deployed. The algorithm of this probe is incorporated into the ops portal and it will now generated alarms when a site's availability and reliability is below 70%/75%. As a consequence, COD will stop the activity of monthly issuing GGUS tickets to these sites as of November 1st 2012.
+
See [[SA1.7-QR6]] for more background information. COD has issued GGUS tickets to sites that are below 70% availability for more than three consecutive months that are eligible for suspension.
  
 
'''Unknown Followup'''
 
'''Unknown Followup'''
  
See [[SA1.7-QR6]] and [[SA1.7-QR6]] for more background information. In Q10 we have continued this activity.
+
See [[SA1.7-QR6]] and [[SA1.7-QR6]] for more background information. In Q11 we have continued this activity. In addition, we have started discussions with the SAM nagios team to have a nagios probe that will raise alarms on the operations dashboard when the unknown percentage is higher than a certain threshold. These discussions are nearly completed.
  
 
'''Followup NGI Core Services availability'''
 
'''Followup NGI Core Services availability'''
  
We have issued GGUS tickets to NGIs that do not meet the 99% availability requirement. In februari 2012 we have started up this activity. At first we have only submitted GGUS tickets to NGIs informing the of their low top-level BDII availability.
+
We have issued GGUS tickets to NGIs that do not meet the 99% availability requirement. In februari 2012 we have started up this activity. At first we have only submitted GGUS tickets to NGIs informing the of their low top-level BDII availability. This activity has been continued in this quarter.
 +
 
 +
'''Review of certification procedures etc'''
 +
 
 +
We are busy developing a procedure to incorporate test resources into the EGI infrastructure, review the certification procedures and to identify possible changes to the operational tools. This discussion is now finished and a presentation about the outcome has been given in the OMB of december, https://indico.egi.eu/indico/getFile.py/access?contribId=4&resId=0&materialId=slides&confId=1100.
 +
 
 +
 
 +
'''COD F2F meeting'''
 +
 
 +
In november we have had a COD f2f meeting. The minutes may be found at: https://indico.egi.eu/indico/getFile.py/access?resId=2&materialId=minutes&confId=1243.
 +
 
 +
'''Plan for 2013'''
  
'''OMB'''
+
As a result of the COD F2F meeting we have written a plan about in what direction the activity should move. These may be found at: https://documents.egi.eu/public/ShowDocument?docid=1529.
  
We are busy developing a procedure to incorporate test resources into the EGI infrastructure and to identify possible changes to the operational tools.
+
== Software support ==
  
'''EGI TF12'''
+
The adapted software support process, as established at the end of PQ10, run smoothly,
 +
without any noticable issues.
  
We have organised a session for ROD teams at  EGI TF12 in Prague. There were 26 participants. Further we gave two presentations from COd in the Future of Ops session at EGI TF12.
+
In this quarter, 172 tickets were identified to be software issue, and 52 (30%) were solved.
 +
The absolute number is higher wrt. PQ10 but comparable, percentage
 +
of solved tickets remains the same, indicating stable ratio of software
 +
defects (ie. the tickets that are reassigned to 3rd line).
  
'''COD F2F meeting'''
+
Ticket solution times (average/median) are 19/4 days, are slightly better
 +
than preceeding quarters (however, these numbers tend to oscillate
 +
considerably, depending on the actual tickets solved), with the same external
 +
reasons for "long tail" of the distribution, yielding rather high average
 +
times.
  
 
= 3. Issues and Mitigation = <!-- fill the table below
 
= 3. Issues and Mitigation = <!-- fill the table below
Line 70: Line 99:
 
! scope="col" | Mitigation Description
 
! scope="col" | Mitigation Description
 
|-
 
|-
|  
+
| Grid Oversight: None
|  
+
| None
 +
|-
 +
| Software Support: None
 +
|
 
|-
 
|-
 
|  
 
|  
Line 78: Line 110:
  
 
= 4. Plans for the next period = <!-- provide your text below. PLEASE PROVIDE TEXT IN A GOOD EDITED FORM (NO BULLET LISTS OF SHORT ITEMS THAT REQUIRE EXPANSION WHEN INSERTED IN A REPORT) -->
 
= 4. Plans for the next period = <!-- provide your text below. PLEASE PROVIDE TEXT IN A GOOD EDITED FORM (NO BULLET LISTS OF SHORT ITEMS THAT REQUIRE EXPANSION WHEN INSERTED IN A REPORT) -->
 +
We will continue the activities that we already doing. Further we are going to proceed with carrying out the plan outlined in https://indico.egi.eu/indico/getFile.py/access?contribId=4&resId=0&materialId=slides&confId=1100 and startup the plans described in https://documents.egi.eu/public/ShowDocument?docid=1529.
 +
 +
== Software support ==
 +
 +
We will focus on implementation of the agreed ticket followup process, which is still not fully in place for the low-priority tickets.
 +
 +
We will discuss and work on the model of the software support in the "post May 2014" era, aligned with the foreseen model of community platform integrators.
 +
 +
For the main work of the unit, triage and resolution of the incoming tickets, no major changes are foreseen.

Latest revision as of 19:11, 6 January 2015

EGI Inspire Main page


Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports



1. Task Meetings

Date (dd/mm/yyyy) Url Indico Agenda Title Outcome
3-1-2013 https://indico.egi.eu/indico/conferenceDisplay.py?confId=1287 COD https://indico.egi.eu/indico/getFile.py/access?resId=0&materialId=minutes&confId=1287
18-12-2012 https://indico.egi.eu/indico/conferenceDisplay.py?confId=1100 OMB https://indico.egi.eu/indico/getFile.py/access?contribId=4&resId=0&materialId=slides&confId=1100
8/9-11-2012 https://indico.egi.eu/indico/conferenceDisplay.py?confId=1243 COD F2F https://indico.egi.eu/indico/getFile.py/access?resId=2&materialId=minutes&confId=1243

2. Main Achievements

Grid Oversight

Followup upgrades of unsupported software There were quite a large number of sites that were still running glite-3.1 and glite-3.2 software that is no longer supported. Last quarter a campaign was started to make these sites upgrade their services that run this software. This campaign was continued this quarter. COD has requested RODs to issued GGUS tickets to these sites and is following this up.

ROD teams newsletter

This quarter we have published a ROD teams newsletter in Januaryr. The rationale behind the newsletter is descibed in the SA1.7-QR4 report.

ROD performance index

For background information on this, have a look at SA1.7-QR6, section RP OLA and ROD metrics. Since October 2011 we have been asking all NGIs above 10 items in the COD dashboard duting one month about the explanation through GGUS, what was the reason of such result and how do you plan to improve the situation. Currently we are continuing to collect and investigate these metrics and also to correlate this with other metrics and see if we can draw some conclusions from them.

Availability followup

See SA1.7-QR6 for more background information. COD has issued GGUS tickets to sites that are below 70% availability for more than three consecutive months that are eligible for suspension.

Unknown Followup

See SA1.7-QR6 and SA1.7-QR6 for more background information. In Q11 we have continued this activity. In addition, we have started discussions with the SAM nagios team to have a nagios probe that will raise alarms on the operations dashboard when the unknown percentage is higher than a certain threshold. These discussions are nearly completed.

Followup NGI Core Services availability

We have issued GGUS tickets to NGIs that do not meet the 99% availability requirement. In februari 2012 we have started up this activity. At first we have only submitted GGUS tickets to NGIs informing the of their low top-level BDII availability. This activity has been continued in this quarter.

Review of certification procedures etc

We are busy developing a procedure to incorporate test resources into the EGI infrastructure, review the certification procedures and to identify possible changes to the operational tools. This discussion is now finished and a presentation about the outcome has been given in the OMB of december, https://indico.egi.eu/indico/getFile.py/access?contribId=4&resId=0&materialId=slides&confId=1100.


COD F2F meeting

In november we have had a COD f2f meeting. The minutes may be found at: https://indico.egi.eu/indico/getFile.py/access?resId=2&materialId=minutes&confId=1243.

Plan for 2013

As a result of the COD F2F meeting we have written a plan about in what direction the activity should move. These may be found at: https://documents.egi.eu/public/ShowDocument?docid=1529.

Software support

The adapted software support process, as established at the end of PQ10, run smoothly, without any noticable issues.

In this quarter, 172 tickets were identified to be software issue, and 52 (30%) were solved. The absolute number is higher wrt. PQ10 but comparable, percentage of solved tickets remains the same, indicating stable ratio of software defects (ie. the tickets that are reassigned to 3rd line).

Ticket solution times (average/median) are 19/4 days, are slightly better than preceeding quarters (however, these numbers tend to oscillate considerably, depending on the actual tickets solved), with the same external reasons for "long tail" of the distribution, yielding rather high average times.

3. Issues and Mitigation

Issue Description Mitigation Description
Grid Oversight: None None
Software Support: None

4. Plans for the next period

We will continue the activities that we already doing. Further we are going to proceed with carrying out the plan outlined in https://indico.egi.eu/indico/getFile.py/access?contribId=4&resId=0&materialId=slides&confId=1100 and startup the plans described in https://documents.egi.eu/public/ShowDocument?docid=1529.

Software support

We will focus on implementation of the agreed ticket followup process, which is still not fully in place for the low-priority tickets.

We will discuss and work on the model of the software support in the "post May 2014" era, aligned with the foreseen model of community platform integrators.

For the main work of the unit, triage and resolution of the incoming tickets, no major changes are foreseen.