Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:Plan 2012 SA1.8"

From EGIWiki
Jump to navigation Jump to search
(Created page with '= Assessment of progress, 2011 = = Plans for 2012 =')
 
Line 1: Line 1:
= Assessment of progress, 2011 =
= Plans 2012 SA1.8 =


= Plans for 2012 =
==  Assessement of progress, 2011 ==
 
=== Core Grid Services  ===
 
==== DTEAM VO Services ====
 
The migration of the DTEAM VO was finalized on January 2011. DTEAM VO is served by 2 geographically distributed VOMS servers in Thessaloniki and Athens (voms.hellasgrid.gr and voms2.hellasgrid.gr). During this year 7 NGI groups were created on the DTEAM VO  (NGI_FI, NGI_NDGF, NGI_DE, NGI_IT, NGI_IE, NGI_UK, NGI_ZA) and 2 ROC Groups were decommissioned (ROC_Italy, SEE)
 
==== EGI Catch All CA ====
 
During 2011 the EGI Catch All CA setup three new Registration Authorities in Senegal,  Egypt and for SixSq (partner in StratusLab) in Switzerland. This brings the total number of RAs to 7.
 
==== Core Services for Site Certification ====
 
A TOP-BDII, a WMS and an LB service was installed as catch all services for NGIs that do not operate their own services for the site certification process. In addition a portal was built, that syncs with GOCDB and gives the ability to the NGI Managers to add and remove on demand uncertified sites from the catch-all TOP-BDII.
 
 
=== Operations tool and availability computation ===
 
==== Propose Changes for Operations tools ====
 
An assessment of the operations tools was completed and the result were presented at the EGI Technical Conference in Lyon. https://wiki.egi.eu/wiki/POEM_and_ACE_requirements
 
==== Data more readily available to NGIs ====
 
This has been provided by MyEGI. Maybe improvements can be suggested as more experience is gained from its usage.
 
==== Follow-up with developers for issues that affect accuracy ====
 
There is a high number of unknown status from certain NGI nagios instances / sites. This is still investigated but it seems to involve mostly NGI nagios operations and not developers. This is an ongoing activity
 
=== Operational Level Agreements (OLAs) ===
 
==== MSA 411 ====
 
The milestone MSA11 "Operational Level Agreements within the EGI PRoduction Infrastructure" was achieved during 2011.
 
https://documents.egi.eu/document/524
 
 
==== Continue adaptations to the OLA between NGI and sites ====
 
The RC OLA has been finalized and available at: https://documents.egi.eu/document/31
 
==== Produce OLA between EGI and NGIs, as well as a Core services OLA ====
 
The RP OLA, which was started during 2011 (https://documents.egi.eu/document/463), partially covers this, with NGI responsibilities including the services NGI provides as core services, however it is ongoing that as tools evolve more services thresholds should be included in this OLA. The first release of the RP OLA was finalized in 2011 and the second release will come shortly in early 2012.
 
In 2012 the EGI.eu OLA will cover the services offered by EGI.
 
==== Propose an OLA amendment procedure (Spring 2011) ====
 
This action was not completed at the OLAs were not finalized. This is an action for 2012
 
==== Evaluate the impact of increased availability suspension threshold ====
 
During 2011 TSA1.8 evaluated the impact of increasing the availability suspension threshold. The results of the evaluation were presented at the Technical Forum in Lyon:
 
https://www.egi.eu/indico/conferenceDisplay.py?confId=267
 
 
==== Reconvene with the OLA task force at least once per 2 months ====
 
That was not really needed, depending on the requirements sometimes 2 meetings took place within 1 month, as the TF work has to go through the OMB for approval and additional comments to be addressed.
 
==== Availability/Reliability ====
 
TSA1.8 is responsible for the distribution of monthly league tables. Continue adding useful material to the wiki:
 
https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics
 
The investigation whether operational tools advancements can simplify the procedure is an ongoing activity and will continue in 2012:
 
https://rt.egi.eu/guest/Ticket/Display.html?id=289
 
Regarding the prime causes of site failures investigation: Ongoing, the first step is to determine the causes of the high % of UNKNOWN states in NGI Nagios (mentioned before in the accuracy issues) before going deeper into sites. Site replies to COD tickets for the reports could start be categorized in 2012. The initial results of the investigate show that the problems are mostly relating with operation.
 
== Plans for 2012 ==
 
=== Core Grid Services ===
 
==== DTEAM VO Services ====
 
The plan for to 2012 is to finalize the decommission of the legacy ROC Groups. (ROC_Benelux, ROC_France, ROC_UKI). Currently the DTEAM VO services are provided using the VOMRS service. Investigate whether the new VOMS service provides all the needed functionality.
 
==== EGI Catch All CA ====
 
Continue the support and operation of the EGI Catch All CA and the expansion of the RA Network as needed.
 
==== Core Services for Site Certification ====
 
Continue the support and operation of the Site Certification Core Services.
 
=== Operations tool and availability computation ===
 
==== Follow-up with developers for issues that affect accuracy ====
 
Continue the investigation of the relatively high number of unknown status from certain NGI nagios instances / sites. Target date 2012Q2.
 
=== Operational Level Aggreements (OLAs) ===
 
==== MSA 418 ====
 
The milestone MSA 418 "Operational Level Agreements (OLAs) within the EGI production infrastructure" is planned for 2012Q1 with deadline the end of the first month of 2012Q2.
 
==== Produce OLA between EGI and NGIs, as well as a Core services OLA ====
 
The 2nd release of the RP OLA will be finalized early 2012Q1. A new work item for 2012 is the EGI.eu OLA. A draft version will be ready in 2012Q2 and the final version is expected in 2012Q3. In 2012Q3 a new revision of the RP OLA will be drafted including any a
 
==== Propose an OLA amendment procedure ====
 
The amendment procedure for the OLA is scheduled for 2012Q2
 
==== OLA Task Force Meetings ====
 
The OLA Task Force will reconvene via video conference and/or face to face meetings as needed.
 
==== Availability/Reliability ====
 
TSA1.8 will continue the distribution of monthly league tables and the maintenance of the relevant wiki space:
 
https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics
 
The investigation whether operational tools advancements can simplify the procedure will continue in 2012 and recommendations will be made to operations and tools developers.
 
https://rt.egi.eu/guest/Ticket/Display.html?id=289
 
Regarding the prime causes of site failures investigation: Ongoing, the first step is to determine the causes of the high % of UNKNOWN states in NGI Nagios (mentioned before in the accuracy issues) before going deeper into sites. Site replies to COD tickets for the reports could start be categorized in 2012. The initial results of the investigate show that the problems are mostly relating with operation.

Revision as of 11:00, 19 December 2011

Plans 2012 SA1.8

Assessement of progress, 2011

Core Grid Services

DTEAM VO Services

The migration of the DTEAM VO was finalized on January 2011. DTEAM VO is served by 2 geographically distributed VOMS servers in Thessaloniki and Athens (voms.hellasgrid.gr and voms2.hellasgrid.gr). During this year 7 NGI groups were created on the DTEAM VO (NGI_FI, NGI_NDGF, NGI_DE, NGI_IT, NGI_IE, NGI_UK, NGI_ZA) and 2 ROC Groups were decommissioned (ROC_Italy, SEE)

EGI Catch All CA

During 2011 the EGI Catch All CA setup three new Registration Authorities in Senegal, Egypt and for SixSq (partner in StratusLab) in Switzerland. This brings the total number of RAs to 7.

Core Services for Site Certification

A TOP-BDII, a WMS and an LB service was installed as catch all services for NGIs that do not operate their own services for the site certification process. In addition a portal was built, that syncs with GOCDB and gives the ability to the NGI Managers to add and remove on demand uncertified sites from the catch-all TOP-BDII.


Operations tool and availability computation

Propose Changes for Operations tools

An assessment of the operations tools was completed and the result were presented at the EGI Technical Conference in Lyon. https://wiki.egi.eu/wiki/POEM_and_ACE_requirements

Data more readily available to NGIs

This has been provided by MyEGI. Maybe improvements can be suggested as more experience is gained from its usage.

Follow-up with developers for issues that affect accuracy

There is a high number of unknown status from certain NGI nagios instances / sites. This is still investigated but it seems to involve mostly NGI nagios operations and not developers. This is an ongoing activity

Operational Level Agreements (OLAs)

MSA 411

The milestone MSA11 "Operational Level Agreements within the EGI PRoduction Infrastructure" was achieved during 2011.

https://documents.egi.eu/document/524


Continue adaptations to the OLA between NGI and sites

The RC OLA has been finalized and available at: https://documents.egi.eu/document/31

Produce OLA between EGI and NGIs, as well as a Core services OLA

The RP OLA, which was started during 2011 (https://documents.egi.eu/document/463), partially covers this, with NGI responsibilities including the services NGI provides as core services, however it is ongoing that as tools evolve more services thresholds should be included in this OLA. The first release of the RP OLA was finalized in 2011 and the second release will come shortly in early 2012.

In 2012 the EGI.eu OLA will cover the services offered by EGI.

Propose an OLA amendment procedure (Spring 2011)

This action was not completed at the OLAs were not finalized. This is an action for 2012

Evaluate the impact of increased availability suspension threshold

During 2011 TSA1.8 evaluated the impact of increasing the availability suspension threshold. The results of the evaluation were presented at the Technical Forum in Lyon:

https://www.egi.eu/indico/conferenceDisplay.py?confId=267


Reconvene with the OLA task force at least once per 2 months

That was not really needed, depending on the requirements sometimes 2 meetings took place within 1 month, as the TF work has to go through the OMB for approval and additional comments to be addressed.

Availability/Reliability

TSA1.8 is responsible for the distribution of monthly league tables. Continue adding useful material to the wiki:

https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics 

The investigation whether operational tools advancements can simplify the procedure is an ongoing activity and will continue in 2012:

https://rt.egi.eu/guest/Ticket/Display.html?id=289

Regarding the prime causes of site failures investigation: Ongoing, the first step is to determine the causes of the high % of UNKNOWN states in NGI Nagios (mentioned before in the accuracy issues) before going deeper into sites. Site replies to COD tickets for the reports could start be categorized in 2012. The initial results of the investigate show that the problems are mostly relating with operation.

Plans for 2012

Core Grid Services

DTEAM VO Services

The plan for to 2012 is to finalize the decommission of the legacy ROC Groups. (ROC_Benelux, ROC_France, ROC_UKI). Currently the DTEAM VO services are provided using the VOMRS service. Investigate whether the new VOMS service provides all the needed functionality.

EGI Catch All CA

Continue the support and operation of the EGI Catch All CA and the expansion of the RA Network as needed.

Core Services for Site Certification

Continue the support and operation of the Site Certification Core Services.

Operations tool and availability computation

Follow-up with developers for issues that affect accuracy

Continue the investigation of the relatively high number of unknown status from certain NGI nagios instances / sites. Target date 2012Q2.

Operational Level Aggreements (OLAs)

MSA 418

The milestone MSA 418 "Operational Level Agreements (OLAs) within the EGI production infrastructure" is planned for 2012Q1 with deadline the end of the first month of 2012Q2.

Produce OLA between EGI and NGIs, as well as a Core services OLA

The 2nd release of the RP OLA will be finalized early 2012Q1. A new work item for 2012 is the EGI.eu OLA. A draft version will be ready in 2012Q2 and the final version is expected in 2012Q3. In 2012Q3 a new revision of the RP OLA will be drafted including any a

Propose an OLA amendment procedure

The amendment procedure for the OLA is scheduled for 2012Q2

OLA Task Force Meetings

The OLA Task Force will reconvene via video conference and/or face to face meetings as needed.

Availability/Reliability

TSA1.8 will continue the distribution of monthly league tables and the maintenance of the relevant wiki space:

https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics

The investigation whether operational tools advancements can simplify the procedure will continue in 2012 and recommendations will be made to operations and tools developers.

https://rt.egi.eu/guest/Ticket/Display.html?id=289

Regarding the prime causes of site failures investigation: Ongoing, the first step is to determine the causes of the high % of UNKNOWN states in NGI Nagios (mentioned before in the accuracy issues) before going deeper into sites. Site replies to COD tickets for the reports could start be categorized in 2012. The initial results of the investigate show that the problems are mostly relating with operation.