Difference between revisions of "EGI-InSPIRE:Plan 2012 SA1.8"

From EGIWiki
Jump to: navigation, search
(DTEAM VO Services)
Line 1: Line 1:
= Plans 2012 SA1.8 =
 
  
== Assessement of progress, 2011 ==
+
= Plans 2012 SA1.8  =
 +
 
 +
== Assessement of progress, 2011 ==
  
 
=== Core Grid Services  ===
 
=== Core Grid Services  ===
  
==== DTEAM VO Services ====
+
==== DTEAM VO Services ====
  
The migration of the DTEAM VO was finalized on January 2011. DTEAM VO is served by 2 geographically distributed VOMS servers in Thessaloniki and Athens (voms.hellasgrid.gr and voms2.hellasgrid.gr). During this year 7 NGI groups were created on the DTEAM VO (NGI_FI, NGI_NDGF, NGI_DE, NGI_IT, NGI_IE, NGI_UK, NGI_ZA) and 3 ROC Groups were decommissioned (ROC_Italy, SEE, dech)
+
The migration of the DTEAM VO was finalized on January 2011. DTEAM VO is served by 2 geographically distributed VOMS servers in Thessaloniki and Athens (voms.hellasgrid.gr and voms2.hellasgrid.gr). During this year 7 NGI groups were created on the DTEAM VO (NGI_FI, NGI_NDGF, NGI_DE, NGI_IT, NGI_IE, NGI_UK, NGI_ZA) and 3 ROC Groups were decommissioned (ROC_Italy, SEE, dech)  
  
==== EGI Catch All CA ====
+
==== EGI Catch All CA ====
  
During 2011 the EGI Catch All CA setup three new Registration Authorities in Senegal, Egypt and for SixSq (partner in StratusLab) in Switzerland. This brings the total number of RAs to 7.
+
During 2011 the EGI Catch All CA setup three new Registration Authorities in Senegal, Egypt and for SixSq (partner in StratusLab) in Switzerland. This brings the total number of RAs to 7.  
  
==== Core Services for Site Certification ====
+
==== Core Services for Site Certification ====
  
 
A TOP-BDII, a WMS and an LB service was installed as catch all services for NGIs that do not operate their own services for the site certification process. In addition a portal was built, that syncs with GOCDB and gives the ability to the NGI Managers to add and remove on demand uncertified sites from the catch-all TOP-BDII.  
 
A TOP-BDII, a WMS and an LB service was installed as catch all services for NGIs that do not operate their own services for the site certification process. In addition a portal was built, that syncs with GOCDB and gives the ability to the NGI Managers to add and remove on demand uncertified sites from the catch-all TOP-BDII.  
  
 +
<br>
  
=== Operations tool and availability computation ===
+
=== Operations tool and availability computation ===
  
==== Propose Changes for Operations tools ====
+
==== Propose Changes for Operations tools ====
  
An assessment of the operations tools was completed and the result were presented at the EGI Technical Conference in Lyon.
+
An assessment of the operations tools was completed and the result were presented at the EGI Technical Conference in Lyon.  
  
 
  https://wiki.egi.eu/wiki/POEM_and_ACE_requirements
 
  https://wiki.egi.eu/wiki/POEM_and_ACE_requirements
  
==== Data more readily available to NGIs ====
+
==== Data more readily available to NGIs ====
  
This has been provided by MyEGI. Maybe improvements can be suggested as more experience is gained from its usage.
+
This has been provided by MyEGI. Maybe improvements can be suggested as more experience is gained from its usage.  
  
==== Follow-up with developers for issues that affect accuracy ====
+
==== Follow-up with developers for issues that affect accuracy ====
  
There is a high number of unknown status from certain NGI nagios instances / sites. This is still investigated but it seems to involve mostly NGI nagios operations and not developers. This is an ongoing activity
+
There is a high number of unknown status from certain NGI nagios instances / sites. This is still investigated but it seems to involve mostly NGI nagios operations and not developers. This is an ongoing activity and will be followed up by TSA1.7
  
=== Operational Level Agreements (OLAs) ===
+
=== Operational Level Agreements (OLAs) ===
  
==== MSA 411 ====
+
==== MSA 411 ====
  
The milestone MSA11 "Operational Level Agreements within the EGI PRoduction Infrastructure" was achieved during 2011.
+
The milestone MSA11 "Operational Level Agreements within the EGI PRoduction Infrastructure" was achieved during 2011.  
  
 
  https://documents.egi.eu/document/524
 
  https://documents.egi.eu/document/524
  
 +
<br>
  
==== Continue adaptations to the OLA between NGI and sites ====
+
==== Continue adaptations to the OLA between NGI and sites ====
  
 
The RC OLA has been finalized and available at:  
 
The RC OLA has been finalized and available at:  
Line 49: Line 52:
 
  https://documents.egi.eu/document/31
 
  https://documents.egi.eu/document/31
  
==== Produce OLA between EGI and NGIs, as well as a Core services OLA ====
+
==== Produce OLA between EGI and NGIs, as well as a Core services OLA ====
  
The RP OLA, which was started during 2011, partially covers this, with NGI responsibilities including the services NGI provides as core services, however it is ongoing that as tools evolve more services thresholds should be included in this OLA. The first release of the RP OLA was finalized in 2011 and the second release will come shortly in early 2012.
+
The RP OLA, which was started during 2011, partially covers this, with NGI responsibilities including the services NGI provides as core services, however it is ongoing that as tools evolve more services thresholds should be included in this OLA. The first release of the RP OLA was finalized in 2011 and the second release will come shortly in early 2012.  
  
 
  https://documents.egi.eu/document/463
 
  https://documents.egi.eu/document/463
  
In 2012 the EGI.eu OLA will cover the services offered by EGI.
+
In 2012 the EGI.eu OLA will cover the services offered by EGI.  
  
==== Propose an OLA amendment procedure (Spring 2011) ====
+
==== Propose an OLA amendment procedure (Spring 2011) ====
  
This action was not completed at the OLAs were not finalized. This is an action for 2012
+
This action was not completed at the OLAs were not finalized. This is an action for 2012  
  
==== Evaluate the impact of increased availability suspension threshold ====
+
==== Evaluate the impact of increased availability suspension threshold ====
  
During 2011 TSA1.8 evaluated the impact of increasing the availability suspension threshold. The results of the evaluation were presented at the Technical Forum in Lyon:
+
During 2011 TSA1.8 evaluated the impact of increasing the availability suspension threshold. The results of the evaluation were presented at the Technical Forum in Lyon:  
  
 
  https://www.egi.eu/indico/conferenceDisplay.py?confId=267
 
  https://www.egi.eu/indico/conferenceDisplay.py?confId=267
  
 +
<br>
  
==== Reconvene with the OLA task force at least once per 2 months ====
+
==== Reconvene with the OLA task force at least once per 2 months ====
  
That was not really needed, depending on the requirements sometimes 2 meetings took place within 1 month, as the TF work has to go through the OMB for approval and additional comments to be addressed.
+
That was not really needed, depending on the requirements sometimes 2 meetings took place within 1 month, as the TF work has to go through the OMB for approval and additional comments to be addressed.  
  
==== Availability/Reliability ====
+
==== Availability/Reliability ====
  
TSA1.8 is responsible for the distribution of monthly league tables. Continue adding useful material to the wiki:
+
TSA1.8 is responsible for the distribution of monthly league tables. Continue adding useful material to the wiki:  
  
 
  https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics  
 
  https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics  
  
The investigation whether operational tools advancements can simplify the procedure is an ongoing activity and will continue in 2012:
+
The investigation whether operational tools advancements can simplify the procedure is an ongoing activity and will continue in 2012:  
  
 
  https://rt.egi.eu/guest/Ticket/Display.html?id=289
 
  https://rt.egi.eu/guest/Ticket/Display.html?id=289
  
Regarding the prime causes of site failures investigation: Ongoing, the first step is to determine the causes of the high % of UNKNOWN states in NGI Nagios (mentioned before in the accuracy issues) before going deeper into sites. Site replies to COD tickets for the reports could start be categorized in 2012. The initial results of the investigate show that the problems are mostly relating with operation.
+
Regarding the prime causes of site failures investigation: Ongoing, the first step is to determine the causes of the high&nbsp;% of UNKNOWN states in NGI Nagios (mentioned before in the accuracy issues) before going deeper into sites. Site replies to COD tickets for the reports could start be categorized in 2012. The initial results of the investigate show that the problems are mostly relating with operation.  
  
== Plans for 2012 ==
+
== Plans for 2012 ==
  
=== Core Grid Services ===
+
=== Core Grid Services ===
 
 
==== DTEAM VO Services ====
 
 
 
The plan for to 2012 is to finalize the decommission of the legacy ROC Groups. (ROC_Benelux, ROC_France, ROC_UKI). Currently the DTEAM VO services are provided using the VOMRS service. Investigate whether the new VOMS service provides all the needed functionality.
 
  
==== EGI Catch All CA ====
+
==== VO Services  ====
  
Continue the support and operation of the EGI Catch All CA and the expansion of the RA Network as needed.
+
The plan for to 2012 is to finalize the decommission of the legacy ROC Groups in the DTEAM VO. (ROC_Benelux, ROC_France, ROC_UKI). Currently the DTEAM VO services are provided using the VOMRS service. Investigate whether the new VOMS service provides all the needed functionality.
  
==== Core Services for Site Certification ====
+
During 2012Q1 TSA1.8 will assess the need and the feasibility of setting up a replicated service of the OPS VO at the GRNET VOMS Infrastructure, while the the primary VO services for the OPS VO are provided by CERN.
  
Continue the support and operation of the Site Certification Core Services.
+
==== EGI Catch All CA  ====
  
=== Operations tool and availability computation ===
+
Continue the support and operation of the EGI Catch All CA and the expansion of the RA Network as needed.
  
==== Follow-up with developers for issues that affect accuracy ====
+
==== Core Services for Site Certification  ====
  
Continue the investigation of the relatively high number of unknown status from certain NGI nagios instances / sites. Target date 2012Q2.
+
Continue the support and operation of the Site Certification Core Services.  
  
=== Operational Level Aggreements (OLAs) ===
+
=== Operational Level Agreements (OLAs) ===
  
==== MSA 418 ====
+
==== MSA 418 ====
  
The milestone MSA 418 "Operational Level Agreements (OLAs) within the EGI production infrastructure" is planned for 2012Q1 with deadline the end of the first month of 2012Q2.
+
The milestone MSA 418 "Operational Level Agreements (OLAs) within the EGI production infrastructure" is planned for 2012Q1 with deadline the end of the first month of 2012Q2.  
  
==== Produce OLA between EGI and NGIs, as well as a Core services OLA ====
+
==== Produce OLA between EGI and NGIs, as well as a Core services OLA ====
  
The 2nd release of the RP OLA will be finalized early 2012Q1. A new work item for 2012 is the EGI.eu OLA. A draft version will be ready in 2012Q2 and the final version is expected in 2012Q3. In 2012Q3 a new revision of the RP OLA will be drafted including any a
+
The 2nd release of the RP OLA will be finalized early 2012Q1. A new work item for 2012 is the EGI.eu OLA. A draft version will be ready in 2012Q2 and the final version is expected in 2012Q3.  
  
==== Propose an OLA amendment procedure ====
+
==== OLA Task Force  ====
  
The amendment procedure for the OLA is scheduled for 2012Q2
+
The OLA Task Force as a fixed group has finished its work. For the upcoming work on the EGI.eu OLA, TSA1.8 will establish direct communication channels with the people that are operating services at the EGI.eu level.
  
==== OLA Task Force Meetings ====
+
==== Availability/Reliability  ====
  
The OLA Task Force will reconvene via video conference and/or face to face meetings as needed.
+
TSA1.8 will continue the handling the validation and distribution of the monthly league tables regarding Resource Center and EGI.eu services and the maintenance of the relevant wiki space:  
 
 
==== Availability/Reliability ====
 
 
 
TSA1.8 will continue the distribution of monthly league tables and the maintenance of the relevant wiki space:
 
 
 
https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics
 
 
 
The investigation whether operational tools advancements can simplify the procedure will continue in 2012 and recommendations will be made to operations and tools developers.
 
 
 
https://rt.egi.eu/guest/Ticket/Display.html?id=289
 
  
Regarding the prime causes of site failures investigation: Ongoing, the first step is to determine the causes of the high % of UNKNOWN states in NGI Nagios (mentioned before in the accuracy issues) before going deeper into sites. Site replies to COD tickets for the reports could start be categorized in 2012. The initial results of the investigate show that the problems are mostly relating with operation.
+
https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics

Revision as of 21:11, 16 January 2012

Plans 2012 SA1.8

Assessement of progress, 2011

Core Grid Services

DTEAM VO Services

The migration of the DTEAM VO was finalized on January 2011. DTEAM VO is served by 2 geographically distributed VOMS servers in Thessaloniki and Athens (voms.hellasgrid.gr and voms2.hellasgrid.gr). During this year 7 NGI groups were created on the DTEAM VO (NGI_FI, NGI_NDGF, NGI_DE, NGI_IT, NGI_IE, NGI_UK, NGI_ZA) and 3 ROC Groups were decommissioned (ROC_Italy, SEE, dech)

EGI Catch All CA

During 2011 the EGI Catch All CA setup three new Registration Authorities in Senegal, Egypt and for SixSq (partner in StratusLab) in Switzerland. This brings the total number of RAs to 7.

Core Services for Site Certification

A TOP-BDII, a WMS and an LB service was installed as catch all services for NGIs that do not operate their own services for the site certification process. In addition a portal was built, that syncs with GOCDB and gives the ability to the NGI Managers to add and remove on demand uncertified sites from the catch-all TOP-BDII.


Operations tool and availability computation

Propose Changes for Operations tools

An assessment of the operations tools was completed and the result were presented at the EGI Technical Conference in Lyon.

https://wiki.egi.eu/wiki/POEM_and_ACE_requirements

Data more readily available to NGIs

This has been provided by MyEGI. Maybe improvements can be suggested as more experience is gained from its usage.

Follow-up with developers for issues that affect accuracy

There is a high number of unknown status from certain NGI nagios instances / sites. This is still investigated but it seems to involve mostly NGI nagios operations and not developers. This is an ongoing activity and will be followed up by TSA1.7

Operational Level Agreements (OLAs)

MSA 411

The milestone MSA11 "Operational Level Agreements within the EGI PRoduction Infrastructure" was achieved during 2011.

https://documents.egi.eu/document/524


Continue adaptations to the OLA between NGI and sites

The RC OLA has been finalized and available at:

https://documents.egi.eu/document/31

Produce OLA between EGI and NGIs, as well as a Core services OLA

The RP OLA, which was started during 2011, partially covers this, with NGI responsibilities including the services NGI provides as core services, however it is ongoing that as tools evolve more services thresholds should be included in this OLA. The first release of the RP OLA was finalized in 2011 and the second release will come shortly in early 2012.

https://documents.egi.eu/document/463

In 2012 the EGI.eu OLA will cover the services offered by EGI.

Propose an OLA amendment procedure (Spring 2011)

This action was not completed at the OLAs were not finalized. This is an action for 2012

Evaluate the impact of increased availability suspension threshold

During 2011 TSA1.8 evaluated the impact of increasing the availability suspension threshold. The results of the evaluation were presented at the Technical Forum in Lyon:

https://www.egi.eu/indico/conferenceDisplay.py?confId=267


Reconvene with the OLA task force at least once per 2 months

That was not really needed, depending on the requirements sometimes 2 meetings took place within 1 month, as the TF work has to go through the OMB for approval and additional comments to be addressed.

Availability/Reliability

TSA1.8 is responsible for the distribution of monthly league tables. Continue adding useful material to the wiki:

https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics 

The investigation whether operational tools advancements can simplify the procedure is an ongoing activity and will continue in 2012:

https://rt.egi.eu/guest/Ticket/Display.html?id=289

Regarding the prime causes of site failures investigation: Ongoing, the first step is to determine the causes of the high % of UNKNOWN states in NGI Nagios (mentioned before in the accuracy issues) before going deeper into sites. Site replies to COD tickets for the reports could start be categorized in 2012. The initial results of the investigate show that the problems are mostly relating with operation.

Plans for 2012

Core Grid Services

VO Services

The plan for to 2012 is to finalize the decommission of the legacy ROC Groups in the DTEAM VO. (ROC_Benelux, ROC_France, ROC_UKI). Currently the DTEAM VO services are provided using the VOMRS service. Investigate whether the new VOMS service provides all the needed functionality.

During 2012Q1 TSA1.8 will assess the need and the feasibility of setting up a replicated service of the OPS VO at the GRNET VOMS Infrastructure, while the the primary VO services for the OPS VO are provided by CERN.

EGI Catch All CA

Continue the support and operation of the EGI Catch All CA and the expansion of the RA Network as needed.

Core Services for Site Certification

Continue the support and operation of the Site Certification Core Services.

Operational Level Agreements (OLAs)

MSA 418

The milestone MSA 418 "Operational Level Agreements (OLAs) within the EGI production infrastructure" is planned for 2012Q1 with deadline the end of the first month of 2012Q2.

Produce OLA between EGI and NGIs, as well as a Core services OLA

The 2nd release of the RP OLA will be finalized early 2012Q1. A new work item for 2012 is the EGI.eu OLA. A draft version will be ready in 2012Q2 and the final version is expected in 2012Q3.

OLA Task Force

The OLA Task Force as a fixed group has finished its work. For the upcoming work on the EGI.eu OLA, TSA1.8 will establish direct communication channels with the people that are operating services at the EGI.eu level.

Availability/Reliability

TSA1.8 will continue the handling the validation and distribution of the monthly league tables regarding Resource Center and EGI.eu services and the maintenance of the relevant wiki space:

https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics