Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:PY2 periodic report (SA1)"

From EGIWiki
Jump to navigation Jump to search
(Created page with "{{Template:Op menubar}} {{TOC_right}} = Executive Summary = SA1 was responsible of the continued operation and expansion of the production infrastructure. The transition started...")
 
Line 4: Line 4:
= Executive Summary =
= Executive Summary =
SA1 was responsible of the continued operation and expansion of the production infrastructure. The transition started in PY1, which evolved the EGEE federated Operations Centre into independent NGIs, was completed. The total number of Resource Centres (RCs) in March 2011 amounts to 352 instances (+3.22% yearly increase). The installed capacity and Resource Centres grew considerably to comprise 270,800 logical cores (+30.7% yearly increase), 2.96 Million HEP-SPEC 06 (+49.5%), 139 PB of disk space (+31.4%) and 134.3 PB of tape (+50%).
SA1 was responsible of the continued operation and expansion of the production infrastructure. The transition started in PY1, which evolved the EGEE federated Operations Centre into independent NGIs, was completed. The total number of Resource Centres (RCs) in March 2011 amounts to 352 instances (+3.22% yearly increase). The installed capacity and Resource Centres grew considerably to comprise 270,800 logical cores (+30.7% yearly increase), 2.96 Million HEP-SPEC 06 (+49.5%), 139 PB of disk space (+31.4%) and 134.3 PB of tape (+50%).
EGI currently comprehends 27 national operations centres and 9 federated operations centres encompassing multiple NGIs. Availability and Reliability reached 94.50% and 95.42% (yearly average), which amounts to a +1% increase in PY2. Overall resource utilization has been satisfactorily progressing confirming the trends of PY1. The yearly increase of the total number of jobs executed in the infrastructure in the period May 2011-April 2012 amounts to +46.42% of the yearly job workload done from May 2010 to April 2011. The PY2 overall quantity of EGI computing resources used amounts to 10.5 Billion HEP-SPEC 06 Hours.
EGI currently comprehends 27 national operations centres and 9 federated operations centres encompassing multiple NGIs. Availability and Reliability reached 94.50% and 95.42% (yearly average), which amounts to a +1% increase in PY2. Overall resource utilization has been satisfactorily progressing confirming the trends of PY1. The yearly increase of the total number of jobs executed in the infrastructure in the period May 2011-April 2012 amounts to +46.42% of the yearly job workload done from May 2010 to April 2011. The PY2 overall quantity of EGI computing resources used amounts to 10.5 Billion HEP-SPEC 06 Hours.
Operational security was run effectively during PY2 and ensured day-by-day security monitoring, and timely response in case of incidents. Security in EGI was reviewed following the PY1 reviewers’ suggestions, and documented in Deliverable D4.4. The EGI Security Threat Risk assessment team was formed. 75 threats in 20 categories were identified and an initial risk assessment and preliminary report was produced describing the assessment process, progress and initial findings. Specialized tools for incident response tracking and for streamlining of operational security tasks, were prototyped and rolled to production.  
Operational security was run effectively during PY2 and ensured day-by-day security monitoring, and timely response in case of incidents. Security in EGI was reviewed following the PY1 reviewers’ suggestions, and documented in Deliverable D4.4. The EGI Security Threat Risk assessment team was formed. 75 threats in 20 categories were identified and an initial risk assessment and preliminary report was produced describing the assessment process, progress and initial findings. Specialized tools for incident response tracking and for streamlining of operational security tasks, were prototyped and rolled to production.  
The Staged Rollout workflow introduced during PY1, is being progressively refined. The Staged Rollout infrastructure has been gradually expanding reflecting the deployment needs of VRCs and NGIs, and resources were reallocated to ensure testing of a broader range of products. The staged rollout infrastructure currently comprehends 60 Early Adopter teams.
The Staged Rollout workflow introduced during PY1, is being progressively refined. The Staged Rollout infrastructure has been gradually expanding reflecting the deployment needs of VRCs and NGIs, and resources were reallocated to ensure testing of a broader range of products. The staged rollout infrastructure currently comprehends 60 Early Adopter teams.
The operations integration of GLOBUS, UNICORE, QosCosGrid and Desktop Grids were completed, with the exception of accounting, which requires further integration development. Extensions are being implemented in collaboration with the external technology providers.
The operations integration of GLOBUS, UNICORE, QosCosGrid and Desktop Grids were completed, with the exception of accounting, which requires further integration development. Extensions are being implemented in collaboration with the external technology providers.
GGUS was updated to decommission various legacy support units, and to add new ones for VO support, operations support and 3rd level support. A new report generator was designed and prototyped. GGUS FAQs were migrated to the EGI wiki, usability of the system was enhanced and GGUS was interfaced to a new helpdesk system (Service NOW). The GGUS failover configuration was hardened with auto-switching between different front-ends.
GGUS was updated to decommission various legacy support units, and to add new ones for VO support, operations support and 3rd level support. A new report generator was designed and prototyped. GGUS FAQs were migrated to the EGI wiki, usability of the system was enhanced and GGUS was interfaced to a new helpdesk system (Service NOW). The GGUS failover configuration was hardened with auto-switching between different front-ends.
VO SAM, VO Admin Dashboard, and LFCBrowseSE are now mature systems supporting VO operations and being deployed by interested NGIs and/or VOs to assist them in VO daily operations and management. The first prototype of the VO Operations Portal  – released by JRA1 and fully integrated into the Operations Portal – was deployed and feedback was provided to finally roll it to production.
VO SAM, VO Admin Dashboard, and LFCBrowseSE are now mature systems supporting VO operations and being deployed by interested NGIs and/or VOs to assist them in VO daily operations and management. The first prototype of the VO Operations Portal  – released by JRA1 and fully integrated into the Operations Portal – was deployed and feedback was provided to finally roll it to production.
Central Grid Oversight (COD) of EGI was responsible of the certification of new NGIs being created either as a result of legacy EGEE federated operations centres stopping operations, or because of new Resource Providers joining the infrastructure. COD was involved in training and dissemination activities, in follow-up of underperformance both at a Resource Centre and at a Resource Provider level, and in monitoring the instability of the distributed SAM infrastructure.
Central Grid Oversight (COD) of EGI was responsible of the certification of new NGIs being created either as a result of legacy EGEE federated operations centres stopping operations, or because of new Resource Providers joining the infrastructure. COD was involved in training and dissemination activities, in follow-up of underperformance both at a Resource Centre and at a Resource Provider level, and in monitoring the instability of the distributed SAM infrastructure.
The EGI.eu central tools were significantly advanced. The first Metrics Portal was rolled to production in PQ6. The message broker network was repeatedly upgraded to improve the reliability of message delivery, stability, manageability and scalability. The transition from R-GMA to messaging of the accounting infrastructure was completed and a new central consumer based on ActiveMQ STOMP was deployed in pre-production. The Canopus release of the accounting portal (v4.0) brought among the other things, many bug fixes, extended FQAN-based views and new graphics. GOCDB functionality was also significantly extended with the support of virtual sites, new roles and permissions, scoping of Resource Centres and sites, and a hardened DNS-based failover configuration. The Service Availability Monitoring (SAM) underwent five different upgrades and is currently the largest and more distributed operational infrastructure comprising 32 distributed instances. The operations portal rolled to production new major components: the VO Dashboard and the Security Dashboard. In addition the VO management features were greatly enhanced.
The EGI.eu central tools were significantly advanced. The first Metrics Portal was rolled to production in PQ6. The message broker network was repeatedly upgraded to improve the reliability of message delivery, stability, manageability and scalability. The transition from R-GMA to messaging of the accounting infrastructure was completed and a new central consumer based on ActiveMQ STOMP was deployed in pre-production. The Canopus release of the accounting portal (v4.0) brought among the other things, many bug fixes, extended FQAN-based views and new graphics. GOCDB functionality was also significantly extended with the support of virtual sites, new roles and permissions, scoping of Resource Centres and sites, and a hardened DNS-based failover configuration. The Service Availability Monitoring (SAM) underwent five different upgrades and is currently the largest and more distributed operational infrastructure comprising 32 distributed instances. The operations portal rolled to production new major components: the VO Dashboard and the Security Dashboard. In addition the VO management features were greatly enhanced.
The EGI Operations Level Agreement framework was considerably extended in PY2 with the first Resource Centre Operational Level Agreement, defining the target levels of the services provided by sites for resource access, and the Resource infrastructure Provider Operational Level Agreement, defining the target levels of the community services provided by the NGIs, which came into force in January 2012.
The EGI Operations Level Agreement framework was considerably extended in PY2 with the first Resource Centre Operational Level Agreement, defining the target levels of the services provided by sites for resource access, and the Resource infrastructure Provider Operational Level Agreement, defining the target levels of the community services provided by the NGIs, which came into force in January 2012.
A new set of catch-all services for monitoring for the monitoring of uncertified Resource Centres was rolled to production. EGEE legacy documentation pages were phased out, updated and migrated to the EGI wiki, three new operational procedures were approved and training and support pages were improved.
A new set of catch-all services for monitoring for the monitoring of uncertified Resource Centres was rolled to production. EGEE legacy documentation pages were phased out, updated and migrated to the EGI wiki, three new operational procedures were approved and training and support pages were improved.

Revision as of 15:22, 4 June 2012

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security



Executive Summary

SA1 was responsible of the continued operation and expansion of the production infrastructure. The transition started in PY1, which evolved the EGEE federated Operations Centre into independent NGIs, was completed. The total number of Resource Centres (RCs) in March 2011 amounts to 352 instances (+3.22% yearly increase). The installed capacity and Resource Centres grew considerably to comprise 270,800 logical cores (+30.7% yearly increase), 2.96 Million HEP-SPEC 06 (+49.5%), 139 PB of disk space (+31.4%) and 134.3 PB of tape (+50%).

EGI currently comprehends 27 national operations centres and 9 federated operations centres encompassing multiple NGIs. Availability and Reliability reached 94.50% and 95.42% (yearly average), which amounts to a +1% increase in PY2. Overall resource utilization has been satisfactorily progressing confirming the trends of PY1. The yearly increase of the total number of jobs executed in the infrastructure in the period May 2011-April 2012 amounts to +46.42% of the yearly job workload done from May 2010 to April 2011. The PY2 overall quantity of EGI computing resources used amounts to 10.5 Billion HEP-SPEC 06 Hours.

Operational security was run effectively during PY2 and ensured day-by-day security monitoring, and timely response in case of incidents. Security in EGI was reviewed following the PY1 reviewers’ suggestions, and documented in Deliverable D4.4. The EGI Security Threat Risk assessment team was formed. 75 threats in 20 categories were identified and an initial risk assessment and preliminary report was produced describing the assessment process, progress and initial findings. Specialized tools for incident response tracking and for streamlining of operational security tasks, were prototyped and rolled to production.

The Staged Rollout workflow introduced during PY1, is being progressively refined. The Staged Rollout infrastructure has been gradually expanding reflecting the deployment needs of VRCs and NGIs, and resources were reallocated to ensure testing of a broader range of products. The staged rollout infrastructure currently comprehends 60 Early Adopter teams.

The operations integration of GLOBUS, UNICORE, QosCosGrid and Desktop Grids were completed, with the exception of accounting, which requires further integration development. Extensions are being implemented in collaboration with the external technology providers.

GGUS was updated to decommission various legacy support units, and to add new ones for VO support, operations support and 3rd level support. A new report generator was designed and prototyped. GGUS FAQs were migrated to the EGI wiki, usability of the system was enhanced and GGUS was interfaced to a new helpdesk system (Service NOW). The GGUS failover configuration was hardened with auto-switching between different front-ends.

VO SAM, VO Admin Dashboard, and LFCBrowseSE are now mature systems supporting VO operations and being deployed by interested NGIs and/or VOs to assist them in VO daily operations and management. The first prototype of the VO Operations Portal – released by JRA1 and fully integrated into the Operations Portal – was deployed and feedback was provided to finally roll it to production.

Central Grid Oversight (COD) of EGI was responsible of the certification of new NGIs being created either as a result of legacy EGEE federated operations centres stopping operations, or because of new Resource Providers joining the infrastructure. COD was involved in training and dissemination activities, in follow-up of underperformance both at a Resource Centre and at a Resource Provider level, and in monitoring the instability of the distributed SAM infrastructure.

The EGI.eu central tools were significantly advanced. The first Metrics Portal was rolled to production in PQ6. The message broker network was repeatedly upgraded to improve the reliability of message delivery, stability, manageability and scalability. The transition from R-GMA to messaging of the accounting infrastructure was completed and a new central consumer based on ActiveMQ STOMP was deployed in pre-production. The Canopus release of the accounting portal (v4.0) brought among the other things, many bug fixes, extended FQAN-based views and new graphics. GOCDB functionality was also significantly extended with the support of virtual sites, new roles and permissions, scoping of Resource Centres and sites, and a hardened DNS-based failover configuration. The Service Availability Monitoring (SAM) underwent five different upgrades and is currently the largest and more distributed operational infrastructure comprising 32 distributed instances. The operations portal rolled to production new major components: the VO Dashboard and the Security Dashboard. In addition the VO management features were greatly enhanced.

The EGI Operations Level Agreement framework was considerably extended in PY2 with the first Resource Centre Operational Level Agreement, defining the target levels of the services provided by sites for resource access, and the Resource infrastructure Provider Operational Level Agreement, defining the target levels of the community services provided by the NGIs, which came into force in January 2012.

A new set of catch-all services for monitoring for the monitoring of uncertified Resource Centres was rolled to production. EGEE legacy documentation pages were phased out, updated and migrated to the EGI wiki, three new operational procedures were approved and training and support pages were improved.