Difference between revisions of "EGI-InSPIRE:UK-QR11"
(31 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
{{Template: | {{Template:EGI-Inspire menubar}} | ||
{{Template:Inspire_reports_menubar}} | |||
{{TOC_right}} | |||
{| class="wikitable" | {| class="wikitable" | ||
Line 33: | Line 34: | ||
| QMUL | | QMUL | ||
| London Impact and Dissemination | | London Impact and Dissemination | ||
| | | 9 | ||
| | | Discussions about increasing impact of NGI activities | ||
|- | |- | ||
| 16-17 January 2013 | | 16-17 January 2013 | ||
| Rome | | Rome | ||
| Security for Collaborating Infrastructures | | Security for Collaborating Infrastructures | ||
| | | 1 | ||
| | | Organised and chaired. Produced final version 1 of the document describing the requirements and best practices and considered self assessments against these criteris. http://indico.cern.ch/conferenceDisplay.py?confId=227273 | ||
|- | |- | ||
| 14 January 2013 | | 14 January 2013 | ||
| STFC RAL | | STFC RAL | ||
| UK NGI Meeting to discuss sustainability and plan post EGI Inspire | | UK NGI Meeting to discuss sustainability and plan post EGI Inspire | ||
| | | 11 | ||
| | | Involvement of UK NGI Management, UK funders and UK Global Service/task leaders to discuss UK NGI sustainability post EGI InSPIRE | ||
|} | |} | ||
Line 70: | Line 59: | ||
! Participants | ! Participants | ||
! Outcome (Short report & Indico URL) | ! Outcome (Short report & Indico URL) | ||
|- | |||
| 1 December 2012 | |||
| Manchester | |||
| Integrated cluster management Atlas | |||
| 2 | |||
| | |||
|- | |- | ||
| 10-12 December 2012 | | 10-12 December 2012 | ||
| CERN | | CERN | ||
| Atlas Software and Computing T1/T2/T3 Jamboree | | Atlas Software and Computing T1/T2/T3 Jamboree | ||
| 1 | |||
| 1 | |||
| | | | ||
|- | |- | ||
Line 86: | Line 75: | ||
| Rome | | Rome | ||
| EU Grid PMA Meeting | | EU Grid PMA Meeting | ||
| | | 1 | ||
| Representing interests of EGI and WLCG as a Relying Party https://indico.egi.eu/indico/conferenceTimeTable.py?confId=1252#20130128 | | Representing interests of EGI and WLCG as a Relying Party https://indico.egi.eu/indico/conferenceTimeTable.py?confId=1252#20130128 | ||
|- | |- | ||
| | | 17 - 18 December 2012 | ||
| | | FNAL Chicago | ||
| | | WLCG Security Coordination Meeting | ||
| | | 1 | ||
| | | Discussed all operational and policy issues. https://indico.cern.ch/conferenceDisplay.py?confId=221987 | ||
|- | |- | ||
| 28-30 January 2013 | | 28-30 January 2013 | ||
| Amsterdam | | Amsterdam | ||
| EGI Futures | | EGI Futures and co-located eFiscal Workshops | ||
| | | 6 | ||
| https://indico.egi.eu/indico/conferenceTimeTable.py?confId=1252#20130128 | | https://indico.egi.eu/indico/conferenceTimeTable.py?confId=1252#20130128 | ||
|- | |||
| 28 January - 1 February 2013 | |||
| CERN | |||
| LHCb Software Analysis Week | |||
| 1 | |||
| http://lhcb.web.cern.ch/lhcb/ | |||
|- | |||
| 12 December 2012 | |||
| Imperial College | |||
| NSCCS User Meeting | |||
| 2 | |||
| Dissemination about UK_NGI and wider landscape, and Training Marketplace tool, to a UK Computational Chemistry Community and TMP to the UK HPC "HeCTORR" team http://www.nsccs.ac.uk/UM2012.php | |||
|} | |} | ||
Line 122: | Line 120: | ||
! align="left" | Authors ''<br>1.<br>2.<br>3.<br>Et al?'' | ! align="left" | Authors ''<br>1.<br>2.<br>3.<br>Et al?'' | ||
|- | |- | ||
| | | Integrated cluster management at Manchester Tier2 | ||
| | | J. Phys. 2012 | ||
| | | Conf. Ser. <br>396 042039 | ||
| 1. | | 1. Andrew McNab;<br>2. Alessandra Forti;<br> | ||
|- | |- | ||
|} | |} | ||
== 2. ACTIVITY REPORT == <!--''Note: just report activities relevant to this Quarter.''--> | == 2. ACTIVITY REPORT == <!--''Note: just report activities relevant to this Quarter.''--> | ||
===2.1. Progress Summary=== <!-- Provide your test below --> | ===2.1. Progress Summary=== <!-- Provide your test below --> | ||
The UK NGI suffered two major power outages in Q11. The second outage, which was caused by voltage surges at the time of electrical maintenance work, resulted in a significant number of hardware failures. All services were brought back online in a timely manner and critial procurement of new hardware was enabled in a very short timeframe. The first incident was due to failure of the UPS generator to provide sustained power due to a power failure. Since then the issue with the generator has been found, addressed and tested. | |||
The UK community has been working closely with EGI Operations to provide a WN tarball for EMI with the decommissioning of glite WNs. | |||
The UK NGI is exploring opportunities for sustainability post EGI InSPIRE and has successfully bid into the EGI Miniproject proposal call. | |||
===2.2. Main Achievements=== <!-- | ===2.2. Main Achievements=== <!-- | ||
Provide your text below | Provide your text below | ||
--> | --> | ||
Procurement of resources to meet 2013 WLCG MoU commitments on track. Hardware delivery partially complete. Upgrade of batch farm to EMI-2 complete, more generally EMI-2 rollout on all services close to completion. FTS 3 testing continues. Test queues available for SL6 on batch farm. CVMFS Stratup-0 available for small VOs – currently working with NA62 and Mice. Change to batch farm configuration, hyperthreading now switched on and running more jobs than cores (cpu normalisation/accounting appropriatly adjusted). Preperation to upgrade Tier-1 network backbone underway and likely to be scheduled for early Q2. | |||
Imperial College has continued to maintain and develop the UK 'state of the nation' web-pages (http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html and http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html) tracking the roll out of EMI services in the UK. We submitted an EMI2-CREAM staged-rollout report and elsewhere in London UKI-LT2-Brunel submitted an EMI2-WN staged rollout report. We've also been helping UK sites to transition away from Glite to EMI. We have been contributing to the ROD rota. Now that an EMI tarball has become available we have been testing this at our site and sent feedback to the developer. | |||
gLite 3.1 retirement went without any problems at Glasgow. gLite 3.2 retirement is on target to remove all services before the end of January - with the majority of services moved to the EMI-2 release. DPM file balancing software being developed to alleviate issues with disk hot spotting. | |||
Alessandra Forti (MU) is now part of the WLCG middleware deployment and perfsonar deployment coordination task forces. Andrew has made three trips to CERN to undertake LHCb Grid Production shifts, to train for LHCb Grid Expert On Call (GEOC) shifts, to work with the ops team and LHCb DIRAC developers to produce site-orientated views of the LHCb job/site monitoring to make it easier for sites to resolve problems with LHCb jobs, and to begin co-ordinating a review of the DIRAC system and its interoperation with sites and resources in preparation for Long Shutdown 1 and the LHC and LHCb upgrades. Robert Frank has continued to work on updating the GridPP/NGS VOMS system as it transitions from NGS to GridPP operations. He and NGS Support Centre Manager are co-ordinating this with VOMS backup installations being set up at other GridPP sites (Oxford and Imperial). | |||
Oxford, leading the Federated Cloud Task Force in EGI report that the integration of cloud resources into the current EGI production infrastructure is proceeding as planned. Three new types of endpoints have been created within GOCDB and the resource providers that are contributing resources to the federation test bed of the task force are now registering their OCCI, CDMI and accounting endpoints. A SAM instance dedicated to cloud resources has been deployed. Data is retrieved from GOCDB and the state of the federated cloud resources is monitored thanks to a set of dedicated probes. Furthermore, the profile we created for the accounting usage records is undergoing a peer-review updating process while the cloud accounting infrastructure is being merged within the EGI production-grade APEL service. The integration of cloud resources within the EGI infrastructure is completed by making available two general-purpose OCCI clients that will allow every EGI user to access federated cloud resources in a transparent and standardised way. Two use cases have been successfully supported via the federated test bed and two more are in the pipeline. The work planned for the coming months include the set up of multiple demo for the upcoming EGI 2013 Community Forum in Manchester; the opening of the test bed to generic users; the use of the federation test bed as the back-end for multiple scientific portals | |||
=== 2.3. Issues and mitigation === | === 2.3. Issues and mitigation === | ||
Line 153: | Line 156: | ||
! scope="col" | Mitigation Description | ! scope="col" | Mitigation Description | ||
|- | |- | ||
| | | Two major power incidents at STFC RAL caused considerable disruption and in the second case hardware damage and severely impacted availability (SiRs available) | ||
| | | All services were brought back up efficiently and the incident confirmed procurement was possible at very short timescales | ||
|- | |||
| Stability problems with the (EMI-2) top level BDII | |||
| trying an upgrade to SL6 to see if that addresses the issue | |||
|- | |||
|Tarballs | |||
|We are still waiting for suitable glexec and UI tar balls | |||
|- | |||
|(Fed Clouds) relative lack of capabilities exposed by the available implementations of the OCCI management interfaces | |||
| establishing a close relationship with the development communities of the OCCI implementations | |||
|- | |||
|(Fed Clouds) supporting established user communities with a federation test bed as opposed to a production-grade federation of clouds | |||
| managing the user expectations and, more importantly, by involving the EGI technical support unit within the task force operations | |||
|- | |||
|(Glasgow) Ongoing issues with damaged Air Conditioning units | |||
|Full engineering review of data rooms 141 and 243d to redesign power delivery and air conditioning flows | |||
|} | |} | ||
Line 161: | Line 179: | ||
|- | |- | ||
| Issue Description || Issue mitigation | | Issue Description || Issue mitigation | ||
--> | --> | ||
Latest revision as of 13:15, 9 January 2015
EGI Inspire Main page |
Inspire reports menu: | Home • | SA1 weekly Reports • | SA1 Task QR Reports • | NGI QR Reports • | NGI QR User support Reports |
1. MEETINGS AND DISSEMINATION
Note: Complete the tables below by adding as many rows as needed.
1.1. CONFERENCES/WORKSHOPS ORGANISED
Quarterly Report Number | NGI Name | Partner Name | Author |
---|---|---|---|
QR 11 | NGI_UK | STFC | Denise Small and Claire Devereux |
Date | Location | Title | Participants | Outcome (Short report & Indico URL) |
---|---|---|---|---|
27 November 2012 | QMUL | London Impact and Dissemination | 9 | Discussions about increasing impact of NGI activities |
16-17 January 2013 | Rome | Security for Collaborating Infrastructures | 1 | Organised and chaired. Produced final version 1 of the document describing the requirements and best practices and considered self assessments against these criteris. http://indico.cern.ch/conferenceDisplay.py?confId=227273 |
14 January 2013 | STFC RAL | UK NGI Meeting to discuss sustainability and plan post EGI Inspire | 11 | Involvement of UK NGI Management, UK funders and UK Global Service/task leaders to discuss UK NGI sustainability post EGI InSPIRE |
1.2. OTHER CONFERENCES/WORKSHOPS ATTENDED
Date | Location | Title | Participants | Outcome (Short report & Indico URL) |
---|---|---|---|---|
1 December 2012 | Manchester | Integrated cluster management Atlas | 2 | |
10-12 December 2012 | CERN | Atlas Software and Computing T1/T2/T3 Jamboree | 1 | |
14-16 January 2013 | Rome | EU Grid PMA Meeting | 1 | Representing interests of EGI and WLCG as a Relying Party https://indico.egi.eu/indico/conferenceTimeTable.py?confId=1252#20130128 |
17 - 18 December 2012 | FNAL Chicago | WLCG Security Coordination Meeting | 1 | Discussed all operational and policy issues. https://indico.cern.ch/conferenceDisplay.py?confId=221987 |
28-30 January 2013 | Amsterdam | EGI Futures and co-located eFiscal Workshops | 6 | https://indico.egi.eu/indico/conferenceTimeTable.py?confId=1252#20130128 |
28 January - 1 February 2013 | CERN | LHCb Software Analysis Week | 1 | http://lhcb.web.cern.ch/lhcb/ |
12 December 2012 | Imperial College | NSCCS User Meeting | 2 | Dissemination about UK_NGI and wider landscape, and Training Marketplace tool, to a UK Computational Chemistry Community and TMP to the UK HPC "HeCTORR" team http://www.nsccs.ac.uk/UM2012.php |
1.3. PUBLICATIONS
Publication title | Journal / Proceedings title | Journal references Volume number Issue Pages from - to |
Authors 1. 2. 3. Et al? |
---|---|---|---|
Integrated cluster management at Manchester Tier2 | J. Phys. 2012 | Conf. Ser. 396 042039 |
1. Andrew McNab; 2. Alessandra Forti; |
2. ACTIVITY REPORT
2.1. Progress Summary
The UK NGI suffered two major power outages in Q11. The second outage, which was caused by voltage surges at the time of electrical maintenance work, resulted in a significant number of hardware failures. All services were brought back online in a timely manner and critial procurement of new hardware was enabled in a very short timeframe. The first incident was due to failure of the UPS generator to provide sustained power due to a power failure. Since then the issue with the generator has been found, addressed and tested.
The UK community has been working closely with EGI Operations to provide a WN tarball for EMI with the decommissioning of glite WNs.
The UK NGI is exploring opportunities for sustainability post EGI InSPIRE and has successfully bid into the EGI Miniproject proposal call.
2.2. Main Achievements
Procurement of resources to meet 2013 WLCG MoU commitments on track. Hardware delivery partially complete. Upgrade of batch farm to EMI-2 complete, more generally EMI-2 rollout on all services close to completion. FTS 3 testing continues. Test queues available for SL6 on batch farm. CVMFS Stratup-0 available for small VOs – currently working with NA62 and Mice. Change to batch farm configuration, hyperthreading now switched on and running more jobs than cores (cpu normalisation/accounting appropriatly adjusted). Preperation to upgrade Tier-1 network backbone underway and likely to be scheduled for early Q2.
Imperial College has continued to maintain and develop the UK 'state of the nation' web-pages (http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html and http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html) tracking the roll out of EMI services in the UK. We submitted an EMI2-CREAM staged-rollout report and elsewhere in London UKI-LT2-Brunel submitted an EMI2-WN staged rollout report. We've also been helping UK sites to transition away from Glite to EMI. We have been contributing to the ROD rota. Now that an EMI tarball has become available we have been testing this at our site and sent feedback to the developer.
gLite 3.1 retirement went without any problems at Glasgow. gLite 3.2 retirement is on target to remove all services before the end of January - with the majority of services moved to the EMI-2 release. DPM file balancing software being developed to alleviate issues with disk hot spotting.
Alessandra Forti (MU) is now part of the WLCG middleware deployment and perfsonar deployment coordination task forces. Andrew has made three trips to CERN to undertake LHCb Grid Production shifts, to train for LHCb Grid Expert On Call (GEOC) shifts, to work with the ops team and LHCb DIRAC developers to produce site-orientated views of the LHCb job/site monitoring to make it easier for sites to resolve problems with LHCb jobs, and to begin co-ordinating a review of the DIRAC system and its interoperation with sites and resources in preparation for Long Shutdown 1 and the LHC and LHCb upgrades. Robert Frank has continued to work on updating the GridPP/NGS VOMS system as it transitions from NGS to GridPP operations. He and NGS Support Centre Manager are co-ordinating this with VOMS backup installations being set up at other GridPP sites (Oxford and Imperial).
Oxford, leading the Federated Cloud Task Force in EGI report that the integration of cloud resources into the current EGI production infrastructure is proceeding as planned. Three new types of endpoints have been created within GOCDB and the resource providers that are contributing resources to the federation test bed of the task force are now registering their OCCI, CDMI and accounting endpoints. A SAM instance dedicated to cloud resources has been deployed. Data is retrieved from GOCDB and the state of the federated cloud resources is monitored thanks to a set of dedicated probes. Furthermore, the profile we created for the accounting usage records is undergoing a peer-review updating process while the cloud accounting infrastructure is being merged within the EGI production-grade APEL service. The integration of cloud resources within the EGI infrastructure is completed by making available two general-purpose OCCI clients that will allow every EGI user to access federated cloud resources in a transparent and standardised way. Two use cases have been successfully supported via the federated test bed and two more are in the pipeline. The work planned for the coming months include the set up of multiple demo for the upcoming EGI 2013 Community Forum in Manchester; the opening of the test bed to generic users; the use of the federation test bed as the back-end for multiple scientific portals
2.3. Issues and mitigation
Issue Description | Mitigation Description |
---|---|
Two major power incidents at STFC RAL caused considerable disruption and in the second case hardware damage and severely impacted availability (SiRs available) | All services were brought back up efficiently and the incident confirmed procurement was possible at very short timescales |
Stability problems with the (EMI-2) top level BDII | trying an upgrade to SL6 to see if that addresses the issue |
Tarballs | We are still waiting for suitable glexec and UI tar balls |
(Fed Clouds) relative lack of capabilities exposed by the available implementations of the OCCI management interfaces | establishing a close relationship with the development communities of the OCCI implementations |
(Fed Clouds) supporting established user communities with a federation test bed as opposed to a production-grade federation of clouds | managing the user expectations and, more importantly, by involving the EGI technical support unit within the task force operations |
(Glasgow) Ongoing issues with damaged Air Conditioning units | Full engineering review of data rooms 141 and 243d to redesign power delivery and air conditioning flows |