Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:Ibergrid-QR8"

From EGIWiki
Jump to navigation Jump to search
 
(15 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<!--  
{{Template:EGI-Inspire menubar}}
Fill the second line of the table replacing the <...> stuff with your data.
 
-->
{{Template:Inspire_reports_menubar}}
_NOTOC__
{{TOC_right}}
{| border="1" cellspacing="0" cellpadding="2"
{| border="1" cellspacing="0" cellpadding="2"
|-
|-
Line 52: Line 52:


|-
|-
|19-23 September||Lyon, France||EGI Technical Forum 2011|| ~ 15||  
|28-Jan to 1-Feb||LBL Berkeley (USA)||LHCOPN and LHCONE joint meeting||1||
* Participation from several IBERGRID institutes with contributions in different sessions. Good meeting point to see the status of the middle-ware and to see new developed applications. It was also good to hear about how other NGIs are progressing, the main requirements from their ROD teams, perpectives and difficulties during the "Grid Oversight" session. http://tf2011.egi.eu/
* ifae/pic: Follow up of activities of the networking group for the LHC. The architecture of the new LHCONE dedicated network extended to Tier2s is being decided now.
* CESGA: Contribution with the session "EGI Operations Management Board" presenting the EGI-InSPIRE Metrics Portal Demo, https://www.egi.eu/indico/conferenceDisplay.py?confId=434
https://indico.cern.ch/conferenceDisplay.py?confId=160533
* IFISC-GRID: Contribution with the session "Science Gateways & Portals" presenting a Web interface for generic grid jobs, Web4Grid, https://www.egi.eu/indico/contributionDisplay.py?contribId=55&confId=452
|-
* PIC: Contribution with the session "Resource Centre Forum" presenting a report about PIC, https://www.egi.eu/indico/contributionDisplay.py?sessionId=52&contribId=382&confId=452
|13-14 February||Brussels||Cloud Scape IV||2||
* IAA-CSIC: With over 100 participants from the scientific community, industry, standards organizations and policy makers, representatives from 50 EC-funded projects, over 300 tweets, 45 new online community members since the event, and the event streamed live on the SIENA Channel, Cloudscape IV lived up to all expectations and continued its successful four-year run, http://www.sienainitiative.eu/Pages/Static.aspx?id_documento=2543465f-ee8c-472c-8720-de8f7f5204cb
|-
|15th February||EVO meeting||EGI Security Threat Risk Assessment meeting||2||
* RedIRIS: The main purpose of the meeting was to establish the granularity of the threats and get the threat selection under way, including allocating work, https://www.egi.eu/indico/conferenceDisplay.py?confId=816
|-
|23th March||EVO meeting||EGI Security Threat Risk Assessment meeting||2||
* RedIRIS: The main aim of this meeting was to finalize the list of threats as much as possible and decide how to further proceed.
|-
|26-30 March||Garching, Munich||EGI Community Forum 2012||~ 19||  
* Participation from several IBERGRID institutes with contributions in different sessions:
* BIFI-UNIZAR: Gateways development and relationships with SCI-BUS project.
* CESGA: Contribution with the sessions "MPI in EGI", https://www.egi.eu/indico/contributionDisplay.py?contribId=220&confId=679 , and "Hybrid Cloud-based Grid Infraestructure: Experience & Future", https://www.egi.eu/indico/contributionDisplay.py?contribId=77&confId=679
* CETA-Ciemat: Contribution with the session "gridCake & gridCamp: Making the Grid easier", https://www.egi.eu/indico/contributionDisplay.py?sessionId=33&contribId=76&confId=679
* IFISC-GRID: Contribution with the session "Web4Grid, web interface for grid jobs" presenting a Web interface for generic grid jobs, Web4Grid, https://www.egi.eu/indico/contributionDisplay.py?sessionId=33&contribId=59&confId=679
* PIC: Contribution with the session “VO Support WG proposal” in the Resource Centre Forum (workshop), https://www.egi.eu/indico/sessionDisplay.py?sessionId=6&tab=contribs&confId=679
|-
|10-13 April||Bern||EuroSys 2012||1||
* IAA-CSIC: EuroSys has become a premier forum for discussing various issues of systems software research and development, including implications related to hardware and applications, http://eurosys2012.unibe.ch/program/conference
|-
|-
|12-16 September||Lisbon, Portugal||Timbus F2F meeting||5||
|17-19 April||DESY, Zeuthen (Germany)||6th International dCache Workshop ||3||
* LIP-Lisbon: Presentation of LIP-Lisbon activity in the EGI  framework in the TIMBUS F2F activity. TIMBUS project home page: http://timbusproject.net/
* CIEMAT-LCG2: A lot about new dCache functionalities (2.2.0) was learned, http://indico.desy.de/conferenceDisplay.py?confId=5289
* ifae/pic: Yearly meeting of the dCache system, the one used at PIC to provide Mass Storage Services. A talk was given by PIC related to monitoring, “PIC: Nagios probes”, https://indico.desy.de/conferenceOtherViews.py?view=standard&confId=5289
|-
|-
|19-23 September||Santander||XXXIII Bienal de la Real Sociedad Española de Física|| 2||“Aplicaciones de Física utilizando la Infraestructura de e-Ciencia del IFIC” http://indico.ific.uv.es/indico/materialDisplay.py?contribId=0&materialId=paper&confId=418
|20th April||EVO meeting||EGI Security Assessment group meeting||2||
* RedIRIS: The main aim of this meeting was to discuss threats with deviating opinion on risk and plans for final report.
|-
|-
|26-27 Sep||Amsterdam||LHCOPN-LHCONE joint meeting|| 1||
|23-24 April||Bologna||EGI-CSIRT Face to Face meeting||1||
* IFAE: Reports on the status of LHCOPN and  LHCONE infrastructures. Emphasis on the monitoring tools for LHCOPN such as Perfsonar-PS and also in the deployment of the new infrastructure for Tier2s LHCONE.
* RedIRIS: Review of the actions and task force which EGI-CSIRT is involved, besides a couple of hands-on were given during the meeting, one about forensic analysis and the second one about handling incidents using RTIR.
http://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=149042
https://www.egi.eu/indico/conferenceTimeTable.py?confId=812.
|-
|-
|12 October||Cern, Switzerland||Staged Rollout Process||1||
|23-27 April||Prague||HEPIX Spring 2012 Workshop||1||
* LIP-Lisbon: Presentation of the EGI Stage Rollout process at the October GDB aiming to involved WLCG more deeply in the process: http://indico.cern.ch/conferenceDisplay.py?confId=106649
* ifae/pic: Contribution “PIC Site Report” and attendance to the whole HEPIX workshop.
https://indico.cern.ch/contributionDisplay.py?contribId=16&sessionId=1&confId=160737
|-
|-
|13-14 October||Barcelona||HPC SysAdmin Meeting'11|| 1||
|25 April||CERN||Many-core architectures for LHCb||1||
* IFAE: Presentation on the use of Puppet at PIC as a tool for  services configuration automation and management.
* ifae/pic: Follow-up of the LHC experiment software framework evolution to make efficient use of many core processor architectures.
http://hpc.xrqtc.org/index.php/conference-training/anual-meeting
https://indico.cern.ch/conferenceDisplay.py?confId=184092
|}
|}


Line 88: Line 109:
|http://cf2012.egi.eu/exhibition/posters_and_demos.html
|http://cf2012.egi.eu/exhibition/posters_and_demos.html
|José Miguel Franco Valiente, César Suárez Ortega, Manuel Rubio Del Solar, Jorge Sevilla Cedillo  
|José Miguel Franco Valiente, César Suárez Ortega, Manuel Rubio Del Solar, Jorge Sevilla Cedillo  
|-
|-
|StoRM-ganglia monitoring module
|EGI Community Forum 2012 (26-30 March 2012). Poster.
|http://cf2012.egi.eu/exhibition/posters_and_demos.html
|M. David et al
|-
|-
|-
|-
Line 103: Line 130:


===2.1. Progress Summary===
===2.1. Progress Summary===
# LCMAPS message report sent to SA1 coordinators, and introduced in RT ticket: https://rt.egi.eu/rt/Ticket/Display.html?id=1983
# Operations/Platform Deployment Survey was distributed between the Ibergrid Sites, then the results were collected and the summary was introduced in the EGI wiki on the following link, https://wiki.egi.eu/wiki/Operations/Platform_Deployment_Survey.
# Internal survey to the Ibergrid sites about evaluation of the quality of services and tools at disposal of the site administrators. Analysis and discussion of the survey results in the weekly Ibergrid Operations meeting.
# Added a third primary DNS from RedIRIS site to the TopBDII HA mechanism. This third primary DNS, elsuper.rediris.es, was put on production during March 2012.
# New operations Wiki: http://ibergrid.lip.pt.
# Made a report with the different TOP-BDIIs used by the Ibergrid sites in their services.
# Analysis of the 1st EGI review report.  
# Closed the GGUS tickets which had been open to the Ibergrid sites to configure the VOMS redundancy for the Ibergrid VOS (GGUS tickets from #76214 hasta to #76232).
# Push sites to implement VOMS redundancy for the Ibegrid VOs
# Open the GGUS tickets from #81617 to #81625 to the Ibergrid sites to support the Ibergrid macro VOs and the VOMS redundancy for these VOs correctly.
# Implementation of the TopBDII HA proposal in NGI_IBERGRID. Process is being tracked via GGUS ticket: https://ggus.eu/ws/ticket_info.php?ticket=74883
# Working in the RT-GGUS integration and in the User support shifts. GGUS, Security, and General Support queues in the Ibergrid RT ticket system. Created two new email addresses, helpdesk@ibergrid.eu address to submit tickets to the Ibergrid RT ticket system, and  ibergrid-support@listas.cesga.es is a mailing list to be used for the communications between the user support shifts teams.  
# Decommission the SWE helpdesk. GGUS is no longer working with SWE helpdesk. Now, when a ticket is opened to a site in the Ibergrid NGI, a notification is sent to the ibergrid-tickets@listas.cesga.es mailing list, and an additional notification is submitted to the site-administrators for the site via the site support email declared in GOCDB.
# Support CESGA in the site renaming process from CESGA-EGEE to CESGA. Draft a site renaming procedure to be used generally (https://wiki.egi.eu/wiki/Draft_PROC)


===2.2. Main Achievements===
===2.2. Main Achievements===
# Two new sites were joined to the infrastructure:  ARAGRID-CIENCIAS and RC-GISELA-CETA.
# Enforcement of the TopBDII HA mechanism at local sites.
# It has been installed a dedicated WMS to support the Ibergrid VOs.
# 100% A/R in February, March and April 2012 for the TopBDII service (after the implementation of the TopBDII HA mechanism).
# A new site was joined to the Ibergrid infrastructure: CETA-GRID.  


===2.3. Issues and mitigation===
===2.3. Issues and mitigation===
Line 120: Line 150:
!scope="col"| Issue Description
!scope="col"| Issue Description
!scope="col"| Mitigation Description
!scope="col"| Mitigation Description
|-
|UOGRID site is still suspended.
|This site entering in suspension status after 1 month of downtime during the last QR5.
|-  
|-  
|There was several issues related with the Ibergrid Regional Nagios:
|There was several issues related with the Ibergrid Regional Nagios:
* There was an external issue related with the connection between RedIris and the Galicia centers), which affected to the Ibergrid Regional Nagios. The incident had place on 08/09/2011, and it took from 01:00 AM CEST to 13:00 PM CEST. GGUS ticket #74146 was open in order to do not take into account that period in the A/R metrics.  
* The request on the VOMS OPS server for the digital certificate that it is being used to submit the Ibergid Nagios tests expired the Friday 30th March at night, and it was restarted  the Saturday 31th March in the morning. GGUS ticket #80793 was open in order to do not take into account that period in the A/R metrics, but the re-computation was rejected since the impact was very low (less than 1.5%).
* There was an external issue related with the top-bdii configured in the R-Nagios, which affected to the Ibergrid Regional Nagios. The incident had place on 19/09/2011, and it took from 22:00 PM CEST to 12:00 AM CEST (20/09/2011). GGUS ticket #74652 was open in order to do not take into account that period in the A/R metrics.  
* There was an external issue related with the connection between the Spanish and Portuguese NRENs to the GÉANT network, which affected to the Ibergrid Regional Nagios. The incident had place on 04/12/2012, and it took from 17:00 PM CEST to 04/23/2012 at 14:00 PM CEST. GGUS ticket #81227 was open in order to do not take into account that period in the A/R metrics.  
* It was necessary to reinstall the Ibergrid Regional Nagios, due to an issue related with the Mysql. GGUS ticket #75701 was open in order to do not take into account that period in the A/R metrics.  
| Recomputation already processed and ticket is closed and verified.
|
* https://ggus.eu/tech/ticket_show.php?ticket=81227
* https://ggus.eu/tech/ticket_show.php?ticket=81235;
|-
| Several crashes of the regional operation portal at the time part of SAM infrastructure was decommissioned
| Reported to Operation Portal staff. After some patches, the problem dissapeared
|-
| IBERGRID VOMS was hit by the VOMS-ADMIN membership renewal bug
| Issues was solved with the help of the developers.
* https://ggus.eu/tech/ticket_show.php?ticket=77913
* https://ggus.eu/tech/ticket_show.php?ticket=78178
|}
|}
<!--
<!--

Latest revision as of 19:06, 9 January 2015

EGI Inspire Main page


Inspire reports menu: Home SA1 weekly Reports SA1 Task QR Reports NGI QR Reports NGI QR User support Reports



Quarterly Report Number NGI Name Partner Name Author
QR8 Ibergrid LIP & CSIC Álvaro Simón García, Esteban Freire García (CSIC)


1. MEETINGS AND DISSEMINATION

1.1. CONFERENCES/WORKSHOPS ORGANISED

Date Location Title Participants Outcome (Short report & Indico URL)


1.2. OTHER CONFERENCES/WORKSHOPS ATTENDED

Date Location Title Participants Outcome (Short report & Indico URL)
28-Jan to 1-Feb LBL Berkeley (USA) LHCOPN and LHCONE joint meeting 1
  • ifae/pic: Follow up of activities of the networking group for the LHC. The architecture of the new LHCONE dedicated network extended to Tier2s is being decided now.

https://indico.cern.ch/conferenceDisplay.py?confId=160533

13-14 February Brussels Cloud Scape IV 2
15th February EVO meeting EGI Security Threat Risk Assessment meeting 2
23th March EVO meeting EGI Security Threat Risk Assessment meeting 2
  • RedIRIS: The main aim of this meeting was to finalize the list of threats as much as possible and decide how to further proceed.
26-30 March Garching, Munich EGI Community Forum 2012 ~ 19
10-13 April Bern EuroSys 2012 1
17-19 April DESY, Zeuthen (Germany) 6th International dCache Workshop 3
20th April EVO meeting EGI Security Assessment group meeting 2
  • RedIRIS: The main aim of this meeting was to discuss threats with deviating opinion on risk and plans for final report.
23-24 April Bologna EGI-CSIRT Face to Face meeting 1
  • RedIRIS: Review of the actions and task force which EGI-CSIRT is involved, besides a couple of hands-on were given during the meeting, one about forensic analysis and the second one about handling incidents using RTIR.

https://www.egi.eu/indico/conferenceTimeTable.py?confId=812.

23-27 April Prague HEPIX Spring 2012 Workshop 1
  • ifae/pic: Contribution “PIC Site Report” and attendance to the whole HEPIX workshop.

https://indico.cern.ch/contributionDisplay.py?contribId=16&sessionId=1&confId=160737

25 April CERN Many-core architectures for LHCb 1
  • ifae/pic: Follow-up of the LHC experiment software framework evolution to make efficient use of many core processor architectures.

https://indico.cern.ch/conferenceDisplay.py?confId=184092


1.3. PUBLICATIONS

Publication title Journal / Proceedings title Journal references
Volume number
Issue

Pages from - to
Authors
1.
2.
3.
Et al?
DRI: Data and Job Management on the Grid EGI Community Forum 2012 (26-30 March 2012). Poster. http://cf2012.egi.eu/exhibition/posters_and_demos.html José Miguel Franco Valiente, César Suárez Ortega, Manuel Rubio Del Solar, Jorge Sevilla Cedillo
StoRM-ganglia monitoring module EGI Community Forum 2012 (26-30 March 2012). Poster. http://cf2012.egi.eu/exhibition/posters_and_demos.html M. David et al
Web Interface for Generic Grid Jobs Computing and informatics Vol 31, 2012, No.1 p173-186 Antònia Tugores, Pere Colet


2. ACTIVITY REPORT

2.1. Progress Summary

  1. Operations/Platform Deployment Survey was distributed between the Ibergrid Sites, then the results were collected and the summary was introduced in the EGI wiki on the following link, https://wiki.egi.eu/wiki/Operations/Platform_Deployment_Survey.
  2. Added a third primary DNS from RedIRIS site to the TopBDII HA mechanism. This third primary DNS, elsuper.rediris.es, was put on production during March 2012.
  3. Made a report with the different TOP-BDIIs used by the Ibergrid sites in their services.
  4. Closed the GGUS tickets which had been open to the Ibergrid sites to configure the VOMS redundancy for the Ibergrid VOS (GGUS tickets from #76214 hasta to #76232).
  5. Open the GGUS tickets from #81617 to #81625 to the Ibergrid sites to support the Ibergrid macro VOs and the VOMS redundancy for these VOs correctly.
  6. Working in the RT-GGUS integration and in the User support shifts. GGUS, Security, and General Support queues in the Ibergrid RT ticket system. Created two new email addresses, helpdesk@ibergrid.eu address to submit tickets to the Ibergrid RT ticket system, and ibergrid-support@listas.cesga.es is a mailing list to be used for the communications between the user support shifts teams.
  7. Decommission the SWE helpdesk. GGUS is no longer working with SWE helpdesk. Now, when a ticket is opened to a site in the Ibergrid NGI, a notification is sent to the ibergrid-tickets@listas.cesga.es mailing list, and an additional notification is submitted to the site-administrators for the site via the site support email declared in GOCDB.
  8. Support CESGA in the site renaming process from CESGA-EGEE to CESGA. Draft a site renaming procedure to be used generally (https://wiki.egi.eu/wiki/Draft_PROC)

2.2. Main Achievements

  1. Enforcement of the TopBDII HA mechanism at local sites.
  2. 100% A/R in February, March and April 2012 for the TopBDII service (after the implementation of the TopBDII HA mechanism).
  3. A new site was joined to the Ibergrid infrastructure: CETA-GRID.

2.3. Issues and mitigation

Issue Description Mitigation Description
There was several issues related with the Ibergrid Regional Nagios:
  • The request on the VOMS OPS server for the digital certificate that it is being used to submit the Ibergid Nagios tests expired the Friday 30th March at night, and it was restarted the Saturday 31th March in the morning. GGUS ticket #80793 was open in order to do not take into account that period in the A/R metrics, but the re-computation was rejected since the impact was very low (less than 1.5%).
  • There was an external issue related with the connection between the Spanish and Portuguese NRENs to the GÉANT network, which affected to the Ibergrid Regional Nagios. The incident had place on 04/12/2012, and it took from 17:00 PM CEST to 04/23/2012 at 14:00 PM CEST. GGUS ticket #81227 was open in order to do not take into account that period in the A/R metrics.
Recomputation already processed and ticket is closed and verified.
Several crashes of the regional operation portal at the time part of SAM infrastructure was decommissioned Reported to Operation Portal staff. After some patches, the problem dissapeared
IBERGRID VOMS was hit by the VOMS-ADMIN membership renewal bug Issues was solved with the help of the developers.