Agenda-13-06-2016
General information
- the Operations meeting will be on the 2nd Monday of the month
- the EGI Operations Meeting schedule for first half of 2016 is available on Indico: https://indico.egi.eu/indico/categoryDisplay.py?categId=32 and on the new summary page: https://wiki.egi.eu/wiki/Operations_Meeting
UMD/CMD
Staged rollout updates
Preview repository
on 2016-05-17 released:
- preview 1.2.0
- LCMAPS-plugins-vo-ca-ap 0.0.1-1
- STORM 1.11.11
- Preview 2.1.0
- NorduGrid ARC 15.03 update 6
- LCMAPS-plugins-vo-ca-ap 0.0.1-1
Generic information about Preview repository: https://wiki.egi.eu/wiki/Preview_Repository
Note: EGI provides the preview repository without any additional quality assurance process, but the products are released as they are provided by the product team. EGI recommends the use of the UMD repositories, which contain software verified through the quality assurance process of UMD.
Operations
EGI Operations Support activities stopped
- Operations Support core activity has not been re-bid in the phase 2 of the EGI core activities
- all Operations Support activities have been moved to the EGI.eu Operations
- all the operational procedures involving operations support have been updated pointing to EGI operations. Please, let us know if we
missed to update any documents.
- The operations support support unit in GGUS has been decommissioned. Please, use the Operations support unit instead from now on.
Monthly Availability/Reliability
- AfricaArabia https://ggus.eu/?mode=ticket_info&ticket_id=117094: main problems with the monitoring system, waiting for the release of the central one
- ASRT
- DZ-01-ARN (recovered)
- EG-ZC-T3: unresponsive since too months, must be suspended
- ZA-UJ
- AsiaPacific: (since February) https://ggus.eu/index.php?mode=ticket_info&ticket_id=121222
- IN-DAE-VECC-02 (miscellaneous issues)
- MY-UPM-BIRUNI-01
- NGI_DE: https://ggus.eu/?mode=ticket_info&ticket_id=121975
- UNI-SIEGEN-HEP
- NGI_HR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=120573
- egee.fesb.hr issue with SE element which affected the whole NGI: situation improved, they are planning to decommission it during this year.
- NGI_IL: (since last month) https://ggus.eu/index.php?mode=ticket_info&ticket_id=121223
IL_IUCC_IG: suspended on June 6th
NGI_MARGI https://ggus.eu/index.php?mode=ticket_info&ticket_id=118465 no monitoring data since January
- NGI_MD: https://ggus.eu/index.php?mode=ticket_info&ticket_id=120578
- the only site MD-02-IMI was suspended in March for security reasons, asked for news
- NGI_NDGF: https://ggus.eu/index.php?mode=ticket_info&ticket_id=121985
- EENet problem with the probe
Decommissioning SL5
NGIs argus server not properly configured
FedCloud status
A/R Profile | March | April | May |
improvements | 2 | 6 | 5 |
unchanged | 11 | 7 | 5 |
worsening | 9 | 10 | 12 |
- CYFRONET-CLOUD (+100%): in the old profile it fails the accounting test
- GoeGRID (+80.7%): in the old profile it fails the cdmi test
- TR-FC1-ULAKBIM (+47.59%): it was failing the accounting test in the old profile
- HG-09-Okeanos-Cloud: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122012 (SOLVED, updated the cert)
- failures with the probes:
- eu.egi.cloud.OCCI-Context-ops: CATEGORIES CRITICAL - SSL_connect returned=1 errno=0 state=error: certificate verify failed
- eu.egi.cloud.OCCI-VM-ops: CRITICAL - SSL connection with "https://okeanos-occi2.hellasgrid.gr:9000/" could not be established! SSL_connect
- MK-04-FINKICLOUD unreachable
- NCG-INGRID-PT (+26.74%): https://ggus.eu/index.php?mode=ticket_info&ticket_id=122013 (a new server are going to be put in production, decommissioning the old one)
- failures mainly with the cloud probes:
- eu.egi.cloud.OCCI-VM-ops (sometimes warning, sometimes critical): WARNING - "http://aurora.ncg.ingrid.pt:8787" failed to instantiate a COMPUTE instance in the given timeframe! Timeout: 300s
- eu.egi.cloud.OpenStack-VM-ops: Critical: could not fetch flavor ID, endpoint does not correctly exposes available flavors: 110 Connection timed out
- SCAI (-21.61%) https://ggus.eu/index.php?mode=ticket_info&ticket_id=122015 (CAs not completely updated)
- some repeated failures with the CA probes
- also eu.egi.cloud.OCCI-VM-ops CRITICAL - Unexpected response from https://fc.scai.fraunhofer.de:8787/! Net::HTTP::Post failed! HTTP Response status: [500] Internal Server Error : The server has either erred or is incapable of performing the requested operation.
- UPV-GRyCAP (-24.56) https://ggus.eu/index.php?mode=ticket_info&ticket_id=122014 (SOLVED, CAs updated)
- it is still failing the eu.egi.OCCI-IGTF probe
- org.nagios.OCCI-TCP: 05-11-2016 17:56:27 Connection refused
AOB
Next meeting
- 11 Jul 2016 https://indico.egi.eu/indico/event/3003/