Difference between revisions of "Agenda-14-03-2016"
Jump to navigation
Jump to search
Line 53: | Line 53: | ||
* NGI_UK | * NGI_UK | ||
** 100IT (OpenStack) | ** 100IT (OpenStack) | ||
*** vmcatcher issues https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358#update#19 | *** vmcatcher issues https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358#update#19 '''IN PROGRESS''' | ||
*** BDII and GOCDB have different Endpoint URLs https://ggus.eu/index.php?mode=ticket_info&ticket_id=119002#update#5 | *** BDII and GOCDB have different Endpoint URLs https://ggus.eu/index.php?mode=ticket_info&ticket_id=119002#update#5 '''FIXED''' | ||
* NGI_PL | * NGI_PL | ||
** CYFRONET-CLOUD (OpenStack) | ** CYFRONET-CLOUD (OpenStack) | ||
*** VMCatcher https://ggus.eu/index.php?mode=ticket_info&ticket_id=116363#update#29 | *** VMCatcher https://ggus.eu/index.php?mode=ticket_info&ticket_id=116363#update#29 '''IN PROGRESS''' | ||
* NGI_DE | * NGI_DE | ||
** GoeGrid (OpenNebula) | ** GoeGrid (OpenNebula) | ||
*** OCCI, VMCatcher https://ggus.eu/index.php?mode=ticket_info&ticket_id=119003 https://ggus.eu/index.php?mode=ticket_info&ticket_id=116365 | *** OCCI, VMCatcher https://ggus.eu/index.php?mode=ticket_info&ticket_id=119003 https://ggus.eu/index.php?mode=ticket_info&ticket_id=116365 '''IN PROGRESS''' | ||
* NGI_GRNET | * NGI_GRNET | ||
** HG-09-Okeanos-Cloud (Synnefo) | ** HG-09-Okeanos-Cloud (Synnefo) | ||
*** VMCatcher, issue with large metadata, on hold (it requires some development) https://ggus.eu/index.php?mode=ticket_info&ticket_id=116368 | *** VMCatcher, issue with large metadata, on hold (it requires some development) https://ggus.eu/index.php?mode=ticket_info&ticket_id=116368 '''ON HOLD''' | ||
* NGI_TR | * NGI_TR | ||
** TR-FC1-ULAKBIM (OpenStack) | ** TR-FC1-ULAKBIM (OpenStack) | ||
*** Missing GLUE2DomainID and image description looks wrong https://ggus.eu/index.php?mode=ticket_info&ticket_id=119005#update#15 | *** Missing GLUE2DomainID and image description looks wrong https://ggus.eu/index.php?mode=ticket_info&ticket_id=119005#update#15 '''IN PROGRESS''' | ||
=== Getting help on issues === | === Getting help on issues === |
Revision as of 14:04, 14 March 2016
General information
- the Operations meeting will be on the 2nd Monday of the month
- the EGI Operations Meeting schedule for first half of 2016 is available on Indico: https://indico.egi.eu/indico/categoryDisplay.py?categId=32 and on the new summary page: https://wiki.egi.eu/wiki/Operations_Meeting
News from URT
- A Critical bug which causes file loss.has been discovered on the DPM dmlite-shell new drain command released in DPM 1.8.10. One site in production has been affected https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite/Shell#Newfunctionality:Drain
- broadcast sent on March 10th
- if you have run the new drain commands at your site, contact the DPM Development team through GGUS (data consistency check is needed)
- DO NOT use the new drain commands (documented at https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite/Shell#Newfunctionality:Drain) and until the fixed components are released please continue to use the old dpm-drain command
- UMD 3.14.1 releae notes updated
Staged rollout updates
- frontier-squid 2.7.25
- voms-admin 3.4.1
- storm 1.8.10
Next releases
Operational issues
Globus GSI clients moving to STRICT_RFC2818 by default
- the release of the update that will change the default name compatibility mode from "HYBRID" to "STRICT_RFC2818" is planned for April 1, 2016.
- EGI Broadcast sent in August already warning about the change, already advising "site managers to make sure that all the hostnames and aliases used to connect to a service are included in its host certificate Subject Alternative Name field, at the latest by the end of the year"
- sites that could be affected by this future change are the ones running services whose clients may use globus-gssapi-gsi for authentication (CE, FTS, SRM, GridFTP, MyProxy, WMS) and using DNS aliases which are not included within the SAN (Subject Alternative Name) field of the certificate (including the host name itself)
Aligning Fedcloud sites to the A/R procedures
- EGI Operations proposal to align Fedcloud sites to the A/R related procedures used for the grid sites
- based on the availability reliability of monitored services in cloudmon, EGI Operations will start follow up with underperforming sites as we are doing for every grid sites
- sites will NOT be suspended for a/r performance at least until end of May
- in parallel EGI Operations will start PROC08 to include cloud probes in the EGI_CRITICAL and EGI profiles used for A/R computations (IN PROGRESS)
The proposed timeline is:
- February 2016:
- EGI Operations will check the status of the production cloud services in order to understand which issues (if any) the site has and provide help to NGIs and sites;
- Start of the integration of cloud probes in EGI CRITICAL profile(current set+openstack): To be agreed with the ARGO team, PROC08 will be followed
- June 2016:
- Starting notification of sites eligible for suspension
FedCloud status
Issues at cloud sites
Grouped by NGI, please follow up with sites.
- NGI_UK
- 100IT (OpenStack)
- vmcatcher issues https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358#update#19 IN PROGRESS
- BDII and GOCDB have different Endpoint URLs https://ggus.eu/index.php?mode=ticket_info&ticket_id=119002#update#5 FIXED
- 100IT (OpenStack)
- NGI_PL
- CYFRONET-CLOUD (OpenStack)
- VMCatcher https://ggus.eu/index.php?mode=ticket_info&ticket_id=116363#update#29 IN PROGRESS
- CYFRONET-CLOUD (OpenStack)
- NGI_DE
- GoeGrid (OpenNebula)
- NGI_GRNET
- HG-09-Okeanos-Cloud (Synnefo)
- VMCatcher, issue with large metadata, on hold (it requires some development) https://ggus.eu/index.php?mode=ticket_info&ticket_id=116368 ON HOLD
- HG-09-Okeanos-Cloud (Synnefo)
- NGI_TR
- TR-FC1-ULAKBIM (OpenStack)
- Missing GLUE2DomainID and image description looks wrong https://ggus.eu/index.php?mode=ticket_info&ticket_id=119005#update#15 IN PROGRESS
- TR-FC1-ULAKBIM (OpenStack)
Getting help on issues
- the whole FedCloud wiki has been reviewed, removing redundancies, updating links and instructions
- from the operations point of view: https://wiki.egi.eu/wiki/Federated_Cloud_resource_providers_support
- see in particular the manual for the installation of a cloud site
- TBD: review the support units associated with FedCloud (in progress)
Updating Federated_Cloud_Operation wiki
- Renamed to https://wiki.egi.eu/wiki/Federated_Cloud_infrastructure_status
- Information collection finished
Decommissioning Debian
- Debian support for squeeze (6.0) has been reached (Feb2016) https://www.debian.org/News/2016/20160212
Decommissioning SL5
- Tracked on SL5_retirement wiki
- No checks for dCache, DPM, ARC, UNICORE --> Action on NGIs/ROCs to follow up directly with sites
Decommissioning dCache 2.6
- DONE.
AOB
Monthly Availability/Reliability
List of the underperforming RCs for (at least) 3 consecutive months:
- AfricaArabia https://ggus.eu/?mode=ticket_info&ticket_id=117094:
- EG-ZC-T3: unresponsive since months, must be suspended
- ZA-UJ
- AsiaPacific:
- MY-UM-SIFIR
- NGI_DE https://ggus.eu/?mode=ticket_info&ticket_id=117099:
- LRZ-LMU
- UNI-DORTMUND
- NGI_GRNET:
- GR-04-FORTH-ICS
- NGI_IT https://ggus.eu/index.php?mode=ticket_info&ticket_id=118846:
- INFN-NAPOLI-PAMELA
- NGI_MARGI https://ggus.eu/index.php?mode=ticket_info&ticket_id=118465 no monitoring data since January
- ROC_LA:
- UFAL: new site but the monitoring data are missing