General information
- the Operations meeting will be on the 2nd Monday of the month
- the EGI Operations Meeting schedule for first half of 2016 is available on Indico: https://indico.egi.eu/indico/categoryDisplay.py?categId=32 and on the new summary page: https://wiki.egi.eu/wiki/Operations_Meeting
News from URT
UMD release
- Preparation of the UMD-4 SL6 release
Staged rollout updates
- dcache 2.13.17
- voms-admin 3.4.0 (soon)
- storm 1.11.10 (soon)
Next releases
Operational issues
Aligning Fedcloud sites to the A/R procedures
- EGI Operations proposal to align Fedcloud sites to the A/R related procedures used for the grid sites
- based on the availability reliability of monitored services in cloudmon, EGI Operations will start follow up with underperforming sites as we are doing for every grid sites
- sites will NOT be suspended for a/r performance at least until end of May
- in parallel EGI Operations will start PROC08 to include cloud probes in the EGI_CRITICAL and EGI profiles used for A/R computations (IN PROGRESS)
The proposed timeline is:
- February 2016:
- EGI Operations will check the status of the production cloud services in order to understand which issues (if any) the site has and provide help to NGIs and sites;
- Start of the integration of cloud probes in EGI CRITICAL profile(current set+openstack): To be agreed with the ARGO team, PROC08 will be followed
- June 2016:
- Starting notification of sites eligible for suspension
FedCloud status
Issues at cloud sites
Grouped by NGI, please follow up with sites.
- NGI_UK
- 100IT (OpenStack)
- vmcatcher issues https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358#update#19
- BDII and GOCDB have different Endpoint URLs https://ggus.eu/index.php?mode=ticket_info&ticket_id=119002#update#5
- 100IT (OpenStack)
- NGI_PL
- CYFRONET-CLOUD (OpenStack)
- NGI_DE
- GoeGrid (OpenNebula)
- NGI_GRNET
- HG-09-Okeanos-Cloud (Synnefo)
- VMCatcher, issue with large metadata, on hold (it requires some development) https://ggus.eu/index.php?mode=ticket_info&ticket_id=116368
- HG-09-Okeanos-Cloud (Synnefo)
- NGI_IBERGRID
- IFCA-LCG2 (OpenStack)
- OCCI, endpoing published on sBDII is missing "/occi1.1/" https://ggus.eu/index.php?mode=ticket_info&ticket_id=119004
- IFCA-LCG2 (OpenStack)
- NGI_TR
- TR-FC1-ULAKBIM (OpenStack)
- Missing GLUE2DomainID and image description looks wrong https://ggus.eu/index.php?mode=ticket_info&ticket_id=119005#update#15
- TR-FC1-ULAKBIM (OpenStack)
Getting help on issues
- VMcatcher issues
- This page has a little number down right the site showing the number of images available at the site. If it's missing, it's very likely that the site has issues with vmcatcher.
- ACTION: Please check this documentation: https://wiki.egi.eu/wiki/MAN10#EGI_Image_Management_2 and https://github.com/hepix-virtualisation/vmcatcher. If you cannot figure out, please contact EGI Operations through the ticket, we will forward to vmcatcher devs.
Updating Federated_Cloud_Operation wiki
- Review your site's information on Federated_Cloud_Operation wiki, please sites reply asap!
- GoeGrid https://ggus.eu/?mode=ticket_info&ticket_id=118882
- MK-04-FINKICLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118890
- CYFRONET-CLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118878
Decommissioning SL5
- Tracked on SL5_retirement wiki
- Tests available https://midmon.egi.eu/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=28
- eu.egi.sec.Argus-SL5
- eu.egi.sec.CREAM-SL5
- eu.egi.sec.LB-SL5
- eu.egi.sec.LFC-SL5
- eu.egi.sec.MyProxy-SL5
- eu.egi.sec.QCG.Computing-SL5
- eu.egi.sec.QCG.Notification-SL5
- eu.egi.sec.Site-BDII-SL5
- eu.egi.sec.Top-BDII-SL5
- eu.egi.sec.VOMS-SL5
- eu.egi.sec.WMS-SL5
- eu.egi.sec.StoRM-SL5
- No checks for dCache, DPM, ARC --> NGIs/ROCs to follow up directly with sites
- Documentation https://wiki.egi.eu/wiki/MW_SAM_tests#SL5_tests
Decommissioning dCache 2.6
- almost done, last server is se0002.m45.ihep.su @ RU-Protvino-IHEP https://ggus.eu/?mode=ticket_info&ticket_id=118256 (IN PROGRESS)
AOB
Monthly Availability/Reliability
- Last three months report availabile on ARGO
- Problems follow-up:
- AfricaArabia: ticket
- Overall A/R: 12.67/12.67
- RCs eligible to suspension: EG-ZC-T3, ZA-CHPC, ZA-UJ
- CERN: ticket
- Overall A/R: 33.22/33.22
- there were problems on the regional SAM instances, solved in January
- NGI_ARMGRID: ticket
- Overall A/R: 77.43/77.43
- NGI_DE: ticket
- the underperforming RCs (SCAI, UNI-DORTMUND) are recovering from the issues
- NGI_GRNET: ticket
- RC eligible for suspension: GR-04-FORTH-ICS
- NGI_IT: ticket
- the underperforming RC INFN-NAPOLI-PAMELA seems to be recovering, waiting for a confirmation
- NGI_MARGI: ticket
- no monitoring data available since January
- RC eligible for suspension: MK-03-FINKI
- NGI_MD:
- Overall A/R: 61.89/61.89
- the underperforming RC MD-02-IMI is recovering
- ROC_LA: ticket
- no monitoring data available for CBPF
- RC eligible for suspension: UFAL
- AfricaArabia: ticket