Agenda-08-02-2016
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
General information
- the Operations meeting will be on the 2nd Monday of the month
- the EGI Operations Meeting schedule for first half of 2016 is available on Indico: https://indico.egi.eu/indico/categoryDisplay.py?categId=32 and on the new summary page: https://wiki.egi.eu/wiki/Operations_Meeting
News from URT
UMD release
- Preparation of the UMD-4 SL6 release
Staged rollout updates
- dcache 2.13.17
- voms-admin 3.4.0 (soon)
- storm 1.11.10 (soon)
Next releases
Operational issues
Aligning Fedcloud sites to the A/R procedures
- EGI Operations proposal to align Fedcloud sites to the A/R related procedures used for the grid sites
- based on the availability reliability of monitored services in cloudmon, EGI Operations will start follow up with underperforming sites as we are doing for every grid sites
- sites will NOT be suspended for a/r performance at least until end of May
- in parallel EGI Operations will start PROC08 to include cloud probes in the EGI_CRITICAL and EGI profiles used for A/R computations (IN PROGRESS)
The proposed timeline is:
- February 2016:
- EGI Operations will check the status of the production cloud services in order to understand which issues (if any) the site has and provide help to NGIs and sites;
- Start of the integration of cloud probes in EGI CRITICAL profile(current set+openstack): To be agreed with the ARGO team, PROC08 will be followed
- June 2016:
- Starting notification of sites eligible for suspension
FedCloud status
Issues at cloud sites
Grouped by NGI, please follow up with sites.
- NGI_UK
- 100IT (OpenStack)
- vmcatcher issues https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358#update#19
- BDII and GOCDB have different Endpoint URLs https://ggus.eu/index.php?mode=ticket_info&ticket_id=119002#update#5
- 100IT (OpenStack)
- NGI_PL
- CYFRONET-CLOUD (OpenStack)
- NGI_DE
- GoeGrid (OpenNebula)
- NGI_GRNET
- HG-09-Okeanos-Cloud (Synnefo)
- VMCatcher, issue with large metadata, on hold (it requires some development) https://ggus.eu/index.php?mode=ticket_info&ticket_id=116368
- HG-09-Okeanos-Cloud (Synnefo)
- NGI_IBERGRID
- IFCA-LCG2 (OpenStack)
- OCCI, endpoing published on sBDII is missing "/occi1.1/" https://ggus.eu/index.php?mode=ticket_info&ticket_id=119004
- IFCA-LCG2 (OpenStack)
- NGI_TR
- TR-FC1-ULAKBIM (OpenStack)
- Missing GLUE2DomainID and image description looks wrong https://ggus.eu/index.php?mode=ticket_info&ticket_id=119005#update#15
- TR-FC1-ULAKBIM (OpenStack)
Getting help on issues
- VMcatcher issues
- This page has a little number down right the site showing the number of images available at the site. If it's missing, it's very likely that the site has issues with vmcatcher.
- ACTION: Please check this documentation: https://wiki.egi.eu/wiki/MAN10#EGI_Image_Management_2 and https://github.com/hepix-virtualisation/vmcatcher. If you cannot figure out, please contact EGI Operations through the ticket, we will forward to vmcatcher devs.
Updating Federated_Cloud_Operation wiki
- Review your site's information on Federated_Cloud_Operation wiki, please sites reply asap!
- GoeGrid https://ggus.eu/?mode=ticket_info&ticket_id=118882
- MK-04-FINKICLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118890
- CYFRONET-CLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118878
Decommissioning SL5
- Tracked on SL5_retirement wiki
- Tests available https://midmon.egi.eu/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=28
- eu.egi.sec.Argus-SL5
- eu.egi.sec.CREAM-SL5
- eu.egi.sec.LB-SL5
- eu.egi.sec.LFC-SL5
- eu.egi.sec.MyProxy-SL5
- eu.egi.sec.QCG.Computing-SL5
- eu.egi.sec.QCG.Notification-SL5
- eu.egi.sec.Site-BDII-SL5
- eu.egi.sec.Top-BDII-SL5
- eu.egi.sec.VOMS-SL5
- eu.egi.sec.WMS-SL5
- eu.egi.sec.StoRM-SL5
- No checks for dCache, DPM, ARC --> NGIs/ROCs to follow up directly with sites
- Documentation https://wiki.egi.eu/wiki/MW_SAM_tests#SL5_tests
Decommissioning dCache 2.6
- almost done, last server is se0002.m45.ihep.su @ RU-Protvino-IHEP https://ggus.eu/?mode=ticket_info&ticket_id=118256 (IN PROGRESS)
AOB
Monthly Availability/Reliability
- Last three months report availabile on ARGO
- Problems follow-up:
- AfricaArabia: ticket
- Overall A/R: 12.67/12.67
- RCs eligible to suspension: EG-ZC-T3, ZA-CHPC, ZA-UJ
- CERN: ticket
- Overall A/R: 33.22/33.22
- there were problems on the regional SAM instances, solved in January
- NGI_ARMGRID: ticket
- Overall A/R: 77.43/77.43
- NGI_DE: ticket
- the underperforming RCs (SCAI, UNI-DORTMUND) are recovering from the issues
- NGI_GRNET: ticket
- RC eligible for suspension: GR-04-FORTH-ICS
- NGI_IT: ticket
- the underperforming RC INFN-NAPOLI-PAMELA seems to be recovering, waiting for a confirmation
- NGI_MARGI: ticket
- no monitoring data available since January
- RC eligible for suspension: MK-03-FINKI
- NGI_MD:
- Overall A/R: 61.89/61.89
- the underperforming RC MD-02-IMI is recovering
- ROC_LA: ticket
- no monitoring data available for CBPF
- RC eligible for suspension: UFAL
- AfricaArabia: ticket