Difference between revisions of "Agenda-08-02-2016"
Jump to navigation
Jump to search
(23 intermediate revisions by 2 users not shown) | |||
Line 22: | Line 22: | ||
= Operational issues = | = Operational issues = | ||
== Aligning Fedcloud sites to the A/R procedures == | |||
* EGI Operations proposal to align Fedcloud sites to the A/R related procedures used for the grid sites | * EGI Operations proposal to align Fedcloud sites to the A/R related procedures used for the grid sites | ||
Line 40: | Line 38: | ||
** Starting notification of sites eligible for suspension | ** Starting notification of sites eligible for suspension | ||
=== | == FedCloud status == | ||
=== Issues at cloud sites === | |||
Grouped by NGI, please follow up with sites. | |||
* NGI_UK | |||
** 100IT (OpenStack) | |||
*** vmcatcher issues https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358#update#19 | |||
*** BDII and GOCDB have different Endpoint URLs https://ggus.eu/index.php?mode=ticket_info&ticket_id=119002#update#5 | |||
* NGI_PL | |||
** CYFRONET-CLOUD (OpenStack) | |||
*** VMCatcher https://ggus.eu/index.php?mode=ticket_info&ticket_id=116363#update#29 | |||
* NGI_DE | |||
** GoeGrid (OpenNebula) | |||
*** OCCI, VMCatcher https://ggus.eu/index.php?mode=ticket_info&ticket_id=119003 https://ggus.eu/index.php?mode=ticket_info&ticket_id=116365 | |||
* NGI_GRNET | |||
** HG-09-Okeanos-Cloud (Synnefo) | |||
*** VMCatcher, issue with large metadata, on hold (it requires some development) https://ggus.eu/index.php?mode=ticket_info&ticket_id=116368 | |||
* NGI_IBERGRID | |||
** IFCA-LCG2 (OpenStack) | |||
*** OCCI, endpoing published on sBDII is missing "/occi1.1/" https://ggus.eu/index.php?mode=ticket_info&ticket_id=119004 | |||
* NGI_TR | |||
** TR-FC1-ULAKBIM (OpenStack) | |||
*** Missing GLUE2DomainID and image description looks wrong https://ggus.eu/index.php?mode=ticket_info&ticket_id=119005#update#15 | |||
=== Getting help on issues === | |||
* VMcatcher issues | |||
** [https://appdb.egi.eu/browse/sites/cloud This page] has a little number down right the site showing the number of images available at the site. If it's missing, it's very likely that the site has issues with vmcatcher. | |||
** '''ACTION''': Please check this documentation: https://wiki.egi.eu/wiki/MAN10#EGI_Image_Management_2 and https://github.com/hepix-virtualisation/vmcatcher. '''If you cannot figure out, please contact EGI Operations through the ticket, we will forward to vmcatcher devs.''' | |||
=== Updating Federated_Cloud_Operation wiki === | |||
* Review your site's information on [https://wiki.egi.eu/wiki/Federated_Cloud_Operation Federated_Cloud_Operation] wiki, please sites reply asap! | |||
** GoeGrid https://ggus.eu/?mode=ticket_info&ticket_id=118882 | |||
** MK-04-FINKICLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118890 | |||
** CYFRONET-CLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118878 | |||
== Decommissioning SL5 == | == Decommissioning SL5 == | ||
* Tracked on [https://wiki.egi.eu/wiki/SL5_retirement SL5_retirement wiki] | |||
* Tests available https://midmon.egi.eu/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=28 | |||
** eu.egi.sec.Argus-SL5 | |||
** eu.egi.sec.CREAM-SL5 | |||
** eu.egi.sec.LB-SL5 | |||
** eu.egi.sec.LFC-SL5 | |||
** eu.egi.sec.MyProxy-SL5 | |||
** eu.egi.sec.QCG.Computing-SL5 | |||
** eu.egi.sec.QCG.Notification-SL5 | |||
** eu.egi.sec.Site-BDII-SL5 | |||
** eu.egi.sec.Top-BDII-SL5 | |||
** eu.egi.sec.VOMS-SL5 | |||
** eu.egi.sec.WMS-SL5 | |||
** eu.egi.sec.StoRM-SL5 | |||
* No checks for dCache, DPM, ARC --> '''NGIs/ROCs to follow up directly with sites''' | |||
* Documentation https://wiki.egi.eu/wiki/MW_SAM_tests#SL5_tests | |||
== Decommissioning dCache 2.6 == | == Decommissioning dCache 2.6 == | ||
Line 52: | Line 107: | ||
== Monthly Availability/Reliability == | == Monthly Availability/Reliability == | ||
* Last three months report availabile on | * Last three months report availabile on [http://argo.egi.eu/lavoisier/ngi_reports?month=2016-01 ARGO] | ||
* Problems follow-up: | * Problems follow-up: | ||
** AfricaArabia: | ** AfricaArabia: [https://ggus.eu/?mode=ticket_info&ticket_id=117094 ticket] | ||
*** Overall A/R: 12.67/12.67 | *** Overall A/R: 12.67/12.67 | ||
*** RCs eligible to suspension: EG-ZC-T3, ZA-CHPC, ZA-UJ | *** RCs eligible to suspension: EG-ZC-T3, ZA-CHPC, ZA-UJ | ||
** | ** CERN: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=118843 ticket] | ||
*** Overall A/R: 33.22/33.22 | *** Overall A/R: 33.22/33.22 | ||
*** there were problems on the regional SAM instances, solved in January: https://ggus.eu/index.php?mode=ticket_info&ticket_id= | *** there were problems on the regional SAM instances, solved in January | ||
** | ** NGI_ARMGRID: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=119415 ticket] | ||
*** Overall A/R: 77.43/77.43 | |||
** NGI_DE: [https://ggus.eu/?mode=ticket_info&ticket_id=117099 ticket] | |||
*** the underperforming RCs (SCAI, UNI-DORTMUND) are recovering from the issues | |||
** NGI_GRNET: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=119414 ticket] | |||
*** RC eligible for suspension: GR-04-FORTH-ICS | |||
** NGI_IT: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=118846 ticket] | |||
*** the underperforming RC INFN-NAPOLI-PAMELA seems to be recovering, waiting for a confirmation | |||
** NGI_MARGI: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=118465 ticket] | |||
*** no monitoring data available since January | |||
*** RC eligible for suspension: MK-03-FINKI | |||
** NGI_MD: | |||
*** Overall A/R: 61.89/61.89 | |||
*** the underperforming RC MD-02-IMI is recovering | |||
** ROC_LA: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=119416 ticket] | |||
*** no monitoring data available for CBPF | |||
*** RC eligible for suspension: UFAL | |||
== Next meeting == | == Next meeting == | ||
* '''14 Mar 2016''' https://indico.egi.eu/indico/conferenceDisplay.py?confId=2736 | * '''14 Mar 2016''' https://indico.egi.eu/indico/conferenceDisplay.py?confId=2736 |
Latest revision as of 15:19, 8 February 2016
General information
- the Operations meeting will be on the 2nd Monday of the month
- the EGI Operations Meeting schedule for first half of 2016 is available on Indico: https://indico.egi.eu/indico/categoryDisplay.py?categId=32 and on the new summary page: https://wiki.egi.eu/wiki/Operations_Meeting
News from URT
UMD release
- Preparation of the UMD-4 SL6 release
Staged rollout updates
- dcache 2.13.17
- voms-admin 3.4.0 (soon)
- storm 1.11.10 (soon)
Next releases
Operational issues
Aligning Fedcloud sites to the A/R procedures
- EGI Operations proposal to align Fedcloud sites to the A/R related procedures used for the grid sites
- based on the availability reliability of monitored services in cloudmon, EGI Operations will start follow up with underperforming sites as we are doing for every grid sites
- sites will NOT be suspended for a/r performance at least until end of May
- in parallel EGI Operations will start PROC08 to include cloud probes in the EGI_CRITICAL and EGI profiles used for A/R computations (IN PROGRESS)
The proposed timeline is:
- February 2016:
- EGI Operations will check the status of the production cloud services in order to understand which issues (if any) the site has and provide help to NGIs and sites;
- Start of the integration of cloud probes in EGI CRITICAL profile(current set+openstack): To be agreed with the ARGO team, PROC08 will be followed
- June 2016:
- Starting notification of sites eligible for suspension
FedCloud status
Issues at cloud sites
Grouped by NGI, please follow up with sites.
- NGI_UK
- 100IT (OpenStack)
- vmcatcher issues https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358#update#19
- BDII and GOCDB have different Endpoint URLs https://ggus.eu/index.php?mode=ticket_info&ticket_id=119002#update#5
- 100IT (OpenStack)
- NGI_PL
- CYFRONET-CLOUD (OpenStack)
- NGI_DE
- GoeGrid (OpenNebula)
- NGI_GRNET
- HG-09-Okeanos-Cloud (Synnefo)
- VMCatcher, issue with large metadata, on hold (it requires some development) https://ggus.eu/index.php?mode=ticket_info&ticket_id=116368
- HG-09-Okeanos-Cloud (Synnefo)
- NGI_IBERGRID
- IFCA-LCG2 (OpenStack)
- OCCI, endpoing published on sBDII is missing "/occi1.1/" https://ggus.eu/index.php?mode=ticket_info&ticket_id=119004
- IFCA-LCG2 (OpenStack)
- NGI_TR
- TR-FC1-ULAKBIM (OpenStack)
- Missing GLUE2DomainID and image description looks wrong https://ggus.eu/index.php?mode=ticket_info&ticket_id=119005#update#15
- TR-FC1-ULAKBIM (OpenStack)
Getting help on issues
- VMcatcher issues
- This page has a little number down right the site showing the number of images available at the site. If it's missing, it's very likely that the site has issues with vmcatcher.
- ACTION: Please check this documentation: https://wiki.egi.eu/wiki/MAN10#EGI_Image_Management_2 and https://github.com/hepix-virtualisation/vmcatcher. If you cannot figure out, please contact EGI Operations through the ticket, we will forward to vmcatcher devs.
Updating Federated_Cloud_Operation wiki
- Review your site's information on Federated_Cloud_Operation wiki, please sites reply asap!
- GoeGrid https://ggus.eu/?mode=ticket_info&ticket_id=118882
- MK-04-FINKICLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118890
- CYFRONET-CLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118878
Decommissioning SL5
- Tracked on SL5_retirement wiki
- Tests available https://midmon.egi.eu/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=28
- eu.egi.sec.Argus-SL5
- eu.egi.sec.CREAM-SL5
- eu.egi.sec.LB-SL5
- eu.egi.sec.LFC-SL5
- eu.egi.sec.MyProxy-SL5
- eu.egi.sec.QCG.Computing-SL5
- eu.egi.sec.QCG.Notification-SL5
- eu.egi.sec.Site-BDII-SL5
- eu.egi.sec.Top-BDII-SL5
- eu.egi.sec.VOMS-SL5
- eu.egi.sec.WMS-SL5
- eu.egi.sec.StoRM-SL5
- No checks for dCache, DPM, ARC --> NGIs/ROCs to follow up directly with sites
- Documentation https://wiki.egi.eu/wiki/MW_SAM_tests#SL5_tests
Decommissioning dCache 2.6
- almost done, last server is se0002.m45.ihep.su @ RU-Protvino-IHEP https://ggus.eu/?mode=ticket_info&ticket_id=118256 (IN PROGRESS)
AOB
Monthly Availability/Reliability
- Last three months report availabile on ARGO
- Problems follow-up:
- AfricaArabia: ticket
- Overall A/R: 12.67/12.67
- RCs eligible to suspension: EG-ZC-T3, ZA-CHPC, ZA-UJ
- CERN: ticket
- Overall A/R: 33.22/33.22
- there were problems on the regional SAM instances, solved in January
- NGI_ARMGRID: ticket
- Overall A/R: 77.43/77.43
- NGI_DE: ticket
- the underperforming RCs (SCAI, UNI-DORTMUND) are recovering from the issues
- NGI_GRNET: ticket
- RC eligible for suspension: GR-04-FORTH-ICS
- NGI_IT: ticket
- the underperforming RC INFN-NAPOLI-PAMELA seems to be recovering, waiting for a confirmation
- NGI_MARGI: ticket
- no monitoring data available since January
- RC eligible for suspension: MK-03-FINKI
- NGI_MD:
- Overall A/R: 61.89/61.89
- the underperforming RC MD-02-IMI is recovering
- ROC_LA: ticket
- no monitoring data available for CBPF
- RC eligible for suspension: UFAL
- AfricaArabia: ticket