Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-08-02-2016"

From EGIWiki
Jump to navigation Jump to search
 
(38 intermediate revisions by 3 users not shown)
Line 8: Line 8:
= News from URT =
= News from URT =


== Middleware releases and staged rollout ==
== UMD release  ==


== UMD release ==
* Preparation of the UMD-4 SL6 release  


== Staged rollout updates  ==
== Staged rollout updates  ==


== Under Staged Rollout ==
* dcache 2.13.17
* voms-admin 3.4.0 (soon)
* storm 1.11.10 (soon)
 
== Next releases  ==
 
= Operational issues  =
 
== Aligning Fedcloud sites to the A/R procedures ==
 
* EGI Operations proposal to align Fedcloud sites to the A/R related procedures used for the grid sites
 
** based on the availability reliability of monitored services in cloudmon, EGI Operations will start follow up with underperforming sites as we are doing for every grid sites
** sites will NOT be suspended for a/r performance at least until end of May
* in parallel EGI Operations will start [https://wiki.egi.eu/wiki/PROC08 PROC08] to include cloud probes in the EGI_CRITICAL and EGI profiles used for A/R computations (IN PROGRESS)
 
The proposed timeline is:
 
* February 2016:
** EGI Operations will check the status of the production cloud services in order to understand which issues (if any) the site has and provide help to NGIs and sites;
** Start of the integration of cloud probes in EGI CRITICAL profile(current set+openstack): To be agreed with the ARGO team, [https://wiki.egi.eu/wiki/PROC08 PROC08] will be followed
* June 2016:
** Starting notification of sites eligible for suspension
 
== FedCloud status ==
 
=== Issues at cloud sites ===
 
Grouped by NGI, please follow up with sites.
 
* NGI_UK
** 100IT (OpenStack)
*** vmcatcher issues https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358#update#19
*** BDII and GOCDB have different Endpoint URLs https://ggus.eu/index.php?mode=ticket_info&ticket_id=119002#update#5
 
* NGI_PL
** CYFRONET-CLOUD (OpenStack)
*** VMCatcher https://ggus.eu/index.php?mode=ticket_info&ticket_id=116363#update#29


=== In Verification ===
* NGI_DE
** GoeGrid (OpenNebula)
*** OCCI, VMCatcher https://ggus.eu/index.php?mode=ticket_info&ticket_id=119003 https://ggus.eu/index.php?mode=ticket_info&ticket_id=116365


=== Ready to be released ===
* NGI_GRNET
** HG-09-Okeanos-Cloud (Synnefo)
*** VMCatcher, issue with large metadata, on hold (it requires some development) https://ggus.eu/index.php?mode=ticket_info&ticket_id=116368


=== UMD 3/UMD 4 EA ===
* NGI_IBERGRID
** IFCA-LCG2 (OpenStack)
*** OCCI, endpoing published on sBDII is missing "/occi1.1/" https://ggus.eu/index.php?mode=ticket_info&ticket_id=119004


== Next releases  ==
* NGI_TR
** TR-FC1-ULAKBIM (OpenStack)
*** Missing GLUE2DomainID and image description looks wrong https://ggus.eu/index.php?mode=ticket_info&ticket_id=119005#update#15
 
=== Getting help on issues ===  


= Operational issues =
* VMcatcher issues
** [https://appdb.egi.eu/browse/sites/cloud This page] has a little number down right the site showing the number of images available at the site. If it's missing, it's very likely that the site has issues with vmcatcher.
** '''ACTION''': Please check this documentation: https://wiki.egi.eu/wiki/MAN10#EGI_Image_Management_2 and https://github.com/hepix-virtualisation/vmcatcher. '''If you cannot figure out, please contact EGI Operations through the ticket, we will forward to vmcatcher devs.'''


== Decommissioning dCache 2.6 ==
=== Updating Federated_Cloud_Operation wiki ===
* Review your site's information on [https://wiki.egi.eu/wiki/Federated_Cloud_Operation Federated_Cloud_Operation] wiki, please sites reply asap!
** GoeGrid https://ggus.eu/?mode=ticket_info&ticket_id=118882
** MK-04-FINKICLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118890
** CYFRONET-CLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118878


== Decommissioning SL5 ==
== Decommissioning SL5 ==
* Tracked on [https://wiki.egi.eu/wiki/SL5_retirement SL5_retirement wiki]
* Tests available https://midmon.egi.eu/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=28
** eu.egi.sec.Argus-SL5
** eu.egi.sec.CREAM-SL5
** eu.egi.sec.LB-SL5
** eu.egi.sec.LFC-SL5
** eu.egi.sec.MyProxy-SL5
** eu.egi.sec.QCG.Computing-SL5
** eu.egi.sec.QCG.Notification-SL5
** eu.egi.sec.Site-BDII-SL5
** eu.egi.sec.Top-BDII-SL5
** eu.egi.sec.VOMS-SL5
** eu.egi.sec.WMS-SL5
** eu.egi.sec.StoRM-SL5
* No checks for dCache, DPM, ARC --> '''NGIs/ROCs to follow up directly with sites'''
* Documentation https://wiki.egi.eu/wiki/MW_SAM_tests#SL5_tests


= AOB  =
== Decommissioning dCache 2.6 ==


== Distributing middleware as Docker images ==
* almost done, last server is se0002.m45.ihep.su @ RU-Protvino-IHEP https://ggus.eu/?mode=ticket_info&ticket_id=118256 (IN PROGRESS)


* releasing UMD4 products as Docker images in addition to RPMs
= AOB  =
** pros: can run on hardware, no virtualization platform needed
** cons: maybe hard to create/maintain
* to be provided by TPs and/or volunteer sites
* possible profiles: site/top BDII, CEs


== Monthly Availability/Reliability ==
== Monthly Availability/Reliability ==
* Last three months report availabile on [http://argo.egi.eu/lavoisier/ngi_reports?month=2016-01 ARGO]
* Problems follow-up:
** AfricaArabia: [https://ggus.eu/?mode=ticket_info&ticket_id=117094 ticket]
*** Overall A/R: 12.67/12.67
*** RCs eligible to suspension: EG-ZC-T3, ZA-CHPC, ZA-UJ
** CERN: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=118843 ticket]
*** Overall A/R: 33.22/33.22
*** there were problems on the regional SAM instances, solved in January
** NGI_ARMGRID: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=119415 ticket]
*** Overall A/R: 77.43/77.43
** NGI_DE: [https://ggus.eu/?mode=ticket_info&ticket_id=117099 ticket]
*** the underperforming RCs (SCAI, UNI-DORTMUND) are recovering from the issues
** NGI_GRNET: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=119414 ticket]
*** RC eligible for suspension: GR-04-FORTH-ICS
** NGI_IT: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=118846 ticket]
*** the underperforming RC INFN-NAPOLI-PAMELA seems to be recovering, waiting for a confirmation
** NGI_MARGI: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=118465 ticket]
*** no monitoring data available since January
*** RC eligible for suspension: MK-03-FINKI
** NGI_MD:
*** Overall A/R: 61.89/61.89
*** the underperforming RC MD-02-IMI is recovering
** ROC_LA: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=119416 ticket]
*** no monitoring data available for CBPF
*** RC eligible for suspension: UFAL


== Next meeting ==
== Next meeting ==


* '''8 Feb 2016''' https://indico.egi.eu/indico/conferenceDisplay.py?confId=2736
* '''14 Mar 2016''' https://indico.egi.eu/indico/conferenceDisplay.py?confId=2736

Latest revision as of 14:19, 8 February 2016


General information

News from URT

UMD release

  • Preparation of the UMD-4 SL6 release

Staged rollout updates

  • dcache 2.13.17
  • voms-admin 3.4.0 (soon)
  • storm 1.11.10 (soon)

Next releases

Operational issues

Aligning Fedcloud sites to the A/R procedures

  • EGI Operations proposal to align Fedcloud sites to the A/R related procedures used for the grid sites
    • based on the availability reliability of monitored services in cloudmon, EGI Operations will start follow up with underperforming sites as we are doing for every grid sites
    • sites will NOT be suspended for a/r performance at least until end of May
  • in parallel EGI Operations will start PROC08 to include cloud probes in the EGI_CRITICAL and EGI profiles used for A/R computations (IN PROGRESS)

The proposed timeline is:

  • February 2016:
    • EGI Operations will check the status of the production cloud services in order to understand which issues (if any) the site has and provide help to NGIs and sites;
    • Start of the integration of cloud probes in EGI CRITICAL profile(current set+openstack): To be agreed with the ARGO team, PROC08 will be followed
  • June 2016:
    • Starting notification of sites eligible for suspension

FedCloud status

Issues at cloud sites

Grouped by NGI, please follow up with sites.

Getting help on issues

Updating Federated_Cloud_Operation wiki

Decommissioning SL5

Decommissioning dCache 2.6

AOB

Monthly Availability/Reliability

  • Last three months report availabile on ARGO
  • Problems follow-up:
    • AfricaArabia: ticket
      • Overall A/R: 12.67/12.67
      • RCs eligible to suspension: EG-ZC-T3, ZA-CHPC, ZA-UJ
    • CERN: ticket
      • Overall A/R: 33.22/33.22
      • there were problems on the regional SAM instances, solved in January
    • NGI_ARMGRID: ticket
      • Overall A/R: 77.43/77.43
    • NGI_DE: ticket
      • the underperforming RCs (SCAI, UNI-DORTMUND) are recovering from the issues
    • NGI_GRNET: ticket
      • RC eligible for suspension: GR-04-FORTH-ICS
    • NGI_IT: ticket
      • the underperforming RC INFN-NAPOLI-PAMELA seems to be recovering, waiting for a confirmation
    • NGI_MARGI: ticket
      • no monitoring data available since January
      • RC eligible for suspension: MK-03-FINKI
    • NGI_MD:
      • Overall A/R: 61.89/61.89
      • the underperforming RC MD-02-IMI is recovering
    • ROC_LA: ticket
      • no monitoring data available for CBPF
      • RC eligible for suspension: UFAL

Next meeting