Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-08-02-2016"

From EGIWiki
Jump to navigation Jump to search
 
(40 intermediate revisions by 3 users not shown)
Line 8: Line 8:
= News from URT =
= News from URT =


== Middleware releases and staged rollout ==
== UMD release  ==


== UMD release ==
* Preparation of the UMD-4 SL6 release  


== Staged rollout updates  ==
== Staged rollout updates  ==


== Under Staged Rollout ==
* dcache 2.13.17
* voms-admin 3.4.0 (soon)
* storm 1.11.10 (soon)
 
== Next releases  ==
 
= Operational issues  =
 
== Aligning Fedcloud sites to the A/R procedures ==
 
* EGI Operations proposal to align Fedcloud sites to the A/R related procedures used for the grid sites
 
** based on the availability reliability of monitored services in cloudmon, EGI Operations will start follow up with underperforming sites as we are doing for every grid sites
** sites will NOT be suspended for a/r performance at least until end of May
* in parallel EGI Operations will start [https://wiki.egi.eu/wiki/PROC08 PROC08] to include cloud probes in the EGI_CRITICAL and EGI profiles used for A/R computations (IN PROGRESS)
 
The proposed timeline is:
 
* February 2016:
** EGI Operations will check the status of the production cloud services in order to understand which issues (if any) the site has and provide help to NGIs and sites;
** Start of the integration of cloud probes in EGI CRITICAL profile(current set+openstack): To be agreed with the ARGO team, [https://wiki.egi.eu/wiki/PROC08 PROC08] will be followed
* June 2016:
** Starting notification of sites eligible for suspension
 
== FedCloud status ==
 
=== Issues at cloud sites ===
 
Grouped by NGI, please follow up with sites.
 
* NGI_UK
** 100IT (OpenStack)
*** vmcatcher issues https://ggus.eu/index.php?mode=ticket_info&ticket_id=116358#update#19
*** BDII and GOCDB have different Endpoint URLs https://ggus.eu/index.php?mode=ticket_info&ticket_id=119002#update#5
 
* NGI_PL
** CYFRONET-CLOUD (OpenStack)
*** VMCatcher https://ggus.eu/index.php?mode=ticket_info&ticket_id=116363#update#29


=== In Verification ===
* NGI_DE
** GoeGrid (OpenNebula)
*** OCCI, VMCatcher https://ggus.eu/index.php?mode=ticket_info&ticket_id=119003 https://ggus.eu/index.php?mode=ticket_info&ticket_id=116365


=== Ready to be released ===
* NGI_GRNET
** HG-09-Okeanos-Cloud (Synnefo)
*** VMCatcher, issue with large metadata, on hold (it requires some development) https://ggus.eu/index.php?mode=ticket_info&ticket_id=116368


=== UMD 3/UMD 4 EA ===
* NGI_IBERGRID
** IFCA-LCG2 (OpenStack)
*** OCCI, endpoing published on sBDII is missing "/occi1.1/" https://ggus.eu/index.php?mode=ticket_info&ticket_id=119004


== Next releases  ==
* NGI_TR
** TR-FC1-ULAKBIM (OpenStack)
*** Missing GLUE2DomainID and image description looks wrong https://ggus.eu/index.php?mode=ticket_info&ticket_id=119005#update#15


= Operational issues =
=== Getting help on issues ===  


== Decommissioning dCache 2.6 ==
* VMcatcher issues
** [https://appdb.egi.eu/browse/sites/cloud This page] has a little number down right the site showing the number of images available at the site. If it's missing, it's very likely that the site has issues with vmcatcher.
** '''ACTION''': Please check this documentation: https://wiki.egi.eu/wiki/MAN10#EGI_Image_Management_2 and https://github.com/hepix-virtualisation/vmcatcher. '''If you cannot figure out, please contact EGI Operations through the ticket, we will forward to vmcatcher devs.'''


* support for dCache 2.6 ended at May 2015
=== Updating Federated_Cloud_Operation wiki ===
* we made an assessment to understand how many sites still expose dCache 2.6 endpoints.  
* Review your site's information on [https://wiki.egi.eu/wiki/Federated_Cloud_Operation Federated_Cloud_Operation] wiki, please sites reply asap!
* decommission the dCache 2.6 endpoints by the end of January 2016 (or before) https://ggus.eu/index.php?mode=ticket_info&ticket_id=118248
** GoeGrid https://ggus.eu/?mode=ticket_info&ticket_id=118882
* ref: https://wiki.egi.eu/wiki/PROC16
** MK-04-FINKICLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118890
* '''at next meeting we expect to have 2.6 decommissioned from the EGI infrastructure'''
** CYFRONET-CLOUD https://ggus.eu/?mode=ticket_info&ticket_id=118878


== Decommissioning SL5 ==
== Decommissioning SL5 ==
* Tracked on [https://wiki.egi.eu/wiki/SL5_retirement SL5_retirement wiki]
* Tests available https://midmon.egi.eu/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=28
** eu.egi.sec.Argus-SL5
** eu.egi.sec.CREAM-SL5
** eu.egi.sec.LB-SL5
** eu.egi.sec.LFC-SL5
** eu.egi.sec.MyProxy-SL5
** eu.egi.sec.QCG.Computing-SL5
** eu.egi.sec.QCG.Notification-SL5
** eu.egi.sec.Site-BDII-SL5
** eu.egi.sec.Top-BDII-SL5
** eu.egi.sec.VOMS-SL5
** eu.egi.sec.WMS-SL5
** eu.egi.sec.StoRM-SL5
* No checks for dCache, DPM, ARC --> '''NGIs/ROCs to follow up directly with sites'''
* Documentation https://wiki.egi.eu/wiki/MW_SAM_tests#SL5_tests
== Decommissioning dCache 2.6 ==


* SL5 support aligned with RHEL5
* almost done, last server is se0002.m45.ihep.su @ RU-Protvino-IHEP https://ggus.eu/?mode=ticket_info&ticket_id=118256 (IN PROGRESS)
** no more "Full support”; end of "Transition" Phase on January 31, 2014 •  not getting anymore new software functionalities
** in "Maintenance" until March 31, 2017 •  only urgent/critical fixes until then
* Supporting CentOS7 in UMD requires to schedule the end of support of SL5 in UMD
** No more new packages for SL5, only security/important fixes accepted
* '''SL5 services must be decommissioned by end of April 2016'''; broadcast at December, probes will be warning since February 2016 to start helping with decommissioning
** https://wiki.egi.eu/wiki/SL5_retirement
** https://wiki.egi.eu/wiki/PROC16


= AOB  =
= AOB  =


== Distributing middleware as Docker images ==
== Monthly Availability/Reliability ==


* releasing UMD4 products as Docker images in addition to RPMs
* Last three months report availabile on [http://argo.egi.eu/lavoisier/ngi_reports?month=2016-01 ARGO]
** pros: can run on hardware, no virtualization platform needed
* Problems follow-up:
** cons: maybe hard to create/maintain
** AfricaArabia: [https://ggus.eu/?mode=ticket_info&ticket_id=117094 ticket]
* to be provided by TPs and/or volunteer sites
*** Overall A/R: 12.67/12.67
* possible profiles: site/top BDII, CEs
*** RCs eligible to suspension: EG-ZC-T3, ZA-CHPC, ZA-UJ
 
** CERN: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=118843 ticket]
== Monthly Availability/Reliability ==
*** Overall A/R: 33.22/33.22
*** there were problems on the regional SAM instances, solved in January
** NGI_ARMGRID: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=119415 ticket]
*** Overall A/R: 77.43/77.43
** NGI_DE: [https://ggus.eu/?mode=ticket_info&ticket_id=117099 ticket]
*** the underperforming RCs (SCAI, UNI-DORTMUND) are recovering from the issues
** NGI_GRNET: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=119414 ticket]
*** RC eligible for suspension: GR-04-FORTH-ICS
** NGI_IT: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=118846 ticket]
*** the underperforming RC INFN-NAPOLI-PAMELA seems to be recovering, waiting for a confirmation
** NGI_MARGI: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=118465 ticket]
*** no monitoring data available since January
*** RC eligible for suspension: MK-03-FINKI
** NGI_MD:
*** Overall A/R: 61.89/61.89
*** the underperforming RC MD-02-IMI is recovering
** ROC_LA: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=119416 ticket]
*** no monitoring data available for CBPF
*** RC eligible for suspension: UFAL


== Next meeting ==
== Next meeting ==


* '''8 Feb 2016''' https://indico.egi.eu/indico/conferenceDisplay.py?confId=2736
* '''14 Mar 2016''' https://indico.egi.eu/indico/conferenceDisplay.py?confId=2736

Latest revision as of 15:19, 8 February 2016


General information

News from URT

UMD release

  • Preparation of the UMD-4 SL6 release

Staged rollout updates

  • dcache 2.13.17
  • voms-admin 3.4.0 (soon)
  • storm 1.11.10 (soon)

Next releases

Operational issues

Aligning Fedcloud sites to the A/R procedures

  • EGI Operations proposal to align Fedcloud sites to the A/R related procedures used for the grid sites
    • based on the availability reliability of monitored services in cloudmon, EGI Operations will start follow up with underperforming sites as we are doing for every grid sites
    • sites will NOT be suspended for a/r performance at least until end of May
  • in parallel EGI Operations will start PROC08 to include cloud probes in the EGI_CRITICAL and EGI profiles used for A/R computations (IN PROGRESS)

The proposed timeline is:

  • February 2016:
    • EGI Operations will check the status of the production cloud services in order to understand which issues (if any) the site has and provide help to NGIs and sites;
    • Start of the integration of cloud probes in EGI CRITICAL profile(current set+openstack): To be agreed with the ARGO team, PROC08 will be followed
  • June 2016:
    • Starting notification of sites eligible for suspension

FedCloud status

Issues at cloud sites

Grouped by NGI, please follow up with sites.

Getting help on issues

Updating Federated_Cloud_Operation wiki

Decommissioning SL5

Decommissioning dCache 2.6

AOB

Monthly Availability/Reliability

  • Last three months report availabile on ARGO
  • Problems follow-up:
    • AfricaArabia: ticket
      • Overall A/R: 12.67/12.67
      • RCs eligible to suspension: EG-ZC-T3, ZA-CHPC, ZA-UJ
    • CERN: ticket
      • Overall A/R: 33.22/33.22
      • there were problems on the regional SAM instances, solved in January
    • NGI_ARMGRID: ticket
      • Overall A/R: 77.43/77.43
    • NGI_DE: ticket
      • the underperforming RCs (SCAI, UNI-DORTMUND) are recovering from the issues
    • NGI_GRNET: ticket
      • RC eligible for suspension: GR-04-FORTH-ICS
    • NGI_IT: ticket
      • the underperforming RC INFN-NAPOLI-PAMELA seems to be recovering, waiting for a confirmation
    • NGI_MARGI: ticket
      • no monitoring data available since January
      • RC eligible for suspension: MK-03-FINKI
    • NGI_MD:
      • Overall A/R: 61.89/61.89
      • the underperforming RC MD-02-IMI is recovering
    • ROC_LA: ticket
      • no monitoring data available for CBPF
      • RC eligible for suspension: UFAL

Next meeting