Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-13-02-2017"

From EGIWiki
Jump to navigation Jump to search
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{TOC right}}  
{{Template:Op menubar}} {{Template:Doc_menubar}} {{TOC_right}}
[[Category:Grid Operations Meetings]]


= General information =
= General information =
Line 6: Line 7:
* the EGI Operations Meeting schedule for '''first half of 2016''' is available on Indico: https://indico.egi.eu/indico/categoryDisplay.py?categId=32 and on the new summary page: https://wiki.egi.eu/wiki/Operations_Meeting
* the EGI Operations Meeting schedule for '''first half of 2016''' is available on Indico: https://indico.egi.eu/indico/categoryDisplay.py?categId=32 and on the new summary page: https://wiki.egi.eu/wiki/Operations_Meeting


= UMD/CMD/Preview  =
= Middleware  =
 
== CMD ==


* CMD-OS (OpenStack) released http://repository.egi.eu/category/os-distribution/cmd-os-1/
* CMD-OS (OpenStack) released http://repository.egi.eu/category/os-distribution/cmd-os-1/
** a [https://wiki.egi.eu/wiki/KEDB#.5B2017-02-10.5D_apt_returns_.22Unable_to_find_expected_entry_.27main.2Fbinary-i386.2FPackages.27on_CMD-OS_for_Trusty_.28OPEN.29 known issue] has been documented with respective workaround, to be fixed in the next update
** update planned by March
** Keystone-VOMS 9.0.3
** Keystone-VOMS 9.0.3
** ooi 0.3.2
** ooi 0.3.2
** gridsite 2.3.3
** gridsite 2.3.3
** Cloud BDII Information provider 0.6.12
** Cloud BDII Information provider 0.6.12
* Xrootd in EPEL-testing ( 4.5.0) looking for sites to test it
* starting working on CMD for OpenNebula
 
== UMD ==
 
* working on UMD 4.4.0 (February release)
** FTS 3.5.7, ARC 15.03.10, DPM 1.8.11 (SL6) and 1.9 (C7), more coming
* Xrootd in EPEL-stable ( 4.5.0) looking for sites to test it
* Update to frontier-squid-3 in UMD4
* Update to frontier-squid-3 in UMD4
** major upgrade and it has some incompatibilities with frontier-squid-2 based versions, as detailed here: https://twiki.cern.ch/twiki/bin/view/Frontier/InstallSquid#Upgrading
** major upgrade and it has some incompatibilities with frontier-squid-2 based versions, as detailed here: https://twiki.cern.ch/twiki/bin/view/Frontier/InstallSquid#Upgrading
** https://ggus.eu/index.php?mode=ticket_info&ticket_id=125691
** the two versions will have different package names


== Preview repository ==
== Preview repository ==
Line 35: Line 46:


== IPv6 readiness plans ==
== IPv6 readiness plans ==
* December OMB presentation: https://indico.egi.eu/indico/event/2815/ WLCG is going to deploy their services under dual-stack mode (v4+v6) by April 2018
* EGI Operations started checking the core services against IPv6 compatibility
* EGI Operations is going to assess the IPv6 readiness of the EGI infrastructure
* early draft plan
** Technology Providers: assess the IPv6 readiness of the middleware products
** UMD team: perform verification of all services under IPv6 setup
** EGI core services developers: assess the IPv6 readiness of the products
** EGI core services hosts: assess the IPv6 readiness of the services
** '''Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)'''
*** '''NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan'''


== Decommissioning of dCache 2.10 ==  
== Decommissioning of dCache 2.10 ==  


* start decommissioning campaign
* support for the dCache 2.10 ended at December 2016
* instruction on how to migrate
* according to EGI policies, dCache 2.10 must be decommissioned https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software
* broadcast sent on Feb2 https://operations-portal.egi.eu/broadcast/archive/1631 + email sent on Feb7 to noc-managers mailing list
* sites to upgrade their 2.10 endpoints to a newer "golden release" of dCache
** 2.13, whose support ends on July 2017, which means in about 7 months from now, or
** 2.16, whose support ends on May 2018: here take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
* decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.10 instances and follow up with the NGIs/sites; '''tickets will be opened this week'''
* deadline is '''end of April''', having all the sites more than 2 months to plan and perform the upgrade
* probe will be WARNING for two months until April 17th, when it will switch to CRITICAL
* in May EGI Operations will open tickets against sites still publishing dCache 2.10 and follow up on the upgrade plans
* reference: https://www.dcache.org/downloads/1.9/index.shtml


== Testing of the storage accounting ==
== Testing of the storage accounting ==
Line 50: Line 80:
[[Storage accounting testing| List of sites]] available for test.
[[Storage accounting testing| List of sites]] available for test.


== Software upgrades for OpenStack cloud RCs (TO BE UPDATED) ==
== Proposal to modify the declaration of scheduled interventions ==
 
* keystone-VOMS and cloud-info-provider updates available, need to be installed on '''all OpenStack sites'''
* as keystone-VOMS last version is only compatible with Liberty and Mitaka, '''in case OpenStack is Kilo (or older) an upgrade plan of OpenStack has been asked'''
* according to EGI policies  https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software '''OpenStack Kilo or older should NOT be running on the infrastructure!''' we are asking for '''discussing this point at the next OMB (October 27)'''
* as many sites are finding difficulties in planning upgrades against the very tight release cycle of OpenStack, '''please come with suggestion and reply with details in the tickets''' in order to shape the best (shared) proposal
* '''ticket campaign ONGOING for all OpenStack sites''', asked to upgrade to keystone-VOMS >=8.0.3, cloud-info-provider >=0.6, and plans for the future (OpenStack version currently deployed, plans for upgrades, usual specific RC upgrade schedule), '''UPDATE''':
** INDIGO-CATANIA-STACK and INFN-CATANIA-STACK moving to Mitaka (no plan)
** IISAS-GPUCloud Liberty
** FZJ user isolation bug fixed in Newton, not in Mitaka (investigating about a backport to Mitaka), waiting for solution
** IN2P3-IRES Mitaka
** CETA-GRID using Icehouse, planning mid-term upgrade (Newton?)
** IISAS-FedCloud Mitaka from Ubuntu 16.04 LTS installed
** BIFI upgrading to Mitaka
** SCAI upgraded to Mitaka
** INFN-PADOVA-STACK FIXED using Liberty
** IFCA-LCG2 using Liberty
** CYFRONET-CLOUD running Juno, evaluating Mitaka
** TR-FC1-ULAKBIM FIXED using Liberty
** NCG-INGRID-PT, using Mitaka, up to date
** RECAS-BARI preparing upgrade to Mitaka from Ubuntu 16.04 LTS (deadline by  end of year)
** 100IT Liberty (evaluating Mitaka)


== Proposal to modify the declaration of scheduled downtimes ==
Currently (see [[MAN02 Service intervention management]]) scheduled interventions (of any duration) MUST be declared at least 24 hours in advance, specifying reason and duration; any intervention declared less than 24 hours in advance will be considered unscheduled.


Currently (see [[MAN02 Service intervention management]])
WLCG proposed the [https://indico.cern.ch/event/607744/contributions/2449767/subcontributions/218703/attachments/1402467/2141097/LongDowntimes-170126.pdf following modification]:
* a scheduled intervention shorter than 5 days must be declared at least 24 hours in advance
* a scheduled intervention longer than 5 days must be declared at least 1 month in advance
* any other intervention that don't fulfill the rules above will be considered unscheduled


== Monthly Availability/Reliability ==
== Monthly Availability/Reliability ==
Line 124: Line 136:
== Next meeting ==
== Next meeting ==


* '''Feb 13th, 2016''' https://indico.egi.eu/indico/event/3140/
* EGI is working on bringing up a pilot EGI-branded dockerhub service
* '''Mar 13th, 2017''' https://indico.egi.eu/indico/event/3141/
* '''new calendar available until June 2017''' https://indico.egi.eu/indico/category/32/
* '''new calendar available until June 2017''' https://indico.egi.eu/indico/category/32/

Latest revision as of 14:27, 25 October 2017

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


General information

Middleware

CMD

UMD

  • working on UMD 4.4.0 (February release)
    • FTS 3.5.7, ARC 15.03.10, DPM 1.8.11 (SL6) and 1.9 (C7), more coming
  • Xrootd in EPEL-stable ( 4.5.0) looking for sites to test it
  • Update to frontier-squid-3 in UMD4

Preview repository

Released on 2017-01-19:

Note: EGI provides the preview repository without any additional quality assurance process, but the products are released as they are provided by the product team. EGI recommends the use of the UMD repositories, which contain software verified through the quality assurance process of UMD.

Operations

Feedback from Helpdesk

  • [2016-12-13] Services using JGlobus fail with RFC proxies from certificates from some CAs

IPv6 readiness plans

  • December OMB presentation: https://indico.egi.eu/indico/event/2815/ WLCG is going to deploy their services under dual-stack mode (v4+v6) by April 2018
  • EGI Operations started checking the core services against IPv6 compatibility
  • EGI Operations is going to assess the IPv6 readiness of the EGI infrastructure
  • early draft plan
    • Technology Providers: assess the IPv6 readiness of the middleware products
    • UMD team: perform verification of all services under IPv6 setup
    • EGI core services developers: assess the IPv6 readiness of the products
    • EGI core services hosts: assess the IPv6 readiness of the services
    • Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
      • NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan

Decommissioning of dCache 2.10

  • support for the dCache 2.10 ended at December 2016
  • according to EGI policies, dCache 2.10 must be decommissioned https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software
  • broadcast sent on Feb2 https://operations-portal.egi.eu/broadcast/archive/1631 + email sent on Feb7 to noc-managers mailing list
  • sites to upgrade their 2.10 endpoints to a newer "golden release" of dCache
    • 2.13, whose support ends on July 2017, which means in about 7 months from now, or
    • 2.16, whose support ends on May 2018: here take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
  • decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.10 instances and follow up with the NGIs/sites; tickets will be opened this week
  • deadline is end of April, having all the sites more than 2 months to plan and perform the upgrade
  • probe will be WARNING for two months until April 17th, when it will switch to CRITICAL
  • in May EGI Operations will open tickets against sites still publishing dCache 2.10 and follow up on the upgrade plans
  • reference: https://www.dcache.org/downloads/1.9/index.shtml

Testing of the storage accounting

As discussed during the January OMB, the APEL team would need one site per NGI for testing the storage accounting. The eligible sites are the ones providing either dCache or DPM storage elements.

More information can be found in the following wiki: https://wiki.egi.eu/wiki/APEL/Storage

List of sites available for test.

Proposal to modify the declaration of scheduled interventions

Currently (see MAN02 Service intervention management) scheduled interventions (of any duration) MUST be declared at least 24 hours in advance, specifying reason and duration; any intervention declared less than 24 hours in advance will be considered unscheduled.

WLCG proposed the following modification:

  • a scheduled intervention shorter than 5 days must be declared at least 24 hours in advance
  • a scheduled intervention longer than 5 days must be declared at least 1 month in advance
  • any other intervention that don't fulfill the rules above will be considered unscheduled

Monthly Availability/Reliability

  • Underperformed sites in the past A/R reports with issues not yet fixed:
    • AsiaPacific GGUS 125427
      • TW-NCUHEP: site-bdii unstable
    • NGI_DE GGUS 125430
      • UNI-SIEGEN-HEP: waiting for the fix for CREAM probe.
    • NGI_NL: GGUS 123532
      • BelGrid-UCL: UNKNOWN status returned by CREAM probes, waiting for the fix for CREAM probe.
    • NGI_UA:
      • UA-NSCMBR GGUS 125839: on nagios the ARC-CE tests are OK, on ARGO it is reported an UNKNOWN status
  • Sites suspended after past A/R reports:
    • TUDresden-ZIH (NGI_DE)
  • Underperformed sites after 3 consecutive months and underperformed NGIs:

ARGO proposal to use GOCDB as the only source of topology information

  • Timescale:
    • New GOC-DB release on Dec 7th including a boolean ‘monitored’ flag for the service endpoints
    • Then creation of a web UI view for uncertified sites in ARGO
    • Uncertified sites will be asked to fill in the service endpoints information. Follow the How to add URL service endpoint information into GOC-DB
    • As information is added in the GOCDB, uncertified sites/services will be picked up by the ARGO Monitoring Engine and they will start to be monitored
    • By Q2 2017: support for multiple service endpoints

VAPOR

AOB

Next meeting