Agenda-07-11-2016

From EGIWiki
Jump to: navigation, search


General information

UMD/CMD/Preview

  • UMD 4.3.0 'October' release, release candidate ready, to be released by end of this week, including:
    • ARC, GFAL2, XROOT, Davix, dCache, ARGUS, Gridsite, edg-mkgrid, umd-release for CentOS7
    • ARC, GFAL2, XROOT, Gridsite, edg-mkgrid, umd-release, GRAM5, DPM, Globus GridFTP, globus-default-security, MyProxy, Davix, dCache, VOMS, YAIM core, lcas-lcmaps for SL6
  • please start using UMD4/SL6 or UMD4/CentOS7 instead of UMD3/SL6
    • Debian not used anymore, SL5 only security fixes, SL6 is available in UMD4 as well
    • UMD4/SL6 contains products of UMD3/SL6 which give support for the next year at least, all the unsupported products are not in UMD4/SL6 (please let us know if we are missing specific products that we might have skipped!)
      • for some unsupported products, we are investigating how to replace them with equivalnet products in UMD4/SL6 (see WMS)
      • list of all the products that are in UMD3 but not migrated to UMD4 is available, to be improved: https://wiki.egi.eu/wiki/UMD3_UMD4_products

Preview repository

Note: EGI provides the preview repository without any additional quality assurance process, but the products are released as they are provided by the product team. EGI recommends the use of the UMD repositories, which contain software verified through the quality assurance process of UMD.

Operations

Downtimes due to the vulnerability CVE-2016-5195: request an A/R recomputation

All the resource centres that were affected by the vulnerability CVE-2016-5195 and that declared a downtime between 2016-10-20 16:00 UTC and 2016-10-31 18:00 UTC are invited to request a recomputation of A/R figures for the days in which the downtime was ongoing.

In according to the procedure https://wiki.egi.eu/wiki/PROC10_Recomputation_of_SAM_results_or_availability_reliability_statistics you need to fill this form: http://argo.egi.eu/lavoisier/recomputation

and indicate:

  • Your name and email
  • the site(s) affectected by the problem
  • a description of the problem
  • the profile affected
  • the starting and ending time of the problem (including day and hour in UTC)

In case of problems with the web form, please submit a GGUS ticket to ARGO/SAM support unit providing the same information.

Software upgrades for OpenStack cloud RCs

  • keystone-VOMS and cloud-info-provider updates available, need to be installed on all OpenStack sites
  • as keystone-VOMS last version is only compatible with Liberty and Mitaka, in case OpenStack is Kilo (or older) an upgrade plan of OpenStack has been asked
  • according to EGI policies https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software OpenStack Kilo or older should NOT be running on the infrastructure! we are asking for discussing this point at the next OMB (October 27)
  • as many sites are finding difficulties in planning upgrades against the very tight release cycle of OpenStack, please come with suggestion and reply with details in the tickets in order to shape the best (shared) proposal
  • ticket campaign ONGOING for all OpenStack sites, asked to upgrade to keystone-VOMS >=8.0.3, cloud-info-provider >=0.6, and plans for the future (OpenStack version currently deployed, plans for upgrades, usual specific RC upgrade schedule), some results:
      • TR-FC1-ULAKBIM FIXED, using FedCloud Appliance (updated version), using Liberty
      • IN2P3-IRES FIXED, using Liberty, upgrading to Mitaka
      • INFN-PADOVA-STACK FIXED, using Liberty, following Indigo indications about the version (staying with Liberty at the moment)
      • NCG-INGRID-PT, using Mitaka, up to date
      • IISAS-GPUCloud up to date, using Liberty, upgrading to Mitaka
      • FZJ using Kilo, not up to date, no plans for migrations due to user isolation bug https://ggus.eu/index.php?mode=ticket_info&ticket_id=121685 -> fixed in Newton, not in Mitaka (investigating about a backport to Mitaka)
      • BIFI using Grizzly, upgrading to Mitaka by mid of Nov
      • RECAS-BARI using Juno, no upgrades so far waiting for resolution of the user isolation bug, preparing upgrade to Mitaka from Ubuntu 16.04 LTS
      • IFCA-LCG2 FIXED, using Liberty
      • IISAS-FedCloud, downtime, upgrading to Mitaka from Ubuntu 16.04 LTS
      • SCAI FIXED, using Kilo, planning short term upgrade (no version specified yet)
      • CETA-GRID using Icehouse, planning mid-term upgrade
      • INDIGO-CATANIA-STACK, INFN-CATANIA-STACK, CYFRONET-CLOUD didn't reply yet
  • feedback/issues reported from sites:
    • setup a pre-production environment that replicates the production environment and we use this testbed to implement/test/verify the upgrade procedure
    • suggested pilot RCs provide and share docs and guidelines when approaching to EOLs of CMFs (NCG-INGRID-PT)
    • New OpenStack versions are often buggy, need some grace time before upgrading
    • Unavailability of EGI Components for new versions
    • OpenStack shared with other projects and integrated with other tools
    • User identity auth issue still unsolved https://ggus.eu/index.php?mode=ticket_info&ticket_id=121685

Monthly Availability/Reliability

  • October A/R figures not definitive yet
  • Underperformed sites as results in August A/R report:
    • CERN: SRM servers overloaded, low A/R figures since June. Last week they removed the EGI scope tag from the SRM services in GOC-DB, the statistics are improving: GGUS 122596
    • AfricaArabia GGUS 123806
      • ZA-UJ they solved the CREAM issues, statistics during first days of November are good
    • AsiaPacific: GGUS 124368
      • IR-IPM-HEP GGUS 124391 gridmapfile problems, statistics are improving
      • KR-KISTI-GSDC-01 GGUS 124392 not aware about the new ARGO framework
      • PK-CIIT GGUS 124393 network configuartion changes, statistics are improving
      • TW-eScience GGUS 124394 miscellaneous issues, then network problems not easy to solve (it would require a long downtime), suspended this morning
    • NGI_DE GGUS 123836, GGUS 124370
      • UNI-DORTMUNT (NGI_DE): migration to new site-bdii and CREAM-CE; UNKNOWN status returned by CREAM probes
      • TUDresden-ZIH: set-up a new CREAM-CE, the CA probes were failing. Issues on SRM service
      • SCAI: decommissioning the HTC services, issues with OCCI probes
      • LRZ: the nagios GRAM probes were contacting the wrong port
      • mainzgrid: CREAM and network issues, powercuts producing problems with GPFS cluster
    • NGI_IT: GGUS 123531
      • INFN-CAGLIARI: underperforming for more than 3 months, no further feedback provided, eligible to suspension
    • NGI_IBERGRID: GGUS 124371
      • BIFI: upgrade in cloud infrastructure during august
      • CIEMATIC-TIC: decommissioning the storqage element
    • NGI_NL: GGUS 123532
      • BelGrid-UCL: UNKNOWN status returned by CREAM probes, asked a recomputation
    • NGI_PL: GGUS 124374
      • ICM: SRM issues solved, the statistics are improving
    • NGI_MARGI: unresponsive, we suspended the sites MK-03-FINKI and MK-03-FINKICLOUD

ARGO proposal to use GOCDB as the only source of topology information

  • slides in October Operations Meeting agenda
  • ARGO Proposal (September OMB)
  • The plan:
    • Develop new features on GOC-DB and ARGO:
      • GOC-DB: create the new�boolean attribute "Monitored" on the ServiceEndpoints
      • ARGO: change the GOCDB connectors in order to take into account also this attribute
    • Start with the uncertified sites, then all the others
    • Interim period: consuming both BDII and GOC-DB
      • If no issues, only GOC-DB will be kept as topology source
  • Proposed timescale:
    • November
      • Complete developments on the ARGO/GOCDB sides
      • New report will be configured on ARGO for uncertified sites
    • December:
      • Uncertified sites will be requested to update their information on GOCDB
      • As information is added in the GOCDB, uncertified sites/services will be picked up by the ARGO Monitoring Engine and they will start to be monitored
    • End of February:
      • Present at the OMB the results of the pilot and decision for the next step regarding the production infrastructure
  • For site-admins: use the GOC-DB test instance for testing the procedure

VAPOR

AOB

Next meeting