Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-2020-11-16"

From EGIWiki
Jump to navigation Jump to search
 
(19 intermediate revisions by 2 users not shown)
Line 11: Line 11:
** https://wiki.egi.eu/wiki/Next_middleware_release
** https://wiki.egi.eu/wiki/Next_middleware_release


* UMD-4.12.0 regular release is almost ready (testing RC)
* UMD4 release in preparation
** CVMFS 2.7.3, ARCCE 6.7.0, gfal 2.18.1, davix 0.7.6, xrootd 4.12.3
** StoRM, VOMS, BDII update, dCache
** next releases: update for VOMS on C7, StoRM on C7, BDII C7/SL6
** VERY URGENT
 


* feedback on software automation from the EGI Conference


== Preview repository  ==
== Preview repository  ==
Line 21: Line 21:
** '''[[Preview 1.29.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/1.0/1.29.0/ AppDB info] (sl6):  ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
** '''[[Preview 1.29.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/1.0/1.29.0/ AppDB info] (sl6):  ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
** '''[[Preview 2.29.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/2.0/2.29.0/ AppDB info] (CentOS 7):  ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
** '''[[Preview 2.29.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/2.0/2.29.0/ AppDB info] (CentOS 7):  ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
* included in the upcoming release: DPM, VOMS


= Operations  =
= Operations  =
Line 28: Line 29:
** '''(14th Sept)''' 70 endpoints, 14 CRITICAL, success rate is about 80%
** '''(14th Sept)''' 70 endpoints, 14 CRITICAL, success rate is about 80%
** '''Oct 1st: included in the [https://poem.egi.eu/ui/public_metricprofiles/ARGO_MON_CRITICAL ARGO_MON_CRITICAL] profile (A/R computation)'''
** '''Oct 1st: included in the [https://poem.egi.eu/ui/public_metricprofiles/ARGO_MON_CRITICAL ARGO_MON_CRITICAL] profile (A/R computation)'''
*** (Oct 12th) 71 endpoints, success rate (including WARNING) 85.9%
*** (Nov 16th) 76 endpoints, success rate (including WARNING) 84.2%
** working on the probe for the host certificate validity check: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=147386 GGUS 147386]
** working on the probe for the host certificate validity check: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=147386 GGUS 147386]


Line 37: Line 38:


== Feedback from DMSU  ==
== Feedback from DMSU  ==
== Upgrade of central argus node ==
Message sent to administrators of NGIs argus servers:
* A replacement of the central argus servers (lcgargus03.cern.ch & lcgargus04.cern.ch), which are behind the argus.cern.ch & lcgargus.cern.ch aliases, is planned for Tuesday 17th November 2020 between 10:00 and 12:00.
* This replacement should be transparent, requiring no change of configuration on your side. Please report any issue you have with your NGI argus server.
* The two new hosts, lcargus21.cern.ch and lcgargus22.cern.ch are already ready for production, you can remotely test them if you want. The operation next week is simply a change of alias.


== Monthly Availability/Reliability ==
== Monthly Availability/Reliability ==
Line 49: Line 56:
*** egee.irb.hr: in the process of a major upgrade from CentOS 6 to CentOS 7, some delays.
*** egee.irb.hr: in the process of a major upgrade from CentOS 6 to CentOS 7, some delays.
** NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148957
** NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148957
*** INFN-CATANIA
*** INFN-CATANIA: SRM problems
*** INFN-CATANIA-STACK
*** INFN-CATANIA-STACK: recovered
*** INFN-PADOVA
*** INFN-PADOVA: decommissioning process
**NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147311
**NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147311
***WCSS64
***WCSS64: failures on QCG and CREAM CEs
**NGI_UK:
**NGI_UK:
***'''UKI-NORTHGRID-SHEF-HEP''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=146455 ARC-CE re-installed, some condor problems to fix, improving...  
***'''UKI-NORTHGRID-SHEF-HEP''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=146455 ARC-CE re-installed, some condor problems to fix, improving...  
***'''UKI-SOUTHGRID-SUSX''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=144720 Migration from CREAM to ARC, WN migration to CentOS7; SRM to be decommissioned; ARC-CE was failing the IGTF test, then solved; site-bdii failures.
***'''UKI-SOUTHGRID-SUSX''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=144720 Migration from CREAM to ARC, WN migration to CentOS7; SRM to be decommissioned; ARC-CE was failing the IGTF test, then solved; site-bdii failures. new failures on ARC-CE.
** ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148515
** ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148515
*** ATLAND: downtime due to powercut and quarantine
*** ATLAND: downtime due to powercut and quarantine
** ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148956
** ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148956
*** CBPF: SRM failures due to information not properly published
*** CBPF: SRM failures due to information not properly published. Physical access to facilities restricted due to COVID measures; planned a DPM update in December.
** NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148958
** NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148958
*** UA-NSCMBR: IGTF outdated; improving...
*** UA-NSCMBR: IGTF outdated; improving...
*Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''October 2020'''):
*Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''October 2020'''):
** AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149353
*** JP-KEK-CRC-02: migration from CREAM-CE to ARC-CE, some problems with the ARC-CE which has been marked then as "not production"
** CERN-PROD: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149351 webdav failures
** NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149352
*** INFN-LECCE
*** TRIGRID-INFN-CATANIA
** NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149356
*** UA_BITP_ARC: bdii freshness failures
** ROC_LA https://ggus.eu/index.php?mode=ticket_info&ticket_id=149355
*** SUPERCOMPUTO-UNAM




Line 91: Line 108:
* [https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software PROC16 Decommission of unsupported software]
* [https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software PROC16 Decommission of unsupported software]
* deadline: '''end of July'''
* deadline: '''end of July'''
* Catalin is in contact with ARC team to get a webinar on ARC administration, scheduled (to be confirmed) for July 6th please contact operations@ for information
 


* Status
* Status
Line 105: Line 122:
|-
|-
| 2020-10-12 || 32 || 19 || -
| 2020-10-12 || 32 || 19 || -
|-
| 2020-11-16 || 26 || 16 || -
|}
|}


Line 118: Line 137:


== Next meeting  ==
== Next meeting  ==
Nov 16th, 2020 https://indico.egi.eu/event/5100/
In 2021

Latest revision as of 14:11, 16 November 2020

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


Back to https://wiki.egi.eu/wiki/Operations_Meeting

General information

Middleware

UMD

  • UMD4 release in preparation
    • StoRM, VOMS, BDII update, dCache
    • VERY URGENT
  • feedback on software automation from the EGI Conference

Preview repository

  • released on 2020-10-09
    • Preview 1.29.0 AppDB info (sl6): ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
    • Preview 2.29.0 AppDB info (CentOS 7): ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
  • included in the upcoming release: DPM, VOMS

Operations

ARGO/SAM

FedCloud

Feedback from DMSU

Upgrade of central argus node

Message sent to administrators of NGIs argus servers:

  • A replacement of the central argus servers (lcgargus03.cern.ch & lcgargus04.cern.ch), which are behind the argus.cern.ch & lcgargus.cern.ch aliases, is planned for Tuesday 17th November 2020 between 10:00 and 12:00.
  • This replacement should be transparent, requiring no change of configuration on your side. Please report any issue you have with your NGI argus server.
  • The two new hosts, lcargus21.cern.ch and lcgargus22.cern.ch are already ready for production, you can remotely test them if you want. The operation next week is simply a change of alias.

Monthly Availability/Reliability


  • sites suspended:

IPv6 readiness plans

CREAM-CE Decommission

ARC Middleware 5 end of support, migration to ARC 6


  • Status
Date Number of endpoints in BDII Number of GGUS tickets Issues
2020-06-08 75 42 Some ARC endpoints publish a timestamp instead of a version like 5.X.Y; we can fairly assume they are ARC6 nightly builds, but we're going to close the corresponding tickets after explicit confirmation from the site admin.
2020-07-13 53 29 -
2020-09-14 34 18 -
2020-10-12 32 19 -
2020-11-16 26 16 -

Storage accounting

Many sites stopped the publication of storage accounting records. Opened 57 tickets to fix that.

AOB

Next meeting

In 2021