General information
EGI Core Services delivered in best effort way - New bidding announced
- Broadcast circulated on July 6th
- Gap between EGI-ACE project (ended in June 2023) and the EOSC procurement that is going to fund them (starting from Jan 2024)
- Delivery in maintenance mode, ensuring continuous operation and security system maintenance
- bugs fixing and application of security patches
- no implementation of new features
- no major upgrades
- expected slower response time to the tickets
- The bidding for the EGI Services covering 2024-01 - 2026-12 has been announced
Middleware
UMD
- testing the mechanism for signing the packages.
- new UMD update to be soon released.
Operations
ARGO/SAM
- Monitoring of xrootd endpoints
- some endpoints are exposed outside the site in read-only mode
- the new service type "eu.egi.readonly.xrootd" was created for this purpose (see GGUS 160848)
- new version of the xrootd probe executing only "read" tests: to be added in UMD and deployed in ARGO (GGUS 163071)
- New version of srm probe to be deployed (GGUS 162411) and to be included in UMD (GGUS 162424)
- support for py3 only
- support for SRM+HTTPS
- updated default Top-BDII endpoint
FedCloud
- Some sites were missing from the Accounting Portal even though they were properly publishing the accounting records - Fixed
Feedback from DMSU
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evolution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
Under-performed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=163460
- NCP-LCG2: issues during the migration from DPM to dCache; network issues.
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=162630
- UNI-SIEGEN-HEP: SRM failures
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=163457
- FZJ: the SURL information should be registered on GOCDB
wuppertalprod: SRM tests were failing because SURL information was missing
- NGI_IBERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=163463
- USC-LCG2: they were in downtime due to a security incident occurred in the institute; frequent failures due to a misconfiguration of the ARGUS server; other failures occurred during the upgrade to HTCondor 10.
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=162187
- INFN-COSENZA: The recent performance loss and down were always caused by UPS system failure: the UPS batteries have been replaced. Migration of the storage element completed; new CE failures.
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=162999
INFN-CLOUD-BARI: recovering after the hardware problems- INFN-MILANO-ATLASC: CE and SRM failures: old CEs were disabled, SRM protocol disabled; new failures occurred on CE and webdav endpoints.
- NGI_NDGF: https://ggus.eu/index.php?mode=ticket_info&ticket_id=163954
- UNICPH-NBI: IGTF failures: the reason of the error is under investigation.
Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (Nov 2023):
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=164542
- Australia-T2: SRM failures due to SURL information missing
- NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=164544
- GRNET-OPENSTACK:
- ROC_CANADA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=164543
- CA-WATERLOO-T2: SRM failures with retrieving the SURL information, fixed.
sites suspended:
Documentation
- MediaWiki in read-only mode
- content moved to different locations (confluence and https://docs.egi.eu/)
- confluence space hosting policies and procedures: EGI Policies and Procedures
- EGI Federation Operations
- Change Management, Release and Deployment Management, Incident and Service Request Management, Problem Management, Information Security Management
- Manuals, How-Tos, Troubleshooting, FAQs:
- huge number of material need to be reviewed and in case updated when moved to the new place
- location will be https://docs.egi.eu/providers/operations-manuals/
- Guidelines for providers to join EGI: https://docs.egi.eu/providers/joining/
- Tutorial on submitting HTC jobs: https://docs.egi.eu/users/tutorials/htc-job-submission/
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
Campaign to upgrade HTCondor to version 10 with SSL authentication enabled
- The campaign to decommission HTCondor <= 9 was started
- Upgrade to HTCondor 10 (or 23) with SSL authentication enabled
- Tickets to sites created at the beginning of November 2023
Enabling SSL authentication on HTCondor 9 and 10
The HTCondor team set-up an upgrade procedure to help sites and VOs with the migration from X509 personal certificates to tokens.
Essentially it was created an intermediate step where the plain SSL authentication can be used to authenticate a client' proxy, in addition to the GSI one or to the token one:
In summary, the steps are:
- update to HTCondor 9.0.20
- enable the SSL authz (with priority over GSI)
- map the users' DNs
- test the SSL authz successfully
- update to HTCondor 10.7.0 or later
- install and configure the Check-in plugin
Note the usage in the last step of the HTCondor Feature channel since it is the one supporting the EGI Check-in plugin from 10.4.0.
- In this way the sites can accept clients’ proxies and tokens at the same time while waiting for the supported VOs moving completely to tokens.
The new HTCondor version not yet included in UMD (GGUS 162689). WLCG kindly set-up a dedicated repository for HTCondor 9.0.20.
Early next year WLCG is going to start a new campaign for updating to HTCondor 23
- in the ticket the sites will receive further information if updating to HTCondor 10 first (and already) or if waiting for that campaign.
Important for the sites:
- Please start collecting information from the VOs you support about the DNs that should be mapped on your endpoints
- Mapping for the ops VO - at least the following certificates:
- EGI Monitoring Service:
- "/DC=EU/DC=EGI/C=GR/O=Robots/O=Greek Research and Technology Network/CN=Robot:argo-egi@grnet.gr"
- "/DC=EU/DC=EGI/C=HR/O=Robots/O=SRCE/CN=Robot:argo-egi@cro-ngi.hr"
- EGI Security monitoring:
- "/DC=EU/DC=EGI/C=GR/O=Robots/O=Greek Research and Technology Network/CN=Robot:argo-secmon@grnet.gr"
- EGI Monitoring Service:
Important for the VOs:
- update the condor-client as well in coordination with the sites
Monitoring:
- CE client updated also on ARGO (GGUS 163583)
- To be clarified with the developers if the current version of the probe can work also with Check-in tokens.
Issues:
- some issues with LHCB clients (v. 8.8.10) when SSL is used as a primary authentication. It works fine when on CE it is set SEC_CLIENT_AUTHENTICATION_METHODS=GSI,SSL.
- fixed with HTCondor 9.0.20 version.
New server for dteam VO
- The current VOMS server voms2.hellasgrid.gr is going to be decommissioned at the end of the month
- CERN provided an Indigo IAM server to replace it: https://dteam-auth.cern.ch/
- Users have been imported from the voms server
- for the time being, new memberships will still be handled with the voms server
- The sites need to update their configuration as soon as possible
- Created the rpm wlcg-iam-lsc-dteam containing the .lsc file of the new server
- Follow the instruction in https://twiki.cern.ch/twiki/bin/view/LCG/VOMSLSCfileConfiguration
Configuration example for dteam VO:
----------------------------------------------------------------------
# ls -l /etc/grid-security/vomsdir/dteam/
total 8
-rw-r--r--. 1 root root 102 Dec 6 22:04 voms-dteam-auth.cern.ch.lsc
-rw-r--r--. 1 root root 129 Jan 19 2017 voms2.hellasgrid.gr.lsc
----------------------------------------------------------------------
# cat /etc/grid-security/vomsdir/dteam/voms-dteam-auth.cern.ch.lsc
/DC=ch/DC=cern/OU=computers/CN=dteam-auth.cern.ch
/DC=ch/DC=cern/CN=CERN Grid Certification Authority
----------------------------------------------------------------------
# cat /etc/grid-security/vomsdir/dteam/voms2.hellasgrid.gr.lsc
/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr
/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2016
----------------------------------------------------------------------
- The information about the "vomses" file for the UI will be added to the wiki mentioned above within a few days, waiting for more sites updating their configuration.
- Broadcast circulated to the sites on Dec 7th
DPM Decommission and migration
- Suppor of DPM ended in June 2023
- CERN IT will provide a minimal support to DPM until the EOL of CentOS 7, with very little effort:
- only critical issues will be looked into
- only critical issues will be looked into
- CERN IT will provide a minimal support to DPM until the EOL of CentOS 7, with very little effort:
- DPM provides a migration script to dCache (migration guide)
- In September 2022 opened tickets to the sites to plan the migration and decommission:
- tickets list (39 out of 57 were solved, 1 unsolved)
- Migrations still pending
- Australia-T2
- BEIJING-LCG2
- BG05-SUGrid (EOS)
- CYFRONET-LCG2 (EOS)
- GRIF (EOS)
- INFN-COSENZA (dCache)
- INFN-FRASCATI (dCache)
- INFN-ROMA1 (dCache)
- NCP-LCG2 (dCache)
- UKI-LT2-Brunel (XrootD/CEPHFS)
- UNIBE-LHEP (dCache)
By Q3 2023
- UKI-SCOTGRID-DURHAM (XrootD/CEPHFS)
- By Q4 2023
- PSNC (EOS)
By Q1 2024
- UKI-NORTHGRID-MAN-HEP (XrootD/CEPHFS)
- not clear/no reply
- ATLAND
- GR-07-UOI-HEPLAB
- Please note that after June 30th no support is provided with the migration to dCache in case of issues.
New benchmark HEPscore23
The benchmark HEPscore23 is replacing the old Hep-SPEC06
Recent activities:
- Some tests in particular with sites sending normalised reports were performed.
- APEL client 1.9.2 released that adds basic HEPscore23 publishing using existing message format
- It needs to be added to UMD
- APEL server release candidate in testing
- Liaising with Portal on setting up testing with them
- this new version allows the aggregation of the accounting records by benchmark to monitor the move to the new benchmark over the time
- When the tests are successful, final release of APEL server update and of the Portal
- Information for testing the publication of accounting records with the new benchmark:
- Testing a fix in ARC-CE for the proper configuration of HEPscore23
- Please contact us if you'd like to make tests with the new benchmark
HEPSCORE application:
- link to the gitlab page: https://gitlab.cern.ch/hep-benchmarks/hep-score
April GDB:
June WLCG Operations Coordination meeting:
Monitoring of webdav and xrootd protocols/endpoints
- 93 tickets were created requesting to update the information for monitoring webdav and xrootd endpoints
- Extension Properties to set:
- webdav:
- Name: ARGO_WEBDAV_OPS_URL
- Value: webdav URL containing also the VO ops folder, for example: https://darkstorm.cnaf.infn.it:8443/webdav/ops or https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ops/
- xrootd:
- Name: ARGO_XROOTD_OPS_URL
- Value: XRootD base SURL to test (the path where ops VO has write access, for example: root://eosatlas.cern.ch//eos/atlas/opstest/egi/, root://recas-se-01.cs.infn.it:1094/dpm/cs.infn.it/home/ops/, root://dcache-atlas-xrootd-ops.desy.de:2811/pnfs/desy.de/ops or similar)
- webdav:
- Reference: https://docs.egi.eu/internal/configuration-database/adding-service-endpoint/#webdav
- Link to the broadcast circulated in October 2022
- 77 tickets were solved (5 Unsolved)
- Extension Properties to set:
- From Nov 6th ARGO retrieves the endpoint url information only from the extension properties.
AOB
Next meeting
December