General information
Middleware
UMD
- CentOS Stream 8 now the recommended OS for new installations
- C8->CS8 migrations recommended
- CS9 will be supported by CERN and FNAL
- middleware: recommended path is C7->CS9 (we will skip CS8)
Operations
ARGO/SAM
- Integration of EOS storage element: GGUS 154335
- Monitoring probe status:
We are testing the monitoring probe for the EOS Storage endpoints (GGUS 156251) which uses the XRootD interface (see https://github.com/EGI-Federation/nagios-plugins-xrootd )
On GOCDB the EOS endpoints are registered as XrootD service endpoints.
In order to allow the proper execution of the probe, we would like you to:
enable the ops VO on your endpoints
for each EOS (Xrootd) service endpoint add the following Extension Property:
Name: XROOTD_URL
Value: XRootD base SURL to test (the path where ops VO has write access, for example: root://eospps.cern.ch:1094/eos/pps/ or similar)
Please do the same even if you provide an XRootD interface with a different type of storage element.
- Test results on the devel instance
FedCloud
Feedback from DMSU
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evoulution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156745
- TW-NCUHEP: webdav failures
- NGI_BG: https://ggus.eu/index.php?mode=ticket_info&ticket_id=155859
- BG05-SUGrid: instability with webdav and HTCondorCE; some idle jobs prevent the correct execution of the tests. In downtime until 20th March. In April batch system issues, unscheduled downtime due to maintenance to power supply.
- NGI_IBERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156741
- CETA-GRID: issues due to an increase of utilisation. Re-installing the infrastructure.
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156742
- RAL-LCG2: authz issues with the webdav endpoint
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156745
Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (April 2022):
- NGI_RO: https://ggus.eu/index.php?mode=ticket_info&ticket_id=157127
- NIHAM: webdav failures: xrootd@dpmdisk service failed to start at system startup after a kernel update. Fixed.
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=157128
- UKI-SOUTHGRID-BRIS-HEP: webdav issues affecting also production VOs, slow progress with fixing the issues due to lack of effort.
sites suspended: ITEP (security reasons)
New CERN Grid CA
- IGTF 1.116 was released on April 25th introducing a new CERN Grid CA
- The new certificate was put into production on May 2nd
- This change affects middleware products relying on older versions of the "canl-java" library for which a service restart is needed to make use of the new CA (and the new IGTF release in general)
- Argus
- dCache
- StoRM
- VOMS-Admin
- On May 2nd a broadcast was circulated asking to restart the NGI Argus endpoints which were failing the tests
- the following versions of dCache do not need a restart:
myproxy-6.2.9-8 restores backward compatibility
- Last week WLCG found out that the version of MyProxy released in EPEL (6.2.9-7) was working only with 6.2.9-7 clients
- Issue reported to Grid CF.
- A fix was released, and a version backward-compatible (6.2.9-8) is now in EPEL 7 and 8.
Documentation
- MediaWiki in read-only mode
- content to be moved to different locations (confluence and https://docs.egi.eu/)
- confluence space hosting policies and procedures: EGI Policies and Procedures
- EGI Federation Operations
- Change Management, Release and Deployment Management, Incident and Service Request Management, Problem Management, Information Security Management
- Manuals, How-Tos, Troubleshooting, FAQs:
- huge number of material need to be reviewed and in case updated when moved to the new place
- location will be https://docs.egi.eu/providers/operations-manuals/
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
Transition from X509 to federated identities (AARC profile token)
- WLCG is testing aai tokens (WLCG profile) as authz system for accessing the middleware, with Indigo IAM as a replacement of VOMS
- In Feb 2022 OSG will fully move to token-based AAI, abandoning X509 certificates
- HTCondorCE: replacement of Grid Community Toolkit
- The long-term support series (9.0.x) from the CHTC repositories will support X509/VOMS authentication through
Sep 2022Jan 2023 - Starting in 9.3.0 (released in October), the HTCondor feature releases does NOT contain this support
- EGI sites are recommended to stay with the long-term support series for the time being
- The long-term support series (9.0.x) from the CHTC repositories will support X509/VOMS authentication through
What we need to know in preparation of the transition:
Checking the middleware compliance with the AARC Profile token:
- ARC-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154958
- So far focusing on the WLCG profile, which is built upon the AARC profile, so this should cover everything.
- Argus: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154959
- no clear plans yet
- dCache: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154960
- dCache does support authorisation statements, as described by WLCG AuthZ-WG's JWT profile.
- supporting AARC-style group membership statements is on the TODO list
- DPM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154961
- The DPM is in maintenance mode to be phased out by ~2024. There is no effort for implementing new functionality, which furthermore would be short-lived.
- HTCondor-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154962
- HTCondor-CE supports WLCG tokens, so it should work also with the AARC profile token. Some tests are needed.
- STORM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154963
- Only the StoRM-WebDAV component supports token-based authorization.
- At the moment only scopes and groups foreseen by the WLCG Token Profile are recognized by the authorization policy engine, but adding support for the AARC profile is planned.
- At the moment there is no plan to add token support to the SRM component.
- Finalizing a specification for a Tape REST API to replace the functionally-equivalent SRM one.
- That implementation will have token support.
- Only the StoRM-WebDAV component supports token-based authorization.
Circulated a survey to check the awareness and readiness of users communities:
- which GRID services do they use
- Compute: ARC-CE
- Compute: HTCondorCE
- Storage: SRM
- Storage: webdav/http
- Storage: GridFTP
do you interact directly with Compute and Storage services (e.g., through command line) or do you use a tool (e.g., DIRAC, data transfer tools, data management tools, etc.) available to your VO?
- do you own and need a personal X509 certificate to access the services or can you use a federated identity (e.g., institutional identity, social account, etc.)
- are they familiar with AAI identities
- are they ready for the switch
Broadcast sent to the VO on Jan 28th (it requires login): https://operations-portal.egi.eu/broadcast/archive/2896
- reply so far from:
- atlas
- biomed
- enea
- eiscat.se
- glast.org (srm, gfal-utils)
- ildg (srm, gridftp; direct access with x509)
- Km3Net
- lhcb
- project.nl
- vo.france-grilles.fr
- vo.grapevine.eu
- vo.hess-experiment.eu
- vo.complex-systems.eu
- VOCE
- usage of DIRAC in general, a few VOs access directly to the services
- a training over federated identities for users (and sys-admins) could be useful
- VOs framework based on either X509 or AAI (because the usage of DIRAC)
Migration of the VOs from VOMS to Check-in
- transition period where both X509 and tokens can be used
- delays in updating the GRID elements to the latest version compliant with tokens
- not all if the middleware products can be compliant with tokens at the same time
- the same VO has to interact with element supporting different authentications
Testing HTCondorCE and AARC Profile token
- INFN-T1 is going to test the AARC Profile token with its HTCondorCE endpoints
- Configuring authentication on HTCondorCE
- HTCondor CE token configuration tips (by INFN-T1)
- dteam VO registered in Check-in/Comanage:
- Entitlements:
- urn:mace:egi.eu:group:dteam:role=member#aai.egi.eu
- urn:mace:egi.eu:group:dteam:role=vm_operator#aai.egi.eu
- Entitlements:
New benchmark replacing HEP-SPEC06
The benchmark HEPSCORE is going to replace the old Hep-Spec06
- preparing plans with WLCG and the EGI Accounting team for deploying the new benchmark
- transition period where both the benchmark will be published and used to normalise the data
- to allow comparison between the two kind of data
- APEL is working on a version where the accounting records contains 2 benchmarks
AOB
- DPM migration
Next meeting
Apr