General information
Middleware
UMD
- CentOS7 is the recommended OS until UMD5+CS9 are released
- In case you have already some machines with C8, migration to CentOS Stream 8 is mandatory
- CS9 will be supported by CERN and FNAL
- middleware: recommended path is C7->CS9 (we will skip CS8)
- UMD 4.17.0 released https://repository.egi.eu/UMD/4.17.0.html
- UMD 4.17.1 released: https://repository.egi.eu/UMD/4.17.1.html
- dCache 7.2.15 new major release
- Argus 1.7.5 bug fix release that prevents argus pepd crash if non standard characters are found in the DN Certification Authorities released by the EGI Trust Anchor team.
- glite-infoprovider-ldap 1.5.0 bug fix release that suppress the software and job information to be added to the topbdii01 ldap preventing huge memory consuption.
- UMD Infrastructure:
- testbed and scripts to test the new workflows
- working on the jenkins + nexus interactions
- in more detail using the nexus API we can create, upload sign rpm's and deb packages
- working on the implementation of this into jenkins
Operations
- Creation of NGI_IE Operation Centre: https://ggus.eu/index.php?mode=ticket_info&ticket_id=159433
ARGO/SAM
- New configuration to set on GOCDB to monitor webdav and xrootd endpoints (GGUS 158585)
- webdav: the endpoint url information used for monitoring purposes should be set in the extension property (check the documentation):
- Name: ARGO_WEBDAV_OPS_URL
- Value: webdav URL containing also the VO ops folder, for example:
https://darkstorm.cnaf.infn.it:8443/webdav/ops
orhttps://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ops/
- xrootd: the endpoint URL information used for monitoring purposes should be set in the extension properties section (check the documentation)
Name: ARGO_XROOTD_OPS_URL
Value: XRootD base SURL to test (the path where ops VO has write access, for example: root://eosatlas.cern.ch//eos/atlas/opstest/egi/, root://recas-se-01.cs.infn.it:1094/dpm/cs.infn.it/home/ops/, root://dcache-atlas-xrootd-ops.desy.de:2811/pnfs/desy.de/ops or similar)
- webdav: the endpoint url information used for monitoring purposes should be set in the extension property (check the documentation):
- Information circulated through the monthly broadcast
- we are going to open tickets to the sites that haven't implemented the new settings yet.
- request to enable the notification of failures for the APEL metrics:
- the APEL metrics are included in the ARGO_MON_OPERATORS
- notifications will be enabled for all of the metrics in this profile instead of ARGO_MON_CRITICAL
- a new report associated to this profile needs to be created in ARGO
FedCloud
- EGI Check-in migration from MitreID to Keycloak:
- testing and coordination of the sites for implementing the new settings
- a few sites haven't completed the change yet: tickets list
- guide for migration of generic clients to Keycloak
- example of changes in the OpenStack configuration
- testing and coordination of the sites for implementing the new settings
Feedback from DMSU
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evoulution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
Under-performed sites in the past A/R reports with issues not yet fixed:
- AfricaArabia:
- DZ-01-ARN: https://ggus.eu/index.php?mode=ticket_info&ticket_id=159107 SE issues fixed,
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=159223
- Taiwan-LCG2: SRM failures
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=159116
- MPPMU: webdav configuration issues
- UNI-FREIBURG: webdav configuration issues
- NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=158231
- GR-07-UOI-HEPLAB: SRM failures
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=159108
- INFN-COSENZA: recovering from hardware issues
- INFN-GENOVA: HTCondorCE configuration issues have been fixed
- UNINA-EGEE: hardware problems resolved.
- NGI_TR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=158710
- AZ-IFAN: maintenance works in the data centre
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156742
- RAL-LCG2: authz issues with the webdav endpoint
- they provide an object store endpoint, and the commands executed by the webdav probe don't work (GGUS 157748). The webdav probe is not maintained any more, anyway we are discussing a possible modification.
- RAL-LCG2: authz issues with the webdav endpoint
- ROC_Russia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=159117
- RU-Protvino-IHEP
- RU-SARFTI: power outages and faulty hard drives, working on resolving the issues.
- RU-SPbSU: problem with the air conditioner system in the computers room, site in long downtime, SUSPENDED by NGI Operators.
Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (Sept 2022):
- NGI_CH: https://ggus.eu/index.php?mode=ticket_info&ticket_id=159399
- T3_CH_PSI: webdav failures: need to create a webdav door for the ops VO
sites suspended:
Documentation
- MediaWiki in read-only mode
- content to be moved to different locations (confluence and https://docs.egi.eu/)
- confluence space hosting policies and procedures: EGI Policies and Procedures
- EGI Federation Operations
- Change Management, Release and Deployment Management, Incident and Service Request Management, Problem Management, Information Security Management
- Manuals, How-Tos, Troubleshooting, FAQs:
- huge number of material need to be reviewed and in case updated when moved to the new place
- location will be https://docs.egi.eu/providers/operations-manuals/
- Guidelines for providers to join EGI: https://docs.egi.eu/providers/joining/
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
Transition from X509 to federated identities (AARC profile token)
- WLCG is testing aai tokens (WLCG profile) as authz system for accessing the middleware, with Indigo IAM as a replacement of VOMS
- In Feb 2022 OSG will fully move to token-based AAI, abandoning X509 certificates
- HTCondorCE: replacement of Grid Community Toolkit
- The long-term support series (9.0.x) from the CHTC repositories will support X509/VOMS authentication through March 2023
- Starting in 9.3.0 (released in October 2021), the HTCondor feature releases does NOT contain this support
- EGI sites are recommended to stay with the long-term support series for the time being
What we need to know in preparation for the transition:
Checking the middleware compliance with the AARC Profile token:
- ARC-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154958
- So far focusing on the WLCG profile, which is built upon the AARC profile, so this should cover everything.
- Argus: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154959
- no clear plans yet
- dCache: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154960
- dCache does support authorisation statements, as described by WLCG AuthZ-WG's JWT profile.
- supporting AARC-style group membership statements is on the TODO list
- DPM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154961
- The DPM is in maintenance mode to be phased out by ~2024. There is no effort for implementing new functionality, which furthermore would be short-lived.
- HTCondor-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154962
- HTCondor-CE supports WLCG tokens, so it should work also with the AARC profile token. Some tests are needed.
- STORM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154963
- Only the StoRM-WebDAV component supports token-based authorization.
- At the moment only scopes and groups foreseen by the WLCG Token Profile are recognized by the authorization policy engine, but adding support for the AARC profile is planned.
- At the moment there is no plan to add token support to the SRM component.
- Finalizing a specification for a Tape REST API to replace the functionally-equivalent SRM one.
- That implementation will have token support.
- Only the StoRM-WebDAV component supports token-based authorization.
Circulated a survey to check the awareness and readiness of users communities:
- see details in the notes of past meetings
Migration of the VOs from VOMS to Check-in
- transition period where both X509 and tokens can be used
- delays in updating the GRID elements to the latest version compliant with tokens
- not all if the middleware products can be compliant with tokens at the same time
- the same VO has to interact with element supporting different authentications
Testing HTCondorCE and AARC Profile token
- INFN-T1 did some tests with the AARC Profile token using its HTCondorCE endpoints
- Configuring authentication on HTCondorCE
- HTCondor CE token configuration tips (by INFN-T1)
- dteam VO registered in Check-in/Comanage:
- Entitlements:
- urn:mace:egi.eu:group:dteam:role=member#aai.egi.eu
- urn:mace:egi.eu:group:dteam:role=vm_operator#aai.egi.eu
- Entitlements:
- The HTCondorCE expects to find in the token the scope claim to authorise the jobs submission
- at the moment Check-in doesn't release this claim: it will after the migration to Keykloak technology replacing MitreID
WLCG Campaign
- WLCG started the CE Token support campaign
- https://twiki.cern.ch/twiki/bin/view/LCG/CEtokenSupportCampaign
- sites should upgrade the CEs to the version supporting tokens, and configure tokens
Timeline
- CEs to fully support tokens by the end of the year (Dirac for the job submission)
- data management services (dCache, STORM, FTS, Rucio, Dirac) might need a few years before droppin X509/VOMS support
Hackathon events
- 15th - 16th September ARC/HTCondor CE Hackathon, organised by WLCG, with HTCondorCE and ARC-CE to mostly investigating data staging issues (see GDB introduction)
- agreed to enable the support of the several token profiles through plugins
- same plugin for the several CEs
- plugins provided by the "creators" of the token profiles
- CE teams to provide specifics to the AAI teams and to release a new CE version supporting the plugins
- agreed to enable the support of the several token profiles through plugins
DPM Decommission and migration
- DPM supported until June 2023
- Sites are encouraged to start the migration to a different storage element since the process will take time
- choosing the new storage solution depends on the expertise/experience of the sites and on the needs of the supported VOs
- See the slides presented by Petr Vokac at the EGI Conference 2022 about the migration tools to dCache
- DPM provides a migration script to dCache (migration guide)
- Transparent migration
- Migrate just catalog (database) and keep files untouched
- both SE store files on posix filesystem
- Transparent migration
- Migration in three steps
- verify the DPM data consistency
- no downtime needed
- the operation can last several days or some weeks
- DPM dump and dCache import
- downtime lasting about 1 day
- verify the DPM data consistency
- In September opened tickets to the sites to plan the migration and decommission:
- tickets list
- Please let us know your plans for DPM EOL and in case you decide to use dCache migration tools the tickets will be used to support you on this storage migration method.
- dCache migration should be done by June 2023
New benchmark replacing HEP-SPEC06
The benchmark HEPSCORE is going to replace the old Hep-Spec06
- preparing plans with WLCG and the EGI Accounting team for deploying the new benchmark
- transition period where both the benchmark will be published and used to normalise the data
- to allow comparison between the two kind of data
- APEL is working on a version where the accounting records contains 2 benchmarks
AOB
Next meeting
December