General information
Middleware
UMD
- CentOS Stream 8 now the recommended OS for new installations
- C8->CS8 migrations recommended
- CS9 will be supported by CERN and FNAL
- middleware: recommended path is C7->CS9 (we will probabily skip CS8)
- new release https://repository.egi.eu/UMD/4.15.1.html
- ARC-CE 6.13.0 bug fixes release
- Xrootd 5.3.1 bug fixes release
- CERN EOS 5.0.2 new release of EOS Open Storage which provides a storage solution large amounts of physics data and user files, with a focus on interactive and batch analysis.
- dCache 6.2.31 security vulnerability fix
- Infrastructure Manager Nagios probe 1.3.1
- GridFTP 13.21.1 minor bug fix of some Globus packages
- gfal2 2.19.2 regular update of the gfal clientes
- gfal2-utils 1.6.0 regular update of the gfal2-utils clientes
- EGI CVMFS 3.3.16 new release for the EGI default configuration meta-package configured for EGI.
- CVMFS 2.8.2 patch release containing bug fixes for clients and new diagnostics commands for the client.
- HTCondor 9.0.1 New major release of HTCondor
- HTCondor-CE 5.1.3 New Major Reelase of the HTCondor-CE
Operations
ARGO/SAM
- probe for checking the HTCondorCE host certificate added to the critical profile for A/R computation (GGUS 155733):
- checks on expiration date, CN, and CA:
- it is working fine (very few failures)
- Memory limits set by the ARC-CE probe: https://ggus.eu/index.php?mode=ticket_info&ticket_id=155081
- the default is 512MB, but they were increased because failures on some sites
- 1GB for normal test jobs, 1.5GB for security jobs
- these limits seem to high for a simple test jobs that is expected to run fast and with low demand
- request to come back to the default limits and let the probe use particular settings in CEs if any
- a proposal could be:
- sites with particular environment settings can define the values on GOCDB using the extension properties
- the probe is executed with its default values unless there is something else defined on GOCDB
- there is already an option to use for setting a value different from the configuration files. To verify it is suitable to our case
- in the monthly broadcast the sites have been informed to register the information on GOCDB: https://operations-portal.egi.eu/broadcast/archive/2897
- the default is 512MB, but they were increased because failures on some sites
FedCloud
Feedback from DMSU
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evoulution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (January 2022):
- NGI_BG: https://ggus.eu/index.php?mode=ticket_info&ticket_id=155859
- BG05-SUGrid: instability with webdav and HTCondorCE, fixed.
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=155862
- SCAI: replacing the hardware
- NGI_FRANCE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=155861
- IN2P3-CPPM: SRM failures with retrieving the SURL: the SRM endpoint will be disabled.
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=155863
- TRIGRID-INFN-CATANIA
- NGI_NDGF: https://ggus.eu/index.php?mode=ticket_info&ticket_id=155860
- FI_HIP_T2: several disk related problems on the CE which led to unscheduled downtimes and a low availability of the system; in addition an extensive network reconfiguration scheduled in December 2021 became for various reasons very much longer than initially anticipated. The disk issues have been solved and the latest IGTF CA:s are installed on the CE.
- NGI_BG: https://ggus.eu/index.php?mode=ticket_info&ticket_id=155859
- sites suspended:
Documentation
- plan to decommission MediaWiki
- content to be moved to different locations (confluence and https://docs.egi.eu/)
- confluence space hosting policies and procedures: EGI Policies and Procedures
- EGI Federation Operations
- Change Management, Release and Deployment Management, Incident and Service Request Management, Problem Management, Information Security Management
- Manuals, How-Tos, Troubleshooting, FAQs:
- huge number of material need to be reviewed and in case updated when moved to the new place
- location will be https://docs.egi.eu/providers/operations-manuals/
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
Transition from X509 to federated identities (AARC profile token)
- WLCG is testing aai tokens (WLCG profile) as authz system for accessing the middleware, with Indigo IAM as a replacement of VOMS
- In Feb 2022 OSG will fully move to token-based AAI, abandoning X509 certificates
- HTCondorCE: replacement of Grid Community Toolkit
- The long-term support series (9.0.x) from the CHTC repositories will support X509/VOMS authentication through Sep 2022
- Starting in 9.3.0 (released in October), the HTCondor feature releases does NOT contain this support
- EGI sites are recommended to stay with the long-term support series for the time being
What we need to know in preparation of the transition:
Checking the middleware compliance with the AARC Profile token:
- ARC-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154958
- So far focusing on the WLCG profile, which is built upon the AARC profile, so this should cover everything.
- Argus: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154959
- no clear plans yet
- dCache: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154960
- DPM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154961
- The DPM is in maintenance mode to be phased out by ~2024. There is no effort for implementing new functionality, which furthermore would be short-lived.
- HTCondor-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154962
- STORM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154963
- Only the StoRM-WebDAV component supports token-based authorization.
- At the moment only scopes and groups foreseen by the WLCG Token Profile are recognized by the authorization policy engine, but adding support for the AARC profile is planned.
- At the moment there is no plan to add token support to the SRM component.
- Finalizing a specification for a Tape REST API to replace the functionally-equivalent SRM one.
- That implementation will have token support.
- Only the StoRM-WebDAV component supports token-based authorization.
Need to check the awareness and readiness of users communities:
- which GRID services do they use
- Compute: ARC-CE
- Compute: HTCondorCE
- Storage: SRM
- Storage: webdav/http
- Storage: GridFTP
do you interact directly with Compute and Storage services (e.g., through command line) or do you use a tool (e.g., DIRAC, data transfer tools, data management tools, etc.) available to your VO?
- do you own and need a personal X509 certificate to access the services or can you use a federated identity (e.g., institutional identity, social account, etc.)
- are they familiar with AAI identities
- are they ready for the switch
Broadcast sent to the VO on Jan 28th (it requires login): https://operations-portal.egi.eu/broadcast/archive/2896
- reply so far from:
- biomed
- Km3Net
- vo.france-grilles.fr
- vo.hess-experiment.eu
- vo.complex-systems.eu
- VOCE
- usage of DIRAC in general
- a training over federated identities for users (and sys-admins) could be useful
- VOs framework based on either X509 or AAI (because the usage of DIRAC)
Migration of the VOs from VOMS to Check-in
- transition period where both X509 and tokens can be used
- delays in updating the GRID elements to the latest version compliant with tokens
- not all if the middleware products can be compliant with tokens at the same time
- the same VO has to interact with element supporting different authentications
AOB
- DPM migration
- New benchmark replacing HEP-SPEC06
Next meeting
Feb