General information
Middleware
UMD
- CentOS7 is the recommended OS until UMD5+CS9 are released
- In case you have already some machines with C8, migration to CentOS Stream 8 is mandatory
- CS9 will be supported by CERN and FNAL
- middleware: recommended path is C7->CS9 (we will skip CS8)
- UMD 4.17.0 released
- ArC 6.1.5.1
- Myproxy 6.2.14
- gridFTP 13.26.1
- xroot 5.4.2
- condor 9.0.16,
- htcondor-CE 5.1.3
- dCache 6.2.44
- gfal2 clients 2.20.5
- davix 0.8.2,
- dmlite 1.15.2
- cvmfs 2.9.4
- egi-cvmfs 4.4.0
- new nagios probes for ondedata and IM
- UMD Infrastructure:
- testbed and scripts to test the new workflows
- working on the jenkins + nexus interactions
- in more detail using the nexus API we can create, upload sign rpm's and deb packages
- working on the implementation of this into jenkins
Operations
ARGO/SAM
- Integration of EOS storage element completed: GGUS 154335
- endpoints registered as xrootd service type and monitored through the XRootD interface (see https://github.com/EGI-Federation/nagios-plugins-xrootd)
In order to allow the proper execution of the probe, we would like you to:
enable the ops VO on your endpoints
for each EOS (Xrootd) service endpoint add the following Extension Property (currently testing a new variable name, see below):
Name:
XROOTD_URLARGO_XROOTD_OPS_URLValue: XRootD base SURL to test (the path where ops VO has write access, for example: root://eospps.cern.ch:1094/eos/pps/ or similar)
Please do the same even if you provide an XRootD interface with a different type of storage element.
- Test results
- Testing a new configuration of webdav and xrootd metrics: GGUS 158585
- webdav: the endpoint url information used for monitoring purposes should be set in the extension property:
- Name: ARGO_WEBDAV_OPS_URL
- Value: webdav URL containing also the VO ops folder, for example:
https://darkstorm.cnaf.infn.it:8443/webdav/ops
orhttps://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ops/
- xrootd: change the extension property name, from XROOTD_URL to ARGO_XROOTD_OPS_URL
- webdav: the endpoint url information used for monitoring purposes should be set in the extension property:
FedCloud
- EGI Check-in migration from MitreID to Keycloak:
- testing and coordination of the sites for implementing the new settings
- a few sites haven't completed the change yet: tickets list
- guide for migration of generic clients to Keycloak
- example of changes in the OpenStack configuration
- testing and coordination of the sites for implementing the new settings
Feedback from DMSU
ARGUS not working with a certificate in IGTF 1.117 distribution
- A certificate subject containing a comma in its DN is wrongly processed by ARGO breaking it:
|
- GGUS ticket: #158702
- KEDB entry: EGIKEDB-17 - Getting issue details... STATUS
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evoulution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
Under-performed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156745
- TW-NCUHEP: webdav failures fixed
- NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=158231
- GR-07-UOI-HEPLAB
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=157943
- INFN-COSENZA: in downtime since April for repair and maintenance; recovered in July; some failures with the webdav metric due to incorrect url registered on GOCDB, SOLVED.
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156742
- RAL-LCG2: authz issues with the webdav endpoint
- they provide an object store endpoint, and the commands executed by the webdav probe don't work (GGUS 157748). The webdav probe is not maintained any more, anyway we are discussing a possible modification.
- RAL-LCG2: authz issues with the webdav endpoint
Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (Aug 2022):
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=158712
- LRZ-LMU: webdav information fixed
- NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=158711
- GRNET-OPENSTACK
- NGI_TR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=158710
- AZ-IFAN: maintenance works in the data centre
sites suspended:
Verify configuration records
On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:
- NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
- ROD E-Mail
- Security E-Mail
NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;
- RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
- telephone numbers
- CSIRT E-Mail
RC administrators should also review the information related to the registered service endpoints.
The process should be completed by July 29th.
List of tickets on GGUS search page.
- 8 out of 28 tickets still open
Documentation
- MediaWiki in read-only mode
- content to be moved to different locations (confluence and https://docs.egi.eu/)
- confluence space hosting policies and procedures: EGI Policies and Procedures
- EGI Federation Operations
- Change Management, Release and Deployment Management, Incident and Service Request Management, Problem Management, Information Security Management
- Manuals, How-Tos, Troubleshooting, FAQs:
- huge number of material need to be reviewed and in case updated when moved to the new place
- location will be https://docs.egi.eu/providers/operations-manuals/
- Guidelines for providers to join EGI: https://docs.egi.eu/providers/joining/
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
Transition from X509 to federated identities (AARC profile token)
- WLCG is testing aai tokens (WLCG profile) as authz system for accessing the middleware, with Indigo IAM as a replacement of VOMS
- In Feb 2022 OSG will fully move to token-based AAI, abandoning X509 certificates
- HTCondorCE: replacement of Grid Community Toolkit
- The long-term support series (9.0.x) from the CHTC repositories will support X509/VOMS authentication through
Sep 2022Jan 2023 - Starting in 9.3.0 (released in October), the HTCondor feature releases does NOT contain this support
- EGI sites are recommended to stay with the long-term support series for the time being
- The long-term support series (9.0.x) from the CHTC repositories will support X509/VOMS authentication through
What we need to know in preparation of the transition:
Checking the middleware compliance with the AARC Profile token:
- ARC-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154958
- So far focusing on the WLCG profile, which is built upon the AARC profile, so this should cover everything.
- Argus: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154959
- no clear plans yet
- dCache: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154960
- dCache does support authorisation statements, as described by WLCG AuthZ-WG's JWT profile.
- supporting AARC-style group membership statements is on the TODO list
- DPM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154961
- The DPM is in maintenance mode to be phased out by ~2024. There is no effort for implementing new functionality, which furthermore would be short-lived.
- HTCondor-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154962
- HTCondor-CE supports WLCG tokens, so it should work also with the AARC profile token. Some tests are needed.
- STORM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154963
- Only the StoRM-WebDAV component supports token-based authorization.
- At the moment only scopes and groups foreseen by the WLCG Token Profile are recognized by the authorization policy engine, but adding support for the AARC profile is planned.
- At the moment there is no plan to add token support to the SRM component.
- Finalizing a specification for a Tape REST API to replace the functionally-equivalent SRM one.
- That implementation will have token support.
- Only the StoRM-WebDAV component supports token-based authorization.
Circulated a survey to check the awareness and readiness of users communities:
- see details in the notes of past meetings
Migration of the VOs from VOMS to Check-in
- transition period where both X509 and tokens can be used
- delays in updating the GRID elements to the latest version compliant with tokens
- not all if the middleware products can be compliant with tokens at the same time
- the same VO has to interact with element supporting different authentications
Testing HTCondorCE and AARC Profile token
- INFN-T1 did some tests with the AARC Profile token using its HTCondorCE endpoints
- Configuring authentication on HTCondorCE
- HTCondor CE token configuration tips (by INFN-T1)
- dteam VO registered in Check-in/Comanage:
- Entitlements:
- urn:mace:egi.eu:group:dteam:role=member#aai.egi.eu
- urn:mace:egi.eu:group:dteam:role=vm_operator#aai.egi.eu
- Entitlements:
- The HTCondorCE expects to find in the token the scope claim to authorise the jobs submission
- at the moment Check-in doesn't release this claim: it will after the migration to Keykloak technology replacing MitreID
WLCG Campaign
- WLCG started the CE Token support campaign
- https://twiki.cern.ch/twiki/bin/view/LCG/CEtokenSupportCampaign
- sites should upgrade the CEs to the version supporting tokens, and configure tokens
Timeline
- CEs to fully support tokens by the end of the year (Dirac for the job submission)
- data management services (dCache, STORM, FTS, Rucio, Dirac) might need a few years before droppin X509/VOMS support
Hackathon events
- 15th - 16th September ARC/HTCondor CE Hackathon, organised by WLCG, with HTCondorCE and ARC-CE to mostly investigating data staging issues (see GDB introduction)
DPM Decommission and migration
- DPM supported until June 2023
- Sites are encouraged to start the migration to a different storage element since the process will take time
- choosing the new storage solution depends on the expertise/experience of the sites and on the needs of the supported VOs
- DPM provides a migration script to dCache (migration guide)
- Transparent migration
- Migrate just catalog (database) and keep files untouched
- both SE store files on posix filesystem
- Transparent migration
- Migration in three steps
- verify the DPM data consistency
- no downtime needed
- the operation can last several days or some weeks
- DPM dump and dCache import
- downtime lasting about 1 day
- verify the DPM data consistency
- In September opened tickets to the sites to plan the migration and decommission:
New benchmark replacing HEP-SPEC06
The benchmark HEPSCORE is going to replace the old Hep-Spec06
- preparing plans with WLCG and the EGI Accounting team for deploying the new benchmark
- transition period where both the benchmark will be published and used to normalise the data
- to allow comparison between the two kind of data
- APEL is working on a version where the accounting records contains 2 benchmarks
AOB
EGI Conference 2022: https://indico.egi.eu/event/5882/overview
Next meeting
October