General information
Middleware
UMD
- CentOS7 is the
recommended OS until UMD5+CS9 are released - In case you have already some machines with C8, migration to CentOS Stream 8 is recommended
- CS9 will be supported by CERN and FNAL
- middleware: recommended path is C7->CS9 (we will skip CS8)
- UMD Infrastructure:
- testbed and scripts to test the new workflows
- working on the jenkins + nexus interactions
- in more detail using the nexus API we can create, upload sign rpm's and deb packages
- working on the implementation of this into jenkins
Operations
ARGO/SAM
- Integration of EOS storage element: GGUS 154335
We are testing the monitoring probe for the EOS Storage endpoints (GGUS 156251) which uses the XRootD interface (see https://github.com/EGI-Federation/nagios-plugins-xrootd )
On GOCDB the EOS endpoints are registered as XrootD service endpoints.
In order to allow the proper execution of the probe, we would like you to:
enable the ops VO on your endpoints
for each EOS (Xrootd) service endpoint add the following Extension Property:
Name: XROOTD_URL
Value: XRootD base SURL to test (the path where ops VO has write access, for example: root://eospps.cern.ch:1094/eos/pps/ or similar)
Please do the same even if you provide an XRootD interface with a different type of storage element.
- Test results on the devel instance
- Released new ARGO UI (2.10) in production https://argo.egi.eu/egi/Critical
- NEW trends pages
- Trends about flapping services /groups https://argo.egi.eu/egi/Critical/trends/flapping
- Trends about flapping services /groups group by tag https://argo.egi.eu/egi/Critical/trends/tags-flapping
- Trends about services statuses https://argo.egi.eu/egi/Critical/trends/status
- Trends about tags https://argo.egi.eu/egi/Critical/trends/tags
- NEW trends pages
- Page reporting all the sites that fail the tests: https://argo.egi.eu/egi/issues/Critical
- Page reporting the status by metric: https://argo.egi.eu/egi/Critical/metrics
FedCloud
Feedback from DMSU
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evoulution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156745
- TW-NCUHEP: webdav failures
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156742
- RAL-LCG2: authz issues with the webdav endpoint
- they provide an object store endpoint, and the commands executed by the webdav probe don't work (GGUS 157748). The webdav probe is not maintained any more, anyway we are discussing a possible modification.
- RAL-LCG2: authz issues with the webdav endpoint
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=156745
Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (May 2022):
- NGI_NDGF: https://ggus.eu/index.php?mode=ticket_info&ticket_id=157718
- SE-SNIC-T2: webdav failures due to permission problems in the ops directory, SOLVED.
- NGI_NL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=157716
- BEgrid-ULB-VUB: instability issues were solved.
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=157717
- ICN-UNAM
sites suspended:
Documentation
- MediaWiki in read-only mode
- content to be moved to different locations (confluence and https://docs.egi.eu/)
- confluence space hosting policies and procedures: EGI Policies and Procedures
- EGI Federation Operations
- Change Management, Release and Deployment Management, Incident and Service Request Management, Problem Management, Information Security Management
- Manuals, How-Tos, Troubleshooting, FAQs:
- huge number of material need to be reviewed and in case updated when moved to the new place
- location will be https://docs.egi.eu/providers/operations-manuals/
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
Transition from X509 to federated identities (AARC profile token)
- WLCG is testing aai tokens (WLCG profile) as authz system for accessing the middleware, with Indigo IAM as a replacement of VOMS
- In Feb 2022 OSG will fully move to token-based AAI, abandoning X509 certificates
- HTCondorCE: replacement of Grid Community Toolkit
- The long-term support series (9.0.x) from the CHTC repositories will support X509/VOMS authentication through
Sep 2022Jan 2023 - Starting in 9.3.0 (released in October), the HTCondor feature releases does NOT contain this support
- EGI sites are recommended to stay with the long-term support series for the time being
- The long-term support series (9.0.x) from the CHTC repositories will support X509/VOMS authentication through
What we need to know in preparation of the transition:
Checking the middleware compliance with the AARC Profile token:
- ARC-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154958
- So far focusing on the WLCG profile, which is built upon the AARC profile, so this should cover everything.
- Argus: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154959
- no clear plans yet
- dCache: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154960
- dCache does support authorisation statements, as described by WLCG AuthZ-WG's JWT profile.
- supporting AARC-style group membership statements is on the TODO list
- DPM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154961
- The DPM is in maintenance mode to be phased out by ~2024. There is no effort for implementing new functionality, which furthermore would be short-lived.
- HTCondor-CE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154962
- HTCondor-CE supports WLCG tokens, so it should work also with the AARC profile token. Some tests are needed.
- STORM: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154963
- Only the StoRM-WebDAV component supports token-based authorization.
- At the moment only scopes and groups foreseen by the WLCG Token Profile are recognized by the authorization policy engine, but adding support for the AARC profile is planned.
- At the moment there is no plan to add token support to the SRM component.
- Finalizing a specification for a Tape REST API to replace the functionally-equivalent SRM one.
- That implementation will have token support.
- Only the StoRM-WebDAV component supports token-based authorization.
Circulated a survey to check the awareness and readiness of users communities:
- see details in the notes of past meetings
Migration of the VOs from VOMS to Check-in
- transition period where both X509 and tokens can be used
- delays in updating the GRID elements to the latest version compliant with tokens
- not all if the middleware products can be compliant with tokens at the same time
- the same VO has to interact with element supporting different authentications
Testing HTCondorCE and AARC Profile token
- INFN-T1 did some tests with the AARC Profile token using its HTCondorCE endpoints
- Configuring authentication on HTCondorCE
- HTCondor CE token configuration tips (by INFN-T1)
- dteam VO registered in Check-in/Comanage:
- Entitlements:
- urn:mace:egi.eu:group:dteam:role=member#aai.egi.eu
- urn:mace:egi.eu:group:dteam:role=vm_operator#aai.egi.eu
- Entitlements:
- The HTCondorCE expects to find in the token the scope claim to authorise the jobs submission
- at the moment Check-in doesn't release this claim: it will after the migration to Keykloak technology replacing MitreID
WLCG Campaign
- WLCG started the CE Token support campaign
- https://twiki.cern.ch/twiki/bin/view/LCG/CEtokenSupportCampaign
- sites should upgrade the CEs to the version supporting tokens, and configure tokens
Timeline
- CEs to fully support tokens by the end of the year (Dirac for the job submission)
- data management services (dCache, STORM, FTS, Rucio, Dirac) might need a few years before droppin X509/VOMS support
Hackathon events
- in July between CE developers and Check-in
- In September, organised by WLCG, with HTCondorCE and ARC-CE to mostly investigating data staging issues (see GDB introduction)
DPM Decommission and migration
- DPM supported until June 2023
- Sites are encouraged to start the migration to a different storage element since the process will take time
- choosing the new storage solution depends on the expertise/experience of the sites and on the needs of the supported VOs
- DPM provides a migration script to dCache (migration guide)
- Transparent migration
- Migrate just catalog (database) and keep files untouched
- both SE store files on posix filesystem
- Transparent migration
- Migration in three steps
- verify the DPM data consistency
- no downtime needed
- the operation can last several days or some weeks
- DPM dump and dCache import
- downtime lasting about 1 day
- verify the DPM data consistency
- In September we are going to open tickets to the sites
New benchmark replacing HEP-SPEC06
The benchmark HEPSCORE is going to replace the old Hep-Spec06
- preparing plans with WLCG and the EGI Accounting team for deploying the new benchmark
- transition period where both the benchmark will be published and used to normalise the data
- to allow comparison between the two kind of data
- APEL is working on a version where the accounting records contains 2 benchmarks
AOB
EGI Conference 2022: https://indico.egi.eu/event/5882/overview
pre-registrations are open
Next meeting
Jul or Aug