Agenda-2021-01-11
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Back to https://wiki.egi.eu/wiki/Operations_Meeting
General information
Middleware
UMD
- UMD4 schedule: https://wiki.egi.eu/wiki/UMD_Release_Schedule
- CentOS8 rebuild EOL in 2021 (was: May 2029), possible switch to CentOS8 Stream (maintained until August 2024) https://blog.centos.org/2020/12/future-is-centos-stream/ discussion ongoing, especially in WLCG
- CentOS7 will be maintained until June 2024
- Moving UMD4/C7 to UMD5/C7
- SL6 is retired, URT will not accept updates (unless critical and agreed with EGI Operations)
- feedback on software automation from the EGI Conference
Preview repository
- 2020-11-30
- Preview 1.30.0 AppDB info (last release on sl6): CVMFS 2.7.5 and egi-cvmfs-2-7.12, dCache 5.2.35, DMLite/DPM 1.14.2, Dynafed 1.6.0, STORM 1.11.19, VOMS 10-20 release, xrootd 4.12.5
- Preview 2.30.0 AppDB info (CentOS 7): APEL-SSM 3.0.1, CVMFS 2.7.5 and egi-cvmfs-2-7.12, dCache 5.2.35, DMLite/DPM 1.14.2, Dynafed 1.6.0, STORM 1.11.19, VOMS 10-20 release, xrootd 4.12.5 and 5.0.3
Operations
ARGO/SAM
- HTCondor-CE probes
- working on the probe for the host certificate validity check: GGUS 147386
- integration with secmon and pakiti: GGUS 150006
- CREAM-CE metrics removed from ARGO_MON, ARGO_MON_OPERATIONS and ARGO_MON_CRITICAL (GGUS 149778)
- emi.cream.CREAMCE*
- eu.egi.CREAM*
FedCloud
Feedback from DMSU
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
- HK-HKU-CC-01: migrating DPM from sl6 to CenOS7
- TW-NCUHEP: ARC-CE failures due to outdated CAs package, performance is now good
- CERN-PROD: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149351
- webdav failures which required a fix in the EOS services https://its.cern.ch/jira/browse/EOS-4515 ; some instability with the site-bdii
- NGI_HR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148518
- egee.irb.hr: in the process of a major upgrade from CentOS 6 to CentOS 7, some delays.
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148957
- INFN-CATANIA: SRM problems; the SRM service will be decommissioned
- INFN-CATANIA-STACK: recovered
- INFN-PADOVA: decommissioning process
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149352
- INFN-LECCE: authz failures on SRM; CREAM-CE to decommission
- TRIGRID-INFN-CATANIA: CREAM-CE to decommission
- NGI_IT https://ggus.eu/index.php?mode=ticket_info&ticket_id=149798
- INFN-ROMA1-CMS: intermittent failures on SRM service; some failures on ARC-CE servers
- NGI_UK:
- UKI-SOUTHGRID-SUSX: https://ggus.eu/index.php?mode=ticket_info&ticket_id=144720 Migration from CREAM to ARC, WN migration to CentOS7; SRM to be decommissioned; ARC-CE was failing the IGTF test, then solved; site-bdii failures. new failures on ARC-CE.
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148958
- UA-NSCMBR: IGTF outdated; new failures with ARC-CE and SRM/webdav
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148515
- ATLAND: downtime due to powercut and quarantine
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148956
- CBPF: SRM failures due to information not properly published. Physical access to facilities restricted due to COVID measures; planned a DPM update in December.
- ROC_LA https://ggus.eu/index.php?mode=ticket_info&ticket_id=149355
- SUPERCOMPUTO-UNAM: scheduled a downtime for upgrading the site.
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (December 2020):
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150109
- INDIACMS-TIFR
- KR-KNU-T3
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150108
- GARR-01-DIR
- NGI_NDGF: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150111
- SE-SNIC-T2: network issues. Planned a meeting with the internet provider.
- NGI_TR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150107
- AZ-IFAN: CREAM-CE and SRM decommissioned, HTCondorCE deployed; Site-BDII re-installed.
- Russia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150110
- ITEP: hardware problems with storage element, replacement of ARC-CE machine
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150109
- sites suspended:
- WCSS64 (NGI_PL)
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
Top-BDII problem affecting the publication of accounting records
- on 20th Dec 2020 the top-bdii at CERN lcg-bdii.cern.ch stopped working
- since then, it wasn't possible to publish the accounting data
- the SSM script couldn't find the Message Brokers queue to send the messages
- top-bdii fixed on 4th Jan 2021
- this problem affected all the sites because by default in the APEL SSM config file it is set CERN's top-BDII
- each site can set instead the top-BDII of its region:
- Top-BDIIs service group on GOCDB
- Top-BDII servers monitored by ARGO
- each site can set instead the top-BDII of its region:
CREAM-CE Decommission
- End of Security Updates and Support: 31st Dec 2020 (Decommissioning deadline)
- Original broadcast: https://operations-portal.egi.eu/broadcast/archive/2293
- PROC16 Decommission of unsupported software
- Decommissioning start date: Oct 1st 2020
- a probe detecting CREAM-CE endpoints will be run, returning WARNING status
- GGUS ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148715
- eu.egi.sec.CREAMCE
- Nov 1st: probe returns CRITICAL status, alarms created on the ROD dashboard, ROD teams start to create tickets
- 1st Feb 2021: EGI Ops will start chasing the sites still providing CREAM-CE endpoints
- By this time service end-points which couldn't be upgraded should be put into downtime by site admin or ROD
- 1st March 2021: Sites still deploying unsupported service endpoints risk suspension, unless documented technical reasons prevent a Site Admin from updating these endpoints.
VOMS upgrade to CentOS 7
- VOMS for CentOS 7 released Nov 23rd with UMD 4.12.13
- VOMS Admin 3.8.0, VOMS Server 2.0.15
- VOMS endpoints registered on GOCDB as production and monitored: 41
- Provided by 33 sites
- list of ticket opened: GGUS
- the VOMS servers need to be published in the BDII in order to easily collect the deployed version
AOB
Next meeting
8th Feb 2021