Agenda-2021-01-11
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Back to https://wiki.egi.eu/wiki/Operations_Meeting
General information
Middleware
UMD
- plans on CentOS8 ONGOING
- UMD4 release in preparation
- StoRM, VOMS, BDII update, dCache
- VERY URGENT
- feedback on software automation from the EGI Conference
Preview repository
- released on 2020-10-09
- Preview 1.29.0 AppDB info (sl6): ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
- Preview 2.29.0 AppDB info (CentOS 7): ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
- included in the upcoming release: DPM, VOMS
Operations
ARGO/SAM
- HTCondor-CE probes
- working on the probe for the host certificate validity check: GGUS 147386
- CREAM-CE metrics removed from ARGO_MON, ARGO_MON_OPERATIONS and ARGO_MON_CRITICAL (GGUS 149778)
- emi.cream.CREAMCE*
- eu.egi.CREAM*
FedCloud
Feedback from DMSU
Upgrade of central argus node
Message sent to administrators of NGIs argus servers:
- A replacement of the central argus servers (lcgargus03.cern.ch & lcgargus04.cern.ch), which are behind the argus.cern.ch & lcgargus.cern.ch aliases, is planned for Tuesday 17th November 2020 between 10:00 and 12:00.
- This replacement should be transparent, requiring no change of configuration on your side. Please report any issue you have with your NGI argus server.
- The two new hosts, lcargus21.cern.ch and lcgargus22.cern.ch are already ready for production, you can remotely test them if you want. The operation next week is simply a change of alias.
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
- HK-HKU-CC-01: migrating DPM from sl6 to CenOS7
- TW-NCUHEP: ARC-CE failures due to outdated CAs package, performance is now good
- CERN-PROD: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149351
- webdav failures which required a fix in the EOS services https://its.cern.ch/jira/browse/EOS-4515 ; some instability with the site-bdii
- NGI_HR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148518
- egee.irb.hr: in the process of a major upgrade from CentOS 6 to CentOS 7, some delays.
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148957
- INFN-CATANIA: SRM problems; the SRM service will be decommissioned
- INFN-CATANIA-STACK: recovered
- INFN-PADOVA: decommissioning process
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149352
- INFN-LECCE: authz failures on SRM; CREAM-CE to decommission
- TRIGRID-INFN-CATANIA: CREAM-CE to decommission
- NGI_IT https://ggus.eu/index.php?mode=ticket_info&ticket_id=149798
- INFN-ROMA1-CMS: intermittent failures on SRM service; some failures on ARC-CE servers
- NGI_UK:
- UKI-SOUTHGRID-SUSX: https://ggus.eu/index.php?mode=ticket_info&ticket_id=144720 Migration from CREAM to ARC, WN migration to CentOS7; SRM to be decommissioned; ARC-CE was failing the IGTF test, then solved; site-bdii failures. new failures on ARC-CE.
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148958
- UA-NSCMBR: IGTF outdated; new failures with ARC-CE and SRM/webdav
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148515
- ATLAND: downtime due to powercut and quarantine
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148956
- CBPF: SRM failures due to information not properly published. Physical access to facilities restricted due to COVID measures; planned a DPM update in December.
- ROC_LA https://ggus.eu/index.php?mode=ticket_info&ticket_id=149355
- SUPERCOMPUTO-UNAM: scheduled a downtime for upgrading the site.
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (December 2020):
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150109
- INDIACMS-TIFR
- KR-KNU-T3
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150108
- GARR-01-DIR
- NGI_NDGF: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150111
- SE-SNIC-T2
- NGI_TR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150107
- AZ-IFAN: CREAM-CE and SRM decommissioned, HTCondorCE deployed; asked to deploy again the Site-BDII service
- Russia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150110
- ITEP
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150109
- sites suspended:
- WCSS64 (NGI_PL)
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
CREAM-CE Decommission
- End of Security Updates and Support: 31st Dec 2020 (Decommissioning deadline)
- Original broadcast: https://operations-portal.egi.eu/broadcast/archive/2293
- PROC16 Decommission of unsupported software
- Decommissioning start date: Oct 1st 2020
- a probe detecting CREAM-CE endpoints will be run, returning WARNING status
- GGUS ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148715
- eu.egi.sec.CREAMCE
- Nov 1st: probe returns CRITICAL status, alarms created on the ROD dashboard, ROD teams start to create tickets
- 1st Jan 2021: EGI Ops will start chasing the sites still providing CREAM-CE endpoints
- By this time service end-points which couldn't be upgraded should be put into downtime by site admin or ROD:
ARC Middleware 5 end of support, migration to ARC 6
- EGI Operations Broadcast
- PROC16 Decommission of unsupported software
- deadline: end of July
- Status
Date | Number of endpoints in BDII | Number of GGUS tickets | Issues |
---|---|---|---|
2020-06-08 | 75 | 42 | Some ARC endpoints publish a timestamp instead of a version like 5.X.Y; we can fairly assume they are ARC6 nightly builds, but we're going to close the corresponding tickets after explicit confirmation from the site admin. |
2020-07-13 | 53 | 29 | - |
2020-09-14 | 34 | 18 | - |
2020-10-12 | 32 | 19 | - |
2020-11-16 | 26 | 16 | - |
Storage accounting
Many sites stopped the publication of storage accounting records. Opened 57 tickets to fix that.
- 12 tickets not solved yet
- page for checking when the records were published: http://goc-accounting.grid-support.ac.uk/storagetest/storagesitesystems.html
- Accounting Portal Prototype view
AOB
Next meeting
In 2021