Agenda-2020-11-16
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Back to https://wiki.egi.eu/wiki/Operations_Meeting
General information
Middleware
UMD
- plans on CentOS8 ONGOING
- UMD4 release in preparation
- StoRM, VOMS, BDII update, dCache
- VERY URGENT
- feedback on software automation from the EGI Conference
Preview repository
- released on 2020-10-09
- Preview 1.29.0 AppDB info (sl6): ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
- Preview 2.29.0 AppDB info (CentOS 7): ARC 6.8.0 and 6.8.1, BDII 5.5.26, CVMFS 2.7.4, dCache 5.2.31, DMLite/DPM 1.14.0, frontier-squid 4.13.1, glite-info-update-endpoints 3.0.2, lcg-info 1.12.5, STORM 1.11.18
- included in the upcoming release: DPM, VOMS
Operations
ARGO/SAM
- HTCondor-CE probes included in the ARGO_MON_OPERATORS profile on May 13th: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146949
- (14th Sept) 70 endpoints, 14 CRITICAL, success rate is about 80%
- Oct 1st: included in the ARGO_MON_CRITICAL profile (A/R computation)
- (Nov 16th) 76 endpoints, success rate (including WARNING) 84.2%
- working on the probe for the host certificate validity check: GGUS 147386
FedCloud
Feedback from DMSU
Upgrade of central argus node
Message sent to administrators of NGIs argus servers:
- A replacement of the central argus servers (lcgargus03.cern.ch & lcgargus04.cern.ch), which are behind the argus.cern.ch & lcgargus.cern.ch aliases, is planned for Tuesday 17th November 2020 between 10:00 and 12:00.
- This replacement should be transparent, requiring no change of configuration on your side. Please report any issue you have with your NGI argus server.
- The two new hosts, lcargus21.cern.ch and lcgargus22.cern.ch are already ready for production, you can remotely test them if you want. The operation next week is simply a change of alias.
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
- HK-HKU-CC-01: migrating DPM from sl6 to CenOS7
- TW-NCUHEP: ARC-CE failures due to outdated CAs package, performance is now good
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148519
- LRZ-LMU: CE had problems due to the decommission of SharedFS; the other CE returns UNKNOWN in the IGTF test.
- NGI_HR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148518
- egee.irb.hr: in the process of a major upgrade from CentOS 6 to CentOS 7, some delays.
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148957
- INFN-CATANIA: SRM problems
- INFN-CATANIA-STACK: recovered
- INFN-PADOVA: decommissioning process
- NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147311
- WCSS64: failures on QCG and CREAM CEs
- NGI_UK:
- UKI-NORTHGRID-SHEF-HEP: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146455 ARC-CE re-installed, some condor problems to fix, improving...
- UKI-SOUTHGRID-SUSX: https://ggus.eu/index.php?mode=ticket_info&ticket_id=144720 Migration from CREAM to ARC, WN migration to CentOS7; SRM to be decommissioned; ARC-CE was failing the IGTF test, then solved; site-bdii failures. new failures on ARC-CE.
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148515
- ATLAND: downtime due to powercut and quarantine
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148956
- CBPF: SRM failures due to information not properly published. Physical access to facilities restricted due to COVID measures; planned a DPM update in December.
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148958
- UA-NSCMBR: IGTF outdated; improving...
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (October 2020):
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149353
- JP-KEK-CRC-02: migration from CREAM-CE to ARC-CE, some problems with the ARC-CE which has been marked then as "not production"
- CERN-PROD: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149351 webdav failures
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149352
- INFN-LECCE
- TRIGRID-INFN-CATANIA
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149356
- UA_BITP_ARC: bdii freshness failures
- ROC_LA https://ggus.eu/index.php?mode=ticket_info&ticket_id=149355
- SUPERCOMPUTO-UNAM
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=149353
- sites suspended:
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
CREAM-CE Decommission
- End of Security Updates and Support: 31st Dec 2020 (Decommissioning deadline)
- Original broadcast: https://operations-portal.egi.eu/broadcast/archive/2293
- PROC16 Decommission of unsupported software
- Decommissioning start date: Oct 1st 2020
- a probe detecting CREAM-CE endpoints will be run, returning WARNING status
- GGUS ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148715
- eu.egi.sec.CREAMCE
- Nov 1st: probe returns CRITICAL status, alarms created on the ROD dashboard, ROD teams start to create tickets
- 1st Jan 2021: EGI Ops will start chasing the sites still providing CREAM-CE endpoints
- By this time service end-points which couldn't be upgraded should be put into downtime by site admin or ROD:
ARC Middleware 5 end of support, migration to ARC 6
- EGI Operations Broadcast
- PROC16 Decommission of unsupported software
- deadline: end of July
- Status
Date | Number of endpoints in BDII | Number of GGUS tickets | Issues |
---|---|---|---|
2020-06-08 | 75 | 42 | Some ARC endpoints publish a timestamp instead of a version like 5.X.Y; we can fairly assume they are ARC6 nightly builds, but we're going to close the corresponding tickets after explicit confirmation from the site admin. |
2020-07-13 | 53 | 29 | - |
2020-09-14 | 34 | 18 | - |
2020-10-12 | 32 | 19 | - |
2020-11-16 | 26 | 16 | - |
Storage accounting
Many sites stopped the publication of storage accounting records. Opened 57 tickets to fix that.
- 12 tickets not solved yet
- page for checking when the records were published: http://goc-accounting.grid-support.ac.uk/storagetest/storagesitesystems.html
- Accounting Portal Prototype view
AOB
Next meeting
In 2021