Difference between revisions of "Agenda-2020-09-14"
Jump to navigation
Jump to search
Line 92: | Line 92: | ||
*** '''UA-ISMA''': migration to ARC6 and other planned software updates | *** '''UA-ISMA''': migration to ARC6 and other planned software updates | ||
*Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''July 2020'''): | *Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''July 2020'''): | ||
** NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id= | ** NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148171 | ||
*** HG-02-IASA: problems with certificates renewal due to COVID situation; the montlhy figures are improving | *** HG-02-IASA: problems with certificates renewal due to COVID situation; the montlhy figures are improving | ||
**NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148170 | **NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148170 |
Revision as of 16:12, 31 August 2020
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Back to https://wiki.egi.eu/wiki/Operations_Meeting
General information
Middleware
UMD
- plans on CentOS8 STARTED
Preview repository
- released on 2020-08-05
- Preview 1.28.0 AppDB info (sl6): dCache 5.2.25, frontier-squid 4.12.2, gfal2 2.18.1, xrootd 5.0.0
- Preview 2.28.0 AppDB info (CentOS 7): dCache 5.2.25, frontier-squid 4.12.2, gfal2 2.18.1, xrootd 5.0.0
Operations
ARGO/SAM
- HTCondor-CE probes included in the ARGO_MON_OPERATORS profile on May 13th: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146949
- (28th Aug) 65 endpoints, 15 CRITICAL, success rate is about 76.9%
- on Sept 1st they will be included in the ARGO_MON_CRITICAL profile (A/R computation)
- please fix the failures by that date
- working on the probe for the host certificate validity check: GGUS 147386
- CREAM-CE metrics in the ARGO_MON_OPERATORS profile on May 27th: eu.egi.CREAMCE-JobSubmit, eu.egi.CREAMCE.WN-Csh, eu.egi.CREAMCE.WN-Softver
- (24th Aug) results: 156 endpoints, 22 WARNING (Timeout occurred (900 sec) ), 31 CRITICAL. Success rate 80.1% (66% including the WARNING)
- When eu.egi.CREAMCE.WN-Softver is successful:
CREAM JobOutput OK: retrieved outputSandbox: ['std.err', 'std.out'] **** std.err **** **** std.out **** egee01 has UMD 3.14.4
When it fails:
CREAM JobOutput ERROR [DONE-OK, exitCode=1 ]: retrieved outputSandbox: ['std.err', 'std.out'] **** std.err **** **** std.out **** ERROR: unable to find glite, EMI, LCG or UMD WN version on n1037-amd
FedCloud
Feedback from DMSU
Verify configuration records
On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:
- NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
- ROD E-Mail
- Security E-Mail
- NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;
- RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
- telephone numbers
- CSIRT E-Mail
- RC administrators should also review the information related to the registered service endpoints.
The process should be completed by June 22nd.
- 30 tickets
- Not yet solved after 1 month: 16
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
- HK-HKU-CC-01
- TW-NCUHEP
- NGI_BG: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147747
- BG01-IPP
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146871
- GoeGRID: CREAM-CE intermittent failures not affecting ATLAS; failures with ARC-CE
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147313
- mainz: some problems in March and April, that could not be fixed easily; in May, the HPC infrastructure was attacked and the whole computer center was shut down; in downtime.
- wuppertalprod: SRM failures to to a BDII issue, fixed
- NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147311
- WCSS64
- NGI_UK:
- UKI-NORTHGRID-SHEF-HEP: https://ggus.eu/index.php?mode=ticket_info&ticket_id=146455 ARC-CE re-installed, some condor problems to fix
- UKI-SOUTHGRID-SUSX: https://ggus.eu/index.php?mode=ticket_info&ticket_id=144720 Migration from CREAM to ARC, WN migration to CentOS7; SRM to be decommissioned; ARC-CE was failing the IGTF test, then solved; site-bdii failures.
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147750
- UA-ISMA: migration to ARC6 and other planned software updates
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (July 2020):
- NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148171
- HG-02-IASA: problems with certificates renewal due to COVID situation; the montlhy figures are improving
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148170
- Hephy-Vienna
- INFN-PADOVA-STACK
- NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148167
- WUT: downtime for site update, production jobs can run.
- NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148171
- sites suspended:
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
ARC Middleware 5 end of support, migration to ARC 6
- EGI Operations Broadcast
- PROC16 Decommission of unsupported software
- deadline: end of July
- Catalin is in contact with ARC team to get a webinar on ARC administration, scheduled (to be confirmed) for July 6th please contact operations@ for information
- Status
Date | Number of endpoints in BDII | Number of GGUS tickets | Issues |
---|---|---|---|
2020-06-08 | 75 | 42 | Some ARC endpoints publish a timestamp instead of a version like 5.X.Y; we can fairly assume they are ARC6 nightly builds, but we're going to close the corresponding tickets after explicit confirmation from the site admin. |
2020-07-13 | 53 | 29 | - |
Storage accounting
Many of sites stopped the publication of storage accounting records. Opened 57 tickets to fix that.
- page for checking when the records were published: http://goc-accounting.grid-support.ac.uk/storagetest/storagesitesystems.html
- Accounting Portal Prototype view
SECMON failures
Several CEs are failing the job submission tests, preventing pakiti to check the vulnerabilities fixes on the WNs.
- original ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=143837
- List of tickets to the sites
- https://ggus.eu/index.php?mode=ticket_info&ticket_id=144732
AOB
Next meeting
Sept 14th, 2020 https://indico.egi.eu/event/5098/