Agenda-2021-11-15
Jump to navigation
Jump to search
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Back to https://wiki.egi.eu/wiki/Operations_Meeting
General information
Middleware
UMD
- including EOS in UMD
Preview repository
- released on 2021-06-10
- Preview 2.34.0 (CentOS 7): ARC 6.12.0, CVMFS 2.8.1, xrootd 5.2.0
- released on 2021-08-11
- Preview 2.35.0 (CentOS 7): APEL SSM 3.2.1, DPM/DMLite 1.15.0 and 1.15.1, frontier-squid 4.15.2, xrootd 5.3.0
- We plan to stop the release of Preview since it doesn't seem to be used very much, and it is also easier to catch the last version of the products from EPEL or the product teams repos, prior the release in UMD.
Operations
ARGO/SAM
- probe for checking the HTCondorCE host certificate validity deployed in production (GGUS 147386):
- checks on expiration date, CN, and CA:
- it is working fine (very few failures)
- to be included in the A/R profile
FedCloud
Feedback from DMSU
New Known Error Database (KEDB)
The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home
- problems are tracked with Jira tickets to better follow-up their evoulution
- problems can be registered by DMSU staff and EGI Operations team
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154295
- MA-01-CNRST: ARC-CE failures
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150818
- INFN-PISA: HTCondorCE failures fixed; SRM failures not yet
- NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=153659
- TASK: in the process of replacing QCG with ARC-CE
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152841
- UA-NSCMBR: problem during the DPM update: conflict between xrootd 5 and dmlite 1.13. Unscheduled downtime due to power failure in the computing centre. NFS configuration issue affected ARC-CE. Accounting data republished using the ARC accountng functionalities.
- NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=153660
- UKI-SOUTHGRID-SUSX: CE configuration issues; some other failures occurred.
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=153658
- SUPERCOMPUTO-UNAM: some network issues
- AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154295
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (October 2021):
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154745
- GoeGrid: relocation of the cluster to a different building on the campus and subsequent network issues; handover to new staff; problems fixed.
- NGI_IBERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154750
- UAM-LCG2
- NGI_RO: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154746
- GRIDIFIN
- NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154747
- PSNC: storage backend issues affecting the HPC cluster and DPM, causing also ARC-CE instability; DPM issues were fixed, working on HPC cluster
- Russia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154748
- RU-SARFTI: ARC-CE failures, problem with hard drives, fixed
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154749
- UA-KNU: failures with IGTF metric, now fixed.
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=154745
- sites suspended:
Documentation
- plan to decommission MediaWiki
- content to be moved to different locations (confluence and https://docs.egi.eu/)
- confluence space hosting policies and procedures: EGI Policies and Procedures
- EGI Federation Operations
- Change Management, Release and Deployment Management, Incident and Service Request Management, Problem Management, Information Security Management
- Manuals, How-Tos, Troubleshooting, FAQs:
- huge number of material need to be reviewed and in case updated when moved to the new place
- location will be https://docs.egi.eu/providers/operations-manuals/
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
AOB
Next meeting
Dec