Difference between revisions of "Agenda-2021-03-08"
Jump to navigation
Jump to search
Line 17: | Line 17: | ||
== Preview repository == | == Preview repository == | ||
*released on | *released on 2021-02-22 | ||
** '''[[Preview | ** '''[[Preview 2.31.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/2.0/2.31.0/ AppDB info] (CentOS 7): APEL-SSM 3.1.1, ARC 6.10.1, CVMFS 2.8.0 and egi-cvmfs-3-1.13, davix 0.7.6, dCache 5.2.38, gfal2 2.18.2 | ||
= Operations = | = Operations = |
Revision as of 18:19, 5 March 2021
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Back to https://wiki.egi.eu/wiki/Operations_Meeting
General information
Middleware
UMD
- CentOS8 discussion still ongoing
- migration of Software Provisioning infrastructure to IBERGRID still ongoing
- in particular, administration portal used for release creation done successfully
- February release planned https://wiki.egi.eu/wiki/UMD_Release_Schedule to be discussed at today's meeting
- problem: UMD-4 missing voms-clients-cpp-2.0.15: http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/
- to be fixed urgently
Preview repository
- released on 2021-02-22
- Preview 2.31.0 AppDB info (CentOS 7): APEL-SSM 3.1.1, ARC 6.10.1, CVMFS 2.8.0 and egi-cvmfs-3-1.13, davix 0.7.6, dCache 5.2.38, gfal2 2.18.2
Operations
ARGO/SAM
- Migration to CentoOS 7 completed
- some probes not yet ready for CentOS 7 are temporary executed by https://egi-mon-old.argo.grnet.gr/nagios/
- HTCondor-CE probes
- working on the probe for the host certificate validity check: GGUS 147386
- integration with secmon and pakiti: GGUS 150006
- CREAM-CE metrics removed from ARGO_MON, ARGO_MON_OPERATIONS and ARGO_MON_CRITICAL (GGUS 149778)
- emi.cream.CREAMCE*
- eu.egi.CREAM*
FedCloud
Feedback from DMSU
Monthly Availability/Reliability
- Under-performed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150109
- INDIACMS-TIFR failures with HTCondor-CE and webdav; additional failures with SRM tests
- KR-KNU-T3: migration from CREAM-CE to HTCondor-CE
- NGI_HR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148518
- egee.irb.hr: in the process of a major upgrade from CentOS 6 to CentOS 7, some delays.
- NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150469
- UA-MHI: upgrade to CentOS7 and ARC 6, tests are ok
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148515
- ATLAND: ARC-CE misconfiguration
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148956
- CBPF: SRM failures due to information not properly published. Physical access to facilities restricted due to COVID measures; planned a DPM update.
- ROC_LA https://ggus.eu/index.php?mode=ticket_info&ticket_id=149355
- SUPERCOMPUTO-UNAM: scheduled a downtime for upgrading the site.
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150109
- Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: (February 2021):
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150816
- GoeGrid
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150818
- INFN-PISA
- ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150817
- ICN-UNAM
- NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150816
- sites suspended:
- GARR-01-DIR (NGI_IT)
IPv6 readiness plans
- please provide updates to the IPv6 assessment (ongoing) https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
- if any relevant, information will be summarised at OMB
APEL migration from ActiveMQ to ARGO Message Service (AMS)
- Migration insructions: https://github.com/apel/ssm/blob/dev/migrating_to_ams.md
- ActiveMQ is going to be dismissed at the end of March with the end of EOSC-hub
- Currently an issue with apel client prevent SSM to send properly the records through AMS
- it doesn't affect cloud and storage accounting
- ARC-CE might not work if using an old bundled version of SSM - but new ARC versions may work if set to use standalone SSM
- With CondorCE it may work, we will find some sites to test it
- by mid-March a fix will be released; then the sites with ARC-CE/HTCondorCE can implement the change
- starting the migration with FedCloud sites
ARC-CE probe failing due to UMD repositories being down
- The unavailability of UMD repository caused a failure with the ARC-CE IGTF probes (org.nordugrid.ARC-CE-result-ops)
Job terminated as Failed. - Failed in data staging: Failed checking source replica http://repository.egi.eu:80/sw/production/cas/1/current/meta/ca-policy-egi-core.list: Failed to obtain information about file: Failed to connect to repository.egi.eu(IPv4):80 - JID: gsiftp://alex4.nipne.ro:2811/jobs/yq0NDmskJcynuvw3Vp3UrRNqABFKDmABFKDm8hJKDmABFKDmxx7PPm
- Asked the ARC-CE developers to remove this dependency from the probe:
- It will be asked a recomputation to exclude these failures from the A/R figures
CREAM-CE Decommission
- End of Security Updates and Support: 31st Dec 2020
- Original broadcast: https://operations-portal.egi.eu/broadcast/archive/2293
- Decommissioning deadline: 31st Jan 2021
- PROC16 Decommission of unsupported software
- Decommissioning start date: Oct 1st 2020
- a probe detecting CREAM-CE endpoints will be run, returning WARNING status
- GGUS ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148715
- eu.egi.sec.CREAMCE
- Nov 1st: probe returns CRITICAL status, alarms created on the ROD dashboard, ROD teams start to create tickets
- 1st Feb 2021: EGI Ops will start chasing the sites still providing CREAM-CE endpoints
- By this time service end-points which couldn't be upgraded should be put into downtime by site admin or ROD
- 1st March 2021: Sites still deploying unsupported service endpoints risk suspension, unless documented technical reasons prevent a Site Admin from updating these endpoints.
- Tickets opened: 49
- link to the list
- Please note that at least one CE endpoint should be associated to the APEL service type in order to monitor the publication of the accounting data, as explained here
- If the CE you are going to remove was also registered as APEL service type, do not forget to move the APEL service type to a different CE endpoint.
VOMS upgrade to CentOS 7
- VOMS for CentOS 7 released Nov 23rd with UMD 4.12.13
- VOMS Admin 3.8.0, VOMS Server 2.0.15
- VOMS endpoints registered on GOCDB as production and monitored: 41
- Provided by 33 sites
- list of ticket opened: GGUS
- the VOMS servers need to be published in the BDII in order to easily collect the deployed version
AOB
Next meeting
8th Mar 2021