Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-2021-02-08"

From EGIWiki
Jump to navigation Jump to search
 
(32 intermediate revisions by 2 users not shown)
Line 9: Line 9:
== UMD ==
== UMD ==


* UMD4 schedule: https://wiki.egi.eu/wiki/UMD_Release_Schedule
* CentOS8 discussion still ongoing
* CentOS8 rebuild EOL in 2021 (was: May 2029), '''possible switch to CentOS8 Stream''' (maintained until August 2024) https://blog.centos.org/2020/12/future-is-centos-stream/ '''discussion ongoing''', especially in WLCG
* migration of Software Provisioning infrastructure to IBERGRID still ongoing
** https://wiki.egi.eu/wiki/Next_middleware_release
** in particular, administration portal used for release creation done successfully
* '''CentOS7 will be maintained until June 2024'''
* February release planned https://wiki.egi.eu/wiki/UMD_Release_Schedule to be discussed at today's meeting
** Moving UMD4/C7 to UMD5/C7
* problem: UMD-4 missing voms-clients-cpp-2.0.15: http://repository.egi.eu/sw/production/umd/4/centos7/x86_64/updates/
* '''SL6 is retired''', URT will not accept updates (unless critical and agreed with EGI Operations)
** to be fixed urgently
 
* feedback on software automation from the EGI Conference


== Preview repository  ==
== Preview repository  ==
*2020-11-30
*released on 2020-11-30:
** '''[[Preview 1.30.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/1.0/1.30.0/ AppDB info] '''(last release on sl6)''':  CVMFS 2.7.5 and egi-cvmfs-2-7.12, dCache 5.2.35, DMLite/DPM 1.14.2, Dynafed 1.6.0, STORM 1.11.19, VOMS 10-20 release, xrootd 4.12.5
** '''[[Preview 1.30.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/1.0/1.30.0/ AppDB info] '''(last release on sl6)''':  CVMFS 2.7.5 and egi-cvmfs-2-7.12, dCache 5.2.35, DMLite/DPM 1.14.2, Dynafed 1.6.0, STORM 1.11.19, VOMS 10-20 release, xrootd 4.12.5
** '''[[Preview 2.30.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/2.0/2.30.0/ AppDB info] (CentOS 7):  APEL-SSM 3.0.1, CVMFS 2.7.5 and egi-cvmfs-2-7.12, dCache 5.2.35, DMLite/DPM 1.14.2, Dynafed 1.6.0, STORM 1.11.19, VOMS 10-20 release, xrootd 4.12.5 and 5.0.3
** '''[[Preview 2.30.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/2.0/2.30.0/ AppDB info] (CentOS 7):  APEL-SSM 3.0.1, CVMFS 2.7.5 and egi-cvmfs-2-7.12, dCache 5.2.35, DMLite/DPM 1.14.2, Dynafed 1.6.0, STORM 1.11.19, VOMS 10-20 release, xrootd 4.12.5 and 5.0.3
* collecting information for the next release


= Operations  =
= Operations  =


== ARGO/SAM  ==
== ARGO/SAM  ==
* Migration to CentoOS 7 completed
** some probes not yet ready for CentOS 7 are temporary executed by https://egi-mon-old.argo.grnet.gr/nagios/
* [https://argo-mon-fedcloud.cro-ngi.hr/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_org.opensciencegrid.htcondorce&style=detail HTCondor-CE probes]  
* [https://argo-mon-fedcloud.cro-ngi.hr/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_org.opensciencegrid.htcondorce&style=detail HTCondor-CE probes]  
** working on the probe for the host certificate validity check: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=147386 GGUS 147386]
** working on the probe for the host certificate validity check: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=147386 GGUS 147386]
Line 42: Line 43:
== Monthly Availability/Reliability ==
== Monthly Availability/Reliability ==
*Under-performed sites in the past A/R reports with issues not yet fixed:
*Under-performed sites in the past A/R reports with issues not yet fixed:
** AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=147748
** AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150109
*** '''HK-HKU-CC-01''': migrating DPM from sl6 to CenOS7
*** '''INDIACMS-TIFR''' failures with HTCondor-CE and webdav
*** '''TW-NCUHEP''': ARC-CE failures due to outdated CAs package, performance is now good
*** '''KR-KNU-T3''': migration from CREAM-CE to HTCondor-CE
** '''CERN-PROD''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=149351
** '''CERN-PROD''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=149351
*** webdav failures which required a fix in the EOS services https://its.cern.ch/jira/browse/EOS-4515 ; some instability with the site-bdii
*** webdav failures which required a fix in the EOS services https://its.cern.ch/jira/browse/EOS-4515 ; some instability with the site-bdii
Line 53: Line 54:
*** '''TRIGRID-INFN-CATANIA''': CREAM-CE to decommission
*** '''TRIGRID-INFN-CATANIA''': CREAM-CE to decommission
** NGI_IT https://ggus.eu/index.php?mode=ticket_info&ticket_id=149798
** NGI_IT https://ggus.eu/index.php?mode=ticket_info&ticket_id=149798
*** '''INFN-ROMA1-CMS''': problems with ARC-CE solved, intermittent failures on SRM service
*** '''INFN-ROMA1-CMS''': problems with ARC-CE solved; intermittent failures on SRM service, increased the storage to improve the stability
** NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150108
*** '''GARR-01-DIR''' the site will be decommissioned
** NGI_NDGF: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150111
*** '''SE-SNIC-T2''': network issues affecting the SE. Planned a meeting with the internet provider.
** ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148515
** ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148515
*** '''ATLAND''': downtime due to powercut and quarantine
*** '''ATLAND''': ARC-CE misconfiguration
** ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148956
** ROC_LA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148956
*** '''CBPF''': SRM failures due to information not properly published. Physical access to facilities restricted due to COVID measures; planned a DPM update in December.
*** '''CBPF''': SRM failures due to information not properly published. Physical access to facilities restricted due to COVID measures; planned a DPM update.
** ROC_LA https://ggus.eu/index.php?mode=ticket_info&ticket_id=149355
** ROC_LA https://ggus.eu/index.php?mode=ticket_info&ticket_id=149355
*** '''SUPERCOMPUTO-UNAM''': scheduled a downtime for upgrading the site.
*** '''SUPERCOMPUTO-UNAM''': scheduled a downtime for upgrading the site.
*Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''December 2020'''):
*Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''January 2021'''):
** AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150109
** AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150466
*** '''INDIACMS-TIFR''' failures with HTCondor-CE and webdav
*** '''MA-01-CNRST''': migration from CREAM-CE to ARC-CE; job submission failures due to missing information (ApplicationEnvironment)
*** '''KR-KNU-T3''': migration from CREAM-CE to HTCondor-CE
** NGI_DE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150467
** NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150108
*** '''mainz''': problems with upgradin the STORM SE, now solved
*** '''GARR-01-DIR'''
*** '''RWTH-Aachen''': xrootd port doesn't allow ops VO
** NGI_NDGF: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150111
*** '''SCAI''': replacement of the cloud cluster
*** '''SE-SNIC-T2''': network issues. Planned a meeting with the internet provider.
** NGI_France: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150465
** NGI_TR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150107
*** '''IN2P3-CC-T2''': SRM failures
*** '''AZ-IFAN''': CREAM-CE and SRM decommissioned, HTCondorCE deployed; Site-BDII re-installed.
** NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150469
** Russia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150110
*** '''UA-MHI'''
*** '''ITEP''': hardware problems with storage element, replacement of ARC-CE machine
** NGI_UK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150470
*** '''UKI-SOUTHGRID-SUSX''': failures with the IGTF test
** Russia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150468
*** '''RU-SARFTI''': problems when migrating from CREAM-CE to ARC-CE


*sites suspended:
*sites suspended:
** WCSS64 (NGI_PL)
** HK-HKU-CC-01 (AsiaPacific)


== IPv6 readiness plans  ==
== IPv6 readiness plans  ==
Line 93: Line 101:
*** [https://goc.egi.eu/portal/index.php?Page_Type=Service_Group&id=1205 Top-BDIIs service group] on GOCDB
*** [https://goc.egi.eu/portal/index.php?Page_Type=Service_Group&id=1205 Top-BDIIs service group] on GOCDB
*** [https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_Top-BDII&style=overview Top-BDII servers] monitored by ARGO
*** [https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_Top-BDII&style=overview Top-BDII servers] monitored by ARGO
* CERN's top-BDII is going to be retired


== CREAM-CE Decommission ==
== CREAM-CE Decommission ==


* End of Security Updates and Support: 31st Dec 2020 (Decommissioning deadline)
* End of Security Updates and Support: 31st Dec 2020
** Original broadcast: https://operations-portal.egi.eu/broadcast/archive/2293
** Original broadcast: https://operations-portal.egi.eu/broadcast/archive/2293
* Decommissioning deadline: 31st Jan 2021
* [https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software PROC16 Decommission of unsupported software]
* [https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software PROC16 Decommission of unsupported software]
* Decommissioning start date: Oct 1st 2020
* Decommissioning start date: Oct 1st 2020
Line 105: Line 115:
* Nov 1st: probe returns CRITICAL status, alarms created on the ROD dashboard, ROD teams start to create tickets
* Nov 1st: probe returns CRITICAL status, alarms created on the ROD dashboard, ROD teams start to create tickets
** https://ggus.eu/index.php?mode=ticket_info&ticket_id=149312
** https://ggus.eu/index.php?mode=ticket_info&ticket_id=149312
* 1st Feb 2021: EGI Ops will start chasing the sites still providing CREAM-CE endpoints
* '''1st Feb 2021''': EGI Ops will start chasing the sites still providing CREAM-CE endpoints
** By this time service end-points which couldn't be upgraded should be put into downtime by site admin or ROD
** By this time service end-points which couldn't be upgraded should be put into downtime by site admin or ROD
* 1st March 2021: Sites still deploying unsupported service endpoints risk suspension, unless documented technical reasons prevent a Site Admin from updating these endpoints.
* '''1st March 2021''': Sites still deploying unsupported service endpoints risk suspension, unless documented technical reasons prevent a Site Admin from updating these endpoints.
* '''Tickets opened''': 49
** link to the [https://ggus.eu/index.php?mode=ticket_search&status=open&user=paolini&date_type=creation+date&tf_radio=1&timeframe=any&keyword=CREAM-CE%20endpoints%20to%20be%20retired&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO! list]
* '''Please note that at least one CE endpoint should be associated to the APEL service type in order to monitor the publication of the accounting data''', as explained [https://wiki.egi.eu/wiki/APEL/Tests here]
** If the CE you are going to remove was also registered as APEL service type, do not forget to move the APEL service type to a different CE endpoint.


== VOMS upgrade to CentOS 7 ==
== VOMS upgrade to CentOS 7 ==
Line 122: Line 136:


== Next meeting  ==
== Next meeting  ==
8th Feb 2021
8th Mar 2021

Latest revision as of 14:55, 8 February 2021

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


Back to https://wiki.egi.eu/wiki/Operations_Meeting

General information

Middleware

UMD

Preview repository

  • released on 2020-11-30:
    • Preview 1.30.0 AppDB info (last release on sl6): CVMFS 2.7.5 and egi-cvmfs-2-7.12, dCache 5.2.35, DMLite/DPM 1.14.2, Dynafed 1.6.0, STORM 1.11.19, VOMS 10-20 release, xrootd 4.12.5
    • Preview 2.30.0 AppDB info (CentOS 7): APEL-SSM 3.0.1, CVMFS 2.7.5 and egi-cvmfs-2-7.12, dCache 5.2.35, DMLite/DPM 1.14.2, Dynafed 1.6.0, STORM 1.11.19, VOMS 10-20 release, xrootd 4.12.5 and 5.0.3
  • collecting information for the next release

Operations

ARGO/SAM

FedCloud

Feedback from DMSU

Monthly Availability/Reliability

  • sites suspended:
    • HK-HKU-CC-01 (AsiaPacific)

IPv6 readiness plans

Top-BDII problem affecting the publication of accounting records

  • on 20th Dec 2020 the top-bdii at CERN lcg-bdii.cern.ch stopped working
  • since then, it wasn't possible to publish the accounting data
    • the SSM script couldn't find the Message Brokers queue to send the messages
  • top-bdii fixed on 4th Jan 2021
  • this problem affected all the sites because by default in the APEL SSM config file it is set CERN's top-BDII
  • CERN's top-BDII is going to be retired

CREAM-CE Decommission

  • End of Security Updates and Support: 31st Dec 2020
  • Decommissioning deadline: 31st Jan 2021
  • PROC16 Decommission of unsupported software
  • Decommissioning start date: Oct 1st 2020
  • Nov 1st: probe returns CRITICAL status, alarms created on the ROD dashboard, ROD teams start to create tickets
  • 1st Feb 2021: EGI Ops will start chasing the sites still providing CREAM-CE endpoints
    • By this time service end-points which couldn't be upgraded should be put into downtime by site admin or ROD
  • 1st March 2021: Sites still deploying unsupported service endpoints risk suspension, unless documented technical reasons prevent a Site Admin from updating these endpoints.
  • Tickets opened: 49
  • Please note that at least one CE endpoint should be associated to the APEL service type in order to monitor the publication of the accounting data, as explained here
    • If the CE you are going to remove was also registered as APEL service type, do not forget to move the APEL service type to a different CE endpoint.

VOMS upgrade to CentOS 7

  • VOMS for CentOS 7 released Nov 23rd with UMD 4.12.13
    • VOMS Admin 3.8.0, VOMS Server 2.0.15
  • VOMS endpoints registered on GOCDB as production and monitored: 41
    • Provided by 33 sites
  • list of ticket opened: GGUS
  • the VOMS servers need to be published in the BDII in order to easily collect the deployed version

AOB

Next meeting

8th Mar 2021