Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-2021-06-14"

From EGIWiki
Jump to navigation Jump to search
m (Reverted edits by Apaolini (talk) to last revision by Spinoso)
Tag: Rollback
 
(24 intermediate revisions by 2 users not shown)
Line 9: Line 9:


* CentOS8 discussion still ongoing
* CentOS8 discussion still ongoing
* The UMD 4.12.6 release contains the following products:
* UMD 4 June update
** ARGO 0.5.4 - ams-library py2 RPM also packages py3 specific modules https://github.com/ARGOeu/argo-ams-library/releases/tag/V0.5.4-1
** ARC 6.12.0 will be included in the upcoming release (end of June)
** APEL-SSM 3.2.0 - bug fixes and improvements https://github.com/apel/ssm/releases/tag/3.2.0-1
** other products to be included: HTcondor, gfal2, lcmaps-plugins, xrootd 5.1.1, StoRM 1.11.21, DDNS probe
* The UMD 4.13.0 release includes the following products:
* repository frontend web pages restored as static pages
** APEL 1.9.0 - Added AMS support. Requires at least SSM version 3.2.0. Bug fixes and improvements. https://github.com/apel/apel/releases/tag/1.9.0-1
** VOMS admin server 3.8.1, VOMS Admin client 2.0.21, VOMS server 2.0.16, VOMS C/C++ APIs 2.0.16 - bug fixes and enhancements http://italiangrid.github.io/voms/release-notes/voms-admin-server/3.8.1/
** CVMFS 2.8.1 - Bug Fixes and Improvements https://cvmfs.readthedocs.io/en/2.8/cpt-releasenotes.html
* The UMD team is currently working on restoring the repository frontend that hosts the web pages, which hopefully will be soon available


== Preview repository  ==
== Preview repository  ==
* released on 2021-05-20:
* released on 2021-05-20:
** '''[https://appdb.egi.eu/store/software/preview.repository/releases/2.0/2.33.0/ Preview 2.33.0]''' (CentOS 7): ARC 6.11.0, STORM 1.11.20 and 1.11.21, VOMS 04-21
** '''[https://appdb.egi.eu/store/software/preview.repository/releases/2.0/2.33.0/ Preview 2.33.0]''' (CentOS 7): ARC 6.11.0, STORM 1.11.20 and 1.11.21, VOMS 04-21
* released on 2021-06-10
** '''[https://appdb.egi.eu/store/software/preview.repository/releases/2.0/2.34.0/ Preview 2.34.0]''' (CentOS 7): ARC 6.12.0, CVMFS 2.8.1, xrootd 5.2.0


= Operations  =
= Operations  =
Line 32: Line 30:
  subject= /CN=hosted-ce10.opensciencegrid.org
  subject= /CN=hosted-ce10.opensciencegrid.org
  notAfter=Apr 26 12:26:42 2021 GMT
  notAfter=Apr 26 12:26:42 2021 GMT
** a new version of HTCondor (9.0.0) will be added to the UMD Test repo and then the probe can be deployed on the testing instance of ARGO
** testing the new version of the probe available with HTCondor 9.0.0


== FedCloud  ==
== FedCloud  ==
Line 47: Line 45:
* problems are tracked with Jira tickets to better follow-up their evoulution
* problems are tracked with Jira tickets to better follow-up their evoulution
* problems can be registered by DMSU staff and EGI Operations team
* problems can be registered by DMSU staff and EGI Operations team
== Verify configuration records ==
On a yearly basis, the information registered into GOC-DB need to be verified.
NGIs and RCs have been asked to check them. In particular:
# '''NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:'''
#* E-Mail
#* ROD E-Mail
#* Security E-Mail
:NGI Managers should also review the status of the "not certified" RCs, in according to the [https://wiki.egi.eu/wiki/PROC09#Resource_Center_status_Workflow RC Status Workflow];
# '''RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:'''
#* E-Mail
#* telephone numbers
#* CSIRT E-Mail
: RC administrators should also review the information related to the registered service endpoints.
'''The process should be completed by July 2nd.'''
[https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=lastweek&from_date=02+Jun+2021&to_date=03+Jun+2021&ticket_category=all&typeofproblem=all&specattrib=none&keyword=yearly+review+of+the+information+registered+into+GOC-DB+-+2021&orderticketsby=REQUEST_ID&orderhow=asc&ticket_per_page=50&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_VO&show_columns_check%5B2%5D=AFFECTED_SITE&show_columns_check%5B3%5D=PRIORITY&show_columns_check%5B4%5D=RESPONSIBLE_UNIT&show_columns_check%5B5%5D=STATUS&show_columns_check%5B6%5D=DATE_OF_CHANGE&show_columns_check%5B7%5D=SHORT_DESCRIPTION&show_columns_check%5B8%5D=SCOPE&search_submit=Search List of tickets].


== Monthly Availability/Reliability ==
== Monthly Availability/Reliability ==
Line 67: Line 84:
*Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''May 2021'''):
*Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''May 2021'''):
** NGI_BY: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152255
** NGI_BY: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152255
*** '''BY-NCPHEP'''
*** '''BY-NCPHEP''' CE failures due to missing information; fixed.
** NGI_FRANCE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152253
** NGI_FRANCE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152253
*** '''AUVERGRID''': Long downtime connected to IN2P3-LPC site
*** '''AUVERGRID''': Long downtime connected to IN2P3-LPC site
Line 74: Line 91:
*** '''INFN-MIB'''
*** '''INFN-MIB'''
** NGI_NL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152254
** NGI_NL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152254
*** '''BEgrid-ULB-VUB'''
*** '''BEgrid-ULB-VUB''': CE failures only with one nagios server because a wrong mapping of the user certificate used to submit jobs; solved.
** NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152258
** NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152258
*** UA-BITP
*** UA-BITP: authentication issues with one of the nagios servers, fixed.
*** UA-KNU
*** UA-KNU
*sites suspended:
*sites suspended:
Line 87: Line 104:


== APEL migration from ActiveMQ to ARGO Message Service (AMS) ==
== APEL migration from ActiveMQ to ARGO Message Service (AMS) ==
* Migration insructions: https://github.com/apel/ssm/blob/dev/migrating_to_ams.md
* '''ActiveMQ is going to be dismissed at the end of June''': for security reasons it is not possible maintain it any longer.
* ActiveMQ is going to be dismissed at the end of May
* Migration insructions (HTCondorCE, Storage, and Cloud accounting): https://github.com/apel/ssm/blob/dev/migrating_to_ams.md
* Releasing a new version of Apel Client (1.9.0) compatible with the new AMS protocol when used to trigger the publication of the accounting records
* ARC 6.12.0 released, instructions:
** APEL SSM works fine since 2.4.0 version
** http://www.nordugrid.org/arc/releases/6.12/release_notes_6.12.html
* The accounting component of ARC-CE still uses the STOMP protocol to send the message records
** all the sites with ARC-CE need to update to this version
** The developers are working on a new version compatible with AMS
* Recommended versions:
** some sites will be asked to test the new version when available
** Apel Clien: 1.9.0
** APEL SSM: 3.2.0
* Cloud accounting campaign:
* Cloud accounting campaign:
** [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=05+Mar+2021&to_date=06+Mar+2021&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=APEL+migration+from+ActiveMQ+to+AMS&orderticketsby=REQUEST_ID&orderhow=desc&ticket_per_page=50&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_VO&show_columns_check%5B2%5D=AFFECTED_SITE&show_columns_check%5B3%5D=PRIORITY&show_columns_check%5B4%5D=RESPONSIBLE_UNIT&show_columns_check%5B5%5D=STATUS&show_columns_check%5B6%5D=DATE_OF_CHANGE&show_columns_check%5B7%5D=SHORT_DESCRIPTION&show_columns_check%5B8%5D=SCOPE&search_submit=Search list of tickets]
** [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=05+Mar+2021&to_date=06+Mar+2021&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=APEL+migration+from+ActiveMQ+to+AMS&orderticketsby=REQUEST_ID&orderhow=desc&ticket_per_page=50&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_VO&show_columns_check%5B2%5D=AFFECTED_SITE&show_columns_check%5B3%5D=PRIORITY&show_columns_check%5B4%5D=RESPONSIBLE_UNIT&show_columns_check%5B5%5D=STATUS&show_columns_check%5B6%5D=DATE_OF_CHANGE&show_columns_check%5B7%5D=SHORT_DESCRIPTION&show_columns_check%5B8%5D=SCOPE&search_submit=Search list of tickets]
* HTCondorCE and Storage accounting campaign:
* HTCondorCE and Storage accounting campaign:
** [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=15+Mar+2021&to_date=16+Mar+2021&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=APEL+migration+from+ActiveMQ+to+AMS+-+new+settings&orderticketsby=REQUEST_ID&orderhow=desc&ticket_per_page=50&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_VO&show_columns_check%5B2%5D=AFFECTED_SITE&show_columns_check%5B3%5D=PRIORITY&show_columns_check%5B4%5D=RESPONSIBLE_UNIT&show_columns_check%5B5%5D=STATUS&show_columns_check%5B6%5D=DATE_OF_CHANGE&show_columns_check%5B7%5D=SHORT_DESCRIPTION&show_columns_check%5B8%5D=SCOPE&search_submit=Search list of tickets]
** [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=15+Mar+2021&to_date=16+Mar+2021&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=APEL+migration+from+ActiveMQ+to+AMS+-+new+settings&orderticketsby=REQUEST_ID&orderhow=desc&ticket_per_page=50&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_VO&show_columns_check%5B2%5D=AFFECTED_SITE&show_columns_check%5B3%5D=PRIORITY&show_columns_check%5B4%5D=RESPONSIBLE_UNIT&show_columns_check%5B5%5D=STATUS&show_columns_check%5B6%5D=DATE_OF_CHANGE&show_columns_check%5B7%5D=SHORT_DESCRIPTION&show_columns_check%5B8%5D=SCOPE&search_submit=Search list of tickets]
* ARC-CE and storage accounting campaign:
** [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=11+Jun+2021&to_date=12+Jun+2021&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=APEL+migration+from+ActiveMQ+to+AMS+-+ARC-CE+new+settings&orderticketsby=REQUEST_ID&orderhow=asc&ticket_per_page=120&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_SITE&show_columns_check%5B2%5D=PRIORITY&show_columns_check%5B3%5D=RESPONSIBLE_UNIT&show_columns_check%5B4%5D=STATUS&show_columns_check%5B5%5D=DATE_OF_CHANGE&show_columns_check%5B6%5D=SHORT_DESCRIPTION&search_submit=Search list of tickets]
* Most common issues:
* Most common issues:
** mismatch between the host certificate subject registered in GOCDB and the real DN
** mismatch between the host certificate subject registered in GOCDB and the real DN
Line 114: Line 134:
** For Storage accounting use 'eu.egi.storage.accounting'.
** For Storage accounting use 'eu.egi.storage.accounting'.
* The 'Host DN' listed in the GOCDB 'Service' entry must exactly match the certificate DN of the host used for accounting. Make sure there are no leading or trailing spaces in the 'Host DN' field.
* The 'Host DN' listed in the GOCDB 'Service' entry must exactly match the certificate DN of the host used for accounting. Make sure there are no leading or trailing spaces in the 'Host DN' field.
== CREAM-CE Decommission ==
* End of Security Updates and Support: 31st Dec 2020
** Original broadcast: https://operations-portal.egi.eu/broadcast/archive/2293
* Decommissioning deadline: 31st Jan 2021
* [https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software PROC16 Decommission of unsupported software]
* Decommissioning start date: Oct 1st 2020
** a probe detecting CREAM-CE endpoints will be run, returning WARNING status
** GGUS ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148715
** [https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_CREAM-CE&style=detail eu.egi.sec.CREAMCE]
* Nov 1st: probe returns CRITICAL status, alarms created on the ROD dashboard, ROD teams start to create tickets
** https://ggus.eu/index.php?mode=ticket_info&ticket_id=149312
* '''1st Feb 2021''': EGI Ops will start chasing the sites still providing CREAM-CE endpoints
** By this time service end-points which couldn't be upgraded should be put into downtime by site admin or ROD
* '''1st March 2021''': Sites still deploying unsupported service endpoints risk suspension, unless documented technical reasons prevent a Site Admin from updating these endpoints.
* '''Tickets opened''': 49
** link to the [https://ggus.eu/index.php?mode=ticket_search&status=open&user=paolini&date_type=creation+date&tf_radio=1&timeframe=any&keyword=CREAM-CE%20endpoints%20to%20be%20retired&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO! list]
* '''Please note that at least one CE endpoint should be associated to the APEL service type in order to monitor the publication of the accounting data''', as explained [https://wiki.egi.eu/wiki/APEL/Tests here]
** If the CE you are going to remove was also registered as APEL service type, do not forget to move the APEL service type to a different CE endpoint.
== VOMS upgrade to CentOS 7 ==
* VOMS for CentOS 7 released Nov 23rd with [https://repository.egi.eu/2020/11/18/release-umd-4-12-3/ UMD 4.12.13]
** VOMS Admin 3.8.0, VOMS Server 2.0.15
* VOMS endpoints registered on GOCDB as production and monitored: 41
** Provided by 33 sites
* list of ticket opened: [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=11+Dec+2020&to_date=12+Dec+2020&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=upgrade+your+VOMS+server+to%3A+CentOS7%2C+VOMS+Admin+server+3.8.0%2C+VOMS+server+2.0.15&orderticketsby=REQUEST_ID&orderhow=desc&ticket_per_page=50&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_VO&show_columns_check%5B2%5D=AFFECTED_SITE&show_columns_check%5B3%5D=PRIORITY&show_columns_check%5B4%5D=RESPONSIBLE_UNIT&show_columns_check%5B5%5D=STATUS&show_columns_check%5B6%5D=DATE_OF_CHANGE&show_columns_check%5B7%5D=SHORT_DESCRIPTION&show_columns_check%5B8%5D=SCOPE&search_submit=Search GGUS]
** total: 31. Solved: 20.
* the VOMS servers need to be published in the BDII in order to easily collect the deployed version


= AOB  =
= AOB  =
Line 149: Line 139:


== Next meeting  ==
== Next meeting  ==
14th Jun 2021
Jul or Aug

Latest revision as of 15:00, 30 June 2021

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


Back to https://wiki.egi.eu/wiki/Operations_Meeting

General information

Middleware

UMD

  • CentOS8 discussion still ongoing
  • UMD 4 June update
    • ARC 6.12.0 will be included in the upcoming release (end of June)
    • other products to be included: HTcondor, gfal2, lcmaps-plugins, xrootd 5.1.1, StoRM 1.11.21, DDNS probe
  • repository frontend web pages restored as static pages

Preview repository

  • released on 2021-05-20:
    • Preview 2.33.0 (CentOS 7): ARC 6.11.0, STORM 1.11.20 and 1.11.21, VOMS 04-21
  • released on 2021-06-10

Operations

ARGO/SAM

  • HTCondor-CE probes
    • deployed on secmon and pakiti: GGUS 150006
    • working on the probe for the host certificate validity check: GGUS 147386
      • With 8.9.12 installed (expected the week of Mar 15), you should be able to query remote HTCondor-CEs for their host certificate using the following:
$ python -c 'import htcondor; ad = htcondor.Collector("collector2.opensciencegrid.org:9619").locate(htcondor.DaemonTypes.Schedd, "hosted-ce10.opensciencegrid.org"); print htcondor.SecMan().ping(ad, "READ")["ServerPublicCert"]' | openssl x509 -noout -subject -enddate
subject= /CN=hosted-ce10.opensciencegrid.org
notAfter=Apr 26 12:26:42 2021 GMT
    • testing the new version of the probe available with HTCondor 9.0.0

FedCloud

Feedback from DMSU

New Known Error Database (KEDB)

The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home

  • problems are tracked with Jira tickets to better follow-up their evoulution
  • problems can be registered by DMSU staff and EGI Operations team

Verify configuration records

On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:

  1. NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
    • E-Mail
    • ROD E-Mail
    • Security E-Mail
NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;
  1. RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
    • E-Mail
    • telephone numbers
    • CSIRT E-Mail
RC administrators should also review the information related to the registered service endpoints.

The process should be completed by July 2nd.

List of tickets.

Monthly Availability/Reliability

IPv6 readiness plans

APEL migration from ActiveMQ to ARGO Message Service (AMS)

Prerequisites for using AMS

  • A valid host certificate from an IGTF Accredited CA.
  • A GOCDB 'Site' entry flagged as 'Production'.
  • A GOCDB 'Service' entry of the correct service type flagged as 'Production'. The following service types are used:
    • For Grid accounting use 'gLite-APEL'.
    • For Cloud accounting use 'eu.egi.cloud.accounting'.
    • For Storage accounting use 'eu.egi.storage.accounting'.
  • The 'Host DN' listed in the GOCDB 'Service' entry must exactly match the certificate DN of the host used for accounting. Make sure there are no leading or trailing spaces in the 'Host DN' field.

AOB

Next meeting

Jul or Aug