Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-2021-08-09"

From EGIWiki
Jump to navigation Jump to search
 
(12 intermediate revisions by the same user not shown)
Line 8: Line 8:
== UMD ==
== UMD ==


* CentOS8 discussion still ongoing
* CentOS8 discussion still ongoing  
* UMD 4 June update
** ARC 6.12.0 will be included in the upcoming release (end of June)
** other products to be included: HTcondor, gfal2, lcmaps-plugins, xrootd 5.1.1, StoRM 1.11.21, DDNS probe
* repository frontend web pages restored as static pages
* repository frontend web pages restored as static pages
* UMD 4.15.0 has been released (https://repository.egi.eu/static/UMD/4.15.0.html) and includes several updates for CentOS7:
** StoRM 1.11.21 - several bugs fixes and improvements
** lcmaps-plugins 1.8.1 - Update of lcmaps plugins
** CERN Frontier 4.15.2.1
** dmlite 1.15.0
** APEL SSM 3.2.1
** Dynamic DNS Nagios probe 1.0.1
** Infrastructure Manager Nagios probe 1.0.1
** dCache 6.2


== Preview repository  ==
== Preview repository  ==
Line 29: Line 35:
** checks on expiration date, CN, and CA:
** checks on expiration date, CN, and CA:
*** https://argo-mon-devel.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_org.opensciencegrid.htcondorce&style=detail
*** https://argo-mon-devel.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_org.opensciencegrid.htcondorce&style=detail
** to be deployed in production on second week of August
** to be deployed in production once new condor client is released in UMD


== FedCloud  ==
== FedCloud  ==
Line 67: Line 73:
*Under-performed sites in the past A/R reports with issues not yet fixed:
*Under-performed sites in the past A/R reports with issues not yet fixed:
** NGI_FRANCE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152253
** NGI_FRANCE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152253
*** '''AUVERGRID''': Long downtime connected to IN2P3-LPC site
*** '''AUVERGRID''': Long downtime connected to IN2P3-LPC site. Some problems with org.nordugrid.ARC-CE-result and org.nordugrid.ARC-CE-srm metrics: even if the sub-metrics complete successfully, the test jobs don't manage to get the "ending" status, producing an UNKNOWN status in the A/R computation
** NGI_HR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148518
** NGI_HR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=148518
*** '''egee.irb.hr''': major upgrade from CentOS 6 to CentOS 7; tests currently fail due to UNKNOWN status returned
*** '''egee.irb.hr''': major upgrade from CentOS 6 to CentOS 7; tests currently fail due to UNKNOWN status returned
** NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150818
** NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=150818
*** '''INFN-PISA''': HTCondorCE and SRM failures
*** '''INFN-PISA''': HTCondorCE failures fixed; SRM failures not yet
** NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152258
** NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152258
*** '''UA-BITP''': authentication issues with one of the nagios servers, fixed; additionally, power supply issues at the resource center
*** '''UA-BITP''': authentication issues with one of the nagios servers, fixed; additionally, power supply issues at the resource center
Line 82: Line 88:
*** '''ICN-UNAM''': replaced CREAM-CE; SE certificate expired; new failures with HTCondorCE; problems disappeared after re-installation; further failures on the CE.
*** '''ICN-UNAM''': replaced CREAM-CE; SE certificate expired; new failures with HTCondorCE; problems disappeared after re-installation; further failures on the CE.
** Russia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152840
** Russia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=152840
*** '''RU-SARFTI'''
*** '''RU-SARFTI''': failures with org.nordugrid.ARC-CE-SRM-result
*** '''RU-SPbSU'''
*** '''RU-SPbSU''': failures with org.nordugrid.ARC-CE-SRM-result
*Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''July 2021'''):
*Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''July 2021'''):
* NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=153338
** INFN-CNAF-LHCB: SRM authentication failures


*sites suspended:
*sites suspended:
Line 106: Line 114:
* Cloud accounting campaign:
* Cloud accounting campaign:
** [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=05+Mar+2021&to_date=06+Mar+2021&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=APEL+migration+from+ActiveMQ+to+AMS&orderticketsby=REQUEST_ID&orderhow=desc&ticket_per_page=50&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_VO&show_columns_check%5B2%5D=AFFECTED_SITE&show_columns_check%5B3%5D=PRIORITY&show_columns_check%5B4%5D=RESPONSIBLE_UNIT&show_columns_check%5B5%5D=STATUS&show_columns_check%5B6%5D=DATE_OF_CHANGE&show_columns_check%5B7%5D=SHORT_DESCRIPTION&show_columns_check%5B8%5D=SCOPE&search_submit=Search list of tickets]
** [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=05+Mar+2021&to_date=06+Mar+2021&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=APEL+migration+from+ActiveMQ+to+AMS&orderticketsby=REQUEST_ID&orderhow=desc&ticket_per_page=50&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_VO&show_columns_check%5B2%5D=AFFECTED_SITE&show_columns_check%5B3%5D=PRIORITY&show_columns_check%5B4%5D=RESPONSIBLE_UNIT&show_columns_check%5B5%5D=STATUS&show_columns_check%5B6%5D=DATE_OF_CHANGE&show_columns_check%5B7%5D=SHORT_DESCRIPTION&show_columns_check%5B8%5D=SCOPE&search_submit=Search list of tickets]
** 2 tickets (out of 21) not solved yet
* HTCondorCE and Storage accounting campaign:
* HTCondorCE and Storage accounting campaign:
** [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=15+Mar+2021&to_date=16+Mar+2021&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=APEL+migration+from+ActiveMQ+to+AMS+-+new+settings&orderticketsby=REQUEST_ID&orderhow=desc&ticket_per_page=50&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_VO&show_columns_check%5B2%5D=AFFECTED_SITE&show_columns_check%5B3%5D=PRIORITY&show_columns_check%5B4%5D=RESPONSIBLE_UNIT&show_columns_check%5B5%5D=STATUS&show_columns_check%5B6%5D=DATE_OF_CHANGE&show_columns_check%5B7%5D=SHORT_DESCRIPTION&show_columns_check%5B8%5D=SCOPE&search_submit=Search list of tickets]
** [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=15+Mar+2021&to_date=16+Mar+2021&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=APEL+migration+from+ActiveMQ+to+AMS+-+new+settings&orderticketsby=REQUEST_ID&orderhow=desc&ticket_per_page=50&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_VO&show_columns_check%5B2%5D=AFFECTED_SITE&show_columns_check%5B3%5D=PRIORITY&show_columns_check%5B4%5D=RESPONSIBLE_UNIT&show_columns_check%5B5%5D=STATUS&show_columns_check%5B6%5D=DATE_OF_CHANGE&show_columns_check%5B7%5D=SHORT_DESCRIPTION&show_columns_check%5B8%5D=SCOPE&search_submit=Search list of tickets]
** 6 tickets (out of 53) not solved yet
* ARC-CE and storage accounting campaign:
* ARC-CE and storage accounting campaign:
** [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=11+Jun+2021&to_date=12+Jun+2021&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=APEL+migration+from+ActiveMQ+to+AMS+-+ARC-CE+new+settings&orderticketsby=REQUEST_ID&orderhow=asc&ticket_per_page=120&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_SITE&show_columns_check%5B2%5D=PRIORITY&show_columns_check%5B3%5D=RESPONSIBLE_UNIT&show_columns_check%5B4%5D=STATUS&show_columns_check%5B5%5D=DATE_OF_CHANGE&show_columns_check%5B6%5D=SHORT_DESCRIPTION&search_submit=Search list of tickets]
** [https://ggus.eu/index.php?mode=ticket_search&su_hierarchy=0&status=all&date_type=creation+date&tf_radio=1&timeframe=any&from_date=11+Jun+2021&to_date=12+Jun+2021&ticket_category=all&typeofproblem=all&specattrib=none&user=paolini&keyword=APEL+migration+from+ActiveMQ+to+AMS+-+ARC-CE+new+settings&orderticketsby=REQUEST_ID&orderhow=asc&ticket_per_page=120&show_columns_check%5B0%5D=TICKET_TYPE&show_columns_check%5B1%5D=AFFECTED_SITE&show_columns_check%5B2%5D=PRIORITY&show_columns_check%5B3%5D=RESPONSIBLE_UNIT&show_columns_check%5B4%5D=STATUS&show_columns_check%5B5%5D=DATE_OF_CHANGE&show_columns_check%5B6%5D=SHORT_DESCRIPTION&search_submit=Search list of tickets]
* Most common issues:
** 13 tickets (out of 112) not solved yet
** mismatch between the host certificate subject registered in GOCDB and the real DN
** SAN field missing / wrongly defined in the host certificate
** DNS entries not completely defined
** same host used to send different types of accounting records
* a new version of ARGO Message Service mitigates the problems related to DNS entries and the SAN field:
** https://ggus.eu/index.php?mode=ticket_info&ticket_id=151104


=== Prerequisites for using AMS ===
=== Prerequisites for using AMS ===
Line 136: Line 140:


== Next meeting  ==
== Next meeting  ==
Aug
Sept 13th

Latest revision as of 13:26, 9 August 2021

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


Back to https://wiki.egi.eu/wiki/Operations_Meeting

General information

Middleware

UMD

  • CentOS8 discussion still ongoing
  • repository frontend web pages restored as static pages
  • UMD 4.15.0 has been released (https://repository.egi.eu/static/UMD/4.15.0.html) and includes several updates for CentOS7:
    • StoRM 1.11.21 - several bugs fixes and improvements
    • lcmaps-plugins 1.8.1 - Update of lcmaps plugins
    • CERN Frontier 4.15.2.1
    • dmlite 1.15.0
    • APEL SSM 3.2.1
    • Dynamic DNS Nagios probe 1.0.1
    • Infrastructure Manager Nagios probe 1.0.1
    • dCache 6.2

Preview repository

  • released on 2021-05-20:
    • Preview 2.33.0 (CentOS 7): ARC 6.11.0, STORM 1.11.20 and 1.11.21, VOMS 04-21
  • released on 2021-06-10

Operations

ARGO/SAM

FedCloud

Feedback from DMSU

New Known Error Database (KEDB)

The KEDB has been moved to Jira+Confluence: https://confluence.egi.eu/display/EGIKEDB/EGI+Federation+KEDB+Home

  • problems are tracked with Jira tickets to better follow-up their evoulution
  • problems can be registered by DMSU staff and EGI Operations team

Verify configuration records

On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:

  1. NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
    • E-Mail
    • ROD E-Mail
    • Security E-Mail
NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;
  1. RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
    • E-Mail
    • telephone numbers
    • CSIRT E-Mail
RC administrators should also review the information related to the registered service endpoints.

The process should be completed by July 2nd.

List of tickets.

Monthly Availability/Reliability

  • sites suspended:

IPv6 readiness plans

APEL migration from ActiveMQ to ARGO Message Service (AMS)

Prerequisites for using AMS

  • A valid host certificate from an IGTF Accredited CA.
  • A GOCDB 'Site' entry flagged as 'Production'.
  • A GOCDB 'Service' entry of the correct service type flagged as 'Production'. The following service types are used:
    • For Grid accounting use 'gLite-APEL'.
    • For Cloud accounting use 'eu.egi.cloud.accounting'.
    • For Storage accounting use 'eu.egi.storage.accounting'.
  • The 'Host DN' listed in the GOCDB 'Service' entry must exactly match the certificate DN of the host used for accounting. Make sure there are no leading or trailing spaces in the 'Host DN' field.

Monitoring of the accounting data

To ensure the monitoring of the publication of the accounting data, one CE per site needs to be registered as "APEL" service endpoint.

AOB

Next meeting

Sept 13th