Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-2020-01-13"

From EGIWiki
Jump to navigation Jump to search
Line 27: Line 27:
*Under-performed sites in the past A/R reports with issues not yet fixed:
*Under-performed sites in the past A/R reports with issues not yet fixed:
**AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=142591
**AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=142591
***INDIACMS-TIFR: SRM service not published in the BDII. No feedback yet
***INDIACMS-TIFR: SRM service not published in the BDII, they are working on it...
***TW-NTU-HEP: new SRM failures at the end of September due to DPM upgrade
***TW-NTU-HEP: new SRM failures at the end of September due to DPM upgrade, then it recovered
**NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=140556
***GR-12-TEIKAV: SRM failures
**NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=142592
**NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=142592
***INFN-LECCE (recovered)
***INFN-LECCE (recovered)
***INFN-MILANO-ATLASC: frequent SRM failures
***INFN-MILANO-ATLASC: frequent SRM failures; overload due to atlas usage, problem is being investigated…
**NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=140557
**NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=140557
***TASK: still problems: SRM failures and QCG bug; The probe [https://poem.egi.eu/poem/admin/poem/public_probe/81/change/ eu.egi.QCG-Computing-CertValidity] doesn't manage to get the certificate information from the [https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_QCG.Computing&style=detail qcg computing hosts].
***TASK: QCG problem was solved, but the SRM issues are still there
**NGI_RO: https://ggus.eu/index.php?mode=ticket_info&ticket_id=142160
**NGI_RO: https://ggus.eu/index.php?mode=ticket_info&ticket_id=142160
***RO-02-NIPNE: Cooling system problem, migration to CentOS7
***RO-02-NIPNE: Cooling system problem, migration to CentOS7, long downtime...
**NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=142980
**NGI_UA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=142980
***UA-BITP: SRM not published in the BDII
***UA-BITP: SE will be re-installed as XrootD only, without SRM; asked to diasable SRM monitoring...
*Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''Sept 2019'''):
**NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=143506
**NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=143506
***WCSS64: qcg host certificate failures, CREAM-CE failures. The probe [https://poem.egi.eu/poem/admin/poem/public_probe/81/change/ eu.egi.QCG-Computing-CertValidity] doesn't manage to get the certificate information from the [https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_QCG.Computing&style=detail qcg computing hosts].
***WCSS64: qcg host certificate failures, CREAM-CE failures. The probe [https://poem.egi.eu/poem/admin/poem/public_probe/81/change/ eu.egi.QCG-Computing-CertValidity] doesn't manage to get the certificate information from the [https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_QCG.Computing&style=detail qcg computing hosts].
***the sites providing qcg computing can ask for a re-computation https://wiki.egi.eu/wiki/PROC10
***the sites providing qcg computing can ask for a re-computation https://wiki.egi.eu/wiki/PROC10
 
*Under-performed sites after 3 consecutive months, under-performed NGIs, QoS violations: ('''Oct 2019'''):
 
**NGI_FI: https://ggus.eu/index.php?mode=ticket_info&ticket_id=143944
 
***FI_TUT: IGTF failures
suspended sites: AEGIS01-IPB-SCL (NGI_AEGIS)
**NGI_IBERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=143946
***UPorto: SRM failures
**NGI_PL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=143945
***ICM: qcg and SRM failures...
*sites suspended:
**GR-12-TEIKAV


== IPv6 readiness plans  ==
== IPv6 readiness plans  ==

Revision as of 17:46, 2 December 2019

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


General information TO UPDATE

Middleware

UMD

CMD

Preview repository

  • released on 2019-08-16

Operations

ARGO/SAM

FedCloud

Feedback from DMSU

Monthly Availability/Reliability

IPv6 readiness plans

LCGDM end of support and migration to / enabling of DOME (TO UPDATE)

  • Deployment statistics (Oct 10th):
$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2GroupID=grid,o=glue" '(&(objectClass=GLUE2Manager)(GLUE2ManagerProductName=DPM))' GLUE2ManagerProductVersion GLUE2ManagerID | grep GLUE2ManagerProductVersion | sort | uniq -c
    11 GLUE2ManagerProductVersion: 1.10.0
     5 GLUE2ManagerProductVersion: 1.12.0
    47 GLUE2ManagerProductVersion: 1.13.0
     4 GLUE2ManagerProductVersion: 1.13.1
    10 GLUE2ManagerProductVersion: 1.8.10
     2 GLUE2ManagerProductVersion: 1.8.11
     1 GLUE2ManagerProductVersion: 1.8.7
     1 GLUE2ManagerProductVersion: 1.8.8
     4 GLUE2ManagerProductVersion: 1.8.9
    10 GLUE2ManagerProductVersion: 1.9.0


Liasing with WLCG to follow-up the upgrade. Opened GGUS tickets asking the following:

  • all the sites with older DPM versions than 1.12 are suggested to upgrade to the latest DPM version , following the guide DPM upgrade (chapter 1 Upgrade to DPM 1.10.0 "Legacy Flavour" and chapter 2 Upgrade to DPM 1.10.0 "Dome Flavour")
    • DOME and the old LCGDM (srm protocol) will coexist
  • Monitoring: sites should enable the monitoring of the HTTP/WebDav and/or GridFTP endpoints
    • register the storage service endpoint as webdav and/or globus-GRIDFTP service type, with production flag disabled, providing respectively the URL field and the Extension Properties information as explained in the HOWTO21
    • check if the tests are ok
    • switch the production flag to "yes"

List of tickets:

HTCondorCE integration (TO UPDATE)

Link to procedure: https://wiki.egi.eu/wiki/PROC19

GGUS ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=139377

Steps status:

AOB

Next meeting

November https://indico.egi.eu/indico/event/4830/ and there will be no meeting in December https://indico.egi.eu/indico/category/32/