Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @

Difference between revisions of "Agenda-21-08-2017"

From EGIWiki
Jump to navigation Jump to search
Line 103: Line 103:
*Underperformed sites in the past A/R reports with issues not yet fixed:
*Underperformed sites in the past A/R reports with issues not yet fixed:
** '''AsiaPacific'''
** '''AsiaPacific'''
*** TW-NCUHEP: site-bdii unstable for network issues with ARGO, issues solved, figures are improving
*** TW-NCUHEP: still undeperforming for frequent failures
***KR-UOS-SSCC: there were srm problems, now also CREAM failures, proposed the suspension
***KR-UOS-SSCC: there were srm problems, now also CREAM failures, proposed the suspension
**'''NGI_IL''': QoS violation: we are verifying the status of the Operations Centre
**'''NGI_PL''' (IFJ-PAN-BG) perhaps the site will be decommissioned, no manpower.
***CA-MCGILL-CLUMEQ-T2: still some failures
**NGI_BG (BG01-IPP) : suggested to mark the SE as not production
***CA-MCGILL-CLUMEQ-T2 the figures are improving, but still some failures
***HEPHY-UIBK: recovered
***INFN-ROMA1-CMS: still underperforming, but the bug in the nagios probes for the CREAM (ticket GGUS 128151) is then dissappeared,  

*Underperformed sites after 3 consecutive months, underperformed NGIs, QoS violations:
*Underperformed sites after 3 consecutive months, underperformed NGIs, QoS violations:
**'''AfricaArabia''' (ZA-MERAKA, ZA-UJ):
**ROC_CERN QoS violation
**'''AsiaPacific''' (Taiwan-LCG2):
**'''ROC_CERN''': (QoS) (SOLVED)
**'''NGI_BG''' (BG01-IPP)
**'''NGI_CH''' (QoS)
**'''NGI_CZ''' (prague_cesnet_lcg2)
**'''NGI_FRANCE''' (QoS)
**'''NGI_IBERGRID''' (QoS)
**NGI_UK QoS violation (SOLVED)
***HEPHY-UIBK: problem with Expired certificates and unresponsive CA. Now A/R figures are increasing
***INFN-ROMA1-CMS: bug in the nagios probes for the CREAM, ticket GGUS 128151
**'''NGI_PL''' (QoS)
**'''NGI_UA''' (QoS)
**'''NGI_UK''' (UKI-SOUTHGRID-SUSX) there wasn't a reserved job slot for the ops VO

'''suspended sites: ZA-UCT-ICTS, MY-USM-GCL, UA-NSCMBR'''
suspended sites: IFJ-PAN-BG, ZA-MERAKA, ZA-UJ

== Decommissioning EMI WMS  ==
== Decommissioning EMI WMS  ==

Revision as of 12:07, 21 August 2017

General information



Preview repository



Testing FedCloud sites

Feedback from Helpdesk

yearly review of the information registered into GOC-DB


On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:

  1. NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
    • E-Mail
    • ROD E-Mail
    • Security E-Mail
NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;
  1. RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
    • E-Mail
    • telephone numbers
    • CSIRT E-Mail
RC administrators should also review the information related to the registered service endpoints.

The process should be completed by Apr 28th.

To track the process, a series of tickets have been opened.

2017-07-13 UPDATE:

  • AfricaArabia, NGI_IT, NGI_NL still checking;
  • no feedback yet by: NGI_DE;
  • status of NGI_IL Operations centre is uncertain: we are verifying it

Monthly Availability/Reliability

suspended sites: IFJ-PAN-BG, ZA-MERAKA, ZA-UJ

Decommissioning EMI WMS

As discussed at the February and April/May OMBs, we are making plans for decommissioning the WMS and moving to DIRAC.

NGIs provided WMS usage statistics, and in general the usage is relatively low, mainly for local testing

Moderate usage by few VOs:

  • NGI_CZ:
  • NGI_GRNET: see
  • NGI_IT:, compchem, theophys, virgo
  • NGI_PL: gaussian,, vo.nedm.cyfronet
  • NGI_UK: mice,

EGI contacted these VOs to agree a smooth migration of their activities to DIRAC, only some of them replied till now:

  • compchem is already testing DIRAC
  • discussing with the users the migration to DIRAC. Interested in a webinar on DIRAC.
  • mice: enabled on the GridPP DIRAC server

We need the VO feedback for better defining technical details and timeline:

  • NGIs with VOs using WMS (not necessarily limited to the VOs above), please contact them to ensure that these VOs have a back-up plan.

WMS servers can be decommissioned as soon as the supported VOs do not need them any more. The proposal is:

  • WMS will be removed from production starting from 1st January 2018.
    • VOs have 5 months to find alternatives or migrate to DIRAC
  • Considering that this is not an update, the decommission can be performed in few weeks.

IPv6 readiness plans

    • Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
      • NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan

Decommissioning of dCache 2.10 and 2.13

  • support for the dCache 2.10 ended at December 2016, tickets opened by EGI Operations to track decommissioning
  • dCache 2.13 decommissioning procedure started, in June the probes will get CRITICAL, support from dCache ends in July, upgrades to be performed by August
  • please upgrade to 2.16, whose support ends on May 2018, or to 3.0
    • take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
  • decommissioning campaign started by EGI Operations

webdav probes in production

The webdav probes have been deployed in production. Some sites were already contacted for enabling the monitoring of their webdav endpoints:

Site Host GGUSID note
INFN-T1 removed SOLVED

link to nagios results:

Several sites are publishing in the BDII the webdav endpoints:

  • AsiaPacific: JP-KEK-CRC-02
  • NGI_HR:,

Checked with:

$ ldapsearch -x -LLL -H ldap:// -b "GLUE2GroupID=grid,o=glue" '(&(objectClass=GLUE2Endpoint)(GLUE2EndpointInterfaceName=webdav))' GLUE2EndpointImplementationName GLUE2EndpointURL

ACTIONS for NGIs and sites: The Operations Centres are asked to verify with their sites if the webdav protocol is really (intentional) enabled on their storage elements (if not, the information should be removed from the BDII), and report to EGI Operations

  • The webdav service endpoint should be registered in GOC-DB for being properly monitored: the nagios probes are executed using the VO ops, so please ensure that the protocol is enabled for ops VO as well
  • the webdav probes are harmless: they are not in any critical profile, they don't raise any alarm in the operations dashboard, and the A/R figures are not affected. We need time and more sites for gathering statistics on their results before making them critical.

For registering on GOC-DB the webdav service endpoint, follow the HOWTO21 in order to filling in the proper information. In particular:

Testing of the storage accounting

As discussed during the January OMB, the APEL team would need one site per NGI for testing the storage accounting. The eligible sites are the ones providing either dCache or DPM storage elements.

More information can be found in the following wiki:

List of sites available for test.

2017-07-14 UPDATE (more details in the June OMB presentation):

  • 31 sites are sending storage accounting data (only from dCache and DPM SEs); The data validation is on-going.
  • It was created a new service type on GOC-DB,, which will be used for:
    • authorising the site/SE to publish the accounting data
    • making the site/SE appear in the portal
    • monitoring that the accounting data are regularly published
  • by September we should be ready for a wide roll-out of storage accounting
    • detailed instructions for the sites will be circulated


Next meeting