General information

Middleware

EMI repository shut down on June 15th https://operations-portal.egi.eu/broadcast/archive/1715

UMD/CMD

CMD-OS 1.1.2 (C7/Xenial) is out
- CentOS7 (bdii-infoprovider 0.7.0, rOCCI client 4.3.8, APEL SSM 2.1.7, Infrastructure Manager 1.5.1, Site BDII 1.2.1, ooi 1.1.1, keystone-VOMS 9.0.4, cASO 1.1.0)
- Ubuntu Xenial (bdii-infoprovider 0.7.0, rOCCI client 4.3.8, Infrastructure Manager 1.5.1, ooi 1.1.1, keystone-VOMS 9.0.4, cASO 1.1.0,
CMD-ONE dry run successful
- including products for OpenNebula 5 for CentOS7 (Ubuntu not requested by FedCloud)
- Staged-Rollout ongoing
UMD 4.5 (June, delayed to July) in progress
- WN and CREAM for C7
- ARGUS 1.7.2
- APEL, DynaFed, XROOTD, dCache, QCG, ARC

Preview repository

Released on 2017-07-07:

Preview 1.13.0 AppDB info (sl6): ARC 15.03 u15, dCache 2.16.40, frontier-squid 3.5.24-3.1, LCGdm-dav 0.18.2, QCG Broker 4.2.0
Preview 2.13.0 AppDB info (CentOS 7): ARC 15.03 u15, ARGUS 1.7.1, CREAM 1.16.5, dCache 3.1.9 & SRM client 3.0.11, frontier-squid 3.5.24-3.1, LCGdm-dav 0.18.2, QCG Broker 4.2.0

Operations

ARGO/SAM

ARGO shows last 3 days unknown, to be fixed in hours, no data loss, just recalculation from friday

Testing FedCloud sites

Credits to Baptiste Grenier (EGI Operations). Using fedcloud.egi.eu, https://appdb.egi.eu/store/vappliance/egi.centos.6, and https://github.com/EGI-Foundation/sscmon-occi to execute the tests.

Site	Status
BEgrid-BELNET	OK
CESGA	OK
CESNET-MetaCloud	OK
IISAS-FedCloud	OK
IN2P3-IRES	OK
INFN-CATANIA-STACK	OK
INFN-PADOVA-STACK	OK
RECAS-BARI	OK
TR-FC1-ULAKBIM	OK
BIFI	errors about floating IP pool
CLOUDIFIN	no default network, some VAs not synced
CYFRONET-CLOUD	Closed ports on public IP. Using old version of OCCI-OS and OpenStack Juno, site upgrade in progress.
HG-09-Okeanos-Cloud	cloudkeeper was installed, missing appliance. Site BDII updated but almost empty, hence very difficult to use.
FZJ	Server unavailable (OCCI endpoint), upgrade of OS to mitaka and OOI ongoing with troubles, openstack image list fails (but openstack flavor list succeeds). Working from time to time, unstable. Downtime published in GOCDB. Waiting for site admin to confirm that upgrade is over and troubles were fixed.
100IT	No default network, need to link the net1 network on VM creation
GoeGrid	On hold, reinstalling with ONE5 to use cloudkeeper with no downtime in GocDB, 9 GGUS tickets open.
IFCA-LCG2	Cannot list networks.
SCAI	No more works manually and not with scripts as there is no default network and endpoint is Critical in ARGO, moving to cloudkeeper-OS
UPV-GRyCAP	Moved to cloudkeeper and to cloud-info-provider 0.8.3. Able to create VM manually, but it is not possible to link the public network, Carlos is working on it
IISAS-Nebula	site does not support fedcloud.egi.eu
IISAS-GPUCloud	GP-GPU-specific site, does not support fedcloud.egi.eu
NCG-INGRID-PT	keystone v3 with OpenID Connect (experimental).

Feedback from Helpdesk

yearly review of the information registered into GOC-DB

2017-04-07

On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:

NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
- E-Mail
- ROD E-Mail
- Security E-Mail

NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;

RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
- E-Mail
- telephone numbers
- CSIRT E-Mail

RC administrators should also review the information related to the registered service endpoints.

The process should be completed by Apr 28th.

To track the process, a series of tickets have been opened.

2017-07-13 UPDATE:

AfricaArabia, NGI_IT, NGI_NL still checking;
no feedback yet by: NGI_DE;
status of NGI_IL Operations centre is uncertain: we are verifying it

Monthly Availability/Reliability

Underperformed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific
  - TW-NCUHEP: site-bdii unstable for network issues with ARGO, issues solved, figures are improving https://ggus.eu/index.php?mode=ticket_info&ticket_id=128083
  - KR-UOS-SSCC: there were srm problems, now also CREAM failures, proposed the suspension https://ggus.eu/index.php?mode=ticket_info&ticket_id=127024
- NGI_IL: https://ggus.eu/index.php?mode=ticket_info&ticket_id=128886 QoS violation: we are verifying the status of the Operations Centre
- NGI_PL (IFJ-PAN-BG) https://ggus.eu/index.php?mode=ticket_info&ticket_id=128889 perhaps the site will be decommissioned, no manpower.
- ROC_Canada: https://ggus.eu/index.php?mode=ticket_info&ticket_id=128097
  - CA-MCGILL-CLUMEQ-T2 the figures are improving, but still some failures

Underperformed sites after 3 consecutive months, underperformed NGIs, QoS violations:
- AfricaArabia (ZA-MERAKA, ZA-UJ): https://ggus.eu/index.php?mode=ticket_info&ticket_id=129364
- AsiaPacific (Taiwan-LCG2): https://ggus.eu/index.php?mode=ticket_info&ticket_id=129367
- ROC_CERN: https://ggus.eu/index.php?mode=ticket_info&ticket_id=129368 (QoS) (SOLVED)
- NGI_AEGIS: https://ggus.eu/index.php?mode=ticket_info&ticket_id=129369 (SOLVED)
- NGI_BG (BG01-IPP) https://ggus.eu/index.php?mode=ticket_info&ticket_id=129370
- NGI_CH https://ggus.eu/index.php?mode=ticket_info&ticket_id=129373 (QoS)
- NGI_CZ (prague_cesnet_lcg2) https://ggus.eu/index.php?mode=ticket_info&ticket_id=129372
- NGI_FRANCE https://ggus.eu/index.php?mode=ticket_info&ticket_id=129375 (QoS)
- NGI_IBERGRID https://ggus.eu/index.php?mode=ticket_info&ticket_id=129376 (QoS)
- NGI_IT https://ggus.eu/index.php?mode=ticket_info&ticket_id=129381
  - HEPHY-UIBK: problem with Expired certificates and unresponsive CA. Now A/R figures are increasing
  - INFN-ROMA1-CMS: bug in the nagios probes for the CREAM, ticket GGUS 128151
- NGI_PL https://ggus.eu/index.php?mode=ticket_info&ticket_id=129382 (QoS)
- NGI_UA https://ggus.eu/index.php?mode=ticket_info&ticket_id=129468 (QoS)
- NGI_UK (UKI-SOUTHGRID-SUSX) https://ggus.eu/index.php?mode=ticket_info&ticket_id=129383 there wasn't a reserved job slot for the ops VO

suspended sites: ZA-UCT-ICTS, MY-USM-GCL, UA-NSCMBR

Decommissioning EMI WMS

As discussed at the February and April/May OMBs, we are making plans for decommissioning the WMS and moving to DIRAC.

NGIs provided WMS usage statistics, and in general the usage is relatively low, mainly for local testing

Moderate usage by few VOs:

NGI_CZ: eli-beams.eu
NGI_GRNET: see
NGI_IT: calet.org, compchem, theophys, virgo
NGI_PL: gaussian, vo.plgrid.pl, vo.nedm.cyfronet
NGI_UK: mice, t2k.org

EGI contacted these VOs to agree a smooth migration of their activities to DIRAC, only some of them replied till now:

compchem is already testing DIRAC
calet.org: discussing with the users the migration to DIRAC. Interested in a webinar on DIRAC.
mice: enabled on the GridPP DIRAC server

We need the VO feedback for better defining technical details and timeline:

NGIs with VOs using WMS (not necessarily limited to the VOs above), please contact them to ensure that these VOs have a back-up plan.

WMS servers can be decommissioned as soon as the supported VOs do not need them any more. The proposal is:

WMS will be removed from production starting from 1st January 2018.
- VOs have 5 months to find alternatives or migrate to DIRAC
Considering that this is not an update, the decommission can be performed in few weeks.

IPv6 readiness plans

- Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
  - NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan

Decommissioning of dCache 2.10 and 2.13

support for the dCache 2.10 ended at December 2016, tickets opened by EGI Operations to track decommissioning
dCache 2.13 decommissioning procedure started, in June the probes will get CRITICAL, support from dCache ends in July, upgrades to be performed by August
please upgrade to 2.16, whose support ends on May 2018, or to 3.0
- take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.13 instances and follow up with the NGIs/sites at the beginning of August

webdav probes in production

The webdav probes have been deployed in production. Some sites were already contacted for enabling the monitoring of their webdav endpoints:

Site	Host	GGUSID	note
CYFRONET-LCG2	se01.grid.cyfronet.pl	https://ggus.eu/index.php?mode=ticket_info&ticket_id=128325	SOLVED
GRIF	node12.datagrid.cea.fr	https://ggus.eu/index.php?mode=ticket_info&ticket_id=128329
IGI-BOLOGNA	darkstorm.cnaf.infn.it	https://ggus.eu/index.php?mode=ticket_info&ticket_id=127930	SOLVED
INFN-T1	removed	https://ggus.eu/index.php?mode=ticket_info&ticket_id=128326	SOLVED
NCG-INGRID-PT	gftp01.ncg.ingrid.pt	https://ggus.eu/index.php?mode=ticket_info&ticket_id=128327	SOLVED
UKI-NORTHGRID-LIV-HEP	hepgrid11.ph.liv.ac.uk	https://ggus.eu/index.php?mode=ticket_info&ticket_id=128328	SOLVED
egee.irb.hr	lorienmaster.irb.hr

link to nagios results: https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_webdav&style=detail

Several sites are publishing in the BDII the webdav endpoints:

AsiaPacific: JP-KEK-CRC-02
NGI_AEGIS: AEGIS01-IPB-SCL
NGI_CH: UNIGE-DPNC, UNIBE-LHEP
NGI_DE: UNI-SIEGEN-HEP
NGI_GRNET: GR-01-AUTH, HG-03-AUTH
NGI_HR: egee.irb.hr, egee.srce.hr
NGI_IBERGRID: CETA-GRID, NCG-INGRID-PT
NGI_FRANCE: GRIF-IPNO, GRIF-LAL, GRIF-LPNHE
NGI_IL: IL-TAU-HEP, TECHNION-HEP, WEIZMANN-LCG2
NGI_IT: IGI-BOLOGNA, INFN-GENOVA, INFN-MILANO-ATLASC, INFN-ROMA3, INFN-T1
NGI_PL: CYFRONET-LCG2, WUT
NGI_RO: NIHAM
NGI_UK: UKI-NORTHGRID-LIV-HEP, UKI-NORTHGRID-MAN-HEP
ROC_CANADA: CA-MCGILL-CLUMEQ-T2

Checked with:

$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2GroupID=grid,o=glue" '(&(objectClass=GLUE2Endpoint)(GLUE2EndpointInterfaceName=webdav))' GLUE2EndpointImplementationName GLUE2EndpointURL

ACTIONS for NGIs and sites: The Operations Centres are asked to verify with their sites if the webdav protocol is really (intentional) enabled on their storage elements (if not, the information should be removed from the BDII), and report to EGI Operations

The webdav service endpoint should be registered in GOC-DB for being properly monitored: the nagios probes are executed using the VO ops, so please ensure that the protocol is enabled for ops VO as well
the webdav probes are harmless: they are not in any critical profile, they don't raise any alarm in the operations dashboard, and the A/R figures are not affected. We need time and more sites for gathering statistics on their results before making them critical.

For registering on GOC-DB the webdav service endpoint, follow the HOWTO21 in order to filling in the proper information. In particular:

on GOC-DB fill in the webdav URL containing also the VO ops folder, for example: https://darkstorm.cnaf.infn.it:8443/webdav/ops or https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ops/
- it corresponds to the value of GLUE2 attribute GLUE2EndpointURL (containing the used port and without the VO folder)
verify that the webdav url (for example: https://darkstorm.cnaf.infn.it:8443/webdav ) is properly accessible

Testing of the storage accounting

As discussed during the January OMB, the APEL team would need one site per NGI for testing the storage accounting. The eligible sites are the ones providing either dCache or DPM storage elements.

More information can be found in the following wiki: https://wiki.egi.eu/wiki/APEL/Storage

List of sites available for test.

2017-07-14 UPDATE (more details in the June OMB presentation):

31 sites are sending storage accounting data (only from dCache and DPM SEs); The data validation is on-going.
It was created a new service type on GOC-DB, eu.egi.storage.accounting, which will be used for:
- authorising the site/SE to publish the accounting data
- making the site/SE appear in the portal
- monitoring that the accounting data are regularly published
by September we should be ready for a wide roll-out of storage accounting
- detailed instructions for the sites will be circulated

AOB

Next meeting

Aug 7th, 2017 https://indico.egi.eu/indico/event/3351/
do we move to Aug 21th, 2017? (previuos meeting is today, far enough)
switching to GoToMeeting from next meeting on (cannot make it for today due to technical issues with the plugin)

Agenda-17-07-2017

Contents

General information

Middleware

UMD/CMD

Preview repository

Operations

ARGO/SAM

Testing FedCloud sites

Feedback from Helpdesk

yearly review of the information registered into GOC-DB

Monthly Availability/Reliability

Decommissioning EMI WMS

IPv6 readiness plans

Decommissioning of dCache 2.10 and 2.13

webdav probes in production

Testing of the storage accounting

AOB

Next meeting

Navigation menu

Agenda-17-07-2017

General information

Middleware

UMD/CMD

Preview repository

Operations

ARGO/SAM

Testing FedCloud sites

Feedback from Helpdesk

yearly review of the information registered into GOC-DB

Monthly Availability/Reliability

Decommissioning EMI WMS

IPv6 readiness plans

Decommissioning of dCache 2.10 and 2.13

webdav probes in production

Testing of the storage accounting

AOB

Next meeting

Navigation menu

Search