Main

EGI.eu operations services

Support

Documentation

Tools

Activities

Performance

Technology

Catch-all Services

Resource Allocation

Security

Documentation menu:

Home •

Manuals •

Procedures •

Training •

Other •

Contact ►

For:

VO managers •

Administrators

General information

Middleware

UMD/CMD

Next UMD 4.6.0 regular release IN PROGRESS

- to be included: ARGUS, GFAL, CVMFS, gridsite, VOMS-ADMIN-server
- also CREAM, ARC, dCache in progress
- UMD 4.5.1 work to be merged with UMD 4.6.0

UMD3 deprecation: WMS to be dismissed by end of year, EGI Operations planning sending a note about end of support for UMD3 in line with WMS deprecation

CMD-OS update in preparation (cloudkeeper, cloudkeeper-os, ooi, gridsite, rocci-cli, cloud-bdii-info-provider)

CMD-ONE first release to be fixed adding site BDII

Preview repository

released on 2017-11-15
- Preview 1.15.0 AppDB info (sl6): ARC 15.03 update 17, dCache 2.16.53, XRootD 4.7.1
- Preview 2.15.0 AppDB info (CentOS 7): ARC 15.03 update 17, dCache 3.1.21, XRootD 4.7.1

Operations

ARGO/SAM

FedCloud

cASO upgrade campaign ongoing: https://wiki.egi.eu/wiki/Federated_Cloud_siteconf#cASO_upgrade CLOSED

Feedback from Helpdesk

Monthly Availability/Reliability

Underperformed sites in the past A/R reports with issues not yet fixed:
- AsiaPacific
  - T2-TH-SUT: CAs upgraded, A/R figures are improving https://ggus.eu/index.php?mode=ticket_info&ticket_id=130558
- NGI_BG: BG05-SUGrid Se put out of production https://ggus.eu/index.php?mode=ticket_info&ticket_id=130561
- NGI_GRNET: https://ggus.eu/index.php?mode=ticket_info&ticket_id=130908
  - HG-05-FORTH: problems with some worker nodes, recovering
Underperformed sites after 3 consecutive months, underperformed NGIs, QoS violations:
- AsiaPacific: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131661
  - IN-DAE-VECC-02, PK-CIIT
- NGI_IT: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131664
  - CNR-ILC-PISA

New weights for the NGIs average A/R values, based on Computation Power

We would like to implement a new way for computing the weights for the NGIs average A/R values, introducing the concept of CE's "computation power":

computation power = hep-spec * LogicalCPUs

This is a quantity that can be addable up over the CEs of a site (and over the sites). Until now it has been simply added up the CEs hep-spec values for getting a site global value, but this is not correct, because the hep-spec refers to a particular CE (to the cluster behind that particular CE) and it is not addable up. That is why, first of all, we asked VAPOR to implement the "computation power" as well as the site/NGI "average hep-spec". Have a look for example at the "figures" section: http://operations-portal.egi.eu/vapor/resources/GL2ResSummary

In the ARGO development instance the new weights have been used for computing the September average A/R values: http://web-egi-devel.argo.grnet.gr/lavoisier/ngi_reports?accept=html

We made a comparison between the values and the official ones: http://argo.egi.eu/lavoisier/ngi_reports?accept=html

As expected, there were some improvements and some worsening, perhaps more accentuated in the case of NGIs with few sites; with the new way the sites providing more than one CE (either with the same or different hep-spec) weight less than before (in the good and in the evil), because we compute an average hep-spec, not a simple sum over the benchmark values. Moreover several sites are still missing the necessary information for computing the weights in both the methods: check on VAPOR the values published by your sites in order to properly publishing in the GLUE2 schema the number of logical CPUs and the Hep-Spec06 benchmark.

Example of ldap query for checking if a site is publishing the HepSpec-06 benchmark:

$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2DomainID=pic,GLUE2GroupID=grid,o=glue" '(&(objectClass=GLUE2Benchmark)(GLUE2BenchmarkType=hep-spec06))'

dn: GLUE2BenchmarkID=ce07.pic.es_hep-spec06,GLUE2ResourceID=ce07.pic.es,GLUE2ServiceID=ce07.pic.es_ComputingElement,GLUE2GroupID=resource,GLUE2DomainID=pic,GLUE2GroupID=grid,o=glue
GLUE2BenchmarkExecutionEnvironmentForeignKey: ce07.pic.es
GLUE2BenchmarkID: ce07.pic.es_hep-spec06
GLUE2BenchmarkType: hep-spec06
objectClass: GLUE2Entity
objectClass: GLUE2Benchmark
GLUE2BenchmarkValue: 12.1205
GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static
GLUE2EntityOtherInfo: InfoProviderVersion=1.1
GLUE2EntityOtherInfo: InfoProviderHost=ce07.pic.es
GLUE2BenchmarkComputingManagerForeignKey: ce07.pic.es_ComputingElement_Manager
GLUE2EntityName: Benchmark hep-spec06
GLUE2EntityCreationTime: 2017-06-20T16:50:48Z

dn: GLUE2BenchmarkID=ce01.pic.es_hep-spec06,GLUE2ResourceID=ce01.pic.es,GLUE2ServiceID=ce01.pic.es_ComputingElement,GLUE2GroupID=resource,GLUE2DomainID=pic,GLUE2GroupID=grid,o=glue
GLUE2BenchmarkExecutionEnvironmentForeignKey: ce01.pic.es
GLUE2BenchmarkID: ce01.pic.es_hep-spec06
GLUE2BenchmarkType: hep-spec06
objectClass: GLUE2Entity
objectClass: GLUE2Benchmark
GLUE2BenchmarkValue: 13.4856
GLUE2EntityOtherInfo: InfoProviderName=glite-ce-glue2-benchmark-static
GLUE2EntityOtherInfo: InfoProviderVersion=1.1
GLUE2EntityOtherInfo: InfoProviderHost=ce01.pic.es
GLUE2BenchmarkComputingManagerForeignKey: ce01.pic.es_ComputingElement_Manager
GLUE2EntityName: Benchmark hep-spec06
GLUE2EntityCreationTime: 2017-09-05T07:34:26Z

Example of ldap query for getting the number of LogicalCPUs published by an ARC-CE (due to a bug in te info-provider, CREAM-CE publish the total number under the ExecutionEnvironment class):

$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2DomainID=UA_ILTPE_ARC,GLUE2GroupID=grid,o=glue" 'objectClass=GLUE2ComputingManager' GLUE2ComputingManagerTotalLogicalCPUs

dn: GLUE2ManagerID=urn:ogf:ComputingManager:ds4.ilt.kharkov.ua:pbs,GLUE2ServiceID=urn:ogf:ComputingService:ds4.ilt.kharkov.ua:arex,GLUE2GroupID=services,GLUE2DomainID=UA_ILTPE_ARC,GLUE2GroupID=grid,o=glue
GLUE2ComputingManagerTotalLogicalCPUs: 168

Example of ldap query for getting the number of LogicalCPUs published by a CREAM-CE:

$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2DomainID=UKI-SOUTHGRID-SUSX,GLUE2GroupID=grid,o=glue" 'objectClass=GLUE2ExecutionEnvironment' GLUE2ExecutionEnvironmentLogicalCPUs 
GLUE2ExecutionEnvironmentPhysicalCPUs GLUE2ExecutionEnvironmentTotalInstances

dn: GLUE2ResourceID=grid-cream-02.hpc.susx.ac.uk,GLUE2ServiceID=grid-cream-02.hpc.susx.ac.uk_ComputingElement,GLUE2GroupID=resource,GLUE2DomainID=UKI-SOUTHGRID-SUSX,GLUE2GroupID=grid,o=glue
GLUE2ExecutionEnvironmentTotalInstances: 71
GLUE2ExecutionEnvironmentLogicalCPUs: 568
GLUE2ExecutionEnvironmentPhysicalCPUs: 71

Manual for Hepspec06 benchmark.

In December the new way will be moved in production, so if during October many sites fix the information, the new NGIs A/R average values will improve.

Decommissioning EMI WMS

As discussed at the February and April/May OMBs, we are making plans for decommissioning the WMS and moving to DIRAC.

NGIs provided WMS usage statistics, and in general the usage is relatively low, mainly for local testing

Moderate usage by few VOs:

NGI_CZ: eli-beams.eu
NGI_GRNET: see
NGI_IT: calet.org, compchem, theophys, virgo
NGI_PL: gaussian, vo.plgrid.pl, vo.nedm.cyfronet
NGI_UK: mice, t2k.org

EGI contacted these VOs to agree a smooth migration of their activities to DIRAC, only some of them replied till now:

compchem is already testing DIRAC4EGI
calet.org: discussing with the users the migration to DIRAC. Interested in a webinar on DIRAC.
mice: enabled on the GridPP DIRAC server
eli-beams.eu enabled on DIRAC4EGI for performing tests

We need the VOs feedback for better defining technical details and timeline:

NGIs with VOs using WMS (not necessarily limited to the VOs above), please contact them to ensure that these VOs have a back-up plan.

The plan

WMS servers can be decommissioned as soon as the supported VOs do not need them any more: Please follow the procedure PROC13. The proposal is:

Starting from January 2018, put the WMS servers in draining: this will block the submission of new jobs and will allow the jobs previously submitted to finish
- inform in advance your users that you are going to put in draining and then dismiss the WMS servers (as per PROC13)
- there might be several VOs enabled on your WMS servers: in case only few of them need to use the service for few weeks more, you might disable the other VOs
EGI Operations will send a new broadcast to the VOs reminding the users the forthcoming WMS decommission
After the end of February, EGI Operations will open a ticket to the sites that haven't started the decommission process yet

VOs have about 1 months to find alternatives or migrate to DIRAC:

the HOWTO22 explains how a VO can request the access to DIRAC4EGI and how interact with it by CLI

IPv6 readiness plans

assessment ongoing https://wiki.egi.eu/w/index.php?title=IPV6_Assessment
still missing NGIs/ROCs

Decommissioning of dCache 2.10 and 2.13

decommissioning campaign started by EGI Operations http://go.egi.eu/decommdcache213
- still left: CA-VICTORIA-WESTGRID-T2, CA-SCINET-T2, INFN-ROMA1-CMS

webdav probes in production

The webdav probes have been deployed in production. Several sites publish the webdav protocol in the BDII: they have been asked to register the endpoint on GOC-DB and to enable the monitoring, if it wasn't already done.

webdav endpoints registered in GOC-DB: https://goc.egi.eu/gocdbpi/public/?method=get_service&&service_type=webdav
link to nagios results: https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_webdav&style=detail

List of sites ant tickets:

AsiaPacific: JP-KEK-CRC-02 https://ggus.eu/index.php?mode=ticket_info&ticket_id=131031 (SOLVED)
NGI_AEGIS: AEGIS01-IPB-SCL https://ggus.eu/index.php?mode=ticket_info&ticket_id=131033 (in progress...)
NGI_CH: UNIGE-DPNC https://ggus.eu/index.php?mode=ticket_info&ticket_id=131034 (on hold, currently under staffed)
NGI_DE: UNI-SIEGEN-HEP https://ggus.eu/index.php?mode=ticket_info&ticket_id=131036
NGI_GRNET:
- GR-01-AUTH https://ggus.eu/index.php?mode=ticket_info&ticket_id=131037 (disabling...)
- HG-03-AUTH https://ggus.eu/index.php?mode=ticket_info&ticket_id=131038 (disabling...)
NGI_HR: egee.irb.hr, egee.srce.hr https://ggus.eu/index.php?mode=ticket_info&ticket_id=131041 (in progress...)
NGI_IBERGRID:
- CETA-GRID https://ggus.eu/index.php?mode=ticket_info&ticket_id=131042 (disabled, SOLVED)
- IFIC-LCG2 https://ggus.eu/index.php?mode=ticket_info&ticket_id=131043 (SOLVED)
- NCG-INGRID-PT
NGI_FRANCE: GRIF-IPNO, GRIF-LAL, GRIF-LPNHE
NGI_IL:
- IL-TAU-HEP https://ggus.eu/index.php?mode=ticket_info&ticket_id=131044 (SOLVED)
- TECHNION-HEP https://ggus.eu/index.php?mode=ticket_info&ticket_id=131045 (SOLVED)
- WEIZMANN-LCG2 https://ggus.eu/index.php?mode=ticket_info&ticket_id=131047 (SOLVED)
NGI_IT:
- IGI-BOLOGNA, INFN-T1
- INFN-GENOVA https://ggus.eu/index.php?mode=ticket_info&ticket_id=131049 (SOLVED)
- INFN-MILANO-ATLASC https://ggus.eu/index.php?mode=ticket_info&ticket_id=131050 (enabled...)
- INFN-ROMA3 https://ggus.eu/index.php?mode=ticket_info&ticket_id=131051 (done)
NGI_PL: CYFRONET-LCG2, WUT https://ggus.eu/index.php?mode=ticket_info&ticket_id=131052 (disabling...)
NGI_UK: UKI-NORTHGRID-LIV-HEP, UKI-NORTHGRID-MAN-HEP https://ggus.eu/index.php?mode=ticket_info&ticket_id=131053 (SOLVED)
ROC_CANADA: CA-MCGILL-CLUMEQ-T2 https://ggus.eu/index.php?mode=ticket_info&ticket_id=131054 (SOLVED)

For registering on GOC-DB the webdav service endpoint, follow the HOWTO21 in order to filling in the proper information. In particular:

register a new service endpoint, separated from the SRM one;
on GOC-DB fill in the webdav URL containing also the VO ops folder, for example: https://darkstorm.cnaf.infn.it:8443/webdav/ops or https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ops/
- it corresponds to the value of GLUE2 attribute GLUE2EndpointURL (containing the used port and without the VO folder);
verify that the webdav url (for example: https://darkstorm.cnaf.infn.it:8443/webdav ) is properly accessible.

Storage accounting deployment

During the September meeting, OMB has approved the full-scale deployment of storage accounting. The APEL team has tested it with a group of early adopters sites, and the results prove that storage accounting is now production-ready.

Storage accounting is currently supported only for the DPM and dCache storage elements therefore only the resource centres deploying these kind of storage elements are requested to publish storage accounting data.

In order to properly install and configure the storage accounting scripts, please follow the instructions reported in the wiki: https://wiki.egi.eu/wiki/APEL/Storage

IMPORTANT: be sure to have installed the star-accounting.py script v1.0.4 (http://svnweb.cern.ch/world/wsvn/lcgdm/lcg-dm/trunk/scripts/StAR-accounting/star-accounting.py)

After setting up a daily cron job and running the accounting software, look for your data in the Accounting Portal: http://goc-accounting.grid-support.ac.uk/storagetest/storagesitesystems.html. If it does not appear within 24 hours, or there are other errors, please open a GGUS ticket to APEL who will help debug the process.

List of sites already publishing and of tickets opened is reported here.

PROBLEM: several (DPM) sites are using an old version of the star-accounting.py script. This leads to records having an EndTime 30 days in the future. The star-accounting.py script version to use is v1.0.4 (http://svnweb.cern.ch/world/wsvn/lcgdm/lcg-dm/trunk/scripts/StAR-accounting/star-accounting.py).

The APEL team opened tickets for this issue:

AEGIS02-RCUB: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131892 (SOLVED)
AEGIS03-ELEF-LEDA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131893 (SOLVED)
AUVERGRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131894
CAMK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131895 (SOLVED)
CETA-GRID: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131896
GARR-01-DIR: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131897 (SOLVED)
IN2P3-LPC: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131917
RO-02-NIPNE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131918 (SOLVED)
RO-07-NIPNE: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131920
TOKYO-LCG2: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131921
TW-NTU-HEP: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131923
UA-ISMA: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131925
UKI-NORTHGRID-SHEF-HEP: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131926
TASK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=131928

AOB

Next meeting

Dec 11th, 2017 https://indico.egi.eu/indico/event/3355/

Agenda-2017-11-20

Contents

General information

Middleware

UMD/CMD

Preview repository

Operations

ARGO/SAM

FedCloud

Feedback from Helpdesk

Monthly Availability/Reliability

New weights for the NGIs average A/R values, based on Computation Power

Decommissioning EMI WMS

The plan

IPv6 readiness plans

Decommissioning of dCache 2.10 and 2.13

webdav probes in production

Storage accounting deployment

AOB

Next meeting

Navigation menu

Agenda-2017-11-20

General information

Middleware

UMD/CMD

Preview repository

Operations

ARGO/SAM

FedCloud

Feedback from Helpdesk

Monthly Availability/Reliability

New weights for the NGIs average A/R values, based on Computation Power

Decommissioning EMI WMS

The plan

IPv6 readiness plans

Decommissioning of dCache 2.10 and 2.13

webdav probes in production

Storage accounting deployment

AOB

Next meeting

Navigation menu

Search