Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-21-08-2017"

From EGIWiki
Jump to navigation Jump to search
(Created page with "{{TOC right}} = General information = = Middleware = * EMI repository shut down on June 15th https://operations-portal.egi.eu/broadcast/archive/1715 == UMD/CMD == * CMD-O...")
 
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{TOC right}}  
{{Template:Op menubar}} {{Template:Doc_menubar}} {{TOC_right}}
[[Category:Grid Operations Meetings]]


= General information  =
= General information  =


= Middleware  =
= Middleware  =
* EMI repository shut down on June 15th https://operations-portal.egi.eu/broadcast/archive/1715


== UMD/CMD ==
== UMD/CMD ==


* CMD-OS 1.1.2 (C7/Xenial) is out
* CMD-OS 1 (Mitaka) update
** CentOS7 (bdii-infoprovider 0.7.0, rOCCI client 4.3.8, APEL SSM 2.1.7, Infrastructure Manager 1.5.1, Site BDII 1.2.1, ooi 1.1.1, keystone-VOMS 9.0.4, cASO 1.1.0)
** inclusion of cloudkeeper-os/cloudkeeper ongoing https://ggus.eu/?mode=ticket_info&ticket_id=129660
** Ubuntu Xenial (bdii-infoprovider 0.7.0, rOCCI client 4.3.8, Infrastructure Manager 1.5.1, ooi 1.1.1, keystone-VOMS 9.0.4, cASO 1.1.0,
** planned inclusion of user id isolation patch for Mitaka
* CMD-ONE dry run successful
** APEL team asked to include cASO 1.1.1
** including products for OpenNebula 5 for CentOS7 (Ubuntu not requested by FedCloud)
* CMD-ONE 1 (ONE5/C7) first major release
** Staged-Rollout ongoing
** products included and verified, Staged-Rollout ongoing
* UMD 4.5 (June, delayed to July) in progress
* UMD 4.5 released http://repository.egi.eu/2017/08/10/release-umd-4-5-0/
** WN and CREAM for C7
** SL6
** ARGUS 1.7.2
*** APEL 1.5.1 - Add support for Torque 5.1.2 time duration format; change dirq call to use absolute path to support versions of dirq >= 1.7; fix crash when StAR loader encouters a valid XML file with no records in it.
** APEL, DynaFed, XROOTD, dCache, QCG, ARC
*** APEL-SSM 2.1.7 - Added a delay when receiver is reconnecting to improve reliability. Improved the log output for SSM receivers so that there are fewer trivial entries and so it's more useful in tracking messages on the filesystem.
*** CVMFS 2.3.5 - various bug fixes (see release notes http://cvmfs.readthedocs.io/en/2.3/cpt-releasenotes.html)
*** DynaFed 1.3.1 - various bug fixes and improvements (see http://lcgdm.web.cern.ch/dynafed-131-available-epel-testing)
*** ARC 15.03.14 - bugfix release, previous version 15.03u13 suffering from a memory leak causing crashes
*** Globus GridFTP 11.8.4 - new build with 2 patches from DPM applied
*** dpm-dsi 1.9.13 - bug fix (dpm-gsiftp startup script seen to stop and then report daemon already running); use the new gridftp api to remove hack for gridftp redirection.
*** Davix 0.6.6 - bug fixes http://dmc.web.cern.ch/release/davix-060
*** dmlite 0.8.6 - bug fix release bringing small changes http://lcgdm.web.cern.ch/dmlite-086-released-epel
*** gfal2-utils 1.5.0 - bug fixes/improvements http://dmc.web.cern.ch/release/gfal2-util-1.5.0
*** CERN Frontier 3.5.25 - see release notes http://frontier.cern.ch/dist/rpms-debug/frontier-squidRELEASE_NOTES
*** ARC nagios-probes 1.9.1 - bug fixes, see release notes http://www.nordugrid.org/arc/releases/15.03u14/release_notes_15.03u14.html
*** QCG Broker 4.2.0 - added support for Array Jobs; instant information about resources for the new qcg-resources command; bug fixes
*** gfal2-python 1.9.2 - see release notes http://dmc.web.cern.ch/release/gfal2-python-1.9.2
*** XROOTD 4.6.1 - several new features and bug fixes http://xrootd.org/download/ReleaseNotes.html
*** dCache 3.0.25 - see release notes https://www.dcache.org/downloads/1.9/release-notes-3.0.shtml#25
*** FTS 3.6.8 - several new features and bug fixes http://fts3-service.web.cern.ch/documentation/releases#qt-release-ui-tabs3
*** lcgdm 0.18.2 - bug fixes
*** gfal2 2.13.4 - Default checksum for local copy was removed
** CentOS7
*** APEL 1.5.1 - first release in UMD4/C7. Add support for Torque 5.1.2 time duration format; change dirq call to use absolute path to support versions of dirq >= 1.7; fix crash when StAR loader encouters a valid XML file with no records in it.
*** APEL-SSM 2.1.7 - first release in UMD4/C7. Added a delay when receiver is reconnecting to improve reliability. Improved the log output for SSM receivers so that there are fewer trivial entries and so it's more useful in tracking messages on the filesystem.
*** CVMFS 2.3.5 - first release in UMD4/C7. Various bug fixes (see release notes http://cvmfs.readthedocs.io/en/2.3/cpt-releasenotes.html)
*** WN 4.0.5 - first release in UMD4/C7
*** DynaFed 1.3.1 - first release in UMD4/C7
*** ARC 15.03.14 - bugfix release, previous version 15.03u13 suffering from a memory leak causing crashes
*** Globus GridFTP 11.8.4 - new build with 2 patches from DPM applied
*** dpm-dsi 1.9.13 - bug fix (dpm-gsiftp startup script seen to stop and then report daemon already running); use the new gridftp api to remove hack for gridftp redirection.
*** Davix 0.6.6 - bug fixes http://dmc.web.cern.ch/release/davix-060
*** dmlite 0.8.6 - bug fix release bringing small changes http://lcgdm.web.cern.ch/dmlite-086-released-epel
*** gfal2-utils 1.5.0 - bug fixes/improvements http://dmc.web.cern.ch/release/gfal2-util-1.5.0
*** CERN Frontier 3.5.25 - see release notes http://frontier.cern.ch/dist/rpms-debug/frontier-squidRELEASE_NOTES
*** ARC nagios-probes 1.9.1 - bug fixes, see release notes http://www.nordugrid.org/arc/releases/15.03u14/release_notes_15.03u14.html
*** QCG Broker 4.2.0 - first release in UMD4/C7; added support for Array Jobs; instant information about resources for the new qcg-resources command; bug fixes
*** gfal2-python 1.9.2 - see release notes http://dmc.web.cern.ch/release/gfal2-python-1.9.2
*** XROOTD 4.6.1 - several new features and bug fixes http://xrootd.org/download/ReleaseNotes.html
*** dCache 3.0.25 - see release notes https://www.dcache.org/downloads/1.9/release-notes-3.0.shtml#25
*** ARGUS 1.7.3 - this release introduces support for different X.509 CA Authentication profiles via the new Authentication Profile Policy Information Point (PIP); see release notes http://argus-documentation.readthedocs.io/en/stable/release_notes/v_1_7_1.html (1.7.3 refers to UMD version numbering, so reference to 1.7.1 release notes is correct)
*** lcgdm 0.18.2 - bug fixes
*** gfal2 2.13.4 - Default checksum for local copy was removed
* UMD 3.14.10 released
** yaim-core.sl6.x86_64-5.1.4 updated


== Preview repository  ==
== Preview repository  ==
Released on 2017-07-07:
Released on 2017-07-07:
* '''[[Preview 1.13.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/1.0/1.13.0/ AppDB info] (sl6): ARC 15.03 u15, dCache 2.16.40, frontier-squid 3.5.24-3.1, LCGdm-dav 0.18.2, QCG Broker 4.2.0
* '''[[Preview 1.13.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/1.0/1.13.0/ AppDB info] (sl6): ARC 15.03 u15, dCache 2.16.40, frontier-squid 3.5.24-3.1, LCGdm-dav 0.18.2, QCG Broker 4.2.0
Line 32: Line 72:


== Testing FedCloud sites  ==
== Testing FedCloud sites  ==
Credits to Baptiste Grenier (EGI Operations). Using fedcloud.egi.eu, <span style="font-size:13px;color:#1155cc;font-weight:400;text-decoration:underline;font-family:'Arial';font-style:normal;">https://appdb.egi.eu/store/vappliance/egi.centos.6</span>, and <span style="font-size:13px;color:#1155cc;font-weight:400;text-decoration:underline;font-family:'Arial';font-style:normal;">https://github.com/EGI-Foundation/sscmon-occi</span> to execute the tests.<br>
{| width="1208" cellspacing="1" cellpadding="1" border="1"
|-
! scope="col" | Site
! scope="col" | Status
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">BEgrid-BELNET</span>
| OK
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">CESGA</span>
| OK
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">CESNET-MetaCloud</span>
| OK
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">IISAS-FedCloud</span>
| OK
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">IN2P3-IRES</span>
| OK
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">INFN-CATANIA-STACK</span>
| OK
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">INFN-PADOVA-STACK</span>
| OK
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">RECAS-BARI</span>
| OK
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">TR-FC1-ULAKBIM</span>
| OK
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">BIFI</span>
| errors about floating IP pool
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">CLOUDIFIN</span>
| no default network, some VAs not synced
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">CYFRONET-CLOUD</span>
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">Closed ports on public IP. Using old version of OCCI-OS and OpenStack Juno, site upgrade in progress.
</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">HG-09-Okeanos-Cloud</span>
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">cloudkeeper was installed, missing appliance. Site BDII updated but almost empty, hence very difficult to use.
</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">FZJ</span>
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">Server unavailable (OCCI endpoint), upgrade of OS to mitaka and OOI ongoing with troubles, openstack image list fails (but openstack flavor list succeeds). Working from time to time, unstable. Downtime published in GOCDB. Waiting for site admin to confirm that upgrade is over and troubles were fixed.
</span>
|-
| 100IT
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">No default network, need to link the net1 network on VM creation</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">GoeGrid</span>
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">On hold, reinstalling with ONE5 to use cloudkeeper with no downtime in GocDB, 9 GGUS tickets open.
</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">IFCA-LCG2</span>
| Cannot list networks.
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">SCAI</span>
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">No more works manually and not with scripts as there is no default network and endpoint is Critical in ARGO, moving to cloudkeeper-OS</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">UPV-GRyCAP</span>
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">Moved to cloudkeeper and to cloud-info-provider 0.8.3. Able to create VM manually, but it is not possible to link the public network, Carlos is working on it</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">IISAS-Nebula</span>
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">site does not support fedcloud.egi.eu</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">IISAS-GPUCloud</span>
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">GP-GPU-specific site, does not support fedcloud.egi.eu</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">NCG-INGRID-PT</span>
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">keystone v3 with OpenID Connect (experimental).
</span>
|-
| <br>
| <br>
|}


== Feedback from Helpdesk  ==
== Feedback from Helpdesk  ==
== yearly review of the information registered into GOC-DB  ==
'''2017-04-07'''
On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:
#'''NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:'''
#*E-Mail
#*ROD E-Mail
#*Security E-Mail
:NGI Managers should also review the status of the "not certified" RCs, in according to the [https://wiki.egi.eu/wiki/PROC09#Resource_Center_status_Workflow RC Status Workflow];
#'''RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:'''
#*E-Mail
#*telephone numbers
#*CSIRT E-Mail
:RC administrators should also review the information related to the registered service endpoints.
'''The process should be completed by Apr 28th.'''
To track the process, a [https://wiki.egi.eu/wiki/Verify_Configuration_Records series of tickets] have been opened.
'''2017-07-13 UPDATE''':
*AfricaArabia, NGI_IT, NGI_NL still checking;
*no feedback yet by: NGI_DE;
*status of NGI_IL Operations centre is uncertain: we are verifying it


== Monthly Availability/Reliability ==
== Monthly Availability/Reliability ==
Line 170: Line 79:
*Underperformed sites in the past A/R reports with issues not yet fixed:
*Underperformed sites in the past A/R reports with issues not yet fixed:
** '''AsiaPacific'''
** '''AsiaPacific'''
*** TW-NCUHEP: site-bdii unstable for network issues with ARGO, issues solved, figures are improving https://ggus.eu/index.php?mode=ticket_info&ticket_id=128083
*** TW-NCUHEP: still undeperforming for frequent failures https://ggus.eu/index.php?mode=ticket_info&ticket_id=128083
***KR-UOS-SSCC: there were srm problems, now also CREAM failures, proposed the suspension https://ggus.eu/index.php?mode=ticket_info&ticket_id=127024
***KR-UOS-SSCC: there were srm problems, now also CREAM failures, proposed the suspension https://ggus.eu/index.php?mode=ticket_info&ticket_id=127024
**'''NGI_IL''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=128886 QoS violation: we are verifying the status of the Operations Centre
**ROC_Canada: https://ggus.eu/index.php?mode=ticket_info&ticket_id=128097
**'''NGI_PL''' (IFJ-PAN-BG) https://ggus.eu/index.php?mode=ticket_info&ticket_id=128889 perhaps the site will be decommissioned, no manpower.
***CA-MCGILL-CLUMEQ-T2: still some failures
**'''ROC_Canada''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=128097
**NGI_BG (BG01-IPP) https://ggus.eu/index.php?mode=ticket_info&ticket_id=129370 : suggested to mark the SE as not production
***CA-MCGILL-CLUMEQ-T2 the figures are improving, but still some failures
**NGI_IT https://ggus.eu/index.php?mode=ticket_info&ticket_id=129381
***HEPHY-UIBK: recovered
***INFN-ROMA1-CMS: still underperforming, but the bug in the nagios probes for the CREAM (ticket GGUS 128151) is then disappeared,  


*Underperformed sites after 3 consecutive months, underperformed NGIs, QoS violations:
*Underperformed sites after 3 consecutive months, underperformed NGIs, QoS violations:
**'''AfricaArabia''' (ZA-MERAKA, ZA-UJ): https://ggus.eu/index.php?mode=ticket_info&ticket_id=129364
**ROC_CERN https://ggus.eu/index.php?mode=ticket_info&ticket_id=129957 QoS violation
**'''AsiaPacific''' (Taiwan-LCG2): https://ggus.eu/index.php?mode=ticket_info&ticket_id=129367
**NGI_AEGIS https://ggus.eu/index.php?mode=ticket_info&ticket_id=129959
**'''ROC_CERN''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=129368 (QoS) (SOLVED)
**NGI_CH https://ggus.eu/index.php?mode=ticket_info&ticket_id=129960
**'''NGI_AEGIS''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=129369 (SOLVED)
***T3_CH_PSI
**'''NGI_BG''' (BG01-IPP) https://ggus.eu/index.php?mode=ticket_info&ticket_id=129370
**NGI_DE https://ggus.eu/index.php?mode=ticket_info&ticket_id=129961
**'''NGI_CH''' https://ggus.eu/index.php?mode=ticket_info&ticket_id=129373 (QoS)
***FZK-LCG2
**'''NGI_CZ''' (prague_cesnet_lcg2) https://ggus.eu/index.php?mode=ticket_info&ticket_id=129372
**NGI_GRNET https://ggus.eu/index.php?mode=ticket_info&ticket_id=129962
**'''NGI_FRANCE''' https://ggus.eu/index.php?mode=ticket_info&ticket_id=129375 (QoS)
**NGI_UA https://ggus.eu/index.php?mode=ticket_info&ticket_id=129963
**'''NGI_IBERGRID''' https://ggus.eu/index.php?mode=ticket_info&ticket_id=129376 (QoS)
***UA_IFBG
**'''NGI_IT''' https://ggus.eu/index.php?mode=ticket_info&ticket_id=129381
**NGI_UK https://ggus.eu/index.php?mode=ticket_info&ticket_id=129964 QoS violation (SOLVED)
***HEPHY-UIBK: problem with Expired certificates and unresponsive CA. Now A/R figures are increasing
***INFN-ROMA1-CMS: bug in the nagios probes for the CREAM, ticket GGUS 128151
**'''NGI_PL''' https://ggus.eu/index.php?mode=ticket_info&ticket_id=129382 (QoS)
**'''NGI_UA''' https://ggus.eu/index.php?mode=ticket_info&ticket_id=129468 (QoS)
**'''NGI_UK''' (UKI-SOUTHGRID-SUSX) https://ggus.eu/index.php?mode=ticket_info&ticket_id=129383 there wasn't a reserved job slot for the ops VO


'''suspended sites: ZA-UCT-ICTS, MY-USM-GCL, UA-NSCMBR'''
'''suspended sites''': IFJ-PAN-BG, ZA-MERAKA, ZA-UJ


== Decommissioning EMI WMS  ==
== Decommissioning EMI WMS  ==
Line 223: Line 129:


*'''WMS will be removed from production starting from 1st January 2018'''.  
*'''WMS will be removed from production starting from 1st January 2018'''.  
**VOs have '''5 months''' to find alternatives or migrate to DIRAC  
**VOs have '''4 months''' to find alternatives or migrate to DIRAC  
*Considering that this is not an update, the decommission can be performed in few weeks.
*Considering that this is not an update, the decommission can be performed in few weeks.
'''2017-08-21 UPDATE''': eli-beams.eu is interested in testing DIRAC; the process for enabling the VO on te DIRAC4EGI server has started.


== IPv6 readiness plans  ==
== IPv6 readiness plans  ==
Line 237: Line 145:
*please upgrade to 2.16, whose support ends on May 2018, or to 3.0  
*please upgrade to 2.16, whose support ends on May 2018, or to 3.0  
**take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10-&gt;2.13 and 2.13-&gt;2.16 transitions are supported.  
**take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10-&gt;2.13 and 2.13-&gt;2.16 transitions are supported.  
*decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.13 instances and follow up with the NGIs/sites '''at the beginning of August'''
*'''decommissioning campaign started by EGI Operations''' http://go.egi.eu/decommdcache213


== webdav probes in production  ==
== webdav probes in production  ==
Line 300: Line 208:
*'''NGI_IT''': IGI-BOLOGNA, INFN-GENOVA, INFN-MILANO-ATLASC, INFN-ROMA3, INFN-T1
*'''NGI_IT''': IGI-BOLOGNA, INFN-GENOVA, INFN-MILANO-ATLASC, INFN-ROMA3, INFN-T1
*'''NGI_PL''': CYFRONET-LCG2, WUT
*'''NGI_PL''': CYFRONET-LCG2, WUT
*'''NGI_RO''': NIHAM
*'''NGI_UK''': UKI-NORTHGRID-LIV-HEP, UKI-NORTHGRID-MAN-HEP
*'''NGI_UK''': UKI-NORTHGRID-LIV-HEP, UKI-NORTHGRID-MAN-HEP
*'''ROC_CANADA''': CA-MCGILL-CLUMEQ-T2
*'''ROC_CANADA''': CA-MCGILL-CLUMEQ-T2
Line 327: Line 234:
[[Storage accounting testing|List of sites]] available for test.  
[[Storage accounting testing|List of sites]] available for test.  


'''2017-07-14 UPDATE''' (more details in the [https://indico.egi.eu/indico/event/3238/ June OMB presentation]):  
'''2017-07-27 UPDATE''' (more details in the [https://indico.egi.eu/indico/event/3239/ July OMB presentation]):  


*31 sites are sending storage accounting data (only from dCache and DPM SEs); The data validation is on-going.
*23 sites have verified their numbers and 3 in progress
*It was created a new service type on GOC-DB, ''eu.egi.storage.accounting'', which will be used for:
*for the deployment in production we need to:
** authorising the site/SE to publish the accounting data
**Get sites to add new GOCDB service type
** making the site/SE appear in the portal
**Change broker queue name and get sites to swap
** monitoring that the accounting data are regularly published
**Update documentation
**Add storage system scripts to UMD
**Migrate storage view to new development Portal
*by September we should be ready for a wide roll-out of storage accounting
*by September we should be ready for a wide roll-out of storage accounting
**detailed instructions for the sites will be circulated
**detailed instructions for the sites will be circulated
Line 341: Line 250:
== Next meeting  ==
== Next meeting  ==


*'''Aug 7th, 2017''' https://indico.egi.eu/indico/event/3351/  
*'''Sept 11th, 2017''' https://indico.egi.eu/indico/event/3352/
*do we move to '''Aug 21th, 2017? '''(previuos meeting is today, far enough)
*switching to '''GoToMeeting '''from next meeting on (cannot make it for today due to technical issues with the plugin)

Latest revision as of 15:25, 25 October 2017

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


General information

Middleware

UMD/CMD

Preview repository

Released on 2017-07-07:

  • Preview 1.13.0 AppDB info (sl6): ARC 15.03 u15, dCache 2.16.40, frontier-squid 3.5.24-3.1, LCGdm-dav 0.18.2, QCG Broker 4.2.0
  • Preview 2.13.0 AppDB info (CentOS 7): ARC 15.03 u15, ARGUS 1.7.1, CREAM 1.16.5, dCache 3.1.9 & SRM client 3.0.11, frontier-squid 3.5.24-3.1, LCGdm-dav 0.18.2, QCG Broker 4.2.0

Operations

ARGO/SAM


Testing FedCloud sites

Feedback from Helpdesk

Monthly Availability/Reliability

suspended sites: IFJ-PAN-BG, ZA-MERAKA, ZA-UJ

Decommissioning EMI WMS

As discussed at the February and April/May OMBs, we are making plans for decommissioning the WMS and moving to DIRAC.

NGIs provided WMS usage statistics, and in general the usage is relatively low, mainly for local testing

Moderate usage by few VOs:

  • NGI_CZ: eli-beams.eu
  • NGI_GRNET: see
  • NGI_IT: calet.org, compchem, theophys, virgo
  • NGI_PL: gaussian, vo.plgrid.pl, vo.nedm.cyfronet
  • NGI_UK: mice, t2k.org

EGI contacted these VOs to agree a smooth migration of their activities to DIRAC, only some of them replied till now:

  • compchem is already testing DIRAC
  • calet.org: discussing with the users the migration to DIRAC. Interested in a webinar on DIRAC.
  • mice: enabled on the GridPP DIRAC server

We need the VO feedback for better defining technical details and timeline:

  • NGIs with VOs using WMS (not necessarily limited to the VOs above), please contact them to ensure that these VOs have a back-up plan.

WMS servers can be decommissioned as soon as the supported VOs do not need them any more. The proposal is:

  • WMS will be removed from production starting from 1st January 2018.
    • VOs have 4 months to find alternatives or migrate to DIRAC
  • Considering that this is not an update, the decommission can be performed in few weeks.

2017-08-21 UPDATE: eli-beams.eu is interested in testing DIRAC; the process for enabling the VO on te DIRAC4EGI server has started.

IPv6 readiness plans

    • Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
      • NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan

Decommissioning of dCache 2.10 and 2.13

  • support for the dCache 2.10 ended at December 2016, tickets opened by EGI Operations to track decommissioning
  • dCache 2.13 decommissioning procedure started, in June the probes will get CRITICAL, support from dCache ends in July, upgrades to be performed by August
  • please upgrade to 2.16, whose support ends on May 2018, or to 3.0
    • take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
  • decommissioning campaign started by EGI Operations http://go.egi.eu/decommdcache213

webdav probes in production

The webdav probes have been deployed in production. Some sites were already contacted for enabling the monitoring of their webdav endpoints:

Site Host GGUSID note
CYFRONET-LCG2 se01.grid.cyfronet.pl https://ggus.eu/index.php?mode=ticket_info&ticket_id=128325 SOLVED
GRIF node12.datagrid.cea.fr https://ggus.eu/index.php?mode=ticket_info&ticket_id=128329
IGI-BOLOGNA darkstorm.cnaf.infn.it https://ggus.eu/index.php?mode=ticket_info&ticket_id=127930 SOLVED
INFN-T1 removed https://ggus.eu/index.php?mode=ticket_info&ticket_id=128326 SOLVED
NCG-INGRID-PT gftp01.ncg.ingrid.pt https://ggus.eu/index.php?mode=ticket_info&ticket_id=128327 SOLVED
UKI-NORTHGRID-LIV-HEP hepgrid11.ph.liv.ac.uk https://ggus.eu/index.php?mode=ticket_info&ticket_id=128328 SOLVED
egee.irb.hr lorienmaster.irb.hr

link to nagios results: https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_webdav&style=detail

Several sites are publishing in the BDII the webdav endpoints:

  • AsiaPacific: JP-KEK-CRC-02
  • NGI_AEGIS: AEGIS01-IPB-SCL
  • NGI_CH: UNIGE-DPNC, UNIBE-LHEP
  • NGI_DE: UNI-SIEGEN-HEP
  • NGI_GRNET: GR-01-AUTH, HG-03-AUTH
  • NGI_HR: egee.irb.hr, egee.srce.hr
  • NGI_IBERGRID: CETA-GRID, NCG-INGRID-PT
  • NGI_FRANCE: GRIF-IPNO, GRIF-LAL, GRIF-LPNHE
  • NGI_IL: IL-TAU-HEP, TECHNION-HEP, WEIZMANN-LCG2
  • NGI_IT: IGI-BOLOGNA, INFN-GENOVA, INFN-MILANO-ATLASC, INFN-ROMA3, INFN-T1
  • NGI_PL: CYFRONET-LCG2, WUT
  • NGI_UK: UKI-NORTHGRID-LIV-HEP, UKI-NORTHGRID-MAN-HEP
  • ROC_CANADA: CA-MCGILL-CLUMEQ-T2

Checked with:

$ ldapsearch -x -LLL -H ldap://egee-bdii.cnaf.infn.it:2170 -b "GLUE2GroupID=grid,o=glue" '(&(objectClass=GLUE2Endpoint)(GLUE2EndpointInterfaceName=webdav))' GLUE2EndpointImplementationName GLUE2EndpointURL

ACTIONS for NGIs and sites: The Operations Centres are asked to verify with their sites if the webdav protocol is really (intentional) enabled on their storage elements (if not, the information should be removed from the BDII), and report to EGI Operations

  • The webdav service endpoint should be registered in GOC-DB for being properly monitored: the nagios probes are executed using the VO ops, so please ensure that the protocol is enabled for ops VO as well
  • the webdav probes are harmless: they are not in any critical profile, they don't raise any alarm in the operations dashboard, and the A/R figures are not affected. We need time and more sites for gathering statistics on their results before making them critical.


For registering on GOC-DB the webdav service endpoint, follow the HOWTO21 in order to filling in the proper information. In particular:

Testing of the storage accounting

As discussed during the January OMB, the APEL team would need one site per NGI for testing the storage accounting. The eligible sites are the ones providing either dCache or DPM storage elements.

More information can be found in the following wiki: https://wiki.egi.eu/wiki/APEL/Storage

List of sites available for test.

2017-07-27 UPDATE (more details in the July OMB presentation):

  • 23 sites have verified their numbers and 3 in progress
  • for the deployment in production we need to:
    • Get sites to add new GOCDB service type
    • Change broker queue name and get sites to swap
    • Update documentation
    • Add storage system scripts to UMD
    • Migrate storage view to new development Portal
  • by September we should be ready for a wide roll-out of storage accounting
    • detailed instructions for the sites will be circulated

AOB

Next meeting