Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-13-03-2017"

From EGIWiki
Jump to navigation Jump to search
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{TOC right}}  
{{Template:Op menubar}} {{Template:Doc_menubar}} {{TOC_right}}
[[Category:Grid Operations Meetings]]


= General information =
= General information =
Line 46: Line 47:
canL 2.2.8
canL 2.2.8


* pending: XrootD 4.6.0, fix for DPM 1.9.0 C7
* pending: XrootD 4.6.0, fix for DPM 1.9.0 C7, Frontier
* UMD 4.5 (May) will contain WN/UI for C7


== Preview repository ==
== Preview repository ==
released on 2017-02-15:
* '''[[Preview 1.9.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/1.0/1.9.0/ AppDB info] (sl6): ARC 15.03 update 12, APEL Client/Server 1.5.1-1, APEL SSM 2.1.7-1
* '''[[Preview 2.9.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/2.0/2.9.0/ AppDB info] (CentOS 7): ARC 15.03 update 12, APEL Client/Server 1.5.1-1, APEL SSM 2.1.7-1, emi-WN-4.0.1-1


= Operations =
= Operations =


== Decommissioning EMI WMS ==
== Decommissioning EMI WMS ==
* action needed from NGIs
As discussed at the [https://indico.egi.eu/indico/event/3234/ last OMB], we are making plans for decommissioning the WMS and moving to DIRAC.
We asked the NGIs ([https://ggus.eu/?mode=ticket_info&ticket_id=126787 GGUS 126787]) to provide statistics about the WMS usage in order to understand how much it is used and which VOs would be affected (and potentially interested) by this transition.
Several NGIs have already provided some data, we are preparing a VOs list to contact.


== Feedback from Helpdesk ==
== Feedback from Helpdesk ==
Line 100: Line 108:
|
|
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126777
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126777
|  
| Disabled
|-
|-
| GRIF
| GRIF
Line 118: Line 126:
|-
|-
| NCG-INGRID-PT
| NCG-INGRID-PT
|
| gftp01.ncg.ingrid.pt
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126781
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126781
|  
| Registered
|-
|-
| UKI-NORTHGRID-LIV-HEP
| UKI-NORTHGRID-LIV-HEP
Line 141: Line 149:


[[Storage accounting testing| List of sites]] available for test.
[[Storage accounting testing| List of sites]] available for test.
== Proposal to modify the declaration of scheduled interventions ==


== Monthly Availability/Reliability ==
== Monthly Availability/Reliability ==
Line 160: Line 166:
*Underperformed sites after 3 consecutive months and underperformed NGIs:
*Underperformed sites after 3 consecutive months and underperformed NGIs:
**'''AsiaPacific''': [https://ggus.eu/index.php?mode=ticket_info&ticket_id=127024 GGUS 127024]
**'''AsiaPacific''': [https://ggus.eu/index.php?mode=ticket_info&ticket_id=127024 GGUS 127024]
***KR-UOS-SSCC: there were srm problems
***KR-UOS-SSCC: there were srm problems, statistics are improving
**'''NGI_AEGIS''' [https://ggus.eu/index.php?mode=ticket_info&ticket_id=127025 GGUS 127025]
**'''NGI_AEGIS''' [https://ggus.eu/index.php?mode=ticket_info&ticket_id=127025 GGUS 127025]
***AEGIS11-MISANU: low A/R figures due to a bug in the emi.cream.CREAMCE-JobCancel probe, asked a recomputation
***AEGIS11-MISANU: low A/R figures due to a bug in the emi.cream.CREAMCE-JobCancel probe, asked a recomputation
Line 168: Line 174:
***egee.irb.hr: it was partially decommissioned in December and management was transferred to a new team; statistics are improving.
***egee.irb.hr: it was partially decommissioned in December and management was transferred to a new team; statistics are improving.
**'''NGI_PL''': [https://ggus.eu/index.php?mode=ticket_info&ticket_id=127028 GGUS 127028] (SOLVED)
**'''NGI_PL''': [https://ggus.eu/index.php?mode=ticket_info&ticket_id=127028 GGUS 127028] (SOLVED)
***WUT: UNKNOW status was solved by adding a WN to the ops queue
***WUT: UNKNOWN status was solved by adding a WN to the ops queue


== ARGO proposal to use GOCDB as the only source of topology information ==
== Monitoring of the UNCERTIFIED sites ==


Information about the proposal for using GOCDB as the only source of topology information for ARGO:
* [https://indico.egi.eu/indico/event/3006/material/slides/0.pdf slides in October Operations Meeting agenda]
* [https://indico.egi.eu/indico/event/3006/material/slides/0.pdf slides in October Operations Meeting agenda]
* [https://indico.egi.eu/indico/event/2810/contribution/3/material/0/ ARGO Proposal (September OMB)]
* [https://indico.egi.eu/indico/event/2810/contribution/3/material/0/ ARGO Proposal (September OMB)]
Line 177: Line 184:


*Timescale:
*Timescale:
**New GOC-DB release on Dec 7th including a boolean ‘monitored’ flag for the service endpoints
**New GOC-DB release on Dec 7th including a boolean ‘monitored’ flag for the service endpoints: '''DONE'''
**Then creation of a web UI view for uncertified sites in ARGO  
**Then creation of a web UI view for uncertified sites in ARGO: '''DONE'''
**Uncertified sites will be asked to fill in the service endpoints information. Follow the [https://wiki.egi.eu/wiki/HOWTO21 How to add URL service endpoint information into GOC-DB]
**Uncertified sites will be asked to fill in the service endpoints information. Follow the [https://wiki.egi.eu/wiki/HOWTO21 How to add URL service endpoint information into GOC-DB] '''IN PROGRESS'''
***('''OPTIONAL''') use the [https://gocdb-test.esc.rl.ac.uk/portal/index.php GOC-DB test instance] for testing the procedure
***('''OPTIONAL''') use the [https://gocdb-test.esc.rl.ac.uk/portal/index.php GOC-DB test instance] for testing the procedure
**As information is added in the GOCDB, uncertified sites/services will be picked up by the ARGO Monitoring Engine and they will start to be monitored
**As information is added in the GOCDB, uncertified sites/services will be picked up by the ARGO Monitoring Engine and they will start to be monitored '''DONE'''
** By Q2 2017: support for multiple service endpoints
** By Q2 2017: support for multiple service endpoints
'''Nagios server for the uncertified sites: https://argo-mon-uncert.cro-ngi.hr/nagios/'''
*Configuration is regenerated every hour
*[http://web-egi-devel.argo.grnet.gr/lavoisier/status_report-site?report=CriticalUncert&accept=html uncertified sites report] on the ARGO development instance
*'''IMPORTANT''': for being correctly monitored, the uncertified sites have to fill in the proper services information into GOC-DB: please follow the [[HOWTO21]]
[[PROC09]] modified accordingly.


== VAPOR ==
== VAPOR ==
Line 199: Line 214:
== Next meeting ==
== Next meeting ==


* EGI is working on bringing up a pilot EGI-branded dockerhub service
* '''Apr 10th, 2017''' https://indico.egi.eu/indico/event/3142/
* '''Apr 10th, 2017''' https://indico.egi.eu/indico/event/3142/
* '''new calendar available until June 2017''' https://indico.egi.eu/indico/category/32/
* '''new calendar available until June 2017''' https://indico.egi.eu/indico/category/32/

Latest revision as of 14:27, 25 October 2017

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


General information

Middleware

CMD

  • still working on CMD-OS updates
  • CMD-ONE first major to be released for OpenNebula 5
    • CESGA will update to OpenNebula 5 and test in particular the new cloudkeeper (former vmcatcher)

UMD

UMD 4.4.0 almost ready

  • CentOS7

Davix 0.6.4 GFAL 2.12.2 GFAL Utils 1.4.0 CGSI gSOAP 1.3.10 gfalFS 1.5.1 srm-ifce 1.24.1 FTS3 3.5.7 GRAM5 13.16.0 yaim core 5.1.4 GridFTP 11.8.1 MyProxy 6.1.25 globus-default-security 6.4.0 dCache SRM client 3.09.1 ARC 15.03.12 canL 2.2.8

  • SL6

VOMS Admin server 3.5.1 GFAL 2.12.2 GFAL Utils 1.4.0 CGSI gSOAP 1.3.10 gfalFS 1.5.1 FTS3 3.5.7 GRAM5 13.16.0 yaim core 5.1.4 Davix 0.6.4 GridFTP 11.8.1 MyProxy 6.1.25 globus-default-security 6.4.0 CGSI gSOAP 1.3.10 ARC 15.03.12 canL 2.2.8

  • pending: XrootD 4.6.0, fix for DPM 1.9.0 C7, Frontier
  • UMD 4.5 (May) will contain WN/UI for C7

Preview repository

released on 2017-02-15:

Operations

Decommissioning EMI WMS

As discussed at the last OMB, we are making plans for decommissioning the WMS and moving to DIRAC. We asked the NGIs (GGUS 126787) to provide statistics about the WMS usage in order to understand how much it is used and which VOs would be affected (and potentially interested) by this transition. Several NGIs have already provided some data, we are preparing a VOs list to contact.

Feedback from Helpdesk

IPv6 readiness plans

  • December OMB presentation: https://indico.egi.eu/indico/event/2815/ WLCG is going to deploy their services under dual-stack mode (v4+v6) by April 2018
  • EGI Operations started checking the core services against IPv6 compatibility
  • EGI Operations is going to assess the IPv6 readiness of the EGI infrastructure
  • early draft plan
    • Technology Providers: assess the IPv6 readiness of the middleware products
    • UMD team: perform verification of all services under IPv6 setup
    • EGI core services developers: assess the IPv6 readiness of the products
    • EGI core services hosts: assess the IPv6 readiness of the services
    • Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
      • NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan

Decommissioning of dCache 2.10

  • support for the dCache 2.10 ended at December 2016
  • according to EGI policies, dCache 2.10 must be decommissioned https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software
  • broadcast sent on Feb2 https://operations-portal.egi.eu/broadcast/archive/1631 + email sent on Feb7 to noc-managers mailing list
  • sites to upgrade their 2.10 endpoints to a newer "golden release" of dCache
    • 2.13, whose support ends on July 2017, which means in about 7 months from now, or
    • 2.16, whose support ends on May 2018: here take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
  • decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.10 instances and follow up with the NGIs/sites; tickets will be opened this week
  • deadline is end of April, having all the sites more than 2 months to plan and perform the upgrade
  • probe will be WARNING for two months until April 17th, when it will switch to CRITICAL
  • in May EGI Operations will open tickets against sites still publishing dCache 2.10 and follow up on the upgrade plans
  • reference: https://www.dcache.org/downloads/1.9/index.shtml

Testing the new webdav probes

Site Host GGUSID note
CYFRONET-LCG2 se01.grid.cyfronet.pl https://ggus.eu/index.php?mode=ticket_info&ticket_id=126776 Registered
GR-01-AUTH https://ggus.eu/index.php?mode=ticket_info&ticket_id=126777 Disabled
GRIF https://ggus.eu/index.php?mode=ticket_info&ticket_id=126778
IGI-BOLOGNA darkstorm.cnaf.infn.it https://ggus.eu/index.php?mode=ticket_info&ticket_id=126779 Registered
INFN-T1 storm-fe-lhcb.cr.cnaf.infn.it, storm-fe.cr.cnaf.infn.it, storm-fe-archive.cr.cnaf.infn.it https://ggus.eu/index.php?mode=ticket_info&ticket_id=126780 Registered
NCG-INGRID-PT gftp01.ncg.ingrid.pt https://ggus.eu/index.php?mode=ticket_info&ticket_id=126781 Registered
UKI-NORTHGRID-LIV-HEP hepgrid11.ph.liv.ac.uk https://ggus.eu/index.php?mode=ticket_info&ticket_id=126782 Registered
egee.irb.hr lorienmaster.irb.hr https://ggus.eu/index.php?mode=ticket_info&ticket_id=126783 Registered

Testing of the storage accounting

As discussed during the January OMB, the APEL team would need one site per NGI for testing the storage accounting. The eligible sites are the ones providing either dCache or DPM storage elements.

More information can be found in the following wiki: https://wiki.egi.eu/wiki/APEL/Storage

List of sites available for test.

Monthly Availability/Reliability

  • Underperformed sites in the past A/R reports with issues not yet fixed:
    • AsiaPacific GGUS 125427
      • TW-NCUHEP: site-bdii unstable
    • NGI_BG: GGUS 125826
      • BG05-SUGrid: failures in CAs and SRM probes, fixed some days ago
    • NGI_DE GGUS 125430
      • UNI-SIEGEN-HEP: waiting for the fix for CREAM probe.
    • NGI_NL: GGUS 123532
      • BelGrid-UCL: UNKNOWN status returned by CREAM probes, waiting for the fix for CREAM probe.
    • NGI_UA: GGUS 125839
      • UA-NSCMBR: ARC-CE failures, no progress yet
  • Underperformed sites after 3 consecutive months and underperformed NGIs:
    • AsiaPacific: GGUS 127024
      • KR-UOS-SSCC: there were srm problems, statistics are improving
    • NGI_AEGIS GGUS 127025
      • AEGIS11-MISANU: low A/R figures due to a bug in the emi.cream.CREAMCE-JobCancel probe, asked a recomputation
    • NGI_DE: GGUS 127026
      • wuppertalprod: issues with some ARC-CE passive probes that are not up-to-date, it could affect many sites
    • NGI_HR: GGUS 127027 (SOLVED)
      • egee.irb.hr: it was partially decommissioned in December and management was transferred to a new team; statistics are improving.
    • NGI_PL: GGUS 127028 (SOLVED)
      • WUT: UNKNOWN status was solved by adding a WN to the ops queue

Monitoring of the UNCERTIFIED sites

Information about the proposal for using GOCDB as the only source of topology information for ARGO:

  • Timescale:
    • New GOC-DB release on Dec 7th including a boolean ‘monitored’ flag for the service endpoints: DONE
    • Then creation of a web UI view for uncertified sites in ARGO: DONE
    • Uncertified sites will be asked to fill in the service endpoints information. Follow the How to add URL service endpoint information into GOC-DB IN PROGRESS
    • As information is added in the GOCDB, uncertified sites/services will be picked up by the ARGO Monitoring Engine and they will start to be monitored DONE
    • By Q2 2017: support for multiple service endpoints


Nagios server for the uncertified sites: https://argo-mon-uncert.cro-ngi.hr/nagios/

  • Configuration is regenerated every hour
  • uncertified sites report on the ARGO development instance
  • IMPORTANT: for being correctly monitored, the uncertified sites have to fill in the proper services information into GOC-DB: please follow the HOWTO21

PROC09 modified accordingly.

VAPOR

AOB

Next meeting