Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-10-04-2017"

From EGIWiki
Jump to navigation Jump to search
 
(21 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{TOC right}}  
{{Template:Op menubar}} {{Template:Doc_menubar}} {{TOC_right}}
[[Category:Grid Operations Meetings]]


= General information =
= General information =
Line 5: Line 6:
= Middleware  =
= Middleware  =


== CMD (to update) ==
== CMD ==
* still working on CMD-OS updates
* still working on CMD-OS updates
* CMD-ONE first major to be released for OpenNebula 5
* CMD-ONE first major to be released for OpenNebula 5
** CESGA will update to OpenNebula 5 and test in particular the new cloudkeeper (former vmcatcher)
** CESGA will update to OpenNebula 5 and test in particular the new cloudkeeper (former vmcatcher)


== UMD (to update) ==  
== UMD ==  
UMD 4.4.0 almost ready
* UMD 4.4.2 (April 4th) http://repository.egi.eu/2017/04/04/release-umd-4-4-2/
* CentOS7
* UMD 4.4.1 (March 24th) http://repository.egi.eu/2017/03/24/release-umd-4-4-1/
Davix 0.6.4
* UMD 4.4.0 (March 23rd) http://repository.egi.eu/2017/03/23/release-umd-4-4-0/
GFAL 2.12.2
GFAL Utils 1.4.0
CGSI gSOAP 1.3.10
gfalFS 1.5.1
srm-ifce 1.24.1
FTS3 3.5.7
GRAM5 13.16.0
yaim core 5.1.4
GridFTP 11.8.1
MyProxy 6.1.25
globus-default-security 6.4.0
dCache SRM client 3.09.1
ARC 15.03.12
canL 2.2.8


* SL6
VOMS Admin server 3.5.1
GFAL 2.12.2
GFAL Utils 1.4.0
CGSI gSOAP 1.3.10
gfalFS 1.5.1
FTS3 3.5.7
GRAM5 13.16.0
yaim core 5.1.4
Davix 0.6.4
GridFTP 11.8.1
MyProxy 6.1.25
globus-default-security 6.4.0
CGSI gSOAP 1.3.10
ARC 15.03.12
canL 2.2.8


* pending: XrootD 4.6.0, fix for DPM 1.9.0 C7, Frontier
* pending: XrootD 4.6.0
* UMD 4.5 (May) will contain WN/UI for C7
* UMD 4.5 (May/June) will contain WN/UI for C7


== Preview repository ==
== Preview repository ==
Line 86: Line 57:
We asked the NGIs ([https://ggus.eu/?mode=ticket_info&ticket_id=126787 GGUS 126787]) to provide statistics about the WMS usage in order to understand how much it is used and which VOs would be affected (and potentially interested) by this transition.
We asked the NGIs ([https://ggus.eu/?mode=ticket_info&ticket_id=126787 GGUS 126787]) to provide statistics about the WMS usage in order to understand how much it is used and which VOs would be affected (and potentially interested) by this transition.
Several NGIs have already provided some data, we are preparing a VOs list to contact.
Several NGIs have already provided some data, we are preparing a VOs list to contact.
'''2017-04-10 UPDATE''':
VOs that we can try to contact:
* NGI_CZ: eli-beams.eu
* NGI_GRNET: see
* NGI_IT: calet.org, compchem, theophys, virgo   
* NGI_PL: gaussian, vo.plgrid.pl, vo.nedm.cyfronet
* NGI_UK: mice, t2k.org


== Feedback from Helpdesk ==
== Feedback from Helpdesk ==
Line 93: Line 72:
*** '''NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan'''
*** '''NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan'''


== Decommissioning of dCache 2.10 (to update)==  
== Decommissioning of dCache 2.10 and 2.13 ==  


* support for the dCache 2.10 ended at December 2016
* support for the '''dCache 2.10''' ended at December 2016
* according to EGI policies, dCache 2.10 must be decommissioned https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software
* according to EGI policies, dCache 2.10 must be decommissioned https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software
* broadcast sent on Feb2 https://operations-portal.egi.eu/broadcast/archive/1631 + email sent on Feb7 to noc-managers mailing list
* broadcast sent on Feb2 https://operations-portal.egi.eu/broadcast/archive/1631 + email sent on Feb7 to noc-managers mailing list
Line 102: Line 81:
** 2.16, whose support ends on May 2018: here take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
** 2.16, whose support ends on May 2018: here take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
* decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.10 instances and follow up with the NGIs/sites; '''tickets will be opened this week'''
* decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.10 instances and follow up with the NGIs/sites; '''tickets will be opened this week'''
* deadline is '''end of April''', having all the sites more than 2 months to plan and perform the upgrade
* deadline is '''end of April'''
* probe will be WARNING for two months until April 17th, when it will switch to CRITICAL
* probe will be WARNING for two months until April 17th, when it will switch to CRITICAL
* in May EGI Operations will open tickets against sites still publishing dCache 2.10 and follow up on the upgrade plans
* in May EGI Operations will open tickets against sites still publishing dCache 2.10 and follow up on the upgrade plans
* reference: https://www.dcache.org/downloads/1.9/index.shtml
* reference: https://www.dcache.org/downloads/1.9/index.shtml
* '''STATUS:'''  8 instances still publishing 2.10
* support for the '''dCache 2.13''' will end on July 2017
* date of starting the campaign: May 1st (-3m)
* date of ending the campaign: Aug 31st (+1m)
* to be announced at OMB and in the April EGI Monthly Broadcast


== Testing the new webdav probes (to update) ==
== Testing the new webdav probes ==




Line 124: Line 110:
|
|
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126777
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126777
|  
| Disabled
|-
|-
| GRIF
| GRIF
Line 157: Line 143:
|}
|}


== Testing of the storage accounting (to update)==
'''UPDATE 2017-04-10''': this week the probes should be deployed on the ARGO test instance
 
== Testing of the storage accounting ==


As discussed during the [https://indico.egi.eu/indico/event/3233/ January OMB], the APEL team would need one site per NGI for testing the storage accounting.
As discussed during the [https://indico.egi.eu/indico/event/3233/ January OMB], the APEL team would need one site per NGI for testing the storage accounting.
Line 165: Line 153:


[[Storage accounting testing| List of sites]] available for test.
[[Storage accounting testing| List of sites]] available for test.
'''2017-04-10 UPDATE''':
* 25 sites are sending storage accounting data (only from dCache and DPM SEs). The data has to be verified before deploying the script in production.
* After the discussion at the last [https://indico.egi.eu/indico/event/3235/ OMB], we are evaluating the creation of a new service type for monitoring the publication of storage accounting data.
: Currently the accounting service types are:
# glite-APEL: for [https://wiki.egi.eu/wiki/APEL/UsingAuth authorizing] the sending of the messages
# APEL: to [https://wiki.egi.eu/wiki/APEL/Tests monitor] the accounting data publication


== Monthly Availability/Reliability ==
== Monthly Availability/Reliability ==


*Underperformed sites in the past A/R reports with issues not yet fixed:
* Underperformed sites in the past A/R reports with issues not yet fixed:
** '''AsiaPacific''' [https://ggus.eu/index.php?mode=ticket_info&ticket_id=125427 GGUS 125427]
** '''AsiaPacific''' [https://ggus.eu/index.php?mode=ticket_info&ticket_id=125427 GGUS 125427]
*** TW-NCUHEP: site-bdii unstable
*** TW-NCUHEP: site-bdii unstable
***KR-UOS-SSCC: there were srm problems https://ggus.eu/index.php?mode=ticket_info&ticket_id=127024
*** KR-UOS-SSCC: there were srm problems https://ggus.eu/index.php?mode=ticket_info&ticket_id=127024
**NGI_AEGIS https://ggus.eu/index.php?mode=ticket_info&ticket_id=127025
** NGI_AEGIS https://ggus.eu/index.php?mode=ticket_info&ticket_id=127025
***AEGIS11-MISANU: low A/R figures due to a bug in the emi.cream.CREAMCE-JobCancel probe, asked a recomputation
*** AEGIS11-MISANU: low A/R figures due to a bug in the emi.cream.CREAMCE-JobCancel probe, asked a recomputation
** '''NGI_DE''' [https://ggus.eu/index.php?mode=ticket_info&ticket_id=125430 GGUS 125430]
** '''NGI_DE''' [https://ggus.eu/index.php?mode=ticket_info&ticket_id=125430 GGUS 125430]
***UNI-SIEGEN-HEP: waiting for the fix for CREAM probe.
*** UNI-SIEGEN-HEP: waiting for the fix for CREAM probe.
***wuppertalprod:  https://ggus.eu/index.php?mode=ticket_info&ticket_id=127026 issues with some ARC-CE passive probes that are not up-to-date, it could affect many sites
*** wuppertalprod:  https://ggus.eu/index.php?mode=ticket_info&ticket_id=127026 issues with some ARC-CE passive probes that are not up-to-date, it could affect many sites
**'''NGI_NL''': [https://ggus.eu/?mode=ticket_info&ticket_id=123532 GGUS 123532]
** '''NGI_NL''': [https://ggus.eu/?mode=ticket_info&ticket_id=123532 GGUS 123532]
***BelGrid-UCL: UNKNOWN status returned by CREAM probes, waiting for the fix for CREAM probe.
*** BelGrid-UCL: UNKNOWN status returned by CREAM probes, waiting for the fix for CREAM probe.
**NGI_UA: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=125839 GGUS 125839]
** NGI_UA: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=125839 GGUS 125839]
***UA-NSCMBR: bug in the ARC-CE probes
*** UA-NSCMBR: bug in the ARC-CE probes


*Underperformed sites after 3 consecutive months and underperformed NGIs:
* Underperformed sites after 3 consecutive months and underperformed NGIs:
** AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=127502 ZA-UCT-ICTS haven't updated the CAs version yet.
** NGI_FI: https://ggus.eu/index.php?mode=ticket_info&ticket_id=127505 ARC-CE nagios probes bug


**AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=127502 ZA-UCT-ICTS haven't updated the CAs version yet.
== Monitoring of the UNCERTIFIED sites ==
**NGI_FI: https://ggus.eu/index.php?mode=ticket_info&ticket_id=127505 ARC-CE nagios probes bug
 
== Monitoring of the UNCERTIFIED sites (to update)==


Information about the proposal for using GOCDB as the only source of topology information for ARGO:
Information about the proposal for using GOCDB as the only source of topology information for ARGO:
Line 217: Line 211:
Next version will be focused on these computations to be able to provide better figures.
Next version will be focused on these computations to be able to provide better figures.
* Please have a look at the information displayed and report us any inconsistency you should spot.
* Please have a look at the information displayed and report us any inconsistency you should spot.
== Proposal to modify the declaration of scheduled interventions ==
Currently (see [[MAN02 Service intervention management]]) scheduled interventions (of any duration) MUST be declared at least 24 hours in advance, specifying reason and duration; any intervention declared less than 24 hours in advance will be considered unscheduled.
WLCG proposed the [https://indico.cern.ch/event/607744/contributions/2449767/subcontributions/218703/attachments/1402467/2141097/LongDowntimes-170126.pdf following modification]:
* a scheduled intervention shorter than 5 days must be declared at least 24 hours in advance
* a scheduled intervention longer than 5 days must be declared at least 1 month in advance
* any other intervention that don't fulfill the rules above will be considered unscheduled
We are going to take a decision by the next OMB.


= AOB  =
= AOB  =
Line 222: Line 227:
== Next meeting ==
== Next meeting ==


* '''Apr 10th, 2017''' https://indico.egi.eu/indico/event/3142/
* '''May 8th, 2017''' https://indico.egi.eu/indico/event/3143/
* '''new calendar available until June 2017''' https://indico.egi.eu/indico/category/32/
* '''new calendar available until June 2017''' https://indico.egi.eu/indico/category/32/

Latest revision as of 15:26, 25 October 2017

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


General information

Middleware

CMD

  • still working on CMD-OS updates
  • CMD-ONE first major to be released for OpenNebula 5
    • CESGA will update to OpenNebula 5 and test in particular the new cloudkeeper (former vmcatcher)

UMD


  • pending: XrootD 4.6.0
  • UMD 4.5 (May/June) will contain WN/UI for C7

Preview repository

released on:

  • 2017-03-28
    • Preview 1.10.1 AppDB info (sl6): VOMS-admin 3.6.0 (emergency release that fixes several vulnerabilities concerning voms-admin)

Operations

yearly review of the information registered into GOC-DB

2017-07-04

On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:

  1. NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
    • E-Mail
    • ROD E-Mail
    • Security E-Mail
NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;
  1. RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
    • E-Mail
    • telephone numbers
    • CSIRT E-Mail
RC administrators should also review the information related to the registered service endpoints.

The process should be completed by Apr 28th.

To track the process, a series of tickets have been opened.

Decommissioning EMI WMS

As discussed at the last OMB, we are making plans for decommissioning the WMS and moving to DIRAC. We asked the NGIs (GGUS 126787) to provide statistics about the WMS usage in order to understand how much it is used and which VOs would be affected (and potentially interested) by this transition. Several NGIs have already provided some data, we are preparing a VOs list to contact.

2017-04-10 UPDATE: VOs that we can try to contact:

  • NGI_CZ: eli-beams.eu
  • NGI_GRNET: see
  • NGI_IT: calet.org, compchem, theophys, virgo
  • NGI_PL: gaussian, vo.plgrid.pl, vo.nedm.cyfronet
  • NGI_UK: mice, t2k.org

Feedback from Helpdesk

IPv6 readiness plans

    • Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
      • NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan

Decommissioning of dCache 2.10 and 2.13

  • support for the dCache 2.10 ended at December 2016
  • according to EGI policies, dCache 2.10 must be decommissioned https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software
  • broadcast sent on Feb2 https://operations-portal.egi.eu/broadcast/archive/1631 + email sent on Feb7 to noc-managers mailing list
  • sites to upgrade their 2.10 endpoints to a newer "golden release" of dCache
    • 2.13, whose support ends on July 2017, which means in about 7 months from now, or
    • 2.16, whose support ends on May 2018: here take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
  • decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.10 instances and follow up with the NGIs/sites; tickets will be opened this week
  • deadline is end of April
  • probe will be WARNING for two months until April 17th, when it will switch to CRITICAL
  • in May EGI Operations will open tickets against sites still publishing dCache 2.10 and follow up on the upgrade plans
  • reference: https://www.dcache.org/downloads/1.9/index.shtml
  • STATUS: 8 instances still publishing 2.10


  • support for the dCache 2.13 will end on July 2017
  • date of starting the campaign: May 1st (-3m)
  • date of ending the campaign: Aug 31st (+1m)
  • to be announced at OMB and in the April EGI Monthly Broadcast

Testing the new webdav probes

Site Host GGUSID note
CYFRONET-LCG2 se01.grid.cyfronet.pl https://ggus.eu/index.php?mode=ticket_info&ticket_id=126776 Registered
GR-01-AUTH https://ggus.eu/index.php?mode=ticket_info&ticket_id=126777 Disabled
GRIF https://ggus.eu/index.php?mode=ticket_info&ticket_id=126778
IGI-BOLOGNA darkstorm.cnaf.infn.it https://ggus.eu/index.php?mode=ticket_info&ticket_id=126779 Registered
INFN-T1 storm-fe-lhcb.cr.cnaf.infn.it, storm-fe.cr.cnaf.infn.it, storm-fe-archive.cr.cnaf.infn.it https://ggus.eu/index.php?mode=ticket_info&ticket_id=126780 Registered
NCG-INGRID-PT gftp01.ncg.ingrid.pt https://ggus.eu/index.php?mode=ticket_info&ticket_id=126781 Registered
UKI-NORTHGRID-LIV-HEP hepgrid11.ph.liv.ac.uk https://ggus.eu/index.php?mode=ticket_info&ticket_id=126782 Registered
egee.irb.hr lorienmaster.irb.hr https://ggus.eu/index.php?mode=ticket_info&ticket_id=126783 Registered

UPDATE 2017-04-10: this week the probes should be deployed on the ARGO test instance

Testing of the storage accounting

As discussed during the January OMB, the APEL team would need one site per NGI for testing the storage accounting. The eligible sites are the ones providing either dCache or DPM storage elements.

More information can be found in the following wiki: https://wiki.egi.eu/wiki/APEL/Storage

List of sites available for test.

2017-04-10 UPDATE:

  • 25 sites are sending storage accounting data (only from dCache and DPM SEs). The data has to be verified before deploying the script in production.
  • After the discussion at the last OMB, we are evaluating the creation of a new service type for monitoring the publication of storage accounting data.
Currently the accounting service types are:
  1. glite-APEL: for authorizing the sending of the messages
  2. APEL: to monitor the accounting data publication

Monthly Availability/Reliability

Monitoring of the UNCERTIFIED sites

Information about the proposal for using GOCDB as the only source of topology information for ARGO:

  • Timescale:
    • New GOC-DB release on Dec 7th including a boolean ‘monitored’ flag for the service endpoints: DONE
    • Then creation of a web UI view for uncertified sites in ARGO: DONE
    • Uncertified sites will be asked to fill in the service endpoints information. Follow the How to add URL service endpoint information into GOC-DB DONE
    • As information is added in the GOCDB, uncertified sites/services will be picked up by the ARGO Monitoring Engine and they will start to be monitored DONE
    • By Q2 2017: support for multiple service endpoints


Nagios server for the uncertified sites: https://argo-mon-uncert.cro-ngi.hr/nagios/

  • Configuration is regenerated every hour
  • uncertified sites report on the ARGO development instance
  • IMPORTANT: for being correctly monitored, the uncertified sites have to fill in the proper services information into GOC-DB: please follow the HOWTO21

PROC09 modified accordingly.

VAPOR

  • VAPOR 2.2 released on March 16th
  • important for presenting the amount of computing and storage resources of the infrastructure
  • There are several improvements and new features: the computation of values of CPU and storages have been deeply reviewed, nevertheless some values are still not in line with the reality.

Next version will be focused on these computations to be able to provide better figures.

  • Please have a look at the information displayed and report us any inconsistency you should spot.

Proposal to modify the declaration of scheduled interventions

Currently (see MAN02 Service intervention management) scheduled interventions (of any duration) MUST be declared at least 24 hours in advance, specifying reason and duration; any intervention declared less than 24 hours in advance will be considered unscheduled.

WLCG proposed the following modification:

  • a scheduled intervention shorter than 5 days must be declared at least 24 hours in advance
  • a scheduled intervention longer than 5 days must be declared at least 1 month in advance
  • any other intervention that don't fulfill the rules above will be considered unscheduled

We are going to take a decision by the next OMB.

AOB

Next meeting