Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-15-05-2017"

From EGIWiki
Jump to navigation Jump to search
 
(29 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{TOC right}}  
{{Template:Op menubar}} {{Template:Doc_menubar}} {{TOC_right}}
[[Category:Grid Operations Meetings]]


= General information =
= General information =


= Middleware =
= Middleware =


== CMD (to modify)==
== CMD ==
* still working on CMD-OS updates
* CMD-ONE first major to be released for OpenNebula 5
** CESGA will update to OpenNebula 5 and test in particular the new cloudkeeper (former vmcatcher)


== UMD (to modify)==
*still working on CMD-OS updates
* UMD 4.4.2 (April 4th) http://repository.egi.eu/2017/04/04/release-umd-4-4-2/
*CMD-ONE first major to be released for OpenNebula 5
* UMD 4.4.1 (March 24th) http://repository.egi.eu/2017/03/24/release-umd-4-4-1/
**CESGA will update to OpenNebula 5 and test in particular the new cloudkeeper (former vmcatcher)
* UMD 4.4.0 (March 23rd) http://repository.egi.eu/2017/03/23/release-umd-4-4-0/


== UMD ==


* pending: XrootD 4.6.0
*UMD 4.5 (June)  
* UMD 4.5 (May/June) will contain WN/UI for C7
**WN/UI for C7
**CREAM for C7


== Preview repository ==
== Preview repository ==
released on:


* 2017-04-26
released on:  
** '''[[Preview 1.11.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/1.0/1.11.0/ AppDB info] (sl6): ARC 15.03 update 12, CernVM-FS 2.3.5, gfal2 2.13.3, gfal2-utils 1.5.0, srm-ifce 1.24.2, XRootD 4.6.0
** '''[[Preview 2.11.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/2.0/2.11.0/ AppDB info] (CentOS 7): ARC 15.03 update 12, CernVM-FS 2.3.5, dCache 3.0.12, emi-UI 4.0.2, gfal2 2.13.3, gfal2-utils 1.5.0, glite-ce-cream-client 1.15, glite-yaim-clients 5.2.1-1, srm-ifce 1.24.2, XRootD 4.6.0


= Operations =
*2017-04-26
**'''[[Preview 1.11.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/1.0/1.11.0/ AppDB info] (sl6): ARC 15.03 update 12, CernVM-FS 2.3.5, gfal2 2.13.3, gfal2-utils 1.5.0, srm-ifce 1.24.2, XRootD 4.6.0
**'''[[Preview 2.11.0]]''' [https://appdb.egi.eu/store/software/preview.repository/releases/2.0/2.11.0/ AppDB info] (CentOS 7): ARC 15.03 update 12, CernVM-FS 2.3.5, dCache 3.0.12, emi-UI 4.0.2, gfal2 2.13.3, gfal2-utils 1.5.0, glite-ce-cream-client 1.15, glite-yaim-clients 5.2.1-1, srm-ifce 1.24.2, XRootD 4.6.0


== Feedback from Helpdesk ==
= Operations  =


== yearly review of the information registered into GOC-DB ==
== Testing FedCloud sites  ==
'''2017-04-07'''


On a yearly basis, the information registered into GOC-DB need to be verified.
{| width="200" cellspacing="1" cellpadding="1" border="1"
NGIs and RCs have been asked to check them. In particular:
|-
| Resource Centre<br>
| STATUS<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">IFCA-LCG2</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">IN2P3-IRES</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">100IT</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">RECAS-BARI</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">CESNET-MetaCloud</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">FZJ</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">BEgrid-BELNET</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">INFN-CATANIA-STACK</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">TR-FC1-ULAKBIM</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">INFN-PADOVA-STACK</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">IISAS-FedCloud</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">UPV-GRyCAP</span>
| OK<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">IISAS-Nebula</span>
|
OK (but not supporting fedcloud VO)
 
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">CLOUDIFIN</span>
| <style type="text/css">&lt;!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--&gt;</style><span style="font-size:13px;font-family:Arial;text-decoration:underline;color:#000000;" data-sheets-value="{&quot;1":2,"2":"https://ggus.eu/index.php?mode=ticket_info&ticket_id=128104"}" data-sheets-userformat="{&quot;2":8395265,"3":{"1":0},"12":0,"14":{"1":2,"2":0},"15":"'Arial'","26":400}">[https://ggus.eu/index.php?mode=ticket_info&ticket_id=128104 https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=128104]</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">SCAI</span>
| <style type="text/css">&lt;!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--&gt;</style><span style="font-size:13px;font-family:Arial;text-decoration:underline;color:#000000;" data-sheets-value="{&quot;1":2,"2":"https://ggus.eu/?mode=ticket_info&ticket_id=127821"}" data-sheets-userformat="{&quot;2":8395265,"3":{"1":0},"12":0,"14":{"1":2,"2":0},"15":"'Arial'","26":400}">[https://ggus.eu/?mode=ticket_info&ticket_id=127821 https://ggus.eu/?mode=ticket_info&amp;ticket_id=127821]</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">CYFRONET-CLOUD</span>
| <style type="text/css">&lt;!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--&gt;</style><span style="font-size:13px;font-family:Arial;text-decoration:underline;color:#000000;" data-sheets-value="{&quot;1":2,"2":"https://ggus.eu/index.php?mode=ticket_info&ticket_id=128100"}" data-sheets-userformat="{&quot;2":8395265,"3":{"1":0},"12":0,"14":{"1":2,"2":0},"15":"'Arial'","26":400}">[https://ggus.eu/index.php?mode=ticket_info&ticket_id=128100 https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=128100]</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">CESGA</span>
| <style type="text/css">&lt;!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--&gt;</style><span style="font-size:13px;font-family:Arial;text-decoration:underline;color:#000000;" data-sheets-value="{&quot;1":2,"2":"https://ggus.eu/?mode=ticket_info&ticket_id=127815"}" data-sheets-userformat="{&quot;2":8395265,"3":{"1":0},"12":0,"14":{"1":2,"2":0},"15":"'Arial'","26":400}">[https://ggus.eu/?mode=ticket_info&ticket_id=127815 https://ggus.eu/?mode=ticket_info&amp;ticket_id=127815]</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">BIFI</span>
| <style type="text/css">&lt;!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--&gt;</style><span style="font-size:13px;font-family:Arial;text-decoration:underline;color:#000000;" data-sheets-value="{&quot;1":2,"2":"https://ggus.eu/index.php?mode=ticket_info&ticket_id=128096"}" data-sheets-userformat="{&quot;2":8395265,"3":{"1":0},"12":0,"14":{"1":2,"2":0},"15":"'Arial'","26":400}">[https://ggus.eu/index.php?mode=ticket_info&ticket_id=128096 https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=128096]</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">HG-09-Okeanos-Cloud</span>
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">replacing VMCaster with Cloudkeeper</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">CETA-GRID</span>
| [https://ggus.eu/?mode=ticket_info&ticket_id=124224 https://ggus.eu/?mode=ticket_info&amp;ticket_id=124224]<br>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">GoeGrid</span>
| <style type="text/css">&lt;!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--&gt;</style><span style="font-size:13px;font-family:Arial;text-decoration:underline;color:#000000;" data-sheets-value="{&quot;1":2,"2":"https://ggus.eu/?mode=ticket_info&ticket_id=128101"}" data-sheets-userformat="{&quot;2":8395265,"3":{"1":0},"12":0,"14":{"1":2,"2":0},"15":"'Arial'","26":400}">[https://ggus.eu/?mode=ticket_info&ticket_id=128101 https://ggus.eu/?mode=ticket_info&amp;ticket_id=128101]</span>
|-
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">NCG-INGRID-PT</span>
| <span style="font-size:13px;color:#000000;font-weight:400;text-decoration:none;font-family:'Arial';font-style:normal;">keystone v3 with OpenID Connect (experimental)</span>
|}
 
<br>
 
== Feedback from Helpdesk  ==
 
== yearly review of the information registered into GOC-DB  ==
 
'''2017-04-07'''
 
On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:  
 
#'''NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:'''
#*E-Mail
#*ROD E-Mail
#*Security E-Mail


# '''NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:'''
#* E-Mail
#* ROD E-Mail
#* Security E-Mail
:NGI Managers should also review the status of the "not certified" RCs, in according to the [https://wiki.egi.eu/wiki/PROC09#Resource_Center_status_Workflow RC Status Workflow];
:NGI Managers should also review the status of the "not certified" RCs, in according to the [https://wiki.egi.eu/wiki/PROC09#Resource_Center_status_Workflow RC Status Workflow];
# '''RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:'''
#* E-Mail
#* telephone numbers
#* CSIRT E-Mail
: RC administrators should also review the information related to the registered service endpoints.


'''The process should be completed by Apr 28th.'''
#'''RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:'''
#*E-Mail
#*telephone numbers
#*CSIRT E-Mail
 
:RC administrators should also review the information related to the registered service endpoints.
 
'''The process should be completed by Apr 28th.'''  
 
To track the process, a [https://wiki.egi.eu/wiki/Verify_Configuration_Records series of tickets] have been opened.
 
'''2017-05-15 UPDATE''':
 
*no feedback yet by: AfricaArabia, NGI_DE, NGI_FI, NGI_IL, NGI_NL, NGI_UA;
*still reviewing: NGI_GRNET, NGI_HR, NGI_IBERGRID, NGI_IT, NGI_PL, NGI_RO, ROC_LA.


To track the process, a [https://wiki.egi.eu/wiki/Verify_Configuration_Records series of tickets] have been opened.
== Failures with the updated CREAM probes  ==


'''2017-05-15 UPDATE''':
After the release of the updated CREAM probes on May 4th, several sites are failing the JobCancel and/or JobPurge ones ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=128151 GGUS 128151]):
* no feedback yet by: AfricaArabia, NGI_DE, NGI_FI, NGI_IL, NGI_NL, NGI_UA;
* still reviewing: NGI_GRNET, NGI_HR, NGI_IBERGRID, NGI_IT, NGI_PL, NGI_RO, ROC_LA.


== Failures with the updated CREAM probes ==
*the error message is: "'''Received timeout while fetching results'''".


After the release of the updated CREAM probes on May 4th, several sites are failing the JobCancel and/or JobPurge ones ([https://ggus.eu/index.php?mode=ticket_info&ticket_id=128151 GGUS 128151]).
The main reason is that in those CEs there isn't a job slot reserved for the ops tests.  


The main reason is that in those CEs there isn't a job slot reserved for the ops tests.
As explained in the [https://wiki.italiangrid.it/twiki/bin/view/CREAM/DjsCreamProbeNew CREAM probes wiki]:


As explained in the [https://wiki.italiangrid.it/twiki/bin/view/CREAM/DjsCreamProbeNew CREAM probes wiki]:
*JobCancel: cancel an active job  
* JobCancel: cancel an active job
**This metric submits a job directly to the selected CREAM CE, waits until the job gain the IDLE, RUNNING or REALLY-RUNNING state and then tries to cancel it. Finally it checks if the job has been correctly cancelled.  
** This metric submits a job directly to the selected CREAM CE, waits until the job gain the REALLY-RUNNING state and then tries to cancel it. Finally it checks if the job has been correctly cancelled.
*JobPurge: purge a terminted job  
* JobPurge: purge a terminted job
**This metric is analogous of cream_jobCancel.py. It submits a short job (e.g. hostname.jdl), waits its termination (e.g DONE-OK) and then it tries to purge it. Finally, in order to verify the purging operation was successfully executed, the probe checks the job status by executing the glite-ce-job-status command which just in this scenario, must fail because the job doesn't exist anymore.
** This metric is analogous of cream_jobCancel.py. It submits a short job (e.g. hostname.jdl), waits its termination (e.g DONE-OK) and then it tries to purge it. Finally, in order to verify the purging operation was successfully executed, the probe checks the job status by executing the glite-ce-job-status command which just in this scenario, must fail because the job doesn't exist anymore.


They both have a timeout of 15 minutes, so if the test job is not executed by that time, the probes return a failure.
They both have a timeout of 15 minutes, so if the test job is not executed by that time, the probes return a failure. '''Please assign the ops jobs an higher priority and reserve them 1 job slot, they only require few seconds for being executed'''.  
'''Please assign the ops jobs an higher priority and reserve them 1 job slot, they only require few seconds for being executed'''.


These failures didn't occur before May 4th because in the first version of the probes the returned status was "''UNKNOWN''" instead of the most proper one "''CRITICAL''".
These failures didn't occur before May 4th because in the first version of the probes the returned status was "''UNKNOWN''" instead of the most proper one "''CRITICAL''".  


== Monthly Availability/Reliability ==
List of [https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_CREAM-CE&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 failing CREAM-CEs] from nagios (not all of them are affected by this problem):


*Underperformed sites in the past A/R reports with issues not yet fixed:
* 45 CREAM-CEs affected (13% of the total ones)
**AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&ticket_id=127502 ZA-UCT-ICTS CAs updated at the end of April, statistics seems to improve
* the sites can ask a recomputation of the May statistics
** '''AsiaPacific'''
 
*** TW-NCUHEP: site-bdii unstable due to network issues between the site and one of the nagios servers https://ggus.eu/index.php?mode=ticket_info&ticket_id=128083
== Monthly Availability/Reliability  ==
***KR-UOS-SSCC: there were srm problems, statistics are improving https://ggus.eu/index.php?mode=ticket_info&ticket_id=127024
 
**NGI_AEGIS https://ggus.eu/index.php?mode=ticket_info&ticket_id=127025
*Underperformed sites in the past A/R reports with issues not yet fixed:  
***AEGIS11-MISANU: low A/R figures due to a bug in the emi.cream.CREAMCE-JobCancel probe, asked a recomputation
**AfricaArabia: https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=127502 ZA-UCT-ICTS CAs updated at the end of April, statistics seems to improve  
** '''NGI_DE''' [https://ggus.eu/index.php?mode=ticket_info&ticket_id=125430 GGUS 125430]
**'''AsiaPacific'''  
***UNI-SIEGEN-HEP: after the release of the CREAM probes, CE is OK; SRM service unstable since more than a month.
***TW-NCUHEP: site-bdii unstable due to network issues between the site and one of the nagios servers https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=128083  
***wuppertalprod: https://ggus.eu/index.php?mode=ticket_info&ticket_id=127026 issues with some ARC-CE passive probes that are not up-to-date, it could affect many sites, waiting for the new ARC release
***KR-UOS-SSCC: there were srm problems, statistics are improving https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=127024  
**NGI_UA: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=125839 GGUS 125839]
**NGI_AEGIS https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=127025  
***UA-NSCMBR: bug in the ARC-CE probes
***AEGIS11-MISANU: low A/R figures due to a bug in the emi.cream.CREAMCE-JobCancel probe, asked a recomputation  
*Underperformed sites after 3 consecutive months and underperformed NGIs:
**'''NGI_DE''' [https://ggus.eu/index.php?mode=ticket_info&ticket_id=125430 GGUS 125430]  
***UNI-SIEGEN-HEP: after the release of the CREAM probes, CE is OK; SRM service unstable since more than a month.  
***wuppertalprod: https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=127026 issues with some ARC-CE passive probes that are not up-to-date, it could affect many sites, waiting for the new ARC release  
**NGI_UA: [https://ggus.eu/index.php?mode=ticket_info&ticket_id=125839 GGUS 125839]  
***UA-NSCMBR: bug in the ARC-CE probes  
*Underperformed sites after 3 consecutive months and underperformed NGIs:  
**'''NGI_DE''':  
**'''NGI_DE''':  
***LRZ https://ggus.eu/index.php?mode=ticket_info&ticket_id=128087 site-bdii unreachable
***LRZ https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=128087 site-bdii unreachable  
**'''NGI_IT''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=128090
**'''NGI_IT''': https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=128090  
***INFN-CATANIA-STACK
***INFN-CATANIA-STACK  
***INFN-TORINO
***INFN-TORINO  
**'''NGI_NDGF''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=128091
**'''NGI_NDGF''': https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=128091  
***T2_Estonia bug in the ARC-CE probes
***T2_Estonia bug in the ARC-CE probes  
**'''NGI_TR''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=128094 ngi underperforming SOLVED
**'''NGI_TR''': https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=128094 ngi underperforming SOLVED  
**'''ROC_Canada''': https://ggus.eu/index.php?mode=ticket_info&ticket_id=128097
**'''ROC_Canada''': https://ggus.eu/index.php?mode=ticket_info&amp;ticket_id=128097  
***CA-MCGILL-CLUMEQ-T2: several downtimes, statistics are improving
***CA-MCGILL-CLUMEQ-T2: several downtimes, statistics are improving


== Proposal to modify the declaration of scheduled interventions ==
== Proposal to modify the declaration of scheduled interventions ==
 
Currently (see [[MAN02 Service intervention management]]) scheduled interventions (of any duration) MUST be declared at least 24 hours in advance, specifying reason and duration; any intervention declared less than 24 hours in advance will be considered unscheduled.
 
WLCG proposed the [https://indico.cern.ch/event/607744/contributions/2449767/subcontributions/218703/attachments/1402467/2141097/LongDowntimes-170126.pdf following modification]:
 
*a scheduled intervention shorter than 5 days must be declared at least 24 hours in advance
*a scheduled intervention longer than 5 days must be declared at least 1 month in advance
*any other intervention that don't fulfill the rules above will be considered unscheduled
 
'''2017-05-15 UPDATE'''
 
At the [https://indico.egi.eu/indico/event/3237/ last OMB], the WLCG proposal was rejected:
 
*the declaration rule for downtime longer than 5 days is too strict
*difficult planning a downtime one month in advance (at least for non-WLCG sites)
 
Then it was proposed to extend the advance notice time from 24 hours to 5 days, but neither in this case the NGIs was in favour of it.
 
'''New proposal''': Would you be in favour of extending the advance notice time to 3 days for scheduled downtimes of any duration?
 
== Decommissioning EMI WMS  ==
 
As discussed at the [https://indico.egi.eu/indico/event/3234/ February] and [https://indico.egi.eu/indico/event/3237/ April/May] OMBs, we are making plans for decommissioning the WMS and moving to DIRAC.


Currently (see [[MAN02 Service intervention management]]) scheduled interventions (of any duration) MUST be declared at least 24 hours in advance, specifying reason and duration; any intervention declared less than 24 hours in advance will be considered unscheduled.
NGIs provided WMS usage statistics, and in general the usage is relatively low, mainly for local testing


WLCG proposed the [https://indico.cern.ch/event/607744/contributions/2449767/subcontributions/218703/attachments/1402467/2141097/LongDowntimes-170126.pdf following modification]:
Moderate usage by few VOs:  
* a scheduled intervention shorter than 5 days must be declared at least 24 hours in advance
* a scheduled intervention longer than 5 days must be declared at least 1 month in advance
* any other intervention that don't fulfill the rules above will be considered unscheduled


'''2017-05-15 UPDATE'''
*NGI_CZ: eli-beams.eu
*NGI_GRNET: see
*NGI_IT: calet.org, compchem, theophys, virgo
*NGI_PL: gaussian, vo.plgrid.pl, vo.nedm.cyfronet
*NGI_UK: mice, t2k.org


At the last OMB, the WLCG proposal was rejected:
EGI contacted these VOs to agree a smooth migration of their activities to DIRAC, only some of them replied till now:  
* the declaration rule for downtime longer than 5 days is too strict
* difficult planning a downtime one month in advance (at least for non-WLCG sites)


Then it was proposed to extend the advance notice time from 24 hours to 5 days, but neither in this case the NGIs was in favour of it.
*compchem is already testing DIRAC
*calet.org: discussing with the users the migration to DIRAC. Interested in a webinar on DIRAC.
*mice: enabled on the GridPP DIRAC server


'''New proposal''': Would you be in favour of extending the advance notice time to 3 days for scheduled downtimes of any duration?
We need the VO feedback for better defining technical details and timeline:


== Decommissioning EMI WMS ==
*NGIs with VOs using WMS (not necessarily limited to the VOs above), please contact them to ensure that these VOs have a back-up plan.
As discussed at the [https://indico.egi.eu/indico/event/3234/ February] and at the [https://indico.egi.eu/indico/event/3237/ April/May] OMBs, we are making plans for decommissioning the WMS and moving to DIRAC.


NGIs provided WMS usage statistics, and in general the usage is relatively low, mainly for local testing
WMS servers can be decommissioned as soon as the supported VOs do not need them any more. The proposal is:


Moderate usage by few VOs:
*WMS will be removed from production starting from 1st January 2018.  
* NGI_CZ: eli-beams.eu
**VOs have 8 months to find alternatives or migrate to DIRAC
* NGI_GRNET: see
*Considering that this is not an update, the decommission can be performed in few weeks.
* NGI_IT: calet.org, compchem, theophys, virgo   
* NGI_PL: gaussian, vo.plgrid.pl, vo.nedm.cyfronet
* NGI_UK: mice, t2k.org


EGI contacted these VOs to agree a smooth migration of their activities to DIRAC, only some of them replied till now:
== IPv6 readiness plans  ==
* compchem is already testing DIRAC
* calet.org: discussing with the users the migration to DIRAC. Interested in a webinar on DIRAC.
* mice: enabled on the GridPP DIRAC server


== IPv6 readiness plans (to modify)==
**'''Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)'''  
** '''Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)'''
***'''NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan'''
*** '''NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan'''


== Decommissioning of dCache 2.10 and 2.13 (to modify)==  
== Decommissioning of dCache 2.10 and 2.13 (to modify) ==


* support for the '''dCache 2.10''' ended at December 2016
*support for the '''dCache 2.10''' ended at December 2016  
* according to EGI policies, dCache 2.10 must be decommissioned https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software
*according to EGI policies, dCache 2.10 must be decommissioned https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software  
* broadcast sent on Feb2 https://operations-portal.egi.eu/broadcast/archive/1631 + email sent on Feb7 to noc-managers mailing list
*broadcast sent on Feb2 https://operations-portal.egi.eu/broadcast/archive/1631 + email sent on Feb7 to noc-managers mailing list  
* sites to upgrade their 2.10 endpoints to a newer "golden release" of dCache
*sites to upgrade their 2.10 endpoints to a newer "golden release" of dCache  
** 2.13, whose support ends on July 2017, which means in about 7 months from now, or
**2.13, whose support ends on July 2017, which means in about 7 months from now, or  
** 2.16, whose support ends on May 2018: here take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
**2.16, whose support ends on May 2018: here take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10-&gt;2.13 and 2.13-&gt;2.16 transitions are supported.  
* decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.10 instances and follow up with the NGIs/sites; '''tickets will be opened this week'''
*decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.10 instances and follow up with the NGIs/sites;  
* deadline is '''end of April'''
*deadline is '''end of April'''  
* probe will be WARNING for two months until April 17th, when it will switch to CRITICAL
*probe will be WARNING for two months until April 17th, when it will switch to CRITICAL  
* in May EGI Operations will open tickets against sites still publishing dCache 2.10 and follow up on the upgrade plans
*in May EGI Operations will open tickets against sites still publishing dCache 2.10 and follow up on the upgrade plans  
* reference: https://www.dcache.org/downloads/1.9/index.shtml
*reference: https://www.dcache.org/downloads/1.9/index.shtml  
* '''STATUS:'''  8 instances still publishing 2.10




* support for the '''dCache 2.13''' will end on July 2017
<br>
* date of starting the campaign: May 1st (-3m)
* date of ending the campaign: Aug 31st (+1m)
* to be announced at OMB and in the April EGI Monthly Broadcast


== Testing the new webdav probes (to modify)==
*support for the '''dCache 2.13''' will end on July 2017
*date of starting the campaign: May 1st (-3m)
*date of ending the campaign: Aug 31st (+1m)
*to be announced at OMB and in the April EGI Monthly Broadcast


== Testing the new webdav probes  ==


{| class="wikitable sortable"
{| class="wikitable sortable"
|-
! Site  
! Site  
! Host
! Host  
! GGUSID  
! GGUSID  
! note
! note
|-
|-
| CYFRONET-LCG2
| CYFRONET-LCG2  
| se01.grid.cyfronet.pl
| se01.grid.cyfronet.pl  
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126776
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=128325
| Registered
| SOLVED
|-
| GR-01-AUTH
|
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126777
| Disabled
|-
|-
| GRIF
| GRIF  
|
| node12.datagrid.cea.fr
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126778
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=128329
|  
|  
|-
|-
| IGI-BOLOGNA
| IGI-BOLOGNA  
| darkstorm.cnaf.infn.it
| darkstorm.cnaf.infn.it  
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126779
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=127930
| Registered
| SOLVED
|-
|-
| INFN-T1
| INFN-T1  
| storm-fe-lhcb.cr.cnaf.infn.it, storm-fe.cr.cnaf.infn.it, storm-fe-archive.cr.cnaf.infn.it
| storm-fe-lhcb.cr.cnaf.infn.it, storm-fe.cr.cnaf.infn.it, storm-fe-archive.cr.cnaf.infn.it  
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126780
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=128326
| Registered
|  
|-
|-
| NCG-INGRID-PT
| NCG-INGRID-PT  
| gftp01.ncg.ingrid.pt
| gftp01.ncg.ingrid.pt  
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126781
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=128327
| Registered
| SOLVED
|-
|-
| UKI-NORTHGRID-LIV-HEP
| UKI-NORTHGRID-LIV-HEP  
| hepgrid11.ph.liv.ac.uk
| hepgrid11.ph.liv.ac.uk  
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126782
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=128328
| Registered
| SOLVED
|-
|-
| egee.irb.hr
| egee.irb.hr  
| lorienmaster.irb.hr
| lorienmaster.irb.hr  
| https://ggus.eu/index.php?mode=ticket_info&ticket_id=126783
|  
| Registered
|  
|}
|}


'''UPDATE 2017-04-10''': this week the probes should be deployed on the ARGO test instance
Missing steps:
* on GOC-DB fill in the webdav URL containing also the VO ops folder, for example: https://darkstorm.cnaf.infn.it:8443/webdav/ops or https://hepgrid11.ph.liv.ac.uk/dpm/ph.liv.ac.uk/home/ops/
** it corresponds to the value of GLUE2 attribute GLUE2EndpointURL (containing the used port and without the VO folder)
** follow the [[HOWTO21]] for filling in the information on GOC-DB
* verify that the webdav url (for example: https://darkstorm.cnaf.infn.it:8443/webdav ) is properly accessible
 
== Testing of the storage accounting  ==
 
As discussed during the [https://indico.egi.eu/indico/event/3233/ January OMB], the APEL team would need one site per NGI for testing the storage accounting. The eligible sites are the ones providing either dCache or DPM storage elements.


== Testing of the storage accounting ==
More information can be found in the following wiki: https://wiki.egi.eu/wiki/APEL/Storage


As discussed during the [https://indico.egi.eu/indico/event/3233/ January OMB], the APEL team would need one site per NGI for testing the storage accounting.
[[Storage accounting testing|List of sites]] available for test.  
The eligible sites are the ones providing either dCache or DPM storage elements.


More information can be found in the following wiki: https://wiki.egi.eu/wiki/APEL/Storage
'''2017-05-15 UPDATE''':  


[[Storage accounting testing| List of sites]] available for test.
*26 sites are sending storage accounting data (only from dCache and DPM SEs). The data has to be verified before deploying the script in production.
*After the discussion at the March [https://indico.egi.eu/indico/event/3235/ OMB], we are evaluating the creation of a new service type for monitoring the publication of storage accounting data.


'''2017-05-15 UPDATE''':
:Currently the accounting service types are:
* 26 sites are sending storage accounting data (only from dCache and DPM SEs). The data has to be verified before deploying the script in production.
* After the discussion at the March [https://indico.egi.eu/indico/event/3235/ OMB], we are evaluating the creation of a new service type for monitoring the publication of storage accounting data.
: Currently the accounting service types are:
# glite-APEL: for [https://wiki.egi.eu/wiki/APEL/UsingAuth authorizing] the sending of the messages
# APEL: to [https://wiki.egi.eu/wiki/APEL/Tests monitor] the accounting data publication


#glite-APEL: for [https://wiki.egi.eu/wiki/APEL/UsingAuth authorizing] the sending of the messages
#APEL: to [https://wiki.egi.eu/wiki/APEL/Tests monitor] the accounting data publication


== Monitoring of the UNCERTIFIED sites ==
<br>


Information about the proposal for using GOCDB as the only source of topology information for ARGO:
== Monitoring of the UNCERTIFIED sites  ==
* [https://indico.egi.eu/indico/event/3006/material/slides/0.pdf slides in October Operations Meeting agenda]
* [https://indico.egi.eu/indico/event/2810/contribution/3/material/0/ ARGO Proposal (September OMB)]
* [https://indico.egi.eu/indico/event/2814/contribution/6/material/slides/ ARGO] and [https://indico.egi.eu/indico/event/2814/contribution/7/material/slides/ GOC-DB] updates from November OMB


*Timescale:
Information about the proposal for using GOCDB as the only source of topology information for ARGO:  
**New GOC-DB release on Dec 7th including a boolean ‘monitored’ flag for the service endpoints: '''DONE'''
**Then creation of a web UI view for uncertified sites in ARGO: '''DONE'''
**Uncertified sites will be asked to fill in the service endpoints information. Follow the [https://wiki.egi.eu/wiki/HOWTO21 How to add URL service endpoint information into GOC-DB] '''DONE'''
***('''OPTIONAL''') use the [https://gocdb-test.esc.rl.ac.uk/portal/index.php GOC-DB test instance] for testing the procedure
**As information is added in the GOCDB, uncertified sites/services will be picked up by the ARGO Monitoring Engine and they will start to be monitored '''DONE'''
** By Q2 2017: support for multiple service endpoints


*[https://indico.egi.eu/indico/event/3006/material/slides/0.pdf slides in October Operations Meeting agenda]
*[https://indico.egi.eu/indico/event/2810/contribution/3/material/0/ ARGO Proposal (September OMB)]
*[https://indico.egi.eu/indico/event/2814/contribution/6/material/slides/ ARGO] and [https://indico.egi.eu/indico/event/2814/contribution/7/material/slides/ GOC-DB] updates from November OMB


'''Nagios server for the uncertified sites: https://argo-mon-uncert.cro-ngi.hr/nagios/'''
*Timescale:
*Configuration is regenerated every hour
**New GOC-DB release on Dec 7th including a boolean ‘monitored’ flag for the service endpoints: '''DONE'''
*[http://web-egi-devel.argo.grnet.gr/lavoisier/status_report-site?report=CriticalUncert&accept=html uncertified sites report] on the ARGO development instance
**Then creation of a web UI view for uncertified sites in ARGO: '''DONE'''
**Uncertified sites will be asked to fill in the service endpoints information. Follow the [https://wiki.egi.eu/wiki/HOWTO21 How to add URL service endpoint information into GOC-DB] '''DONE'''
***('''OPTIONAL''') use the [https://gocdb-test.esc.rl.ac.uk/portal/index.php GOC-DB test instance] for testing the procedure
**As information is added in the GOCDB, uncertified sites/services will be picked up by the ARGO Monitoring Engine and they will start to be monitored '''DONE'''
**By Q2 2017: support for multiple service endpoints
 
<br> '''Nagios server for the uncertified sites: https://argo-mon-uncert.cro-ngi.hr/nagios/'''  
 
*Configuration is regenerated every hour  
*[http://web-egi-devel.argo.grnet.gr/lavoisier/status_report-site?report=CriticalUncert&accept=html uncertified sites report] on the ARGO development instance  
*'''IMPORTANT''': for being correctly monitored, the uncertified sites have to fill in the proper services information into GOC-DB: please follow the [[HOWTO21]]
*'''IMPORTANT''': for being correctly monitored, the uncertified sites have to fill in the proper services information into GOC-DB: please follow the [[HOWTO21]]


[[PROC09]] modified accordingly.
[[PROC09]] modified accordingly.
 
== VAPOR  ==
 
*[http://operations-portal.egi.eu/vapor/releases VAPOR 2.2] released on March 16th
*important for presenting the amount of computing and storage resources of the infrastructure
*There are several improvements and new features: the computation of values of CPU and storages have been deeply reviewed, nevertheless some values are still not in line with the reality.


== VAPOR ==
Next version will be focused on these computations to be able to provide better figures.


* [http://operations-portal.egi.eu/vapor/releases VAPOR 2.2] released on March 16th
*Please have a look at the information displayed and report us any inconsistency you should spot.
* important for presenting the amount of computing and storage resources of the infrastructure
* There are several improvements and new features: the computation of values of CPU and storages have been deeply reviewed, nevertheless some values are still not in line with the reality.
Next version will be focused on these computations to be able to provide better figures.
* Please have a look at the information displayed and report us any inconsistency you should spot.


= AOB  =
= AOB  =


== Next meeting ==
== Next meeting ==


* '''June 12th, 2017''' https://indico.egi.eu/indico/event/3144/
*'''June 12th, 2017''' https://indico.egi.eu/indico/event/3144/
* '''new calendar available until June 2017''' https://indico.egi.eu/indico/category/32/

Latest revision as of 14:26, 25 October 2017

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


General information

Middleware

CMD

  • still working on CMD-OS updates
  • CMD-ONE first major to be released for OpenNebula 5
    • CESGA will update to OpenNebula 5 and test in particular the new cloudkeeper (former vmcatcher)

UMD

  • UMD 4.5 (June)
    • WN/UI for C7
    • CREAM for C7

Preview repository

released on:

  • 2017-04-26
    • Preview 1.11.0 AppDB info (sl6): ARC 15.03 update 12, CernVM-FS 2.3.5, gfal2 2.13.3, gfal2-utils 1.5.0, srm-ifce 1.24.2, XRootD 4.6.0
    • Preview 2.11.0 AppDB info (CentOS 7): ARC 15.03 update 12, CernVM-FS 2.3.5, dCache 3.0.12, emi-UI 4.0.2, gfal2 2.13.3, gfal2-utils 1.5.0, glite-ce-cream-client 1.15, glite-yaim-clients 5.2.1-1, srm-ifce 1.24.2, XRootD 4.6.0

Operations

Testing FedCloud sites

Resource Centre
STATUS
IFCA-LCG2 OK
IN2P3-IRES OK
100IT OK
RECAS-BARI OK
CESNET-MetaCloud OK
FZJ OK
BEgrid-BELNET OK
INFN-CATANIA-STACK OK
TR-FC1-ULAKBIM OK
INFN-PADOVA-STACK OK
IISAS-FedCloud OK
UPV-GRyCAP OK
IISAS-Nebula

OK (but not supporting fedcloud VO)

CLOUDIFIN <style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style>https://ggus.eu/index.php?mode=ticket_info&ticket_id=128104
SCAI <style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style>https://ggus.eu/?mode=ticket_info&ticket_id=127821
CYFRONET-CLOUD <style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style>https://ggus.eu/index.php?mode=ticket_info&ticket_id=128100
CESGA <style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style>https://ggus.eu/?mode=ticket_info&ticket_id=127815
BIFI <style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style>https://ggus.eu/index.php?mode=ticket_info&ticket_id=128096
HG-09-Okeanos-Cloud replacing VMCaster with Cloudkeeper
CETA-GRID https://ggus.eu/?mode=ticket_info&ticket_id=124224
GoeGrid <style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--></style>https://ggus.eu/?mode=ticket_info&ticket_id=128101
NCG-INGRID-PT keystone v3 with OpenID Connect (experimental)


Feedback from Helpdesk

yearly review of the information registered into GOC-DB

2017-04-07

On a yearly basis, the information registered into GOC-DB need to be verified. NGIs and RCs have been asked to check them. In particular:

  1. NGI managers should review the people registered and the roles assigned to them, and in particular check the following information:
    • E-Mail
    • ROD E-Mail
    • Security E-Mail
NGI Managers should also review the status of the "not certified" RCs, in according to the RC Status Workflow;
  1. RCs administrators should review the people registered and the roles assigned to them, and in particular check the following information:
    • E-Mail
    • telephone numbers
    • CSIRT E-Mail
RC administrators should also review the information related to the registered service endpoints.

The process should be completed by Apr 28th.

To track the process, a series of tickets have been opened.

2017-05-15 UPDATE:

  • no feedback yet by: AfricaArabia, NGI_DE, NGI_FI, NGI_IL, NGI_NL, NGI_UA;
  • still reviewing: NGI_GRNET, NGI_HR, NGI_IBERGRID, NGI_IT, NGI_PL, NGI_RO, ROC_LA.

Failures with the updated CREAM probes

After the release of the updated CREAM probes on May 4th, several sites are failing the JobCancel and/or JobPurge ones (GGUS 128151):

  • the error message is: "Received timeout while fetching results".

The main reason is that in those CEs there isn't a job slot reserved for the ops tests.

As explained in the CREAM probes wiki:

  • JobCancel: cancel an active job
    • This metric submits a job directly to the selected CREAM CE, waits until the job gain the IDLE, RUNNING or REALLY-RUNNING state and then tries to cancel it. Finally it checks if the job has been correctly cancelled.
  • JobPurge: purge a terminted job
    • This metric is analogous of cream_jobCancel.py. It submits a short job (e.g. hostname.jdl), waits its termination (e.g DONE-OK) and then it tries to purge it. Finally, in order to verify the purging operation was successfully executed, the probe checks the job status by executing the glite-ce-job-status command which just in this scenario, must fail because the job doesn't exist anymore.

They both have a timeout of 15 minutes, so if the test job is not executed by that time, the probes return a failure. Please assign the ops jobs an higher priority and reserve them 1 job slot, they only require few seconds for being executed.

These failures didn't occur before May 4th because in the first version of the probes the returned status was "UNKNOWN" instead of the most proper one "CRITICAL".

List of failing CREAM-CEs from nagios (not all of them are affected by this problem):

  • 45 CREAM-CEs affected (13% of the total ones)
  • the sites can ask a recomputation of the May statistics

Monthly Availability/Reliability

Proposal to modify the declaration of scheduled interventions

Currently (see MAN02 Service intervention management) scheduled interventions (of any duration) MUST be declared at least 24 hours in advance, specifying reason and duration; any intervention declared less than 24 hours in advance will be considered unscheduled.

WLCG proposed the following modification:

  • a scheduled intervention shorter than 5 days must be declared at least 24 hours in advance
  • a scheduled intervention longer than 5 days must be declared at least 1 month in advance
  • any other intervention that don't fulfill the rules above will be considered unscheduled

2017-05-15 UPDATE

At the last OMB, the WLCG proposal was rejected:

  • the declaration rule for downtime longer than 5 days is too strict
  • difficult planning a downtime one month in advance (at least for non-WLCG sites)

Then it was proposed to extend the advance notice time from 24 hours to 5 days, but neither in this case the NGIs was in favour of it.

New proposal: Would you be in favour of extending the advance notice time to 3 days for scheduled downtimes of any duration?

Decommissioning EMI WMS

As discussed at the February and April/May OMBs, we are making plans for decommissioning the WMS and moving to DIRAC.

NGIs provided WMS usage statistics, and in general the usage is relatively low, mainly for local testing

Moderate usage by few VOs:

  • NGI_CZ: eli-beams.eu
  • NGI_GRNET: see
  • NGI_IT: calet.org, compchem, theophys, virgo
  • NGI_PL: gaussian, vo.plgrid.pl, vo.nedm.cyfronet
  • NGI_UK: mice, t2k.org

EGI contacted these VOs to agree a smooth migration of their activities to DIRAC, only some of them replied till now:

  • compchem is already testing DIRAC
  • calet.org: discussing with the users the migration to DIRAC. Interested in a webinar on DIRAC.
  • mice: enabled on the GridPP DIRAC server

We need the VO feedback for better defining technical details and timeline:

  • NGIs with VOs using WMS (not necessarily limited to the VOs above), please contact them to ensure that these VOs have a back-up plan.

WMS servers can be decommissioned as soon as the supported VOs do not need them any more. The proposal is:

  • WMS will be removed from production starting from 1st January 2018.
    • VOs have 8 months to find alternatives or migrate to DIRAC
  • Considering that this is not an update, the decommission can be performed in few weeks.

IPv6 readiness plans

    • Resource Centres: assess the IPv6 readiness of the site infrastructure (real machines, cloud managers)
      • NGIs/ROCs please start discussing with sites and provide suggestions for the overall plan

Decommissioning of dCache 2.10 and 2.13 (to modify)

  • support for the dCache 2.10 ended at December 2016
  • according to EGI policies, dCache 2.10 must be decommissioned https://wiki.egi.eu/wiki/PROC16_Decommissioning_of_unsupported_software
  • broadcast sent on Feb2 https://operations-portal.egi.eu/broadcast/archive/1631 + email sent on Feb7 to noc-managers mailing list
  • sites to upgrade their 2.10 endpoints to a newer "golden release" of dCache
    • 2.13, whose support ends on July 2017, which means in about 7 months from now, or
    • 2.16, whose support ends on May 2018: here take care that the dCache team does not support the upgrade from 2.10 directly to 2.16; only 2.10->2.13 and 2.13->2.16 transitions are supported.
  • decommissioning campaign will be started by EGI Operations to monitor the upgrade of the dCache 2.10 instances and follow up with the NGIs/sites;
  • deadline is end of April
  • probe will be WARNING for two months until April 17th, when it will switch to CRITICAL
  • in May EGI Operations will open tickets against sites still publishing dCache 2.10 and follow up on the upgrade plans
  • reference: https://www.dcache.org/downloads/1.9/index.shtml



  • support for the dCache 2.13 will end on July 2017
  • date of starting the campaign: May 1st (-3m)
  • date of ending the campaign: Aug 31st (+1m)
  • to be announced at OMB and in the April EGI Monthly Broadcast

Testing the new webdav probes

Site Host GGUSID note
CYFRONET-LCG2 se01.grid.cyfronet.pl https://ggus.eu/index.php?mode=ticket_info&ticket_id=128325 SOLVED
GRIF node12.datagrid.cea.fr https://ggus.eu/index.php?mode=ticket_info&ticket_id=128329
IGI-BOLOGNA darkstorm.cnaf.infn.it https://ggus.eu/index.php?mode=ticket_info&ticket_id=127930 SOLVED
INFN-T1 storm-fe-lhcb.cr.cnaf.infn.it, storm-fe.cr.cnaf.infn.it, storm-fe-archive.cr.cnaf.infn.it https://ggus.eu/index.php?mode=ticket_info&ticket_id=128326
NCG-INGRID-PT gftp01.ncg.ingrid.pt https://ggus.eu/index.php?mode=ticket_info&ticket_id=128327 SOLVED
UKI-NORTHGRID-LIV-HEP hepgrid11.ph.liv.ac.uk https://ggus.eu/index.php?mode=ticket_info&ticket_id=128328 SOLVED
egee.irb.hr lorienmaster.irb.hr

Missing steps:

Testing of the storage accounting

As discussed during the January OMB, the APEL team would need one site per NGI for testing the storage accounting. The eligible sites are the ones providing either dCache or DPM storage elements.

More information can be found in the following wiki: https://wiki.egi.eu/wiki/APEL/Storage

List of sites available for test.

2017-05-15 UPDATE:

  • 26 sites are sending storage accounting data (only from dCache and DPM SEs). The data has to be verified before deploying the script in production.
  • After the discussion at the March OMB, we are evaluating the creation of a new service type for monitoring the publication of storage accounting data.
Currently the accounting service types are:
  1. glite-APEL: for authorizing the sending of the messages
  2. APEL: to monitor the accounting data publication


Monitoring of the UNCERTIFIED sites

Information about the proposal for using GOCDB as the only source of topology information for ARGO:

  • Timescale:
    • New GOC-DB release on Dec 7th including a boolean ‘monitored’ flag for the service endpoints: DONE
    • Then creation of a web UI view for uncertified sites in ARGO: DONE
    • Uncertified sites will be asked to fill in the service endpoints information. Follow the How to add URL service endpoint information into GOC-DB DONE
    • As information is added in the GOCDB, uncertified sites/services will be picked up by the ARGO Monitoring Engine and they will start to be monitored DONE
    • By Q2 2017: support for multiple service endpoints


Nagios server for the uncertified sites: https://argo-mon-uncert.cro-ngi.hr/nagios/

  • Configuration is regenerated every hour
  • uncertified sites report on the ARGO development instance
  • IMPORTANT: for being correctly monitored, the uncertified sites have to fill in the proper services information into GOC-DB: please follow the HOWTO21

PROC09 modified accordingly.

VAPOR

  • VAPOR 2.2 released on March 16th
  • important for presenting the amount of computing and storage resources of the infrastructure
  • There are several improvements and new features: the computation of values of CPU and storages have been deeply reviewed, nevertheless some values are still not in line with the reality.

Next version will be focused on these computations to be able to provide better figures.

  • Please have a look at the information displayed and report us any inconsistency you should spot.

AOB

Next meeting