Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Agenda-19-11-2013"

From EGIWiki
Jump to navigation Jump to search
(Created page with "{| |- | [http://connect.ct.infn.it/egi-inspire-sa1/ Audio conference link] | ''Conference system is Adobe Connect, no password required.'' |- | [https://indico.egi.eu/indico/mat...")
 
 
(23 intermediate revisions by 3 users not shown)
Line 15: Line 15:
==== 1.1 News from URT ====
==== 1.1 News from URT ====


* '''ARC''': Major release on November 9th. There are small backward incompatibilities, ARC release team is bumping a new major version 4.x to mark that the new release is not backward compatible.
* ARC 4.0.0 expected for November 22nd
* '''INFN''', new release of several products expected at the end of October
* Globus Toolkit 5.2.5 has been released on November 5th. Has been released in EPEL by EGCF and currently this release is in EPEL testing.
** CREAM core, SLURM plugin, Torque plugin and BLAH
*** Information system fix and other bug fixes
** StoRM 1.11.3 bug fix release
** VOMS server and client. CAnL support for the API, several bugs and issues fixed for the server.
* '''EGCF''': new release expected, 5.2.5. Mainly bug fixes, fix for the hanging gridftp server processes issue.
* '''QCG''': New release expected on November 15th featuring: SHA-2 support and information system integration with EGI. Several improvements in the clients.


=== 1.2 Staged rollout updates ===
=== 1.2 Staged rollout updates ===
* Ready to be released (this week):
** unicore-hila v. 2.4.0
** cream-torque v. 2.1.1
** px v. 1.3.34
** arc-ce v. 3.0.3
** gridway v. 5.14.1
** dpm v. 1.8.7-3 
** bdii-core v. 1.5.4
** bdii-top v. 1.1.3
** canl v. 2.1.4
** cream-ce v. 1.16.2
** blah v. 1.20.3


* SAM update 22 is ready to be moved into production
* CREAM, v.1.16.2 and BLAH v. 1.20.3:
** contains fixes for several minor and major bugs.
** Blah solves some memory issues of the SGE: sge_local_submit_attributes.sh.


=== 1.3 Issue with the latest release of BDII in UMD-2 ===
* In Stage Rollout
* The UMD-2.7.0 update comes with a newer version of the glite-info-provider-ldap which changes the location of the glite-info-provider-ldap to /usr/libexec. Unfortunately some rpms haven't been updated (in contrast to UMD-3) to a version that can cope with that change resulting in a empty bdii! [[https://ggus.eu/ws/ticket_info.php?ticket=98392 GGUS]]
** UI v. 3.0.3 which includes allot of new clients: DPM / LFC (1.8.7-3), VOMS (3.0.4), CanL (2.2.0), lcg_utils (1.16.0),
** '''Immediate solution:''' Update to UMD-3.
** Wn v. 3.0.1
** voms-server v. 2.0.11: solves the very slow or malicious clients that could lead to an endless loop in the VOMS server socket accept procedure.
** wms v. 3.6.1
** LFC v.1.8.7-3 '''Missing EA'''


* Emergency release of BDII in UMD-2 is being prepared due to some missing rpm's in repository


=== 1.4 Proposal: UMD-2 security support ===
* Soon in Stage Rollout:
 
** CREAM-SLURM 1.0.1 solves the issue with the BDII not publishing CE information.
* UMD-2 - formally - would end security support on April 2014.
** DPM-YAIM: solves lcgdm-dav and dpm-xrootd issue in can give intermittent permission denied. Also includes new dmlite plugins.
* SA2 proposes, starting from November 2013 to release '''only security updates for UMD-2'''
** The number of early adopters deploying UMD-2 is reducing.
** Updates from PT are less frequent for UMD-2.


=== 2 Operational issues ===
=== 2 Operational issues ===
==== 2.1 Updates from DMSU ====
==== 2.1 Updates from DMSU ====
Nothing to report
==== 2.2 Issues with the gLExec probe on tarball WNs ====
The new gLExec probe provided by EMI is Perl-based and it requires Perl library Time::HiRes. In case the library is missing on WN, test '''emi.cream.glexec.WN-gLExec''' will report:
CRITICAL: (null)
WNs with the metapackage emi-glexec_wn contain the required library and work fine. WNs deployed by using tarball WNs are missing the library and glexec binary.


===== Unable to retrieve the output sandbox in a custom dir from a CREAM CE =====
==== 2.3 Proposed changes in the APEL monitoring probes ====
Slides


see details in [https://ggus.eu/ws/ticket_info.php?ticket=98368 GGUS #98368]
==== 2.4 Medium term changes proposed for GFAL utilities ====


The command glite-ce-job-output with --dir option fails when the specified path contains 'special' chars, such as '=' or '.' .
The GFAL lcg-utils product team is proposing to ''phase out'' GFAL/lcg_utils in favour of GFAL2/gfal-utils.


$ glite-ce-job-output --dir /var/lib/gridprobes/lhcb.Role=production/emi.cream/CREAMCEDJS/ce208.cern.ch/jobOutput https://ce206.cern.ch:8443/CREAM023339075
GFAL2 is already used by FTS3 for file transfers, the main features of GFAL2 are:
*POSIX-like API
2013-10-24 10:10:04,637 FATAL - Failed creation of directory [/var/lib/gridprobes/lhcb.Role=production/emi.cream/CREAMCEDJS/ce208.cern.ch/jobOutput/ce206.cern.ch_8443_CREAM023339075]: boost::filesystem::path: invalid name "lhcb.Role=production" in path: "/var/lib/gridprobes/lhcb.Role=production/emi.cream/CREAMCEDJS/ce208.cern.ch/jobOutput/ce206.cern.ch_8443_CREAM023339075"
*Getting, putting and third party copy all in one call
*Session reuse for SRM, GridFTP and HTTPS
*Python wrappers available
*Interaction with the infosys is entirely optional, and not needed
** Support of information system will be maintained in the future
* gfal2 and gfal (currently in production) are not API compatible


This bug is present on EMI-2 UI and sl5 EMI-3 UI; the developers are investigating [https://issues.infn.it/jira/browse/CREAM-128 #CREAM-128]
lgc_utils functions that are ''proposed'' to '''be deprecated''':
{| width="90%" cellspacing="0" cellpadding="5" class="wikitable" style="border:1px solid black;"
|- style="background-color:darkgray;"
! Function
! lcg_util
! Status
|-
|Add an alias for a given GUID
|lcg-aa
|Deprecated
|-
|Copy and register a file
|lcg-cr
|Deprecated
|-
|List aliases for a given LFN/GUID |
|lcg-la
|Deprecated
|-
|Get the GUID for a given LFN
|lcg-lg
|Deprecated
|-
|Lists the replicas for a given LFN
|lcg-lr
|Deprecated
|-
|Remove an alias
|lcg-ra
|Deprecated
|-
|Copy between Ses using the catalog
|lcg-rep
|Deprecated
|-
|Register a file in a catalog
|lcg-rf
|Deprecated
|-
|Unregister a file
|lcg-uf
|Deprecated
|}


==== 2.2 Glue validator alarms ====
The proposal is to support lcg_utils in terms of security and critical bug patches until '''31st October 2014'''.


ROD teams have been notified about the new probe that will generate alarms in the Operations Dashboard.  
The changes may affect heavily user communities who are using LFC and who based their workflows on the lcg-utils commands that combine SRM and LFC actions.


Peter has generated an [https://documents.egi.eu/document/1995 Excel file] containing the Site-BDIIs failing the probe on midmon, mapped into sites and NGIs. The real time status can be checked on [https://midmon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_Site-BDII&style=detail&servicestatustypes=16&hoststatustypes=15&serviceprops=0&hostprops=0 midmon].
More information available in the [https://svnweb.cern.ch/trac/lcgutil/wiki/MediumTermProposal CERN twiki].


The possible date to turn on alarms is November 22nd, please contact sites who are failing the test and ask them to debug the problems and provide feedback if there are problems with the results. More information on glue-validator can be found in the [http://gridinfo.web.cern.ch/glue/glue-validator-guide manual page] maintained by the developer. ''Remember to run it with the -k option to skip the known issues that cannot be solved at site level''.
'''Action''': Discuss with your NGI team and user communities the implications of this proposed change. <br>
P.Solagna will circulate this information to the UCB.


The glue-validator probe is not affecting AR statistics.
==== 2.5 glue2 monitoring ====
Proposal for a soft-start:
* After the end of the month open tickets to sites, non alarm tickets without escalation, to ask them to start working on the issue.


=== 3. AOB  ===
=== 3. AOB  ===
==== 3.2 Next meeting ====
==== 3.2 Next meeting ====
Proposed date for the next meeting is '''November 11th 2013, 14:00 CET'''


=== 4. Minutes  ===
=== 4. Minutes  ===
[https://indico.egi.eu/indico/materialDisplay.py?materialId=minutes&confId=1918 minutes in indico]

Latest revision as of 15:55, 19 November 2013

Audio conference link Conference system is Adobe Connect, no password required.
Audio conference details Indico page



1. Middleware releases and staged rollout

1.1 News from URT

  • ARC 4.0.0 expected for November 22nd
  • Globus Toolkit 5.2.5 has been released on November 5th. Has been released in EPEL by EGCF and currently this release is in EPEL testing.

1.2 Staged rollout updates

  • Ready to be released (this week):
    • unicore-hila v. 2.4.0
    • cream-torque v. 2.1.1
    • px v. 1.3.34
    • arc-ce v. 3.0.3
    • gridway v. 5.14.1
    • dpm v. 1.8.7-3
    • bdii-core v. 1.5.4
    • bdii-top v. 1.1.3
    • canl v. 2.1.4
    • cream-ce v. 1.16.2
    • blah v. 1.20.3


  • In Stage Rollout
    • UI v. 3.0.3 which includes allot of new clients: DPM / LFC (1.8.7-3), VOMS (3.0.4), CanL (2.2.0), lcg_utils (1.16.0),
    • Wn v. 3.0.1
    • voms-server v. 2.0.11: solves the very slow or malicious clients that could lead to an endless loop in the VOMS server socket accept procedure.
    • wms v. 3.6.1
    • LFC v.1.8.7-3 Missing EA


  • Soon in Stage Rollout:
    • CREAM-SLURM 1.0.1 solves the issue with the BDII not publishing CE information.
    • DPM-YAIM: solves lcgdm-dav and dpm-xrootd issue in can give intermittent permission denied. Also includes new dmlite plugins.

2 Operational issues

2.1 Updates from DMSU

Nothing to report

2.2 Issues with the gLExec probe on tarball WNs

The new gLExec probe provided by EMI is Perl-based and it requires Perl library Time::HiRes. In case the library is missing on WN, test emi.cream.glexec.WN-gLExec will report:

CRITICAL: (null)

WNs with the metapackage emi-glexec_wn contain the required library and work fine. WNs deployed by using tarball WNs are missing the library and glexec binary.

2.3 Proposed changes in the APEL monitoring probes

Slides

2.4 Medium term changes proposed for GFAL utilities

The GFAL lcg-utils product team is proposing to phase out GFAL/lcg_utils in favour of GFAL2/gfal-utils.

GFAL2 is already used by FTS3 for file transfers, the main features of GFAL2 are:

  • POSIX-like API
  • Getting, putting and third party copy all in one call
  • Session reuse for SRM, GridFTP and HTTPS
  • Python wrappers available
  • Interaction with the infosys is entirely optional, and not needed
    • Support of information system will be maintained in the future
  • gfal2 and gfal (currently in production) are not API compatible

lgc_utils functions that are proposed to be deprecated:

Function lcg_util Status
Add an alias for a given GUID lcg-aa Deprecated
Copy and register a file lcg-cr Deprecated
lcg-la Deprecated
Get the GUID for a given LFN lcg-lg Deprecated
Lists the replicas for a given LFN lcg-lr Deprecated
Remove an alias lcg-ra Deprecated
Copy between Ses using the catalog lcg-rep Deprecated
Register a file in a catalog lcg-rf Deprecated
Unregister a file lcg-uf Deprecated

The proposal is to support lcg_utils in terms of security and critical bug patches until 31st October 2014.

The changes may affect heavily user communities who are using LFC and who based their workflows on the lcg-utils commands that combine SRM and LFC actions.

More information available in the CERN twiki.

Action: Discuss with your NGI team and user communities the implications of this proposed change.
P.Solagna will circulate this information to the UCB.

2.5 glue2 monitoring

Proposal for a soft-start:

  • After the end of the month open tickets to sites, non alarm tickets without escalation, to ask them to start working on the issue.

3. AOB

3.2 Next meeting

4. Minutes