From EGIWiki
Jump to: navigation, search
Main operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security

Detailed agenda: Grid Operations Meeting 28 March 2011

1. Middleware releases and staged rollout (Mario)

1.1. Update on the status of EMI-1 release (Cristina)

Slides PDF Slides ODP

1.2. Staged Rollout

  1. gLite 3.1
    1. WMS and FTS upcoming
    2. No resolution yet for DPM 1.8.0: one EA has made the test of DPM with fixed (memory leak) version of voms-api, and everything went OK. Discuss if we release this to production with the note that the voms-api packages have to be downgraded manually.
  2. gLite 3.2
    1. L&B and FTS upcoming
    2. VOBOX in staged rollout: the 2 EAs did not respond in 1 month.
    3. Torque (utils, server and client) should go for production soon.
    4. UI waiting for feedback from technology provider, under test from the EA.
    5. Storm new version under staged rollout (followed through EGI RT) several EAs doing the test.

2. Operational Issues

2.1. CREAM deployement model with gLite-CLUSTER

At the Feb 15 OMB meeting the OMB agreed to ask EMI for support of gLite-CLUSTER. This requirement has been subsequently discussed at the TCB, and EMI has positively replied to this request. In particular:

  1. gLite-CLUSTER will be part of EMI 1.0
  2. CREAM in EMI will come with glite-CLUSTER
  3. The CREAM team will support the glite-CLUSTER components in EMI
  4. CERN will support the glite-CLUSTER component in gLite

The proposal of the CREAM product team is to have a unique deployment model with the gLite-CLUSTER mandatory for every CREAM installation. The motivation of this proposal is to have a single simpler deployment model (with single yaim variables to publish a given GLUE attribute).

Feedback needed on this proposal

2.2 DPM in glite 3.1 (Mario)

The only EA doing the test of DPM 1.8.0 in gLite 3.1, has successfully deployed the voms-api with the memory leak fixed. Report at:

Proposal to release this version with the advisory to install (downgrade) aposteriori to the fixed voms-api packages.

2.3 Bring ARC, Unicore and globus to this meeting (Mario)

For ARC some people (sites) are already represented, we would like that they also report about issues they have, and about interoperability in the infrastructure.

For UNICORE and Globus, NGI's should invite their sites to the meeting, and make themselves known, and what they expect to be integrated into the infrastructure, or what sites (NGI's) expect to deploy these MW stacks.

Next UNICORE integration task force meeting this Thursday 10:00 CET

2.4 Publishing the MW version in the information system (Mario)

Recent discussion within EMI with participation of EGI TSA1.3, about how and what should be published in the information system with respect to the MW stack, version , component and conponent version.

The discussion has been concentrated in the GLUE2 schema, but there may be also some resolution for GLUE1.3

EMI will decide the technical implementation, on our side the following proposal will be presented to EMI. The proposal will be to use the GLUE2Entity = Endpoint

GLUE2 Attribute Comment
InterfaceName The identification name of the primary protocol supported by the endpoint interface. That needs to be a standard registered value, currently the same as the glue 1 ServiceType.
InterfaceVersion The version of the primary interface protocol. For example 2.2 for the SRM
Implementor The name of the main organization implementing this software component. For example gLite or EMI
ImplementationName The name of the implementation. A product name chosen by the technology provider, e.g. CREAM, dCache, MyProxy, BDII.
ImplementationVersion The version of the implementation (for example in the major.minor.patch format).

2.5 Workdir

One comment received from Ibergrid about the Jobs work directory issue, discussed in the Operations Meeting on 14 March 2011.

Reference: Jobs work directory and temportary directory, page updated with the comments from Ibergrid:

NGI_Ibergrid (G. Borges)

Almost in all the surveys, application people always request: please put MPI properly working on the grid! While for sequential jobs it is ok to change the working dir to a scratch area the situation is more unclear in MPI scenarios since the MPI setup is, 90% of the cases, based on shared homes. So, the middleware has to deal with two different use cases:

  • Sequential jobs that do not want to use shared homes due to performance issues;
  • Parallel jobs that 90% of the times do want to use shared homes;

AFAIK, customization scripts inside the jobwrappers are completly insensitive to the user jobs (i.e. they act in the same way for all users, VOs and applications). The hypothesis of implementing a "change dir" (to a scratch space) in the customization scripts executed by the JobWrapper could break the MPI execution unless the Jobwrapper is capable of recognizing the job type, and acting accordingly. If I'm thinking right, I would not vote for this option as a valid option (as it is now) since it gives the message that EGI infrastructure does not care about parallel processing. The adopted solution should be global, i. e., general enough to any kind of job type application, and without the need to manually adjust the middleware behaviour to enable a specific job type to be executed.

(Tiziana): use we should check if the prologue mechanism can be used for this: "The prologue is an executable run within the WMS job wrapper before the user job is started. It can be used for purposes ranging from application-specific checks that the job environment has been correctlyset on the WN to actions like data transfers, database updates or MPI pre script. If the prologue fails the job wrapper terminates and the job is considered for resubmission."
Using Prologue would not be a good solution. It would require MPI users to use the prologue explicitly - and therefore the user has to handle the difference between MPI and serial jobs, rather than the software. Secondly, it means that the user has to know _where_ the correct location to put the the files is. (This becomes an issue with MPI jobs that handle large files). Rather than using prologue, and forcing all these issues on the end user; the middleware should have a neat way for system managers to specify this - this could be as simple as passing the number of cores used into the customisation points scripts for gLite. Site managers can then ignore it, or use that data to route job directories appropriately Spurdie 12:00, 15 March 2011 (UTC)

2.6 update of WMS to version 3.1.30-0.slc4

The WMS version 3.1.30-0.slc4 released in November 2010 includes a new version of gridsite (which fixes a problem happening when considering proxies generated using gLite 3.2 VOMS servers). WMS 3.1.30-0 release notes

The problem with WMS blocked the update of many VOMS servers to gLite 3.2, to keep "compatibility" with WMS.

The following are the WMSes that may not be up to date (older releases than the 3.1.30):


These WMSes does not publish the GlueServiceDataKey=metapackage_glite-WMS information in the BDII, but it is highly probable that the deployed service is an earlier version than 3.1.30-0 (3.1.29 or earlier).

Please check with the site admins.

TopBDII 3.2.11-1 (BDII-5.1.22)

This update has been rated urgent since it contains the new GOCDB url, the list of sites (given in the next link) that have not yet updated are requested to do so as soon as possible:

Sites with Top BDII old version

Please note that the list may not be complete. The change of the GOCDB URL will occur around June.

Personal tools