Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Agenda-12-03-2012

From EGIWiki
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security

Detailed agenda: Grid Operations Meeting 12 March 2012 14h00 Amsterdam time

EVO direct link Pwd: gridops
EVO details Indico page


1. Middleware releases and staged rollout

1.1 EMI-1 release status

EMI Update.

1.2 Staged Rollout

UMD-1

Preparing UMD 1.6 scheduled for 26 March: UMD_1.6.0_Contents The contents will be reviewed in the next couple of days.

The following EMI products have already passed the staged rollout and are now in the UMDStore area:

  • Storm 1.8.2
  • UNICORE UUDB 1.3.3
  • UNICORE TSI6 4.2.0

Current in staged rollout from EMI:

  • BDII Site 1.1.0
  • UNICORE:
    • UVOS 1.5.0
    • Gateway 4.2.0
    • UNICORE/X6 4.2.0

There are several products from IGE corresponding to Globus 5.2 release, most of them now in staged rollout. Due to the change of Globus library packaging, there where some EMI components that where broken, the new versions of those products should be released in this week EMI update, namely DPM 1.8.3 and Storm 1.8.3. The new IGE Globus release will only be done in UMD together with the new EMI updated products.

There is 1 EA that has offered to do the Verification and Staged Rollout of LFC 1.8.2 with Oracle backend, currently the SW provisioning is in progress.

The SW provisioning workflow is beeing tested with IGE Debian packages in preparation of EMI2.

2. Operational Issues

2.1 Testbeds for EMI WN

  • Currently only few clusters have EMI WN deployed.
  • In general, changing the execution environment may cause problems with the users' code.
    • Changes in the client libraries installed in the WN
    • Example of problem with CMS code.
  • VOs may want to validate their applications on the new WorkerNode.
    • In order to do that they would need some EMI WNs available for the tests.

For LHC VOs: CERN keeps the pre-production testbed up-to-date to the newest release of the worker nodes, and it is accessible by the LHC experiments to test their code.

Question: Are there NGIs deploying a testbed with EMI WN?

  • Can this testbed be used by VOs to validate their code?
  • This testing activity can be coordinated by EGI.

The issues related to WNs (and UIs) should be listed - for example in a wiki page - in order to identify the bugs that can be blocking for an upgrade gLite3.x -> EMI, and track the release of the patches.

2.2 OPS VO Membership

Recently there were few requests (triggered by the glexec monitoring tickets)to replace some ops VO members because they were no more part of the NGIs' staff. I would encourage NGIs to replace their ops members as soon as they are not working on this task anymore, if possible:

  1. Currently only 2 members per NGI are allowed in ops, it would be important to have both of them active members of the NGI's staff. If the main SAM admin is unavailable(e.g. he needs to wait few days for a new certificate), there is the need for an immediate backup.
  2. It is also important to have in ops only who strictly needs the membership.

2.3 Temporary area for job execution

Topic already discussed last year in the operations meeting: Here, for example.

  • VOs require a minimum disk space on the worker node. The required disk space is specified in the VO ID card.
  • There is the need to agree on an env. variable that points to the file system to be used for jobs' temporary files.
  • How information system advertises the information:
    • GLUE 1.3
      • SubCluster.GlueSubClusterWNTmpDir: The path of a temporary directory local to each Worker Node.
    • GLUE 2.0
      • ComputingManager.GLUE2TmpDir: The absolute path of a temporary directory local to an Execution Environment instance.
    • These information need to be available in the jobs' execution environment, in a standard way across the different clusters.

First draft proposal: The worker node environment should contain the variable: $TMPDIR, that points to the file system where the jobs are supposed to find the required disk space.

3. AOB

3.2 Next meetings