Detailed agenda: Grid Operations Meeting 16 May 2011
Meeting password: gridops
1. Middleware releases and staged rollout
1.1. Update on the status of EMI-1 release (Cristina)
The EMI project is pleased to announce the availability of the EMI 1 (Kebnekaise) release.
This release features for the first time a complete and consolidated set of middleware components from ARC, dCache, gLite and UNICORE. The services, managed in the past by separate providers, and now developed, built and tested in collaboration, follow well established open-source practices and are distributed from a single reference repository. The reference platform for EMI 1 is Scientific Linux 5 64 bit.
Kebnekaise will be supported for 18 months, with 6 additional months of support for security issues.
For more details on the EMI 1 release and the middleware products composing it, please refer to the following links:
EMI 1 Release page 
EMI User Forums 
EMI Software Repository 
EMI Project Home Page 
1.2. Staged Rollout (Mario)
gLite 3.2 components:
- VOBOX has been tested at LIP, is now in ready for production.
- L&B 2.1.21 under staged rollout by IFIC, first feedback OK.
- CREAM 1.6.6 and SGE_Utils under staged rollout by LIP. A problem seen due to the latest update of glibc.
The glibc has been updated from glibc-2.5-49 to glibc-2.5-58, before the update of the cream and sge_utils, and we saw segmentation faults in the BLAH component. This may turn to be more general. Sites running cream should check at least the BUpdater<LRMS>.
Current ongoing points in the integration task forces:
- Very happy to be number one in the GOCDB development queue with a new solution for requirement https://rt.egi.eu/rt/Ticket/Display.html?id=975 which worked as a showstopper. Promised timeline now within one month.
- For monitoring we don't need an XML based workaround now: everything else finished, just waiting for GOCDB integration.
- A more tight collaboration with Belarus: Good news: The NGI_BY UNICORE Accounting service Andrew. Lukoshko presented at the EGI UF 2011 got permission to go Open Source. An English version and translated documentation is being prepared for initial export to Sourceforge. NGI-DE already became a non open source version to have some tests and do some comparison.
- A new release of UNICORE RUS accounting system is available from sourceforge. It is major milestone as it completes our move to JMS based architecture with a full fail-over support. It is also updated to USE and 6.4.0 release of UNICORE. http://unicore-dev.zam.kfa-juelich.de/documentation/rus-accounting-1.3.1/
- Discussion on the integration mailinglist about the next GLUE2 based service types names to be integrated in the next stage.
- First meeting held on Wednesday 4th May. Wiki: Globus_integration_task_force and new mailinglist email@example.com
- Next meeting will be a meeting mostly dedicated to accounting and has a closed slot in the first European Globus Community Forum (EGCF) this week in Munich http://www.ige-project.eu/events/egcf1. There will be possibility to phone in: Wednesday 15:35-16:15 CEST
- Question on all NGIS on desired timelines for Globus integration to determine need for a second Early Adopter team for Globus at current stage.
- New service types: Globus Online instead of RFT, IIS instead of MDS
- The first IGE release announced for end of April is still being finalized. Second one at the end of this year will contain Argus.
- IGE is taking over the UK developed Grid-SAFE for accounting, currently planned to be installed on the LRZ testbed and then be integrated in the IGE releases.
Both UNICORE and Globus very interested in the outcome of the EMI working group https://twiki.cern.ch/twiki/bin/view/EMI/ComputeAccounting
2. Operational Issues
2.1 Workdir and tmpdir for parallel jobs
New input from NGI_CH: Currently at CSCS the solutions in use are: CREAM JobWrapper and ARC custom submission scripts. Those solutions are obviously middleware specific.
The custom TMPDIR should be used also to solve the local disk throughput limits: with 16 or 32 cores the local disk can be stressed beyond its limit, and high performance network file systems have better performance. Experiences from other site administrator on this topic performance are needed to consider solutions for this problem.
The customization of a job workdir should consider both the job type and the job disk performance requirements. Can the second requirement be attached to the job submission? Is really needed?
2.2 Batch system survey
As announced in the previous meeting EGI.eu prepared a basic but important survey about the batch systems distribution. The question will be:
- Which batch system is deployed in your site?
- SGE (OGE)
- Other (Specify)
- Are you planning to deploy a different batch system?
- If YES, which batch system will be deployed?
Should this surveybe managed by NGIs? Or should it be sent directly to side administrators through a broadcast & mail to LCG-ROLLOUT.
2.3 Open Issues
- gLite 3.2 VOMS not published in the information system.
- Please check here the full list of VOMS registered in the GOCDB and the ones that are being published in the information system
- Upcoming survey (will be announced in the tomorrow's OMB)
- BDII high availability best practices.
3.2 Next meeting
Proposal: Monday 30 of May 14:00