|Main||EGI.eu operations services||Support||Documentation||Tools||Activities||Performance||Technology||Catch-all Services||Resource Allocation||Security|
Detailed agenda: Grid Operations Meeting 21 November 2011 14h00 Amsterdam time
|EVO direct link|| Pwd: gridops|
|EVO details||Indico page|
1. Middleware releases and staged rollout
1.1 EMI-1 release status
External summary wiki page provided by Cristina (EMI): twiki
1.2 Staged Rollout (Mario)
1.2.1 gLite 3.2
- LFC mysql 1.8.2: No EA, same situation as previous weeks.
In staged rollout:
- From EMI:
- EMI MPI 1.1.0: One EA doing the test.
- From IGE:
- Globus RLS 5.0.5: One EA doing the test.
- Globus GSISSH 4.4.3: Notification sent today.
- From EGI JRA1:
- SAM/Nagios update 15: One EA doing the test.
1.2.3 UMD1.4 release schedule
Under discussion the possibility to have the UMD1.4 release around the middle of December - To be proposed in the OMB next week. Candidate products for UMD1.4:
- MPI 1.1.0: presently under staged rollout.
- StoRM 1.8.0 (from update 9): still to be submitted to EGI.
- APEL publisher v.3.2.8 (from update 10): should be released this week.
- CREAM 1.13.3 (from update 10): should contain CEMON 1.13.3
- CREAM (S)GE module 1.0.0 (from update 10).
- GFAL/lcg_util v.1.11.19 (from update 10).
From IGE (needed to be aligned with the versions in EPEL reposistory):
- Globus RLS 5.0.5
- Globus GSISSH 4.4.3
- Globus GridFTP 5.0.5: in verification phase
- Globus MyPROXY 5.4.4: in verification phase
2. Operational Issues
2.1 Site configuration in GOCDB
Some sites were not showing in the Top-BDII GLUE2.0 branch, even if the Site-BDII is GLUE2 enabled. It was a site configuration issue in GOCDB, please find more information here
- GGUS ticket have been opened vs Sites with GLUE2 Site-BDII and wrong GOCDB configuration.
- Please, disseminate this information across the sites, in this way when they will install a GLUE2 Site-BDII there will not be issues.
2.2 LDAP dependencies in BDII
As reported in the Operations Meeting on October 24th the Site-BDII is still installed with openldap-servers v.2.3, this is true for both EMI and gLite3.2 versions of the software. Site admins need to manually upgrade the openldap RPMs to openldap2.4-servers.
Note: the dependency to openldap 2.4 will be added in the next release of the Site-BDII, for EMI-2.
2.3 Sites "candidate" or "suspended"
As discussed in the May 2011 OMB, the sites in suspended or candidate status for too long should be closed. We still have a large number of sites in this situation, you can find the lists of sites in those statuses from the Performance page.
As requested in this GGUS ticket, all sites to which on 01 January 2012 the following conditions apply, will be closed by COD:
- site entered suspension/candidate status before 01 June 2011
- no ticket was sent to COD to request an extension of the deadline
If there are technical reason to extend the deadline for specific sites, NGI are kindly requested to open a GGUS ticket to COD - explaining those reasons and providing a timeline for the site reintegration - before December 31st 2011.
2.4 Slurm support for BLAH
SLURM has been recently discussed in the LCG_ROLLOUT list as an open source alternative to Torque/Maui which has been reported to suffer of scalability problems when used in big clusters.
A survey across the Resource Centres in the past months reported that (about 170 answers):
- 4 sites were currently using SLURM
- 5 sites were considering to move to SLURM as LRMS
Those numbers were reported to EMI, but did not giustify a strong requirement for SLURM.
Recently also Ibergrid (PIC) and NGI_UK(David Wallom reply to this RT ticket) expressed interest in SLURM, for its scalability performance.
If we want to push the SLURM support for CREAM/BLAH (and the other related components) for EMI-3 we need to assess in the next month and a half how many sites are considering alternatives to Torque/Maui.
A way to do that would be a very short survey:
- Directed to gLite sites (ARC already supports SLURM)
- Question 1: Are you interested to use SLURM in the case it will be supported by CREAM in the future.
- Question 2: Why would you change your current batch system (and which is your current LRMS)? (e.g. scalability issues, reliability, go for an open source solution)
3.1 Manuals to review
- Failover for MySQL based grid services (From Ibergrid)
- VOMS Replication : Failover procedure based on MySQL replication (From NGI_IT)
3.2 Next meetings
12 December 2011 h 14:00