|Main||EGI.eu operations services||Support||Documentation||Tools||Activities||Performance||Technology||Catch-all Services||Resource Allocation||Security|
Detailed agenda: Grid Operations Meeting 24 October 2011 14h00 Amsterdam time
|EVO direct link|| Pwd: gridops|
|EVO details||Indico page|
1. Middleware releases and staged rollout
1.1 EMI-1 release status
Wiki page with the EMI release status updates.
1.2 Staged Rollout (Mario)
1.2.1 gLite 3.2
There are the following patches under staged rollout:
- glite Torque (utils, server, client): EAs doing the test
- LFC (oracle and mysql) and DPM 1.8.2-3: despite several calls, EGI SA1, LCG-Rollout list and notification of EAs, no one has acknowledge the test. We would request again that if some sites still running glite LFC and/or DPM could come forward.
Presently there is only gLexec in staged rollout where the report is expected today. The candidate list of products for inclusion into UMD 1.3 for the 31 October is the following:
- gLexec 1.0.1: depends on the staged rollout report, will be decided today.
- BDII site 1.0.1
- BDII top 1.0.1
- ARGUS 1.4.0
- dCache 1.9.12-10
- ARC 1.1.0 products:
- UNICORE 6.4.1 products:
- Globus 5.0.4 products:
2. Operational Issues
2.1.1 Site-BDIIs publishing GLUE1.3
In this GGUS ticket Laurence Field pointed out that many site-BDIIs are not publishing the GLUE2 information to the top-BDII. GLUE1.3:
$ ldapsearch -LLL -x -h lcg-bdii.cern.ch -p 2170 -b o=grid objectClass=GlueSite GlueSiteName | grep GlueSiteName | sort | cut -d ":" -f 2 | wc 384 384 4954
$ ldapsearch -LLL -x -h lcg-bdii.cern.ch -p 2170 -b o=glue objectClass=Glue2AdminDomain | grep GLUE2DomainID: | sort | cut -d ":" -f 2 | wc 147 147 1848
Almost 240 sites are not publishing GLUE2 information on the Top-BDII. There are two different issues:
- Old Site-BDIIs don't publish GLUE2 data at all
- Old version for SL4 do not support GLUE2. Upgrade needed.
- GLUE2 enabled Site-BDIIs instances are publishing the GLUE2 branch in the Site-BDII, but not in the Top-BDII
- Configuration problem or bug? Under investigation.
- Please, follow up this issue with your sites, if they are not in the "glue" branch of the Top-BDII, to understand in which case they should be dropped.
2.1.2 openldap-servers dependencies
There was a thread in LCG-ROLLOUT on September with the title Site-BDII stops responding. Symptoms:
- site-BDII stop responding
- slapd process using 100% CPU
If the site-bdii is using sladp V2.4, the solution is to upgrade by hand the openldap-servers RPM to a version 2.4.* This should be not needed for the EMI/UMD release of Site-BDII because the third-party repository already contains openldap2.4-servers-2.4.22-1.el5.x86_64.rpm (or the 32bit equivalent).
2.2 EGI Staged Rollout collaboration with OSG/USATLAS/USCMS
A request/proposal has been sent to EGI for collaboration between Early Adopter sites that are in WLCG and OSG/USCMS/USATLAS in order to have test instances for atlas and cms workflows, that are as close to production as possible. The mails are copied below:
From Anthony Tiradani USCMS and OSG:
With my USCMS hat on, I am primarily interested in verifying that CMS workflows continue to work through glideinWMS.
With my OSG hat on, I am interested in ensuring interoperability between OSG and WLCG. I'd like to run various test jobs on sites that are early adopters. Initially, I am interested in testing CE's and worker node clients.
From Rob Gardner USATLAS:
As regards to ATLAS - the ideal thing would be to submit validation jobs via Panda to endpoints connected with new releases, setup in a similar manner as they'd be deployed widely. So for OSG we have a Panda endpoint in the integration testbed ("UC_ITB" in the Panda monitor, http://panda.cern.ch/server/pandamon/query?jobsummary=site&site=UC_ITB ) Ideally we'd do this for a variety of sites in both infrastructures (differing mostly by storage deployment).
Looking at the Early Adopters table I believe some of those must already be Panda endpoints. Question: have discussions started with anyone else in ATLAS ADC (ATLAS Distributed Computing)?
2.3 gLite 3.2 critical bugs and patches
- NGI_UK about CREAM service reliability