Difference between revisions of "Agenda-03-12-2012"
(→FTS jobs abort with "No site found for host xxx.yyy" error)
|Line 92:||Line 92:|
===3. AOB ===
===3. AOB ===
=== 4. Minutes ===
=== 4. Minutes ===
Revision as of 14:44, 3 December 2012
|Main||EGI.eu operations services||Support||Documentation||Tools||Activities||Performance||Technology||Catch-all Services||Resource Allocation||Security|
Detailed agenda: Grid Operations Meeting 22 October 2012
|EVO direct link||Pwd: gridops|
|EVO details||Indico page|
1. Middleware releases and staged rollout
1.1. Update on the status of EMI updates
Cristina Aiftimiei (EMI) reports on the EMI updates
1.2. Staged Rollout
2. Operational Issues
2.1 Unsupported middleware update
Middleware services planned to be upgraded by end of November
There are currently (last check Dec 1st) 28 sites, who declared a plan to upgrade their services by the end of November, still with unsupported middleware, without a downtime on those services.
By today EGI Operations will open a new batch of NGI GGUS tickets, asking:
- To open a downtime for the unsupported services by Friday COB
- Sites with late plans (beyond November) should be already in downtime, if any of these sites have not done so they must open the downtime as soon as possible, possibly today COB
- Sites with CLASSIC SE service types registered in GOCDB will be asked to remove those services.
VOMS is a critical services for the VOs, VOMS tickets status will be assessed one by one. Never the less sites deploying unsupported VOMS must provide an upgrade plans, or the technical reasons to delay the upgrade.
DPM LFC and WN
The middleware services that are unsupported since the end of November will raise critical alarms on the ROD dashboard by the end of this week. The probes are ready, currently the testing is being finalized, and Operations portal team is working for their integration in the operational dashboard.
2.2 Updates from DMSU
FTS jobs abort with "No site found for host xxx.yyy" error
Details GGUS #87929
From time to time, some FTS transfers fail with the message above. The problem was reported at CNAF, IN2P3, and GRIDKA, noticed by Atlas, CMS, and LHCb VOs. The problem is appearing and disappearing in rather short and unpredictable intervals.
Exact reasons are not yet understood, we keep investigating. Reports from sites affected by similar problem will be appreciated.
Update Nov 20: The user reports that both problem disappeared, probably fixed together.
LCMAPS-plugins-c-pep in glexec fails at RH6 based WNs
Details GGUS #88520
Due to replacement of OpenSSL with NSS in the RH6 based distributions, LCMAPS-plugins-c-pep invoked from glexec fails on talking to Argus PEP via curl.
This is a known issue, as mentioned in EMI glexec release notes however, the workaround is not described in a usable way there.
Once we make sure we understand it properly and that the fix works, it will be documented properly at UMD pages and passed to the developers to
- fix the documentation
- try to deploy the workaround automatically when NSS-poisoned system is detected
UPDATE Nov 19th: the fix is now well explained in the known issues section and it will be included in a future yaim update
WMS does not work with ARC CE 2.0.1
The format of jobid changed in in the ARC CE release 12. This is not recognised by Condor prior to version 7.8.3. However, current EMI-1 WMS uses Condor 7.8.0. This breaks submission from WMS to ARC CE.
The problem hence affects CMS SAM tests as well as their production jobs.
Hence updates to ARC CE 12 should be done carefully before the Condor update is available from EMI.
UPDATE Nov 26th: on a test WMS it was installed Condor 7.8.6, and the submission to ARC seemed to work fine; since this WMS isn't available any more, further deeper tests should be performed, perhaps using the EMI-TESTBED infrastructure
3.1 Next meeting
2 weeks time would be Dec 17, the day before OMB.
- We would need to skip to January 7th
- Intermediate proposal: Friday Dec 14th