Agenda-03-12-2012

From EGIWiki
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security



Detailed agenda: Grid Operations Meeting 03 December 2012

EVO direct link Pwd: gridops
EVO details Indico page


1. Middleware releases and staged rollout

1.1. Update on the status of EMI updates

Cristina Aiftimiei (EMI) reports on the EMI updates. Twiki page with more information

1.2. Staged Rollout

2. Operational Issues

2.1 Unsupported middleware update

Middleware services planned to be upgraded by end of November

There are currently (last check Dec 1st) 28 sites, who declared a plan to upgrade their services by the end of November, still with unsupported middleware, without a downtime on those services.
By today EGI Operations will open a new batch of NGI GGUS tickets, asking:

Unsupported VOMS

VOMS is a critical services for the VOs, VOMS tickets status will be assessed one by one. Never the less sites deploying unsupported VOMS must provide an upgrade plans, or the technical reasons to delay the upgrade.

DPM LFC and WN

The middleware services that are unsupported since the end of November will raise critical alarms on the ROD dashboard by the end of this week. The probes are ready, currently the testing is being finalized, and Operations portal team is working for their integration in the operational dashboard.

ROD teams have to follow the following escalation procedure, to follow up with the unsupported middleware alarms. The overall procedure for the unsupported middleware decommissioning is PROC16.

ARC unsupported middleware

There are still unsupported old versions of ARC in the infrastructure. They should be removed as well.

2.2 Updates from DMSU

FTS jobs abort with "No site found for host xxx.yyy" error

Details GGUS #87929

From time to time, some FTS transfers fail with the message above. The problem was reported at CNAF, IN2P3, and GRIDKA, noticed by Atlas, CMS, and LHCb VOs. The problem is appearing and disappearing in rather short and unpredictable intervals.

Exact reasons are not yet understood, we keep investigating. Reports from sites affected by similar problem will be appreciated.

Update Nov 20: The user reports that both problem disappeared, probably fixed together.

LCMAPS-plugins-c-pep in glexec fails at RH6 based WNs

Details GGUS #88520

Due to replacement of OpenSSL with NSS in the RH6 based distributions, LCMAPS-plugins-c-pep invoked from glexec fails on talking to Argus PEP via curl.

This is a known issue, as mentioned in EMI glexec release notes however, the workaround is not described in a usable way there.

Once we make sure we understand it properly and that the fix works, it will be documented properly at UMD pages and passed to the developers to

  1. fix the documentation
  2. try to deploy the workaround automatically when NSS-poisoned system is detected

UPDATE Nov 19th: the fix is now well explained in the known issues section and it will be included in a future yaim update

WMS does not work with ARC CE 2.0.1

Details GGUS #88630, further info Condor ticket #3062

The format of jobid changed in in the ARC CE release 12. This is not recognised by Condor prior to version 7.8.3. However, current EMI-1 WMS uses Condor 7.8.0. This breaks submission from WMS to ARC CE.

The problem hence affects CMS SAM tests as well as their production jobs.

Hence updates to ARC CE 12 should be done carefully before the Condor update is available from EMI.

UPDATE Nov 26th: on a test WMS it was installed Condor 7.8.6, and the submission to ARC seemed to work fine; since this WMS isn't available any more, further deeper tests should be performed, perhaps using the EMI-TESTBED infrastructure

3. AOB

3.1 UMD documentations

(Alessandro Usai)

3.2 Next meeting

2 weeks time would be Dec 17, the day before OMB.

4. Minutes

Available in the indico page: Minutes.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox
Print/export