From EGIWiki
Revision as of 12:27, 5 November 2012 by David (talk | contribs)
Jump to: navigation, search
Main operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security

Detailed agenda: Grid Operations Meeting 22 October 2012

EVO direct link Pwd: gridops
EVO details Indico page

1. Middleware releases and staged rollout

1.1. Update on the status of EMI updates

1.2. Staged Rollout

UMD 2.3.0 preparation. Release due 19 November, freeze date 12 November.

  • UMDStore - ready for the release:
    • AMGA 2.3.0
    • CREAM 1.14.1
    • dCache 2.2.4 (Golden release)
    • EMIR 1.2.0
    • IGE SAGA 1.6.1
  • Staged rollout:
    • IGE Globus gsissh 5.3.5
    • LB 3.2.9
    • DPM 1.8.4
    • LFC 1.8.4
    • WN 2.0.1 in SL6 - trouble in SAM/Nagios tests, segmentation fault. Waiting for answer from SAM/Nagios team about the next release that should solve this issue.
    • WMS 3.4.0 in SL6 - high impact bug may lead to rejection to production. Probably also affecting the SL5 product although we didn't tested yet.

2. Operational Issues

2.1 Unsupported middleware update

Most of the sites have replied to the tickets opened by COD about the unsupported middleware still deployed at the site, providing a reasonable upgrade plan for their middleware.

Currently the problematic sites can be grouped in two categories

  • Unresponsibe sites (45 sites), sites without any plan provided in the ticket, or with the plan expired and not updated. The list of unresponsive site is available here: Unresponsive sites 5 Nov 2012.
  • Late upgrade plan (7 sites), in general an upgrade plan cannot extend after the end of 2012.

Next steps:

  • The dCache probe is being rolled in production today, alarms should appear in the next 48 hours on the security dashboard
    • If sites with unsupported dCache instance have no tickets opened, COD will open a new ticket, specifying that dCache triggered the ticket
  • CSIRT is taking over COD for the tickets related to problematic sites
    • By next Monday the NGIs with problematic sites will be asked to contact the sites, asking them to open a donwtime for their unsupported services.
    • If the sites does not provide any plan in the coming days, or does not open the downtime, it remains eligible for suspension.

Note: Not all the sites listed now as problematic will be suspended, EGI Operations will evaluate the single cases before any suspension.

2.2 Updates from DMSU

problem in retrieving the job output after EMI2 update 4 (WMS)

for details see GGUS 87802

it seems that if an user never used the myproxy and he isn't currently using it, she can retrieve the output without any problem. For the other kind of users, the problem occurrs (see the details in that ticket).

  • the user proxy is usually stored in the Sandboxdir, but if the user is using the myproxy service, that file is a symlink to the real file stored in the proxy renewal directory (/var/glite/spool/glite-renewd). When the jobs ends, that proxy is purged so that the user hasn't any more the permissions to retrieve the output
  • For the moment a simple workaround is to submit a new job, and before its ending, retrieve the output of any previous job.

Moreover it was also noticed that if in the jdl there isn't the variable related to myproxy, in the jdl stored in the Sandboxdir it is wrongly contained that variable set to the myproxy hostname that is set by default on the UI. Instead if in the jdl it is present an empy variable like the following:

  MyProxyServer = "";

the jdl stored on the WMS will properly contain it, the proxy is stored in the SandBoxDir (no symlink to proxy renewal dir) and the user is able to retrieve the output when the job ends.

3. AOB