|Main||EGI.eu operations services||Support||Documentation||Tools||Activities||Performance||Technology||Catch-all Services||Resource Allocation||Security|
Detailed agenda: Grid Operations Meeting 22 October 2012
|EVO direct link|| Pwd: gridops|
|EVO details||Indico page|
1. Middleware releases and staged rollout
1.1. Update on the status of EMI updates
Cristina Aiftimiei (EMI) reports on the EMI updates, twiki link
1.2. Staged Rollout
UMD 2.3.0 preparation. Release due 19 November, freeze date 12 November.
- UMDStore - ready for the release:
- AMGA 2.3.0
- CREAM 1.14.1
- dCache 2.2.4 (Golden release)
- EMIR 1.2.0
- IGE SAGA 1.6.1
- Staged rollout:
- IGE Globus gsissh 5.3.5
- LB 3.2.9
- DPM 1.8.4
- LFC 1.8.4
- WN 2.0.1 in SL6 - trouble in SAM/Nagios tests, segmentation fault. Waiting for answer from SAM/Nagios team about the next release that should solve this issue.
- WMS 3.4.0 in SL6 - high impact bug may lead to rejection to production. Probably also affecting the SL5 product although we didn't tested yet.
- UNICORE HILA 2.3.0
- FTS 2.2.8 - Verifiers reported some problems with the current packages under the UMD repos. Being sorted out.
- WN 2.0.1 in SL5 - EMI included some 32 bits data libs that bring Globus 32 bit lib dependencies, that are not in the umd repositories. Being sorted out.
- Verification not started yet:
- UI 2.0.1
- glite-MPI 1.4.0
- (WMS 3.4.0)
- Large number of products from the IGE 3.0 release
2. Operational Issues
2.1 Unsupported middleware update
Most of the sites have replied to the tickets opened by COD about the unsupported middleware still deployed at the site, providing a reasonable upgrade plan for their middleware.
Currently the problematic sites can be grouped in two categories
- Unresponsibe sites (45 sites), sites without any plan provided in the ticket, or with the plan expired and not updated. The list of unresponsive site is available here: Unresponsive sites 5 Nov 2012.
- Late upgrade plan (7 sites), in general an upgrade plan cannot extend after the end of 2012.
- The dCache probe is being rolled in production today, alarms should appear in the next 48 hours on the security dashboard
- If sites with unsupported dCache instance have no tickets opened, COD will open a new ticket, specifying that dCache triggered the ticket
- CSIRT is taking over COD for the tickets evaluation
- By next Monday the NGIs with problematic sites will be asked to contact the sites, asking them to open a donwtime for their unsupported services.
- If the sites does not provide any plan in the coming days, or does not open the downtime, it remains eligible for suspension.
Note: Not all the sites listed now as problematic will be suspended, EGI Operations will evaluate the single cases before any suspension.
2.2 Updates from DMSU
problem in retrieving the job output after EMI2 update 4 (WMS)
for details see GGUS 87802
it seems that if an user never used the myproxy and he isn't currently using it, she can retrieve the output without any problem. For the other kind of users, the problem occurrs (see the details in that ticket).
- the user proxy is usually stored in the Sandboxdir, but if the user is using the myproxy service, that file is a symlink to the real file stored in the proxy renewal directory (/var/glite/spool/glite-renewd). When the jobs ends, that proxy is purged so that the user hasn't any more the permissions to retrieve the output
- For the moment a simple workaround is to submit a new job, and before its ending, retrieve the output of any previous job.
Moreover it was also noticed that if in the jdl there isn't the variable related to myproxy, in the jdl stored in the Sandboxdir it is wrongly contained that variable set to the myproxy hostname that is set by default on the UI. Instead if in the jdl it is present an empy variable like the following:
MyProxyServer = "";
the jdl stored on the WMS will properly contain it, the proxy is stored in the SandBoxDir (no symlink to proxy renewal dir) and the user is able to retrieve the output when the job ends.
From NGI_IT: problems after applying the patch released for org.sam.WN* issue on SL6: all the CREAMCE tests are in unknown status
Minutes available online, here: on indico page