Difference between revisions of "Agenda-07-10-2013"
(→Jobs aborted with the error "CREAM'S database has been scratched and all its jobs have been lost") |
|||
(16 intermediate revisions by 3 users not shown) | |||
Line 15: | Line 15: | ||
==== 1.1 News from URT ==== | ==== 1.1 News from URT ==== | ||
The following product releases are expected to be released in the EMI repositories in the next update (today): | |||
* dcap - EMi 3 | |||
* cream, blah, cream lsf module - EMI 3 | |||
* ui/wn (EMI 2/3) | |||
* canl-java (EMI 3) | |||
* bdii (EMI , EMI 3) | |||
* voms EMI 3 | |||
DPM and lcgutils are being removed from the EMI repositories, EPEL is the authoritative repository for these products. | |||
=== 1.2 Staged rollout updates === | === 1.2 Staged rollout updates === | ||
Line 22: | Line 32: | ||
* IGE.security-integration v. 3.0.0 | * IGE.security-integration v. 3.0.0 | ||
* EMI.cream-torque v. 2.1.1 | * EMI.cream-torque v. 2.1.1 | ||
* EMI.emi-cluster v. | * EMI.emi-cluster v. 2.0.1 | ||
* EMI.storm. | * EMI.px v. 1.3.34 | ||
** https protocol is not working properly or missing feature. For further information consult the ticket opened. | * IGE.gridway v. 5.14.1 | ||
* EMI.storm v. 1.11.2 | |||
** https protocol is not working properly or missing feature. For further information consult the ticket opened [https://ggus.eu/ws/ticket_info.php?ticket=97527 GGUS]. | |||
URLs with a double "/" between the host and the vo dir, like this one: | URLs with a double "/" between the host and the vo dir, like this one: | ||
Line 31: | Line 43: | ||
=== 1.3 Next UMD releases === | === 1.3 Next UMD releases === | ||
* UMD-2 | |||
** Globus gsissh 5.3.9 | |||
** DPM 1.8.6 | |||
** CREAM-ge 2.0.2 | |||
** It will be released as soon as dCache 2.2.17 and the newly released BDII have been tested in SR | |||
* UMD-3 | |||
** High priority: StoRM 1.11.2 (early release) | |||
** Normal release end of October, beginning of November | |||
=== 2 Operational issues === | === 2 Operational issues === | ||
==== 2.1 Updates from DMSU ==== | ==== 2.1 Updates from DMSU ==== | ||
From the last grid ops meeting: | |||
===== Jobs aborted with the error "CREAM'S database has been scratched and all its jobs have been lost" ===== | |||
see details in [https://ggus.eu/ws/ticket_info.php?ticket=97354 GGUS #97354] | |||
Since Sep 13th (at least with WMS servers at CNAF) almost all the production jobs are failing, mainly due to two bugs: for the first one, (almost) all the jobs in the ICE DB are marked with ''DB_ID=0''; for the second bug, a particular CE (prod-ce-01.pd.infn.it) was triggering the signal of deleting the jobs with ''DB_ID=0''. | |||
All the WMS servers which contacted that CE are affected by this issue | |||
It was found out that CREAM CE(s) are sending (since a certain date) an empty DB_ID information as result of an interoperability problem (missing SOAP_HEADER) between gSOAP and Axis2 (ICE uses gSOAP, CREAM uses Axis2 as SOAP frameworks). | |||
The [https://issues.infn.it/jira/browse/CREAM-125 fix (CREAM-125)] has been already committed: with the new version of ''glite-ce-cream-client-api-c'' CREAM re-starts to send to ICE a not empty ''DB_ID'' in the JobRegister query | |||
Other tickets opened for this issue: | |||
- by the users [https://ggus.eu/ws/ticket_info.php?ticket=97360 GGUS #97360] [https://ggus.eu/ws/ticket_info.php?ticket=97402 GGUS #97402] [https://ggus.eu/ws/ticket_info.php?ticket=97420 GGUS #97420] [https://ggus.eu/ws/ticket_info.php?ticket=97453 GGUS #97453] | |||
- by developers for debugging reasons: [https://ggus.eu/ws/ticket_info.php?ticket=97507 GGUS #97507] [https://ggus.eu/ws/ticket_info.php?ticket=97508 GGUS #97508] [https://ggus.eu/ws/ticket_info.php?ticket=97509 GGUS #97509] | |||
'''UPDATE 07-10-2013''': fix released today (EMI-3 Update 9); some weeks ago the "bad" CE was restarted so it is not sending any more the wrong signal | |||
==== 2.2 Connection problems for Argus ==== | |||
As reported in this [https://ggus.eu/ws/ticket_info.php?ticket=96586 GGUS ticket], argus clients may return connection errors: | |||
"argus_authZ": failed to authorize XACML request: Failed sending data to the peer | |||
As reported in the ticket this is due to a [http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8021840 known issue] in jdk7. The bug fix is expected for the u60 of Oracle JDK. | |||
* OpenJDK 1.7 >= 2.3.10.4 already contains the fix, and it is not affected (tested at CERN). | |||
* OpenJDK is officially supported by Argus. | |||
==== 2.3 SHA-2 update ==== | |||
Monitoring probe for dCache is not ready yet. It is expected during this week. | |||
The next steps are: | |||
* Deployment of the probes in midmon | |||
* Test of the probe | |||
* Update the ticket template to be used by ROD | |||
* Broadcast to sites with the definitive calendar: Update before 1st December | |||
=== 3. AOB === | === 3. AOB === | ||
==== 3.2 Next meeting ==== | ==== 3.2 Next meeting ==== | ||
Next meeting will be after the OMB (scheduled on October 24) | |||
=== 4. Minutes === | === 4. Minutes === | ||
==== Participants ==== | ==== Participants ==== |
Latest revision as of 14:11, 7 October 2013
Audio conference link | Conference system is Adobe Connect, no password required. |
Audio conference details | Indico page |
1. Middleware releases and staged rollout
1.1 News from URT
The following product releases are expected to be released in the EMI repositories in the next update (today):
- dcap - EMi 3
- cream, blah, cream lsf module - EMI 3
- ui/wn (EMI 2/3)
- canl-java (EMI 3)
- bdii (EMI , EMI 3)
- voms EMI 3
DPM and lcgutils are being removed from the EMI repositories, EPEL is the authoritative repository for these products.
1.2 Staged rollout updates
Presently under Staged Rollout:
- IGE.security-integration v. 3.0.0
- EMI.cream-torque v. 2.1.1
- EMI.emi-cluster v. 2.0.1
- EMI.px v. 1.3.34
- IGE.gridway v. 5.14.1
- EMI.storm v. 1.11.2
- https protocol is not working properly or missing feature. For further information consult the ticket opened GGUS.
URLs with a double "/" between the host and the vo dir, like this one: srm://test27.egi.cesga.es:8444//ops.vo.ibergrid.eu/test_dir44/test_file02.txt will not work like in gridftp.
1.3 Next UMD releases
- UMD-2
- Globus gsissh 5.3.9
- DPM 1.8.6
- CREAM-ge 2.0.2
- It will be released as soon as dCache 2.2.17 and the newly released BDII have been tested in SR
- UMD-3
- High priority: StoRM 1.11.2 (early release)
- Normal release end of October, beginning of November
2 Operational issues
2.1 Updates from DMSU
From the last grid ops meeting:
Jobs aborted with the error "CREAM'S database has been scratched and all its jobs have been lost"
see details in GGUS #97354
Since Sep 13th (at least with WMS servers at CNAF) almost all the production jobs are failing, mainly due to two bugs: for the first one, (almost) all the jobs in the ICE DB are marked with DB_ID=0; for the second bug, a particular CE (prod-ce-01.pd.infn.it) was triggering the signal of deleting the jobs with DB_ID=0. All the WMS servers which contacted that CE are affected by this issue
It was found out that CREAM CE(s) are sending (since a certain date) an empty DB_ID information as result of an interoperability problem (missing SOAP_HEADER) between gSOAP and Axis2 (ICE uses gSOAP, CREAM uses Axis2 as SOAP frameworks).
The fix (CREAM-125) has been already committed: with the new version of glite-ce-cream-client-api-c CREAM re-starts to send to ICE a not empty DB_ID in the JobRegister query
Other tickets opened for this issue:
- by the users GGUS #97360 GGUS #97402 GGUS #97420 GGUS #97453
- by developers for debugging reasons: GGUS #97507 GGUS #97508 GGUS #97509
UPDATE 07-10-2013: fix released today (EMI-3 Update 9); some weeks ago the "bad" CE was restarted so it is not sending any more the wrong signal
2.2 Connection problems for Argus
As reported in this GGUS ticket, argus clients may return connection errors:
"argus_authZ": failed to authorize XACML request: Failed sending data to the peer
As reported in the ticket this is due to a known issue in jdk7. The bug fix is expected for the u60 of Oracle JDK.
- OpenJDK 1.7 >= 2.3.10.4 already contains the fix, and it is not affected (tested at CERN).
- OpenJDK is officially supported by Argus.
2.3 SHA-2 update
Monitoring probe for dCache is not ready yet. It is expected during this week.
The next steps are:
- Deployment of the probes in midmon
- Test of the probe
- Update the ticket template to be used by ROD
- Broadcast to sites with the definitive calendar: Update before 1st December
3. AOB
3.2 Next meeting
Next meeting will be after the OMB (scheduled on October 24)