EGI-InSPIRE:DMSU digests
The purpose of this page is brief description and indexing of issues solved within DMSU that are likely to have broader impact on EGI Operations therefore it is worth to gather digests, to outline workarounds, to provide pointers to furhter details.
Jun 30, 2011
Growing LB database
Cron job responsible for regular purging of LB database fails in glite 3.2, yielding indefinite growth of the database.
The problem is result of wrong default setting of the purge program agruments (a bug present since long time), which are not overriden by YAIM in these versions.
Fix is available with lb-client 5.0.5-1 released with EMI-1.
A workaround is this modification to /opt/glite/sbin/glite-lb-export.sh
@@ -55,7 +55,7 @@ # directory with exported data (file per job) GLITE_LB_EXPORT_JOBSDIR=${GLITE_LB_EXPORT_JOBSDIR:-$GLITE_LOCATION_VAR/lbexport} # purge args (timeouts) -GLITE_LB_EXPORT_PURGE_ARGS=${GLITE_LB_EXPORT_PURGE_ARGS:---cleared 2d --aborted 2w --cancelled 2w --other 60d} +GLITE_LB_EXPORT_PURGE_ARGS=${GLITE_LB_EXPORT_PURGE_ARGS:---cleared 2d --aborted 15d --cancelled 15d --other 60d} # Book Keeping Server GLITE_LB_SERVER_PORT=${GLITE_LB_SERVER_PORT:-9000} GLITE_LB_EXPORT_BKSERVER=${GLITE_LB_EXPORT_BKSERVER:-localhost:$GLITE_LB_SERVER_PORT}
GGUS ticket #67151
Jun 28, 2011
Insufficient heuristics in reversing DN
Jobs submitted to CREAM through WMS get aborted with the reason like:
Transfer to CREAM failed due to exception: Failed to create a delegation id for job https://grid-lb3.desy.de:9000/1RMsuRv7r8Whlgr41N7enA: reason is Client 'O=GermanGrid,OU=DESY,CN=Alexander Fomenko' is not issuer of proxy 'O=GermanGrid,OU=DESY,CN=Alexander Fomenko,CN=proxy,CN=proxy'
(visible through glite-wms-job-status)
This hapens on DN's starting with "O=" and is due to bug #83426 in trustmanager component.
logrotate not really rotating logs for ARC components
The daemons keep writing to rotated (with .1 suffix files).
It's known bug in current ARC release, going to be fixed in the next one. A workaround is sending SIGHUP to A-REX service and restarting gridftp after rotating the files.
GGUS #71901
The problem is caused by stale jobid file left by a failed test which prevents the Nagios probe to submit a new job.
The problem seriously affects the site state in Gridview, hence also the computed site availability metrics.
Currently the responsibility for the probe is unclear, EMI/ARC team refuses to fix it.
GGUS #70997