Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "EGI-InSPIRE:DMSU digests"

From EGIWiki
Jump to navigation Jump to search
Line 4: Line 4:
therefore it is worth to gather digests,
therefore it is worth to gather digests,
to outline workarounds, to provide pointers to furhter details.
to outline workarounds, to provide pointers to furhter details.
== Jun 30, 2011 ==
=== Growing LB database ===
Cron job responsible for regular purging of LB database fails
in glite 3.2, yielding indefinite growth of the database.
The problem is result of wrong default setting of the purge program
agruments (a bug present since long time), which are not overriden
by YAIM in these versions.
Fix is available with lb-client 5.0.5-1 released with EMI-1.
A workaround is this modification to /opt/glite/sbin/glite-lb-export.sh
  @@ -55,7 +55,7 @@
  # directory with exported data (file per job)
  GLITE_LB_EXPORT_JOBSDIR=${GLITE_LB_EXPORT_JOBSDIR:-$GLITE_LOCATION_VAR/lbexport}
  # purge args (timeouts)
  -GLITE_LB_EXPORT_PURGE_ARGS=${GLITE_LB_EXPORT_PURGE_ARGS:---cleared 2d --aborted 2w --cancelled 2w --other 60d}
  +GLITE_LB_EXPORT_PURGE_ARGS=${GLITE_LB_EXPORT_PURGE_ARGS:---cleared 2d --aborted 15d --cancelled 15d --other 60d}
  # Book Keeping Server
  GLITE_LB_SERVER_PORT=${GLITE_LB_SERVER_PORT:-9000}
  GLITE_LB_EXPORT_BKSERVER=${GLITE_LB_EXPORT_BKSERVER:-localhost:$GLITE_LB_SERVER_PORT}
 
GGUS ticket
[https://ggus.eu/tech/ticket_show.php?ticket=67151 #67151]





Revision as of 15:27, 30 June 2011

The purpose of this page is brief description and indexing of issues solved within DMSU that are likely to have broader impact on EGI Operations therefore it is worth to gather digests, to outline workarounds, to provide pointers to furhter details.

Jun 30, 2011

Growing LB database

Cron job responsible for regular purging of LB database fails in glite 3.2, yielding indefinite growth of the database.

The problem is result of wrong default setting of the purge program agruments (a bug present since long time), which are not overriden by YAIM in these versions.

Fix is available with lb-client 5.0.5-1 released with EMI-1.

A workaround is this modification to /opt/glite/sbin/glite-lb-export.sh

 @@ -55,7 +55,7 @@
  # directory with exported data (file per job)
  GLITE_LB_EXPORT_JOBSDIR=${GLITE_LB_EXPORT_JOBSDIR:-$GLITE_LOCATION_VAR/lbexport}
  # purge args (timeouts)
 -GLITE_LB_EXPORT_PURGE_ARGS=${GLITE_LB_EXPORT_PURGE_ARGS:---cleared 2d --aborted 2w --cancelled 2w --other 60d}
 +GLITE_LB_EXPORT_PURGE_ARGS=${GLITE_LB_EXPORT_PURGE_ARGS:---cleared 2d --aborted 15d --cancelled 15d --other 60d}
  # Book Keeping Server
  GLITE_LB_SERVER_PORT=${GLITE_LB_SERVER_PORT:-9000}
  GLITE_LB_EXPORT_BKSERVER=${GLITE_LB_EXPORT_BKSERVER:-localhost:$GLITE_LB_SERVER_PORT}
 

GGUS ticket #67151


Jun 28, 2011

Insufficient heuristics in reversing DN

Jobs submitted to CREAM through WMS get aborted with the reason like:

 Transfer to CREAM failed due to exception: Failed to create a delegation id
 for job https://grid-lb3.desy.de:9000/1RMsuRv7r8Whlgr41N7enA: reason is Client
 'O=GermanGrid,OU=DESY,CN=Alexander Fomenko' is not issuer of proxy
 'O=GermanGrid,OU=DESY,CN=Alexander Fomenko,CN=proxy,CN=proxy'

(visible through glite-wms-job-status)

This hapens on DN's starting with "O=" and is due to bug #83426 in trustmanager component.

GGUS tickets #71434, #71436

logrotate not really rotating logs for ARC components

The daemons keep writing to rotated (with .1 suffix files).

It's known bug in current ARC release, going to be fixed in the next one. A workaround is sending SIGHUP to A-REX service and restarting gridftp after rotating the files.

GGUS #71901

Failed Nagios probes of ARC CE make it appear unavailable

The problem is caused by stale jobid file left by a failed test which prevents the Nagios probe to submit a new job.

The problem seriously affects the site state in Gridview, hence also the computed site availability metrics.

Currently the responsibility for the probe is unclear, EMI/ARC team refuses to fix it.

GGUS #70997