Revision as of 21:21, 27 September 2011

JS always fails with 'user proxy expired' message

Full message

$ glite-wms-job-logging-info -v 2 https://wms221.cern.ch:9000/iFtw9svc7vBkj3GnvCHwOK

[...]
Event: Done
[...]
- Exit code                  =    1
[...]
- Reason                     =    Got a job held event, reason:
  Globus error 131: the user proxy expired (job is still running)
- Source                     =    LogMonitor
[...]
- Status code                =    FAILED
[...]

Diagnosis

When for one user the WMS jobs submitted to a particular CE consistently fail with that error, the problem may be due to the "grid_monitor" process for that user being stuck on the CE, for example in a call to qstat if the batch system is Torque/PBS:

--------------------------------------------------------------------------------
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
ops006   17956  0.0  1.2  8016 6452 ?        S    Mar13   0:00 perl
/tmp/grid_manager_monitor_agent.ops006.17387.1000 --delete-self --maxtime=3540s
ops006   23017  0.0  0.1  2176  960 ?        S    Mar13   0:00 sh -c
/usr/bin/qstat -f 2>/dev/null
ops006   23018  0.0  0.2  4372 1112 ?        S    Mar13   0:00 /usr/bin/qstat -f
--------------------------------------------------------------------------------

In this example qstat was stuck in a read from a socket connected to the Torque server.

On an LCG-CE this problem should not occur just for a single user, since the calls to the batch system are executed by the globus-gma daemon on behalf of all users and killed on timeout.

Solution

Kill the process that causes the grid_manager_monitor_agent to hang; the latter should clean up and exit shortly afterwards, to be replaced with a new instance a bit later. Also check Proxy expired.

@@ Line 44: / Line 44: @@
 On an LCG-CE this problem should not occur just for a single user,
 since the calls to the batch system are executed by the <font face="Courier New,Courier">globus-gma</font>
-daemon on behalf of all users.
+daemon on behalf of all users and killed on timeout.
 == Solution ==

Difference between revisions of "Tools/Manuals/TS82"

Revision as of 21:21, 27 September 2011

Contents

JS always fails with 'user proxy expired' message

Full message

Diagnosis

Solution

Navigation menu

Difference between revisions of "Tools/Manuals/TS82"

Revision as of 21:21, 27 September 2011

JS always fails with 'user proxy expired' message

Full message

Diagnosis

Solution

Navigation menu

Search