Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @


From EGIWiki
Jump to navigation Jump to search
Main operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security

Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators

Back to Troubleshooting Guide

JS always fails with 'user proxy expired' message

Full message

$ glite-wms-job-logging-info -v 2

Event: Done
- Exit code                  =    1
- Reason                     =    Got a job held event, reason:
  Globus error 131: the user proxy expired (job is still running)
- Source                     =    LogMonitor
- Status code                =    FAILED


When for one user the WMS jobs submitted to a particular CE consistently fail with that error, the problem may be due to the "grid_monitor" process for that user being stuck on the CE, for example in a call to qstat if the batch system is Torque/PBS:

ops006   17956  0.0  1.2  8016 6452 ?        S    Mar13   0:00 perl
/tmp/grid_manager_monitor_agent.ops006.17387.1000 --delete-self --maxtime=3540s
ops006   23017  0.0  0.1  2176  960 ?        S    Mar13   0:00 sh -c
/usr/bin/qstat -f 2>/dev/null
ops006   23018  0.0  0.2  4372 1112 ?        S    Mar13   0:00 /usr/bin/qstat -f

In this example qstat was stuck in a read from a socket connected to the Torque server.

On an LCG-CE this problem should not occur just for a single user, since the calls to the batch system are executed by the globus-gma daemon on behalf of all users and killed on timeout.


Kill the process that causes the grid_manager_monitor_agent to hang; the latter should clean up and exit shortly afterwards, to be replaced with a new instance a bit later. Also check Proxy expired.