Tools/Manuals/TS55

From EGIWiki
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Back to Troubleshooting Guide


Jobs sent to my WMS stay in Waiting state forever

Full message

*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://gswms01.cern.ch:9000/9AfNLYg09zhvi7i4T0RRQw
Current Status:     Waiting 
Submitted:          Mon Mar 28 18:50:20 2011 CEST
*************************************************************

Diagnosis

When jobs stay in the Waiting state for a long time, the workload_manager (WM) daemon on the WMS somehow is slow in processing its input queue (consisting of requests for matchmaking, submission or cancellation of jobs). This can have various causes:

  • The WM has a backlog, e.g. due to the WMS being overloaded.
  • The WM sits in an infinite loop spinning the CPU (check with top, strace, etc.). This has not been seen since a very long time, possibly never.
  • The WM sits in a deadlock (check with strace, gdb, etc.). This has not been seen since a long time.
  • The WM keeps crashing (check its log file). A complicated JDL file in the input queue $GLITE_LOCATION_VAR/workload_manager/jobdir/* might trigger a bug; in that case the admin could move any such files out of the way to restore the service.

If only some jobs stay in the Waiting state, while other jobs proceed, see: BrokerHelper: no compatible resources