Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Tools/Manuals/TS55

From EGIWiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Back to Troubleshooting Guide


Jobs sent to my WMS stay in Waiting state forever

Full message

*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://gswms01.cern.ch:9000/9AfNLYg09zhvi7i4T0RRQw
Current Status:     Waiting 
Submitted:          Mon Mar 28 18:50:20 2011 CEST
*************************************************************

Diagnosis

When jobs stay in the Waiting state for a long time, the workload_manager (WM) daemon on the WMS somehow is slow in processing its input queue (consisting of requests for matchmaking, submission or cancellation of jobs). This can have various causes:

  • The WM has a backlog, e.g. due to the WMS being overloaded.
  • The WM sits in an infinite loop spinning the CPU (check with top, strace, etc.). This has not been seen since a very long time, possibly never.
  • The WM sits in a deadlock (check with strace, gdb, etc.). This has not been seen since a long time.
  • The WM keeps crashing (check its log file). A complicated JDL file in the input queue $GLITE_LOCATION_VAR/workload_manager/jobdir/* might trigger a bug; in that case the admin could move any such files out of the way to restore the service.

If only some jobs stay in the Waiting state, while other jobs proceed, see: BrokerHelper: no compatible resources