Difference between revisions of "Tools/Manuals/TS55"
< Tools
Jump to navigation
Jump to search
m |
|||
Line 1: | Line 1: | ||
{{Template:Op menubar}} | |||
{{Template:Doc_menubar}} | |||
[[Category:Operations Manuals]] | |||
{{TOC_right}} | {{TOC_right}} | ||
------ | ------ | ||
Back to [[Tools/Manuals/SiteProblemsFollowUp|Troubleshooting Guide]] | Back to [[Tools/Manuals/SiteProblemsFollowUp|Troubleshooting Guide]] |
Latest revision as of 13:44, 23 November 2012
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Back to Troubleshooting Guide
Jobs sent to my WMS stay in Waiting state forever
Full message
************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://gswms01.cern.ch:9000/9AfNLYg09zhvi7i4T0RRQw Current Status: Waiting Submitted: Mon Mar 28 18:50:20 2011 CEST *************************************************************
Diagnosis
When jobs stay in the Waiting state for a long time, the workload_manager (WM) daemon on the WMS somehow is slow in processing its input queue (consisting of requests for matchmaking, submission or cancellation of jobs). This can have various causes:
- The WM has a backlog, e.g. due to the WMS being overloaded.
- The WM sits in an infinite loop spinning the CPU (check with top, strace, etc.). This has not been seen since a very long time, possibly never.
- The WM sits in a deadlock (check with strace, gdb, etc.). This has not been seen since a long time.
- The WM keeps crashing (check its log file). A complicated JDL file in the input queue $GLITE_LOCATION_VAR/workload_manager/jobdir/* might trigger a bug; in that case the admin could move any such files out of the way to restore the service.
If only some jobs stay in the Waiting state, while other jobs proceed, see: BrokerHelper: no compatible resources