Tools/Manuals/TS83
< Tools
Jump to navigation
Jump to search
Revision as of 13:37, 26 May 2011 by Aesch (talk | contribs) (Created page with '{{TOC_right}} Category:FAQ ------ Back to Troubleshooting Guide ------ = lcgpbs job manager cancels all jobs = == Full message == Ex…')
Back to Troubleshooting Guide
lcgpbs job manager cancels all jobs
Full message
Example entries in /var/log/globus-gatekeeper.log:
Apr 20 21:07:42 ce05 gridinfo: [11413-12965] Submitted job 1208718397:lcgpbs:internal_2177507569:11193.1208718392 to batch system lcgpbs with ID 4151819.pbs01.pic.es Apr 20 21:10:39 ce05 gridinfo: [11413-11413] Job 1208718397:lcgpbs:internal_2177507569:11193.1208718392 added to DEQUEUE list Apr 20 21:10:39 ce05 gridinfo: [11413-19604] Job 1208718397:lcgpbs:internal_2177507569:11193.1208718392 (batch ID 4151819.pbs01.pic.es) REMOVED from batch system ok
Diagnosis
The /opt/globus/lib/perl/Globus/GRAM/JobManager/lcgpbs.pm code will cancel any job that is reported with 'W' status:
if(/Q|W|T/) { if ($status_line eq "W") { $self->cancel(); $state = Globus::GRAM::JobState::FAILED; } else { $state = Globus::GRAM::JobState::PENDING; } }
The reason for this behavior is that jobs submitted by "lcgpbs" should
never end up in the 'W' state, which signals a configuration
problem: a WN failed to stage in files from the CE via "scp".
See ssh problem from WN to CE.