Difference between revisions of "Tools/Manuals/TS83"
< Tools
Jump to navigation
Jump to search
imported>Krakow |
|
(No difference)
|
Revision as of 13:48, 23 November 2012
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Back to Troubleshooting Guide
lcgpbs job manager cancels all jobs
Full message
Example entries in /var/log/globus-gatekeeper.log:
Apr 20 21:07:42 ce05 gridinfo: [11413-12965] Submitted job 1208718397:lcgpbs:internal_2177507569:11193.1208718392 to batch system lcgpbs with ID 4151819.pbs01.pic.es Apr 20 21:10:39 ce05 gridinfo: [11413-11413] Job 1208718397:lcgpbs:internal_2177507569:11193.1208718392 added to DEQUEUE list Apr 20 21:10:39 ce05 gridinfo: [11413-19604] Job 1208718397:lcgpbs:internal_2177507569:11193.1208718392 (batch ID 4151819.pbs01.pic.es) REMOVED from batch system ok
Diagnosis
The /opt/globus/lib/perl/Globus/GRAM/JobManager/lcgpbs.pm code will cancel any job that is reported with 'W' status:
if(/Q|W|T/) { if ($status_line eq "W") { $self->cancel(); $state = Globus::GRAM::JobState::FAILED; } else { $state = Globus::GRAM::JobState::PENDING; } }
The reason for this behavior is that jobs submitted by "lcgpbs" should
never end up in the 'W' state, which signals a configuration
problem: a WN failed to stage in files from the CE via "scp".
See ssh problem from WN to CE.