Tools/Manuals/TS83

From EGIWiki
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators

Contents



Back to Troubleshooting Guide


lcgpbs job manager cancels all jobs

Full message

Example entries in /var/log/globus-gatekeeper.log:

Apr 20 21:07:42 ce05 gridinfo: [11413-12965] Submitted job
 1208718397:lcgpbs:internal_2177507569:11193.1208718392
 to batch system lcgpbs with ID 4151819.pbs01.pic.es
Apr 20 21:10:39 ce05 gridinfo: [11413-11413] Job
 1208718397:lcgpbs:internal_2177507569:11193.1208718392
 added to DEQUEUE list
Apr 20 21:10:39 ce05 gridinfo: [11413-19604] Job
 1208718397:lcgpbs:internal_2177507569:11193.1208718392
 (batch ID 4151819.pbs01.pic.es) REMOVED from batch system ok

Diagnosis

The /opt/globus/lib/perl/Globus/GRAM/JobManager/lcgpbs.pm code will cancel any job that is reported with 'W' status:

           if(/Q|W|T/)
           {
               if ($status_line eq "W")
               {
                   $self->cancel();
                   $state = Globus::GRAM::JobState::FAILED;
               }
               else
               {
                   $state = Globus::GRAM::JobState::PENDING;
               }
           }


The reason for this behavior is that jobs submitted by "lcgpbs" should never end up in the 'W' state, which signals a configuration problem: a WN failed to stage in files from the CE via "scp". See ssh problem from WN to CE.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox
Print/export