Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Tools/Manuals/TS83"

From EGIWiki
Jump to navigation Jump to search
(Created page with '{{TOC_right}} Category:FAQ ------ Back to Troubleshooting Guide ------ = lcgpbs job manager cancels all jobs = == Full message == Ex…')
 
 
Line 1: Line 1:
{{Template:Op menubar}}
{{Template:Doc_menubar}}
[[Category:Operations Manuals]]
{{TOC_right}}
{{TOC_right}}
[[Category:FAQ]]
------
------
Back to [[Tools/Manuals/SiteProblemsFollowUp|Troubleshooting Guide]]
Back to [[Tools/Manuals/SiteProblemsFollowUp|Troubleshooting Guide]]

Latest revision as of 13:48, 23 November 2012

Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Back to Troubleshooting Guide


lcgpbs job manager cancels all jobs

Full message

Example entries in /var/log/globus-gatekeeper.log:

Apr 20 21:07:42 ce05 gridinfo: [11413-12965] Submitted job
 1208718397:lcgpbs:internal_2177507569:11193.1208718392
 to batch system lcgpbs with ID 4151819.pbs01.pic.es
Apr 20 21:10:39 ce05 gridinfo: [11413-11413] Job
 1208718397:lcgpbs:internal_2177507569:11193.1208718392
 added to DEQUEUE list
Apr 20 21:10:39 ce05 gridinfo: [11413-19604] Job
 1208718397:lcgpbs:internal_2177507569:11193.1208718392
 (batch ID 4151819.pbs01.pic.es) REMOVED from batch system ok

Diagnosis

The /opt/globus/lib/perl/Globus/GRAM/JobManager/lcgpbs.pm code will cancel any job that is reported with 'W' status:

           if(/Q|W|T/)
           {
               if ($status_line eq "W")
               {
                   $self->cancel();
                   $state = Globus::GRAM::JobState::FAILED;
               }
               else
               {
                   $state = Globus::GRAM::JobState::PENDING;
               }
           }


The reason for this behavior is that jobs submitted by "lcgpbs" should never end up in the 'W' state, which signals a configuration problem: a WN failed to stage in files from the CE via "scp". See ssh problem from WN to CE.