Difference between revisions of "Tools/Manuals/TS62"
(Created page with '{{TOC_right}} Category:FAQ ------ Back to Troubleshooting Guide ------ = Globus error 3 = == Full message == ***********************…') |
m |
||
Line 29: | Line 29: | ||
Usually caused by '''lack of memory''' on the CE where the job was sent to. | Usually caused by '''lack of memory''' on the CE where the job was sent to. | ||
For example, in <font face="Courier New,Courier">/opt/globus/lib/perl/Globus/GRAM/Helper.pm</font> (part of | For example, in <font face="Courier New,Courier">/opt/globus/lib/perl/Globus/GRAM/Helper.pm</font> (part of | ||
the "lcg" job managers) the | the "lcg" job managers) the <font face="Courier New,Courier">queue_submit</font> function calls | ||
<font face="Courier New,Courier">l_check_memory</font> to check the available memory using the values | <font face="Courier New,Courier">l_check_memory</font> to check the available memory using the values | ||
reported in <font face="Courier New,Courier">/proc/meminfo</font>: | reported in <font face="Courier New,Courier">/proc/meminfo</font>: |
Revision as of 14:43, 25 May 2011
Back to Troubleshooting Guide
Globus error 3
Full message
************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://my-WMS:9000/SI6d3IRgrYQ65uNVhW3TDQ Current Status: Done (Failed) Exit code: 0 Status Reason: Got a job held event, reason: Globus error 3: an I/O operation failed Destination: some-CE:2119/jobmanager-lcgpbs-long reached on: Mon Jun 8 08:23:28 2009 *************************************************************
also GRAM errors
NORESOURCES
error 3
Diagnosis
Usually caused by lack of memory on the CE where the job was sent to. For example, in /opt/globus/lib/perl/Globus/GRAM/Helper.pm (part of the "lcg" job managers) the queue_submit function calls l_check_memory to check the available memory using the values reported in /proc/meminfo:
my $freefrac = ($memfree+$swapfree)/($memtot+$swaptot); return 1 if $freefrac < $MIN_MEM_FREE;
The job submission will fail if that ratio falls below $MIN_MEM_FREE, which is 0.2 (i.e. 20%) by default.
As pointed out by Rod Walker, Linux usually keeps $memfree small, so the numerator typically is determined by $swapfree. To avoid the ratio accidentally falling below the threshold, the swap space should be at least as large as the physical memory: in that case the ratio will exceed 0.5 until the swap space starts getting used (not likely when the physical memory is very large).
Error 3 can also occur due to the following causes:
- lack of disk space or quota
- a permission problem with the grid account home directory
- a hardware I/O error