Tools/Manuals/TS62

From EGIWiki
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators

Contents



Back to Troubleshooting Guide


Globus error 3

Full message

*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://my-WMS:9000/SI6d3IRgrYQ65uNVhW3TDQ
Current Status:     Done (Failed)
Exit code:          0
Status Reason:      Got a job held event, reason: Globus error 3: an I/O operation failed
Destination:        some-CE:2119/jobmanager-lcgpbs-long
reached on:         Mon Jun  8 08:23:28 2009
*************************************************************

also GRAM errors

NORESOURCES
error 3

Diagnosis

Usually caused by lack of memory on the CE where the job was sent to. For example, in /opt/globus/lib/perl/Globus/GRAM/Helper.pm (part of the "lcg" job managers) the queue_submit function calls l_check_memory to check the available memory using the values reported in /proc/meminfo:

   my $freefrac = ($memfree+$swapfree)/($memtot+$swaptot);

   return 1 if $freefrac < $MIN_MEM_FREE;

The job submission will fail if that ratio falls below $MIN_MEM_FREE, which is 0.2 (i.e. 20%) by default.

As pointed out by Rod Walker, Linux usually keeps $memfree small, so the numerator typically is determined by $swapfree. To avoid the ratio accidentally falling below the threshold, the swap space should be at least as large as the physical memory: in that case the ratio will exceed 0.5 until the swap space starts getting used (not likely when the physical memory is very large).

Error 3 can also occur due to the following causes:

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox
Print/export