Globus error 3

Full message


Status info for the Job : https://my-WMS:9000/SI6d3IRgrYQ65uNVhW3TDQ
Current Status:     Done (Failed)
Exit code:          0
Status Reason:      Got a job held event, reason: Globus error 3: an I/O operation failed
Destination:        some-CE:2119/jobmanager-lcgpbs-long
reached on:         Mon Jun  8 08:23:28 2009

also GRAM errors

error 3


Usually caused by lack of memory on the CE where the job was sent to. For example, in /opt/globus/lib/perl/Globus/GRAM/ (part of the "lcg" job managers) the queue_submit function calls l_check_memory to check the available memory using the values reported in /proc/meminfo:

   my $freefrac = ($memfree+$swapfree)/($memtot+$swaptot);

   return 1 if $freefrac < $MIN_MEM_FREE;

The job submission will fail if that ratio falls below $MIN_MEM_FREE, which is 0.2 (i.e. 20%) by default.

As pointed out by Rod Walker, Linux usually keeps $memfree small, so the numerator typically is determined by $swapfree. To avoid the ratio accidentally falling below the threshold, the swap space should be at least as large as the physical memory: in that case the ratio will exceed 0.5 until the swap space starts getting used (not likely when the physical memory is very large).

Error 3 can also occur due to the following causes:

  • lack of disk space or quota
  • a permission problem with the grid account home directory
  • a hardware I/O error