Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Tools/Manuals/TS62

From EGIWiki
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Back to Troubleshooting Guide


Globus error 3

Full message

*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://my-WMS:9000/SI6d3IRgrYQ65uNVhW3TDQ
Current Status:     Done (Failed)
Exit code:          0
Status Reason:      Got a job held event, reason: Globus error 3: an I/O operation failed
Destination:        some-CE:2119/jobmanager-lcgpbs-long
reached on:         Mon Jun  8 08:23:28 2009
*************************************************************

also GRAM errors

NORESOURCES
error 3

Diagnosis

Usually caused by lack of memory on the CE where the job was sent to. For example, in /opt/globus/lib/perl/Globus/GRAM/Helper.pm (part of the "lcg" job managers) the queue_submit function calls l_check_memory to check the available memory using the values reported in /proc/meminfo:

   my $freefrac = ($memfree+$swapfree)/($memtot+$swaptot);

   return 1 if $freefrac < $MIN_MEM_FREE;

The job submission will fail if that ratio falls below $MIN_MEM_FREE, which is 0.2 (i.e. 20%) by default.

As pointed out by Rod Walker, Linux usually keeps $memfree small, so the numerator typically is determined by $swapfree. To avoid the ratio accidentally falling below the threshold, the swap space should be at least as large as the physical memory: in that case the ratio will exceed 0.5 until the swap space starts getting used (not likely when the physical memory is very large).

Error 3 can also occur due to the following causes:

  • lack of disk space or quota
  • a permission problem with the grid account home directory
  • a hardware I/O error