Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Tools/Manuals/TS63

From EGIWiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Back to Troubleshooting Guide


submit-helper script ... gave error: cache export dir ...

Full message

A command like

globus-job-run my-CE/jobmanager-lcgpbs -q ops /bin/hostname

returns an error like:

submit-helper script running on host lxb1761 gave error: cache_export_dir
(/home/dteam002/.lcgjm/globus-cache-export.Of5sOd) on gatekeeper did not
contain a cache_export_dir.tar archive

Diagnosis

The WN cannot do a globus-url-copy back to its CE (needed by the "lcg" jobmanagers). There can be various causes; see below for possible solutions. To test globus-url-copy on the WN, the admin can imitate this example:

  • On the UI do a voms-proxy-init
  • Copy the proxy to the WN:
scp /tmp/x509up_u`id -u` root@my-WN:/tmp/test_proxy
  • On the WN as root:
chown dteam050 /tmp/test_proxy
su - dteam050
  • On the WN as "dteam050":
export X509_USER_PROXY=/tmp/test_proxy
globus-url-copy file:/etc/group gsiftp://my-CE/tmp/test.$$

If there is an error, use the -dbg option to get more details, if needed.

Solution

The error message from globus-url-copy will usually explain the problem.

Possible causes include:

  • Some CRLs on the WN or CE are out of date. Run the cron job manually, check for errors.
  • Other CA files in $X509_CERT_DIR (by default /etc/grid-security/certificates) on the WN or CE are absent or have expired. Check if all of the latest CA rpms have been installed.
  • The gridftp daemon is not running on the CE.
  • CE and WN are not time-synchronized. Even a difference of less than 1 minute can cause a problem.
  • The gatekeeper and the gridftpd on the CE do not map the DN to the same local account (this should never happen on an LCG-CE). Check this as follows:
globus-job-run my-CE /usr/bin/id
globus-url-copy file:/etc/group gsiftp://my-CE/tmp/test.$$
globus-job-run my-CE /bin/ls -l /tmp/test.$$
  • On the WN some script in /etc/profile.d or so unconditionally sets X509_USER_PROXY. It must only be set (to /tmp/x509up_u`id -u`) if it has not been defined already. A job will have its proxy in a temporary file somewhere else.