Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Tools/Manuals/TS86

From EGIWiki
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Back to Troubleshooting Guide


General notes on Globus job submission error messages

Introduction

This page lists possible causes for Globus job submission errors that typically do not have dedicated entries in the EGI Troubleshooting Guide.

Globus error 12: the connection to the server failed

This error usually indicates there is a communication problem between the WMS/Condor-G (or a UI) and the LCG-CE or OSG-CE:

  • The CE may be down.
  • The gatekeeper service on the CE may be down.
  • The gatekeeper service (usually port 2119) may be unreachable due to a firewall.

It could be also due to a reverse lookup problem, check as follows on the affected client machine:

$ host <CE_hostname>
...
$ host <CE_IP>

An error message like the following would indicate a DNS problem:

Host 89.67.45.123.in-addr.arpa. not found: 3(NXDOMAIN)

Globus error 37: the provided RSL 'queue' parameter is invalid

This error means that the WMS/Condor-G (or the UI) tried to submit a job to a queue that does not exist on the LCG-CE or OSG-CE. Possible causes:

  • Typo in the job submission command.
  • Configuration problem on the CE.

Note: on the CE $GLOBUS_LOCATION/share/globus_gram_job_manager/<jobmanager>.rvf must list every queue (where <jobmanager> would e.g. be lcgpbs).

Globus error 47: The gatekeeper failed to run the job manager

This error is due to a configuration or run-time problem of the gatekeeper. Check in $GLOBUS_LOCATION/etc/grid-services if there is an entry for the requested jobmanager. Check if a critical file system was filled up.

Globus error 76: cannot access cache files in ~/.globus/.gass_cache, check permissions, quota, and disk space

This error can occur when "gass_cache" files are mistakenly removed e.g. by a cleanup job that is not careful enough. Other possible causes are as suggested in the error message.

Globus error 93: gatekeeper failed to find the requested service

This error means that the WMS/Condor-G (or the UI) tried to submit a job to a jobmanager service that does not exist on the LCG-CE or OSG-CE. Possible causes:

  • Typo in the job submission command.
  • Configuration problem on the CE.

Globus error 121: the job state file doesn't exist

This error indicates the job's state file under $GLOBUS_LOCATION/tmp/gram_job_state on the LCG-CE or OSG-CE no longer exists when it is still needed. A possible cause would be a premature cleanup of that directory e.g. by some cron job.