Tools/Manuals/TS86

From EGIWiki
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators

Contents



Back to Troubleshooting Guide


General notes on Globus job submission error messages

Introduction

This page lists possible causes for Globus job submission errors that typically do not have dedicated entries in the EGI Troubleshooting Guide.

Globus error 12: the connection to the server failed

This error usually indicates there is a communication problem between the WMS/Condor-G (or a UI) and the LCG-CE or OSG-CE:

It could be also due to a reverse lookup problem, check as follows on the affected client machine:

$ host <CE_hostname>
...
$ host <CE_IP>

An error message like the following would indicate a DNS problem:

Host 89.67.45.123.in-addr.arpa. not found: 3(NXDOMAIN)

Globus error 37: the provided RSL 'queue' parameter is invalid

This error means that the WMS/Condor-G (or the UI) tried to submit a job to a queue that does not exist on the LCG-CE or OSG-CE. Possible causes:

Note: on the CE $GLOBUS_LOCATION/share/globus_gram_job_manager/<jobmanager>.rvf must list every queue (where <jobmanager> would e.g. be lcgpbs).

Globus error 47: The gatekeeper failed to run the job manager

This error is due to a configuration or run-time problem of the gatekeeper. Check in $GLOBUS_LOCATION/etc/grid-services if there is an entry for the requested jobmanager. Check if a critical file system was filled up.

Globus error 76: cannot access cache files in ~/.globus/.gass_cache, check permissions, quota, and disk space

This error can occur when "gass_cache" files are mistakenly removed e.g. by a cleanup job that is not careful enough. Other possible causes are as suggested in the error message.

Globus error 93: gatekeeper failed to find the requested service

This error means that the WMS/Condor-G (or the UI) tried to submit a job to a jobmanager service that does not exist on the LCG-CE or OSG-CE. Possible causes:

Globus error 121: the job state file doesn't exist

This error indicates the job's state file under $GLOBUS_LOCATION/tmp/gram_job_state on the LCG-CE or OSG-CE no longer exists when it is still needed. A possible cause would be a premature cleanup of that directory e.g. by some cron job.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox
Print/export