Difference between revisions of "Tools/Manuals/TS86"
imported>Krakow |
|
(No difference)
|
Revision as of 13:48, 23 November 2012
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Back to Troubleshooting Guide
General notes on Globus job submission error messages
Introduction
This page lists possible causes for Globus job submission errors that typically do not have dedicated entries in the EGI Troubleshooting Guide.
Globus error 12: the connection to the server failed
This error usually indicates there is a communication problem between the WMS/Condor-G (or a UI) and the LCG-CE or OSG-CE:
- The CE may be down.
- The gatekeeper service on the CE may be down.
- The gatekeeper service (usually port 2119) may be unreachable due to a firewall.
It could be also due to a reverse lookup problem, check as follows on the affected client machine:
$ host <CE_hostname> ... $ host <CE_IP>
An error message like the following would indicate a DNS problem:
Host 89.67.45.123.in-addr.arpa. not found: 3(NXDOMAIN)
Globus error 37: the provided RSL 'queue' parameter is invalid
This error means that the WMS/Condor-G (or the UI) tried to submit a job to a queue that does not exist on the LCG-CE or OSG-CE. Possible causes:
- Typo in the job submission command.
- Configuration problem on the CE.
Note: on the CE $GLOBUS_LOCATION/share/globus_gram_job_manager/<jobmanager>.rvf must list every queue (where <jobmanager> would e.g. be lcgpbs).
Globus error 47: The gatekeeper failed to run the job manager
This error is due to a configuration or run-time problem of the gatekeeper. Check in $GLOBUS_LOCATION/etc/grid-services if there is an entry for the requested jobmanager. Check if a critical file system was filled up.
Globus error 76: cannot access cache files in ~/.globus/.gass_cache, check permissions, quota, and disk space
This error can occur when "gass_cache" files are mistakenly removed e.g. by a cleanup job that is not careful enough. Other possible causes are as suggested in the error message.
Globus error 93: gatekeeper failed to find the requested service
This error means that the WMS/Condor-G (or the UI) tried to submit a job to a jobmanager service that does not exist on the LCG-CE or OSG-CE. Possible causes:
- Typo in the job submission command.
- Configuration problem on the CE.
Globus error 121: the job state file doesn't exist
This error indicates the job's state file under $GLOBUS_LOCATION/tmp/gram_job_state on the LCG-CE or OSG-CE no longer exists when it is still needed. A possible cause would be a premature cleanup of that directory e.g. by some cron job.