From EGIWiki
Jump to: navigation, search
Main operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security

Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


Troubleshooting Guide for Operational Errors on EGI Sites

Admins may also want to check the Administration FAQ.


Problem with authentication or authorization.

  1. TS01: 7 authentication failed
  2. TS02: 530 530 LCMAPS credential mapping NOT successful
  3. TS03: 530 530 No local mapping for Globus ID
  4. TS04: 530-Login incorrect
  5. TS05: Proxy expired
  6. TS06: 501 501-FTPD GSSAPI error: GSS Major Status: General failure
  7. TS07: 535 535-FTPD GSSAPI error: GSS Major Status: General failure
  8. TS08: Invalid CRL: The available CRL has expired
  9. TS09: Certificate proxy not yet valid
  10. TS10: sslv3 alert bad certificate
  11. TS11: GRAM Authentication test failure
  12. TS12: No valid credential found ... Bad magic number
  13. TS13: Generic verification error for VOMS (failure)!
  14. TS14: Host certificate update
  15. TS15: failed unwrapping ENC message
  16. TS16: failed unwrapping MIC message
  17. TS17: gss_unwrap: internal problem with SSL BIO
  18. TS18: no passphrase authentication failed

Information System

Problem in the Information System. Generic documentation:

  1. Information System home page
  2. Top-BDII High availability configuration
  3. Information System Troubleshooting Guide
  4. Information System FAQ
  5. BDII reference card
  6. Old Information System home page

Specific items:

  1. TS59: 444444 waiting jobs
  2. TS101: Service absent in site BDII
  3. TS102: Site absent in top-level BDII
  4. TS103: Some objects missing in site or top-level BDII
  5. TS104: Missing SubCluster entries in a top BDII
  6. TS105: Unreliable gathering of CE Information
  7. TS106: Value of an attribute looks like MjAwNTAzMjIxNzAwMzRaIA
  8. TS107: How to drain a CE
  9. TS108: Software installation tags not published

Workload Management

Explanations and recipes for dealing with problems observed for jobs submitted to CEs of various types via the gLite/EMI WMS or Condor-G, or directly to CREAM CEs.

Generic documentation:

Specific errors:

  1. TS50: 10 data transfer to the server failed
  2. TS51: Cannot read JobWrapper output...
  3. TS52: Cannot download .BrokerInfo
  4. TS53: BrokerHelper: no compatible resources
  5. TS54: request expired
  6. TS55: Jobs sent to my WMS stay in Waiting state forever
  7. TS56: Jobs sent to some CE stay in Ready state forever
  8. TS57: Jobs sent to some CE stay in Scheduled state forever
  9. TS58: Jobs sent to some CE stay in Running state forever
  10. TS59: 444444 waiting jobs
  11. TS60: ssh problem from WN to CE
  12. TS05: Proxy expired
  13. TS62: Globus error 3
  14. TS63: submit-helper script ... gave error: cache export dir ...
  15. TS64: 8 the user cancelled the job
  16. TS65: 43 the job manager failed to stage the executable
  17. TS66: Globus error 17: the job failed when the job manager attempted to run it
  18. TS67: Globus error 21: the job manager failed to locate an internal script argument file
  19. TS68: Globus error 22: the job manager failed to create an internal script argument file
  20. TS69: Globus error 24: the job manager detected an invalid script response
  21. TS70: Globus error 25: the job manager detected an invalid script status
  22. TS71: Globus error 79: connecting to the job manager failed.
  23. TS72: Globus error 94: the jobmanager does not accept any new requests (shutting down)
  24. TS73: Globus error 155: the job manager could not stage out a file
  25. TS74: Globus error 158: the job manager could not lock the state lock file
  26. TS75: Unspecified gridmanager error
  27. TS76: Job got an error while in the CondorG queue
  28. TS77: GRAM Job submission failed because the job manager failed to open stderr (error code 74)
  29. TS78: MPI. ssh: connect to host <hname> port 22: No route to host
  30. TS13: Generic verification error for VOMS (failure)!
  31. TS80: globus-job-run returns nothing
  32. TS81: Cannot take token!
  33. TS82: JS always fails with 'user proxy expired' message
  34. TS83: lcgpbs job manager cancels all jobs
  35. TS84: Lots of <defunct> processes from globus-gma
  36. TS85: Tracing a WMS job ID to the batch system job ID
  37. TS86: Other Globus job submission error messages
  38. TS88: WMS does not consider close SE

Data Management

A data management command failed.

  1. TS21: lcg cr: Invalid argument
  2. TS22: 425 425 Can't open data connection. timed out() failed.
  3. TS23: gridftp works only once within a minute or so
  4. TS24: LFC and DPM troubleshooting page
  5. TS12: No valid credential found ... Bad magic number
  6. TS26: No valid credential found ... System error
  7. TS27: Could not establish context
  8. TS28: Transport endpoint is not connected
  9. TS29: Unknown error ... Communication error on send
  10. TS30: mkdir error: Permission denied (error 13 on ...)
Personal tools