Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Tools/Manuals/SiteProblemsFollowUp

From EGIWiki
Jump to navigation Jump to search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators


Troubleshooting Guide for Operational Errors on EGI Sites

Admins may also want to check the Administration FAQ.

Authentication

Problem with authentication or authorization.

  1. TS01: 7 authentication failed
  2. TS02: 530 530 LCMAPS credential mapping NOT successful
  3. TS03: 530 530 No local mapping for Globus ID
  4. TS04: 530-Login incorrect
  5. TS05: Proxy expired
  6. TS06: 501 501-FTPD GSSAPI error: GSS Major Status: General failure
  7. TS07: 535 535-FTPD GSSAPI error: GSS Major Status: General failure
  8. TS08: Invalid CRL: The available CRL has expired
  9. TS09: Certificate proxy not yet valid
  10. TS10: sslv3 alert bad certificate
  11. TS11: GRAM Authentication test failure
  12. TS12: No valid credential found ... Bad magic number
  13. TS13: Generic verification error for VOMS (failure)!
  14. TS14: Host certificate update
  15. TS15: failed unwrapping ENC message
  16. TS16: failed unwrapping MIC message
  17. TS17: gss_unwrap: internal problem with SSL BIO
  18. TS18: no passphrase authentication failed

Information System

Problem in the Information System. Generic documentation:

  1. Information System home page
  2. Top-BDII High availability configuration
  3. Information System Troubleshooting Guide
  4. Information System FAQ
  5. BDII reference card
  6. Old Information System home page

Specific items:

  1. TS59: 444444 waiting jobs
  2. TS101: Service absent in site BDII
  3. TS102: Site absent in top-level BDII
  4. TS103: Some objects missing in site or top-level BDII
  5. TS104: Missing SubCluster entries in a top BDII
  6. TS105: Unreliable gathering of CE Information
  7. TS106: Value of an attribute looks like MjAwNTAzMjIxNzAwMzRaIA
  8. TS107: How to drain a CE
  9. TS108: Software installation tags not published

Workload Management

Explanations and recipes for dealing with problems observed for jobs submitted to CEs of various types via the gLite/EMI WMS or Condor-G, or directly to CREAM CEs.

Generic documentation:

Specific errors:

  1. TS50: 10 data transfer to the server failed
  2. TS51: Cannot read JobWrapper output...
  3. TS52: Cannot download .BrokerInfo
  4. TS53: BrokerHelper: no compatible resources
  5. TS54: request expired
  6. TS55: Jobs sent to my WMS stay in Waiting state forever
  7. TS56: Jobs sent to some CE stay in Ready state forever
  8. TS57: Jobs sent to some CE stay in Scheduled state forever
  9. TS58: Jobs sent to some CE stay in Running state forever
  10. TS59: 444444 waiting jobs
  11. TS60: ssh problem from WN to CE
  12. TS05: Proxy expired
  13. TS62: Globus error 3
  14. TS63: submit-helper script ... gave error: cache export dir ...
  15. TS64: 8 the user cancelled the job
  16. TS65: 43 the job manager failed to stage the executable
  17. TS66: Globus error 17: the job failed when the job manager attempted to run it
  18. TS67: Globus error 21: the job manager failed to locate an internal script argument file
  19. TS68: Globus error 22: the job manager failed to create an internal script argument file
  20. TS69: Globus error 24: the job manager detected an invalid script response
  21. TS70: Globus error 25: the job manager detected an invalid script status
  22. TS71: Globus error 79: connecting to the job manager failed.
  23. TS72: Globus error 94: the jobmanager does not accept any new requests (shutting down)
  24. TS73: Globus error 155: the job manager could not stage out a file
  25. TS74: Globus error 158: the job manager could not lock the state lock file
  26. TS75: Unspecified gridmanager error
  27. TS76: Job got an error while in the CondorG queue
  28. TS77: GRAM Job submission failed because the job manager failed to open stderr (error code 74)
  29. TS78: MPI. ssh: connect to host <hname> port 22: No route to host
  30. TS13: Generic verification error for VOMS (failure)!
  31. TS80: globus-job-run returns nothing
  32. TS81: Cannot take token!
  33. TS82: JS always fails with 'user proxy expired' message
  34. TS83: lcgpbs job manager cancels all jobs
  35. TS84: Lots of <defunct> processes from globus-gma
  36. TS85: Tracing a WMS job ID to the batch system job ID
  37. TS86: Other Globus job submission error messages
  38. TS88: WMS does not consider close SE

Data Management

A data management command failed.

  1. TS21: lcg cr: Invalid argument
  2. TS22: 425 425 Can't open data connection. timed out() failed.
  3. TS23: gridftp works only once within a minute or so
  4. TS24: LFC and DPM troubleshooting page
  5. TS12: No valid credential found ... Bad magic number
  6. TS26: No valid credential found ... System error
  7. TS27: Could not establish context
  8. TS28: Transport endpoint is not connected
  9. TS29: Unknown error ... Communication error on send
  10. TS30: mkdir error: Permission denied (error 13 on ...)