Difference between revisions of "Tools/Manuals/SiteProblemsFollowUp"
< Tools
Jump to navigation
Jump to search
(27 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
{{Template:Op menubar}} | {{Template:Op menubar}} | ||
{{Template: | {{Template:Doc_menubar}} | ||
{{ | {{TOC right}} | ||
[[Category: | [[Category:Operations Manuals]] | ||
= Troubleshooting Guide | = Troubleshooting Guide for Operational Errors on EGI Sites = | ||
Admins may also want to check the [[Tools/Manuals/AdministrationFaq|Administration FAQ]]. | |||
== Authentication == | == Authentication == | ||
Problem with | Problem with authentication or authorization. | ||
# [[Tools/Manuals/TS01|TS01: 7 authentication failed]] | # [[Tools/Manuals/TS01|TS01: 7 authentication failed]] | ||
Line 19: | Line 21: | ||
# [[Tools/Manuals/TS09|TS09: Certificate proxy not yet valid]] | # [[Tools/Manuals/TS09|TS09: Certificate proxy not yet valid]] | ||
# [[Tools/Manuals/TS10|TS10: sslv3 alert bad certificate]] | # [[Tools/Manuals/TS10|TS10: sslv3 alert bad certificate]] | ||
# [[Tools/Manuals/TS11|TS11: GRAM Authentication test failure | # [[Tools/Manuals/TS11|TS11: GRAM Authentication test failure]] | ||
# [[Tools/Manuals/TS12|TS12: No valid credential found ... Bad magic number]] | # [[Tools/Manuals/TS12|TS12: No valid credential found ... Bad magic number]] | ||
# [[Tools/Manuals/TS13|TS13: Generic verification error for VOMS (failure)!]] | # [[Tools/Manuals/TS13|TS13: Generic verification error for VOMS (failure)!]] | ||
Line 27: | Line 29: | ||
# [[Tools/Manuals/TS17|TS17: gss_unwrap: internal problem with SSL BIO]] | # [[Tools/Manuals/TS17|TS17: gss_unwrap: internal problem with SSL BIO]] | ||
# [[Tools/Manuals/TS18|TS18: no passphrase authentication failed]] | # [[Tools/Manuals/TS18|TS18: no passphrase authentication failed]] | ||
== Information System == | |||
Problem in the Information System. Generic documentation: | |||
# [https://tomtools.cern.ch/confluence/display/IS/Home Information System home page] | |||
# [[MAN05|Top-BDII High availability configuration]] | |||
# [https://twiki.cern.ch/twiki/bin/view/EGEE/InfoTrouble Information System Troubleshooting Guide] | |||
# [https://tomtools.cern.ch/confluence/display/IS/FAQ Information System FAQ] | |||
# [https://twiki.cern.ch/twiki/bin/view/EGEE/Glite-BDII BDII reference card] | |||
# [https://twiki.cern.ch/twiki/bin/view/EGEE/InformationSystem Old Information System home page] | |||
Specific items: | |||
# [[Tools/Manuals/TS59|TS59: 444444 waiting jobs]] | |||
# [[Tools/Manuals/TS101|TS101: Service absent in site BDII]] | |||
# [[Tools/Manuals/TS102|TS102: Site absent in top-level BDII]] | |||
# [[Tools/Manuals/TS103|TS103: Some objects missing in site or top-level BDII]] | |||
# [[Tools/Manuals/TS104|TS104: Missing SubCluster entries in a top BDII]] | |||
# [[Tools/Manuals/TS105|TS105: Unreliable gathering of CE Information]] | |||
# [[Tools/Manuals/TS106|TS106: Value of an attribute looks like MjAwNTAzMjIxNzAwMzRaIA]] | |||
# [[Tools/Manuals/TS107|TS107: How to drain a CE]] | |||
# [[Tools/Manuals/TS108|TS108: Software installation tags not published]] | |||
== Workload Management == | == Workload Management == | ||
Line 37: | Line 62: | ||
* [http://grid.pd.infn.it/cream/field.php?n=Main.CREAMTroubleshooting gLite CREAM Troubleshooting] | * [http://grid.pd.infn.it/cream/field.php?n=Main.CREAMTroubleshooting gLite CREAM Troubleshooting] | ||
* [http://wiki.italiangrid.org/twiki/bin/view/CREAM/TroubleshootingGuide EMI CREAM Troubleshooting] | * [http://wiki.italiangrid.org/twiki/bin/view/CREAM/TroubleshootingGuide EMI CREAM Troubleshooting] | ||
* [https://twiki.cern.ch/twiki/bin/view/EGEE/LcgCE LCG-CE configuration options and diagram] | |||
* [[Tools/Manuals/TS87|Dialog between WMS and LCG-CE]] | |||
Specific errors: | Specific errors: | ||
Line 74: | Line 101: | ||
# [[Tools/Manuals/TS83|TS83: lcgpbs job manager cancels all jobs]] | # [[Tools/Manuals/TS83|TS83: lcgpbs job manager cancels all jobs]] | ||
# [[Tools/Manuals/TS84|TS84: Lots of <defunct> processes from globus-gma]] | # [[Tools/Manuals/TS84|TS84: Lots of <defunct> processes from globus-gma]] | ||
# [[Tools/Manuals/TS85|TS85: Tracing a WMS job ID to the batch system job ID]] | |||
# [[Tools/Manuals/TS86|TS86: Other Globus job submission error messages]] | |||
# [[Tools/Manuals/TS88|TS88: WMS does not consider close SE]] | |||
== Data Management == | == Data Management == | ||
A | A data management command failed. | ||
# [[Tools/Manuals/TS21|TS21: lcg cr: Invalid argument]] | # [[Tools/Manuals/TS21|TS21: lcg cr: Invalid argument]] | ||
Line 88: | Line 118: | ||
# [[Tools/Manuals/TS28|TS28: Transport endpoint is not connected]] | # [[Tools/Manuals/TS28|TS28: Transport endpoint is not connected]] | ||
# [[Tools/Manuals/TS29|TS29: Unknown error ... Communication error on send]] | # [[Tools/Manuals/TS29|TS29: Unknown error ... Communication error on send]] | ||
# [[Tools/Manuals/TS30|TS30: mkdir error: Permission denied (error 13 on ...)]] |
Latest revision as of 14:50, 3 July 2018
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
Documentation menu: | Home • | Manuals • | Procedures • | Training • | Other • | Contact ► | For: | VO managers • | Administrators |
Troubleshooting Guide for Operational Errors on EGI Sites
Admins may also want to check the Administration FAQ.
Authentication
Problem with authentication or authorization.
- TS01: 7 authentication failed
- TS02: 530 530 LCMAPS credential mapping NOT successful
- TS03: 530 530 No local mapping for Globus ID
- TS04: 530-Login incorrect
- TS05: Proxy expired
- TS06: 501 501-FTPD GSSAPI error: GSS Major Status: General failure
- TS07: 535 535-FTPD GSSAPI error: GSS Major Status: General failure
- TS08: Invalid CRL: The available CRL has expired
- TS09: Certificate proxy not yet valid
- TS10: sslv3 alert bad certificate
- TS11: GRAM Authentication test failure
- TS12: No valid credential found ... Bad magic number
- TS13: Generic verification error for VOMS (failure)!
- TS14: Host certificate update
- TS15: failed unwrapping ENC message
- TS16: failed unwrapping MIC message
- TS17: gss_unwrap: internal problem with SSL BIO
- TS18: no passphrase authentication failed
Information System
Problem in the Information System. Generic documentation:
- Information System home page
- Top-BDII High availability configuration
- Information System Troubleshooting Guide
- Information System FAQ
- BDII reference card
- Old Information System home page
Specific items:
- TS59: 444444 waiting jobs
- TS101: Service absent in site BDII
- TS102: Site absent in top-level BDII
- TS103: Some objects missing in site or top-level BDII
- TS104: Missing SubCluster entries in a top BDII
- TS105: Unreliable gathering of CE Information
- TS106: Value of an attribute looks like MjAwNTAzMjIxNzAwMzRaIA
- TS107: How to drain a CE
- TS108: Software installation tags not published
Workload Management
Explanations and recipes for dealing with problems observed for jobs submitted to CEs of various types via the gLite/EMI WMS or Condor-G, or directly to CREAM CEs.
Generic documentation:
- gLite job submission diagram
- gLite CREAM Troubleshooting
- EMI CREAM Troubleshooting
- LCG-CE configuration options and diagram
- Dialog between WMS and LCG-CE
Specific errors:
- TS50: 10 data transfer to the server failed
- TS51: Cannot read JobWrapper output...
- TS52: Cannot download .BrokerInfo
- TS53: BrokerHelper: no compatible resources
- TS54: request expired
- TS55: Jobs sent to my WMS stay in Waiting state forever
- TS56: Jobs sent to some CE stay in Ready state forever
- TS57: Jobs sent to some CE stay in Scheduled state forever
- TS58: Jobs sent to some CE stay in Running state forever
- TS59: 444444 waiting jobs
- TS60: ssh problem from WN to CE
- TS05: Proxy expired
- TS62: Globus error 3
- TS63: submit-helper script ... gave error: cache export dir ...
- TS64: 8 the user cancelled the job
- TS65: 43 the job manager failed to stage the executable
- TS66: Globus error 17: the job failed when the job manager attempted to run it
- TS67: Globus error 21: the job manager failed to locate an internal script argument file
- TS68: Globus error 22: the job manager failed to create an internal script argument file
- TS69: Globus error 24: the job manager detected an invalid script response
- TS70: Globus error 25: the job manager detected an invalid script status
- TS71: Globus error 79: connecting to the job manager failed.
- TS72: Globus error 94: the jobmanager does not accept any new requests (shutting down)
- TS73: Globus error 155: the job manager could not stage out a file
- TS74: Globus error 158: the job manager could not lock the state lock file
- TS75: Unspecified gridmanager error
- TS76: Job got an error while in the CondorG queue
- TS77: GRAM Job submission failed because the job manager failed to open stderr (error code 74)
- TS78: MPI. ssh: connect to host <hname> port 22: No route to host
- TS13: Generic verification error for VOMS (failure)!
- TS80: globus-job-run returns nothing
- TS81: Cannot take token!
- TS82: JS always fails with 'user proxy expired' message
- TS83: lcgpbs job manager cancels all jobs
- TS84: Lots of <defunct> processes from globus-gma
- TS85: Tracing a WMS job ID to the batch system job ID
- TS86: Other Globus job submission error messages
- TS88: WMS does not consider close SE
Data Management
A data management command failed.
- TS21: lcg cr: Invalid argument
- TS22: 425 425 Can't open data connection. timed out() failed.
- TS23: gridftp works only once within a minute or so
- TS24: LFC and DPM troubleshooting page
- TS12: No valid credential found ... Bad magic number
- TS26: No valid credential found ... System error
- TS27: Could not establish context
- TS28: Transport endpoint is not connected
- TS29: Unknown error ... Communication error on send
- TS30: mkdir error: Permission denied (error 13 on ...)