Tools/Manuals/TS74

From EGIWiki
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Back to Troubleshooting Guide


Globus error 158: the job manager could not lock the state lock file

Full message

$ glite-wms-job-logging-info -v 2 https://wms221.cern.ch:9000/eFtw9svc7vBkj3GnvCHwOG

[...]
Event: Done
[...]
- Exit code                  =    1
[...]
- Reason                     =    Got a job held event, reason:
  Globus error 158: the job manager could not lock the state lock file
- Source                     =    LogMonitor
[...]
- Status code                =    FAILED
[...]

Diagnosis

This error can occur when the user proxy's mapping on the CE changed between the time the job was submitted and the time it was monitored or cleaned up. Possible causes:

  • The CE got reconfigured: account names or UIDs changed, or the gridmapdir that remembers pool account mappings was cleaned up too aggressively.
  • When the CE relies on a classic grid-mapfile, the DN mapping may change when the DN was added to or removed from a particular group or role in the VOMS server. See the next item.
  • On an LCG-CE a VOMS mapping is tried first, but it could fail e.g. when there is no corresponding free pool account for the proxy. It could also fail when the VOMS configuration of the CE is wrong (e.g. VOMS server certificate copy for some VO has expired or a corresponding *.lsc file is absent or wrong). Normally that would lead to a fatal LCAS error before a mapping is even tried by the LCMAPS module, but in case it does not, a mapping based on the proxy's DN would be tried next. VOMS and DN mappings usually involve different accounts. The VOMS mapping then could have succeeded when the job was submitted, while a DN mapping was used later when the job was monitored or cleaned up, or vice versa.

Note that the problem might also occur if $GLOBUS_LOCATION/tmp on the CE is on an NFS and there was an NFS problem, particularly if the mount options are risky (e.g. "soft" must not be used).