Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Tools/Manuals/TS105

From EGIWiki
< Tools
Revision as of 14:55, 15 September 2011 by Aesch (talk | contribs) (Created page with '{{TOC_right}} Category:FAQ ------ Back to Troubleshooting Guide ------ = Unreliable gathering of CE Information = == Error == * GSta…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Back to Troubleshooting Guide


Unreliable gathering of CE Information

Error

  • GStat graphs show an erratic number of CPUs for some CE
  • the number of waiting jobs for some CE is intermittently reported as 444444

Diagnosis

Such problems may be due to the glite-info-dynamic-ce or glite-info-dynamic-scheduler-wrapper info provider timing out.

Solution

For PBS/Torque/Maui systems:

  • Many stale files for old jobs in /var/spool/pbs/server_priv/jobs or /var/torque/server_priv/jobs could slow down qstat: in that case such files should be deleted and the pbs_server restarted.
  • With older versions of the middleware and/or batch systems it was a good idea to replace qstat etc. with versions that would cache the results for a while. These days that should not be needed (see next items), but you may want to check out the utilities provided by NIKHEF at the time:
  • Consider upgrading Torque and/or Maui to more recent versions, but beware of potential compatibility issues e.g. with gLite. You may want to ask for advice e.g. on the LCG-Rollout list.
  • Look at the Torque/Maui documentation for large clusters: