Tools/Manuals/TS59

From EGIWiki
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators

Contents



Back to Troubleshooting Guide


444444 waiting jobs

Full message

$ lcg-infosites --vo ops ce -f SOME-SITE
#   CPU    Free Total Jobs      Running Waiting ComputingElement
----------------------------------------------------------------
    456       3          0            0  444444 ce.site.domain:8443/cream-pbs-ops

Diagnosis

The CE information system provider has safe, easily recognizable default values for various attributes that are published in the BDII. Normally those defaults are overridden by the actual values obtained from the batch system by a particular provider or plugin script. When a default value does appear in the BDII, it means the provider failed and its output, if any, was discarded. An info provider can fail for at least the following reasons:

[root@server ~]# grep ADMIN3 /var/spool/maui/maui.cfg
ADMIN3                  edginfo rgma edguser ldap
  1. remove the WN from /var/spool/pbs/server_priv/nodes
  2. remove the corresponding jobs from /var/spool/pbs/server_priv/jobs
  3. restart the PBS/Torque daemons
# diff lcg-info-dynamic-scheduler.bak lcg-info-dynamic-scheduler
12a13
> from types import NoneType
435a437,438
>         if type(qwt) is NoneType:
>            qwt = 260000
485c488,491
<         wrt = waitingJobs[0].get('maxwalltime')  * nwait
---
>      qwt = waitingJobs[0].get('maxwalltime') 
>         if type(qwt) is NoneType:
>            qwt = 260000
>         wrt = qwt * nwait

https://wiki.italiangrid.it/twiki/bin/view/CREAM/KnownIssues#Error_from_TORQUE_infoprovider_E

Further information

GLUE2ComputingShareRunningJobs
GLUE2ComputingShareWaitingJobs
GLUE2ComputingShareTotalJobs
GLUE2ComputingShareEstimatedAverageWaitingTime
GLUE2ComputingShareEstimatedWorstWaitingTime
GLUE2ComputingShareMaxCPUTime
GLUE2ComputingShareMaxWallTime
GLUE2ComputingShareMaxRunningJobs
...
/opt/glite/yaim/bin/yaim -c -s <site-info.def> -n creamCE -n TORQUE_server -n TORQUE_utils
Make sure this section exists in the configuration file.
https://wiki.italiangrid.it/twiki/bin/view/CREAM/SystemAdministratorGuideForEMI2#1_6_Batch_system_integration
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox
Print/export