From EGIWiki
Jump to: navigation, search
Main operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security

Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators

Back to Administration FAQ

How does GStat count CPUs?

GStat will count the CPUs for each GlueCEUniqueID (i.e. queue) found in the site-BDII of a site and avoid double counting for the queues served by the same CE host. However, if there are multiple CEs using the same batch system subclusters, the CPU numbers will be double counted if those CEs publish them independently: by default YAIM will let a CE publish "its" subcluster with an object ID based on the CE host name, such that a particular subcluster of the batch system may appear under multiple names and thus as multiple independent subclusters.

GStat used to apply an ad-hoc rule to ignore the spurious subcluster definitions: it would consider subclusters with the same number of CPUs identical and thus avoid double counting the CPUs. However, this rule was abandoned because it turned out to be fragile:

  1. the number of CPUs nowadays is determined dynamically by querying the batch system
  2. the resource BDII info providers on different CEs are not synchronized
  3. when the CPU numbers change, some CEs will pick up the changes earlier than others
  4. during a short period GStat may deduce the site has multiple subclusters
  5. the site's CPU count plots show erratic behavior

Currently there are 2 ways to prevent double counting:

  1. Let one of the CEs publish the real values for GlueSubClusterLogicalCPUs and GlueSubClusterPhysicalCPUs, while the other CEs publish zero values.
  2. Deploy a glite-CLUSTER node to publish those numbers once and configure your CEs to refer to that node.
    • More information here