Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Tools/Manuals/TS192

From EGIWiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


Documentation menu: Home Manuals Procedures Training Other Contact For: VO managers Administrators



Back to Administration FAQ


How does GStat count CPUs?

GStat will count the CPUs for each GlueCEUniqueID (i.e. queue) found in the site-BDII of a site and avoid double counting for the queues served by the same CE host. However, if there are multiple CEs using the same batch system subclusters, the CPU numbers will be double counted if those CEs publish them independently: by default YAIM will let a CE publish "its" subcluster with an object ID based on the CE host name, such that a particular subcluster of the batch system may appear under multiple names and thus as multiple independent subclusters.

GStat used to apply an ad-hoc rule to ignore the spurious subcluster definitions: it would consider subclusters with the same number of CPUs identical and thus avoid double counting the CPUs. However, this rule was abandoned because it turned out to be fragile:

  1. the number of CPUs nowadays is determined dynamically by querying the batch system
  2. the resource BDII info providers on different CEs are not synchronized
  3. when the CPU numbers change, some CEs will pick up the changes earlier than others
  4. during a short period GStat may deduce the site has multiple subclusters
  5. the site's CPU count plots show erratic behavior

Currently there are 2 ways to prevent double counting:

  1. Let one of the CEs publish the real values for GlueSubClusterLogicalCPUs and GlueSubClusterPhysicalCPUs, while the other CEs publish zero values.
  2. Deploy a glite-CLUSTER node to publish those numbers once and configure your CEs to refer to that node.
    • More information here