Difference between revisions of "GPGPU-WG KnowledgeBase Batch Schedulers SchedulerScenarios"
Line 3: | Line 3: | ||
== Multiple GPGPUs per Physical Node == | == Multiple GPGPUs per Physical Node == | ||
Similar to the Virtualisation example above, a physical node with '''N'''-gpgpu cards could be configured with | Similar to the Virtualisation example above, a physical node with '''N'''-gpgpu cards could be configured with: | ||
''np=#NUM_OF_GPGPUS'' | ''np=#NUM_OF_GPGPUS'' | ||
A virtual machine could | A virtual machine could the remaining number of cores: | ||
''np=#NUM_OF_CORES-#NUM_OF_GPGPUS'' job slots | ''np=#NUM_OF_CORES-#NUM_OF_GPGPUS'' job slots | ||
The drawback of this scheme is that we need to develop a scheme to ensure that jobs from distinct users cannot interfere with each other. User code could try to enumerate and use all the GPGPUs on the node. This will have (potentially job-catastrophic) unintended side-effects, so we need some mechanism to ensure that a user-job does not consume more than its allocation. |
Revision as of 23:14, 27 February 2014
Single GPGPU per node
A simple batch setup that assumes a physical node and its componenent GPGPU card expose a single Job Slot would simplify Resource Centre setup. Each GPGPU node could be partitioned from the non-GPGPU nodes using an access-control-list. However, most modern physical nodes contain and expose multiple CPU-cores to the batch system. If the physical system supports Virtualisation, a CPU-core could be allocated to the GPU on the phsical node, and a single virtual machine could expose the remainder of the job slots. For example: Assume the physical host (wn1) has 8-cores, we can configure the node to declare (in torque) "np=1" to the batch system. If we create a VM with "np=7", then all cores can be allocated to the batch system.
Multiple GPGPUs per Physical Node
Similar to the Virtualisation example above, a physical node with N-gpgpu cards could be configured with:
np=#NUM_OF_GPGPUS
A virtual machine could the remaining number of cores:
np=#NUM_OF_CORES-#NUM_OF_GPGPUS job slots
The drawback of this scheme is that we need to develop a scheme to ensure that jobs from distinct users cannot interfere with each other. User code could try to enumerate and use all the GPGPUs on the node. This will have (potentially job-catastrophic) unintended side-effects, so we need some mechanism to ensure that a user-job does not consume more than its allocation.