Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "GPGPU-WG KnowledgeBase Batch Schedulers SchedulerScenarios"

From EGIWiki
Jump to navigation Jump to search
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Single GPGPU per node ==
{{Template:Op menubar}} {{TOC_right}}
A simple batch setup that assumes a physical node and its componenent GPGPU card expose a single '''Job Slot''' would simplify Resource Centre setup. Each GPGPU node could be partitioned from the non-GPGPU nodes using an access-control-list. However, most modern physical nodes contain and expose multiple CPU-cores to the batch system. If the physical system supports '''Virtualisation''', a CPU-core could be allocated to the GPU on the phsical node, and a single virtual machine could expose the remainder of the job slots. For example: Assume the physical host (wn1) has 8-cores, we can configure the node to declare (in torque) "np=1" to the batch system. If we create a  VM with "np=7", then all cores can be allocated to the batch system.
[[Category:Task_forces]]


'''[[GPGPU_Working_Group| << GPGPU Working Group main page]]'''


=== Options for RC ===
[[GPGPU-WG:GPGPU_Working_Group_KnowledgeBase | To Parent Page]]
* Define a queue with name tagged '''gpgpu'''
* Apply the usual VO restrictions on the queue
* Define an ACL that partitions these nodes from non-GPGPU nodes
* Publish basic SoftwareEnvironment (eg. CUDA, CUDA-5, CUDA-5.5)
* Ensure the WN environment is configured with all relevant GPGPU Software development kit installed.


=== JDL Example 1 ===
* [[GPGPU-WG:GPGPU_Working_Group_KnowledgeBase:Batch_Schedulers:SchedulerScenarios:GPUOnlyQueue | GPGPU Queues assuming 1-core per GPGPU ]]
This example simply requests 1 job slots from any queue matching '''"*gpgpu$"'''.  This is equivalent to the PBS '''-l nodes=1'''
* [[GPGPU-WG:GPGPU_Working_Group_KnowledgeBase:Batch_Schedulers:SchedulerScenarios:MixedCPU_GPU_Queue | GPGPUs as a subset of queue resources]]
 
<pre>
[
Type="Job";
 
JobType="Normal";
Executable = "myScript.sh";
StdOutput = "std.out";
StdError = "std.err";
InputSandbox =  {"myScript.sh"};
Requirements =  ( RegExp("*gpgpu$", other.GlueCEUniqueID) && Member("CUDA", other.GlueHostApplicationSoftwareRunTimeEnvironment);
]
</pre>
 
=== JDL Example 2 ===
 
This example simply requests 2 job slots from any queue matching '''"*gpgpu$"'''. The allocated cores may be on distinct hosts, so we cannot assume that the the GPGPU application will see or can enumerate both GPGPU cards. This is equivalent to the PBS '''-l nodes=2'''
<pre>
[
Type="Job";
JobType="Normal";
CPUnumber=2;
# myScript.sh must take responsibility for executing the GPGPU application on both the allocated core/GPGPU pairs!
Executable = "myScript.sh"; 
StdOutput = "std.out";
StdError = "std.err";
InputSandbox =  {"myScript.sh"};
Requirements =  ( RegExp("*gpgpu$", other.GlueCEUniqueID) && Member("CUDA", other.GlueHostApplicationSoftwareRunTimeEnvironment);
]
</pre>
 
=== JDL Example 3 ===
 
This example simply ensure 2 GPGPUs on the same host. Again, if there are more than 2 GPGPUs on the node, we need to ensure that other user processes do no interfere with each other. This is equivalent to the PBS '''-l nodes=1:ppn=2'''
<pre>
[
Type="Job";
JobType="Normal";
CPUnumber=2;
SMPgranularity=2;
# We assume that the GPGPU application will take responsibilty for enumerating both GPGPU devices
Executable = "myScript.sh"; 
StdOutput = "std.out";
StdError = "std.err";
InputSandbox =  {"myScript.sh"};
Requirements =  ( RegExp("*gpgpu$", other.GlueCEUniqueID) && Member("CUDA", other.GlueHostApplicationSoftwareRunTimeEnvironment);
]
</pre>
 
== Multiple GPGPUs per Physical Node ==
Similar to the Virtualisation example above, a physical node with '''N'''-gpgpu cards could be configured with:  
 
''np=#NUM_OF_GPGPUS''
 
A virtual machine could the remaining number of cores:
 
''np=#NUM_OF_CORES-#NUM_OF_GPGPUS'' job slots
 
 
The drawback of this scheme is that we need to develop a scheme to ensure that jobs from distinct users cannot interfere with each other. User code could try to enumerate and use all the GPGPUs on the node. This will have (potentially job-catastrophic) unintended side-effects, so we need some mechanism to ensure that a user-job does not consume more than its allocation.
 
 
=== JDL multiple-CPU, multiple GPU ===
 
Under the single-core per GPGPU scenario we nned to investigate wether
This is not yet possible without, as it requires batch system and Grid support for Generic Resources or Generic Consumables.

Latest revision as of 16:02, 22 January 2015