Difference between revisions of "GPGPU WG KnowledgeBase -Batch Schedulers - SLURM"
Jump to navigation
Jump to search
m (moved GPGPU-WG:GPGPU Working Group KnowledgeBase:Batch Schedulers:SLURM to GPGPU WG KnowledgeBase -Batch Schedulers - SLURM) |
|||
(3 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
{{Template:Op menubar}} {{TOC_right}} | |||
[[Category:Task_forces]] | |||
'''[[GPGPU_Working_Group| << GPGPU Working Group main page]]''' | |||
SLURM has increased in popularity as the LRMS of choice at many large resource centres. | SLURM has increased in popularity as the LRMS of choice at many large resource centres. | ||
This document does not | This document does not | ||
Line 48: | Line 53: | ||
srun -p gpgpu --gres=gpu:1 sleep 20 & # This job should wait until a GPGPU is available | srun -p gpgpu --gres=gpu:1 sleep 20 & # This job should wait until a GPGPU is available | ||
srun -p gpgpu sleep 20 # This job will run immediately is a job slot is available | srun -p gpgpu sleep 20 # This job will run immediately is a job slot is available | ||
</pre> | |||
SLURM can also handle CUDA_VISIBILE_DEVICES correctly: | |||
<pre> | |||
# Acquire 2 GPU on same node | |||
srun -p gpgpu --gres=gpu:2 env | grep -i cuda | |||
CUDA_VISIBLE_DEVICES=0,1 | |||
#Acquire 1 GPU on Node | |||
srun -p gpgpu --gres=gpu:1 env | grep -i cuda | |||
CUDA_VISIBLE_DEVICES=0 | |||
</pre> | </pre> |
Latest revision as of 15:55, 22 January 2015
Main | EGI.eu operations services | Support | Documentation | Tools | Activities | Performance | Technology | Catch-all Services | Resource Allocation | Security |
<< GPGPU Working Group main page
SLURM has increased in popularity as the LRMS of choice at many large resource centres. This document does not
GPGPU resources are handled as a Generic Resource under SLURM.
To add support for GPGPU generic resources, the following value must be declared in slurm.conf:
GresTypes=gpu # cons_res seems to be necessary to stop all cores being allocated on the workernode for the job SelectType=select/cons_res
To declare that a workernode supports GPGPUs, the resource must be declared through the NodeName/NodeAddr statement.
#Declare a range of nodes wn0 -> wn9 #Each of these nodes has 2 GPGPUs and 8 job slots NodeName=wn[0-9].example.com Gres=gpu:2 CPUs=8
A Partition defined a set of related nodes, and is somewhat equivalent to a PBS queue.
PartitionName=gpgpu Nodes=wn[0-9].example.com Default=YES MaxTime=INFINITE State=UP Shared=YES
To view which nodes support GPU resources:
NodeName=wn0 Arch=x86_64 CoresPerSocket=1 CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.00 Features=(null) Gres=gpu:2 NodeAddr=wn0.example.com NodeHostName=wn0.example.com OS=Linux RealMemory=1 AllocMem=0 Sockets=8 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 BootTime=2014-01-27T22:30:02 SlurmdStartTime=2014-01-30T15:58:43 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
The above configuration will prevent more than two jobs requiring a single GPU from running simultaneously
srun -p gpgpu --gres=gpu:1 sleep 20 & srun -p gpgpu --gres=gpu:1 sleep 20 & srun -p gpgpu --gres=gpu:1 sleep 20 & # This job should wait until a GPGPU is available srun -p gpgpu sleep 20 # This job will run immediately is a job slot is available
SLURM can also handle CUDA_VISIBILE_DEVICES correctly:
# Acquire 2 GPU on same node srun -p gpgpu --gres=gpu:2 env | grep -i cuda CUDA_VISIBLE_DEVICES=0,1 #Acquire 1 GPU on Node srun -p gpgpu --gres=gpu:1 env | grep -i cuda CUDA_VISIBLE_DEVICES=0