GPGPU-WG KnowledgeBase - Batch Schedulers - Torque MAUI

From EGIWiki
Jump to: navigation, search
Main EGI.eu operations services Support Documentation Tools Activities Performance Technology Catch-all Services Resource Allocation Security


<< GPGPU Working Group main page MAUI does not officially support GPGPU scheduling, and is very unlikely to support it in the future. Even if a Resource Centre adds the "gpu=X" to the nodes file, MAUI will silently drop batch GPGPU directive:

 
qsub -l nodes=1:ppn=1:gpus=1

Experimental MAUI patch

A potential workaround to solve this problem (at a batch scheduling level) through the use of a patched version of MAUI 3.3.1 (created by Jonathan Michalon at the University of Strasbourg). This [patch ] implements a Generic Resource capability in MAUI.

After this patch is applied, the maui.cfg should be updated to include the GRES declartion for all appropriate nodes:

NODECFG[wn001.example.com] GRES=gpu:2 # Node with two generic resources marked with tag 'gpu'

Example(s) batch usage

qsub -W "x=GRES:gpu@1" < test-sl6-gpu.qsub

The following examples are provided by way of http://www.sdsc.edu/~hocks/FG/TSCC.torque.html, with thanks to Mariusz Mamonski ( mamonski at man.poznan.pl )

1. one CPU core, one GPU:

qsub -W x='GRES:gpu at 1' #works

2. one CPU core, all two GPUs on one machine:

qsub -lnodes=1:ppn=1 -W x='GRES:gpu at 2' #works

3. two GPUs on two hosts

qsub -lnodes=2:ppn=1 -W x='GRES:gpu at 2' #works

4. you want all GPUs and all CPU cores on two hosts
qsub -lnodes=2:ppn=8 -W x='GRES:gpu at 1' #does not work - because the
job request 16 GPUS on two hosts, but actually if you request
exclusive access to machines you do need to specify GRES at all...