Difference between revisions of "GPGPU-CREAM"
Jump to navigation
Jump to search
m (→Progress) |
m (→Next steps) |
||
Line 55: | Line 55: | ||
= Next steps = | = Next steps = | ||
* installing slurm on cegpu.cerm.unifi.it | * installing slurm and/or condor on cegpu.cerm.unifi.it and writing the BLAH side | ||
* | * writing the information providers according with GLUE2.1 schema |
Revision as of 17:51, 3 September 2015
Goal
- To develop a solution enabling GPU support in CREAM-CE:
- For the most popular LRMSes already supported by CREAM-CE
- Based on GLUE 2.1 schema
Work plan
- Indentifying the relevant GPGPU-related parameters supported by the different LRMS, and abstract them to significant JDL attributes
- GPGPU accounting is expected to be provided by LRMS log files, as done for CPU accounting, and then follows the same APEL flow
- Implementing the needed changes in CREAM-core and BLAH components
- Writing the infoproviders according to GLUE 2.1
- Testing and certification of the prototype
- Releasing a CREAM-CE update with full GPGPU support
Testbed
- 3 nodes (2x Intel Xeon E5-2620v2) with 2 NVIDIA Tesla K20m GPUs per node available at CIRMMP
- MoBrain applications installed: AMBER and GROMACS with CUDA 5.5
- Batch system/Scheduler: Torque 4.2.10 (source compiled with NVML libs)/ Maui 3.3.1
- EMI3 CREAM-CE
Progress
- May 2015:
- tested local AMBER job submission with the different Torque/NVIDIA GPGPU support options, e.g.:
qsub -l nodes=1:gpus=2:default qsub -l nodes=1:gpus=2:exclusive_process
- June 2015:
- attributes "GPUNumber" e "GPUMode" added to command BLAH_JOB_SUBMIT, e.g.:
BLAH_JOB_SUBMIT 2 [Cmd="/tmp/test.sh";GridType="pbs";Queue="batch";In="/dev/null";Out="~\/StdOutput";Err="~\/StdError";GPUNumber=1;GPUMode="default"] BLAH_JOB_SUBMIT 2 [Cmd="test_gpu_blah.sh";GridType="pbs";Queue="batch";In="/dev/null";Out="StdOutput";Err="StdError";GPUNumber=1;GPUMode="exclusive_process"]
- this required modifications blah_common_submit_functions.sh and server.c
- first implementation of the two new attributes for PBS/Torque:
GPUMode can have the following values for PBS/Torque: - default - Shared mode available for multiple processes - exclusive Thread - Only one COMPUTE thread is allowed to run on the GPU (v260 exclusive) - prohibited - No COMPUTE contexts are allowed to run on the GPU - exclusive_process - Only one COMPUTE process is allowed to run on the GPU
- this required modifications to pbs_submit.sh
- July-August 2015:
- implemented the parser on CREAM-CE core for the new JDL attributes GPUNumber and GPUMode
- tested AMBER remote job submission through glite-ce-submit client:
$ glite-ce-job-submit -o jobid.txt -d -a -r cegpu.cerm.unifi.it:8443/cream-pbs-batch test.jdl $ cat test.jdl [ executable = "test_gpu.sh"; inputSandbox = { "test_gpu.sh" }; stdoutput = "out.out"; outputsandboxbasedesturi = "gsiftp://localhost"; stderror = "err.err"; outputsandbox = { "out.out","err.err","min.out","heat.out" }; GPUNumber=2; GPUMode="exclusive_process"; ]
Next steps
- installing slurm and/or condor on cegpu.cerm.unifi.it and writing the BLAH side
- writing the information providers according with GLUE2.1 schema