Difference between revisions of "GPGPU-FedCloud"
Line 55: | Line 55: | ||
Some initiatives but not completed | Some initiatives but not completed | ||
*OpenStack PCI passthrough (https://wiki.openstack.org/wiki/Pci_passthrough) | *OpenStack PCI passthrough (https://wiki.openstack.org/wiki/Pci_passthrough) meetings (https://wiki.openstack.org/wiki/Meetings/Passthrough) | ||
*HeterogeneousGpuAcceleratorSupport in OpenStack (https://wiki.openstack.org/wiki/HeterogeneousGpuAcceleratorSupport) (abandoned?) | *HeterogeneousGpuAcceleratorSupport in OpenStack (https://wiki.openstack.org/wiki/HeterogeneousGpuAcceleratorSupport) (abandoned?) | ||
*Design document for GPU and vGPU support for CloudStack Guest VMs (https://cwiki.apache.org/confluence/display/CLOUDSTACK/GPU+and+vGPU+support+for+CloudStack+Guest+VMs) | *Design document for GPU and vGPU support for CloudStack Guest VMs (https://cwiki.apache.org/confluence/display/CLOUDSTACK/GPU+and+vGPU+support+for+CloudStack+Guest+VMs) |
Revision as of 15:39, 11 May 2015
Status of accelerated computing in Clouds
Need efforts for additional development/support at all levels
- Chipset : HW virtualization support (otherwise some limitation)
- OS level: correct kernel configuration for the accelerators
- Hypervisor: configuration pass-through, vGPU
- CMFs: VM start, scheduler
- FedCloud facilities: accounting, information discovery
- Application: VM images with correct drivers for specific chipsets
Accelerators
GPGPU (General-Purpose computing on Graphical Processing Units)
NVIDIA GPU/Tesla/GRID, AMD Radeon/FirePro, Intel HD Graphics,...
Virtualization using VGA pass-through, vGPU (GPU partitioning) - NVIDIA GRID accelerators
- Shared Virtual GPUs (vGPU) http://www.nvidia.com/object/virtual-gpus.html
Intel Many Integrated Core Architecture
Xeon Phi Coprocessor
Virtualization using PCI pass-through
Specialized PCIe cards with accelerators
DSP (Digital Signal Processors)
FPGA (Field Programmable Gate Array)
Not commonly used in cloud environment
Hypervisors
QEMU/KVM
Supports only pass-through virtualization model
vGPU support is under development
Instructions for configuring passthrough in KVM (link ???)
Citrix XenServer 6, VMware ESXi 5.1
Support both pass-through and vGPU virtualization models
Limitations:
- vGPU support require certified server HW
- Live VM migration is not supported
- VM snapshot with memory is not supported
Cloud Management Frameworks
Some initiatives but not completed
- OpenStack PCI passthrough (https://wiki.openstack.org/wiki/Pci_passthrough) meetings (https://wiki.openstack.org/wiki/Meetings/Passthrough)
- HeterogeneousGpuAcceleratorSupport in OpenStack (https://wiki.openstack.org/wiki/HeterogeneousGpuAcceleratorSupport) (abandoned?)
- Design document for GPU and vGPU support for CloudStack Guest VMs (https://cwiki.apache.org/confluence/display/CLOUDSTACK/GPU+and+vGPU+support+for+CloudStack+Guest+VMs)
- Request to support vGPU in OpenNebula (http://dev.opennebula.org/issues/3028)
Work to be done:
- Define VM types/flavors with attributes for GPGPU
- Modify VM start to allow passthrough or allocate vGPU
- Modify scheduler to allocate VMs with GPGPU correctly
VM images
VM images should contain proper drivers and libraries for specific accelerators
- Not transferable from site to site
More suitable approach is to use vanilla images with GPU support provided by cloud provider
- Using VM contextualization like cloud-init for installing applications
Or using VM snapshots
- May require support from site admins
FedCloud facilities
AppDB
- VM images are rather site-specific: any sense to use AppDB ?
Information discovery
- Should use similar GLUE2 scheme like grid sites with GPGPU
Accounting
- How to account GPU? (again to coordinate with grid)
Brokering, monitoring, VM management
Possible configuration
Dedicated cloud site with GPGPU
Homogenous: identical working nodes
Single VM type, single VM per node
- Simple configuration, no conflicting resources, no need to modify scheduler
Cloud site with OS level hypervisor
VMs can have direct access to hardware resources and share them
Limitation to the same OS/kernel
Plan
Review available technologies for supporting accelerated computing in the clouds
- Identify what additional works required and evaluate them
Create cloud site with GPGPU support for proof-of-concept
- Firstly dedicated cloud site then generalize
Integrate the cloud site to FedCloud
- Need cooperation with FedCloud facilities
Related work
- GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications. http://www.isi.edu/sites/default/files/users/jwalters/papers/Cloud_2014.pdf