GPGPU-FedCloud
Status of accelerated computing in Clouds
Need additional development/support at all levels
- Chipset : HW virtualization support (otherwise some limitation)
- OS level: correct kernel configuration for the accelerators
- Hypervisor: configuration pass-through, vGPU
- CMFs: VM start, scheduler
- FedCloud facilities: accounting, information discovery
- Application: VM images with correct drivers for specific chipsets
Accelerators
GPGPU (General-Purpose computing on Graphical Processing Units)
NVIDIA GPU/Tesla/GRID, AMD Radeon/FirePro, Intel HD Graphics,...
Virtualization using VGA pass-through, vGPU (GPU partitioning) - NVIDIA GRID accelerators
Intel Many Integrated Core Architecture
Xeon Phi Coprocessor
Virtualization using PCI pass-through
Specialized PCIe cards with accelerators
DSP (Digital Signal Processors)
FPGA (Field Programmable Gate Array)
Not commonly used in cloud environment
Hypervisors
QEMU/KVM
Supports only pass-through virtualization model
vGPU support is under development
Citrix XenServer 6, VMware ESXi 5.1
Support both pass-through and vGPU virtualization models
Limitations:
- vGPU support require certified server HW
- Live VM migration is not supported
- VM snapshot with memory is not supported
Cloud Management Frameworks
Some initiatives but not completed
- HeterogeneousGpuAcceleratorSupport in OpenStack (https://wiki.openstack.org/wiki/HeterogeneousGpuAcceleratorSupport)
- GPU and vGPU support for CloudStack Guest VMs (https://cwiki.apache.org/confluence/display/CLOUDSTACK/GPU+and+vGPU+support+for+CloudStack+Guest+VMs)
Work to be done:
- Define VM types/flavors with attributes for GPGPU
- Modify VM start to allow passthrough or allocate vGPU
- Modify scheduler to allocate VMs with GPGPU correctly
VM images
VM images should contain proper drivers and libraries for specific accelerators
- Not transferable from site to site
More suitable approach is to use vanilla images with GPU support provided by cloud provider
- Using VM contextualization like cloud-init for installing applications
Or using VM snapshots
- May require support from site admins
FedCloud facilities
AppDB
- VM images are rather site-specific: any sense to use AppDB ?
Information discovery
- Should use similar GLUE2 scheme like grid sites with GPGPU
Accounting
- How to account GPU? (again to coordinate with grid)
Brokering, monitoring, VM management
Possible configuration
Dedicated cloud site with GPGPU
Homogenous: identical working nodes
Single VM type, single VM per node
- Simple configuration, no conflicting resources, no need to modify scheduler
Example Amazon EC2
= Cloud site with OS level hypervisor
VMs can have direct access to hardware resources and share them
Limitation to the same OS/kernel
Plan
Review available technologies for supporting accelerated computing in the clouds
- Identify what additional works required and evaluate them
Create cloud site with GPGPU support for proof-of-concept
- Firstly dedicated cloud site then generalize
Integrate the cloud site to FedCloud
- Need cooperation with FedCloud facilities