GPGPU-FedCloud

From EGIWiki
Revision as of 13:56, 30 April 2015 by Viet (talk | contribs) (Status of accelerated computing in Clouds)
Jump to: navigation, search

Status of accelerated computing in Clouds

Need additional development/support at all levels

  • Chipset : HW virtualization support (otherwise some limitation)
  • OS level: correct kernel configuration for the accelerators
  • Hypervisor: configuration pass-through, vGPU
  • CMFs: VM start, scheduler
  • FedCloud facilities: accounting, information discovery
  • Application: VM images with correct drivers for specific chipsets

Accelerators

GPGPU (General-Purpose computing on Graphical Processing Units)

NVIDIA GPU/Tesla/GRID, AMD Radeon/FirePro, Intel HD Graphics,...

Virtualization using VGA pass-through, vGPU (GPU partitioning) - NVIDIA GRID accelerators

Intel Many Integrated Core Architecture

Xeon Phi Coprocessor

Virtualization using PCI pass-through

Specialized PCIe cards with accelerators

DSP (Digital Signal Processors)

FPGA (Field Programmable Gate Array)

Not commonly used in cloud environment

Hypervisors

QEMU/KVM

Supports only pass-through virtualization model

vGPU support is under development

Citrix XenServer 6, VMware ESXi 5.1

Support both pass-through and vGPU virtualization models

Limitations:

  • vGPU support require certified server HW
  • Live VM migration is not supported
  • VM snapshot with memory is not supported

Cloud Management Frameworks

Some initiatives but not completed

Work to be done:

  • Define VM types/flavors with attributes for GPGPU
  • Modify VM start to allow passthrough or allocate vGPU
  • Modify scheduler to allocate VMs with GPGPU correctly

VM images

VM images should contain proper drivers and libraries for specific accelerators

  • Not transferable from site to site

More suitable approach is to use vanilla images with GPU support provided by cloud provider

  • Using VM contextualization like cloud-init for installing applications

Or using VM snapshots

  • May require support from site admins

FedCloud facilities

AppDB

  • VM images are rather site-specific: any sense to use AppDB ?

Information discovery

  • Should use similar GLUE2 scheme like grid sites with GPGPU

Accounting

  • How to account GPU? (again to coordinate with grid)

Brokering, monitoring, VM management

Possible configuration

Dedicated cloud site with GPGPU

Homogenous: identical working nodes

Single VM type, single VM per node

  • Simple configuration, no conflicting resources, no need to modify scheduler

Example Amazon EC2

= Cloud site with OS level hypervisor

VMs can have direct access to hardware resources and share them

Limitation to the same OS/kernel

Plan

Review available technologies for supporting accelerated computing in the clouds

  • Identify what additional works required and evaluate them

Create cloud site with GPGPU support for proof-of-concept

  • Firstly dedicated cloud site then generalize

Integrate the cloud site to FedCloud

  • Need cooperation with FedCloud facilities