Difference between revisions of "GPGPU-FedCloud"

From EGIWiki
Jump to: navigation, search
(Progress)
(Progress)
Line 140: Line 140:
  
 
* July+August 2015
 
* July+August 2015
** Testing cloud site with GPGPU support
+
** Creating cloud site with GPGPU support
 
  Cloud site created at keystone3.ui.savba.sk, master + two worker nodes, configuration reported above
 
  Cloud site created at keystone3.ui.savba.sk, master + two worker nodes, configuration reported above
 
  Creating VM images for GPGPU (based on Ubuntu 14.04, GPU driver and libraries)
 
  Creating VM images for GPGPU (based on Ubuntu 14.04, GPU driver and libraries)
 +
 +
** Testing cloud site with GPGPU support
 
  Performance testing and tuning with GPGPU in Openstack  
 
  Performance testing and tuning with GPGPU in Openstack  
 
   - comparison performance of cloud-based VM with non-cloud virtualization and physical machine,  
 
   - comparison performance of cloud-based VM with non-cloud virtualization and physical machine,  

Revision as of 14:04, 4 September 2015

Status of accelerated computing in Clouds

Need efforts for additional development/support at all levels

  • Chipset : HW virtualization support (otherwise some limitation)
  • OS level: correct kernel configuration for the accelerators
  • Hypervisor: configuration pass-through, vGPU
  • CMFs: VM start, scheduler
  • FedCloud facilities: accounting, information discovery
  • Application: VM images with correct drivers for specific chipsets

Accelerators

GPGPU (General-Purpose computing on Graphical Processing Units)

NVIDIA GPU/Tesla/GRID, AMD Radeon/FirePro, Intel HD Graphics,...

Virtualization using VGA pass-through, vGPU (GPU partitioning) - NVIDIA GRID accelerators

Intel Many Integrated Core Architecture

Xeon Phi Coprocessor

Virtualization using PCI pass-through

Specialized PCIe cards with accelerators

DSP (Digital Signal Processors)

FPGA (Field Programmable Gate Array)

Not commonly used in cloud environment

Hypervisors

QEMU/KVM

Supports only pass-through virtualization model

vGPU support is under development

Instructions for configuring passthrough in KVM (link ???)

Citrix XenServer 6, VMware ESXi 5.1

Support both pass-through and vGPU virtualization models

Limitations:

  • vGPU support require certified server HW
  • Live VM migration is not supported
  • VM snapshot with memory is not supported

Security issues

Cloud Management Frameworks

Some work done with PCI passthrough

vGPU is in very early stage

Work to be done:

  • Define VM types/flavors with attributes for GPGPU
  • Modify VM start to allow passthrough or allocate vGPU
  • Modify scheduler to allocate VMs with GPGPU correctly

VM images

VM images should contain proper drivers and libraries for specific accelerators

  • Not transferable from site to site

More suitable approach is to use vanilla images with GPU support provided by cloud provider

  • Using VM contextualization like cloud-init for installing applications

Or using VM snapshots

  • May require support from site admins

FedCloud facilities

AppDB

  • VM images are rather site-specific: any sense to use AppDB ?

Information discovery

  • Should use similar GLUE2 scheme like grid sites with GPGPU

Accounting

  • How to account GPU? (again to coordinate with grid)

Brokering, monitoring, VM management

Possible configuration

Dedicated cloud site with GPGPU

Homogenous: identical working nodes

Single VM type, single VM per node

  • Simple configuration, no conflicting resources, no need to modify scheduler

Cloud site with OS level hypervisor

VMs can have direct access to hardware resources and share them

Limitation to the same OS/kernel

Related work

Progress

  • May 2015
    • Review of available technologies
    • GPGPU virtualisation in KVM/QEMU
    • Performance testing of passthrough
HW configuration: 
IBM dx360 M4 server with two NVIDIA Tesla K20 accelerators.
Ubuntu 14.04.2 LTS with KVM/QEMU, PCI passthrough virtualization of GPU cards.
Tested application:
NAMD molecular dynamics simulation (CUDA version), STMV test example (http://www.ks.uiuc.edu/Research/namd/).
Performance results:
Tested application runs 2-3% slower in virtual machine compared to direct run on tested server.
If hyperthreading is enabled on compute server, vCPUs have to be pinned to real cores so that
whole cores will be dedicated to one VM. To avoid potential performance problems, hyperthreading 
should be switched off.

  • June 2015
    • Creating cloud site with GPGPU support
Configuration: master node, 2 worker nodes (IBM dx360 M4 servers, see above)
Base OS: Ubuntu 14.04.2 LTS
Hypervisor: KVM
Middleware: Openstack Kilo
  • July+August 2015
    • Creating cloud site with GPGPU support
Cloud site created at keystone3.ui.savba.sk, master + two worker nodes, configuration reported above
Creating VM images for GPGPU (based on Ubuntu 14.04, GPU driver and libraries)
    • Testing cloud site with GPGPU support
Performance testing and tuning with GPGPU in Openstack 
 - comparison performance of cloud-based VM with non-cloud virtualization and physical machine, 
 - setting CPU flavor in Openstack nova (performance optimization) 
 - Adjusting Openstack scheduler
Starting process of integration of the site to EGI FedCloud
 - Keystone VOMS support being integrated
 - OCCI planned in September
  • Next steps
Continue integration to EGI-FedCloud