Difference between revisions of "GPGPU-FedCloud"

From EGIWiki
Jump to: navigation, search
Line 1: Line 1:
 
{{Template:EGI-Engage menubar}} {{TOC_right}}  
 
{{Template:EGI-Engage menubar}} {{TOC_right}}  
= Status of accelerated computing in Clouds  =
 
Need efforts for additional development/support at all levels
 
  
*Chipset : HW virtualization support (otherwise some limitation)
+
= Objective =
*OS level: correct kernel configuration for the accelerators
 
*Hypervisor: configuration pass-through, vGPU
 
*CMFs: VM start, scheduler
 
*FedCloud facilities: accounting, information discovery
 
*Application: VM images with correct drivers for specific chipsets
 
  
= Accelerators  =
+
To provide support for accelerated computing in EGI-Engage federated cloud.
  
== GPGPU (General-Purpose computing on Graphical Processing Units)  ==
 
  
NVIDIA GPU/Tesla/GRID, AMD Radeon/FirePro, Intel HD Graphics,...
+
= Participants =
  
Virtualization using VGA pass-through, vGPU (GPU partitioning) - NVIDIA GRID accelerators
+
Viet Tran (IISAS)
*Shared Virtual GPUs (vGPU) http://www.nvidia.com/object/virtual-gpus.html
+
Jan Astalos (IISAS)
 +
Miroslav Dobrucky (IISAS)
  
== Intel Many Integrated Core Architecture  ==
+
= Current status =
  
Xeon Phi Coprocessor
+
A working site with GPGPU in EGI federated cloud
  
Virtualization using PCI pass-through
+
HW configuration:
 +
IBM dx360 M4 server with two NVIDIA Tesla K20 accelerators.
 +
Ubuntu 14.04.2 LTS with KVM/QEMU, PCI passthrough virtualization of GPU cards.
  
== Specialized PCIe cards with accelerators ==
+
SW configuration:
 +
  Base OS: Ubuntu 14.04.2 LTS
 +
Hypervisor: KVM
 +
Middleware: Openstack Kilo
  
DSP (Digital Signal Processors)
+
GOCDB: IISAS-GPUCloud, https://goc.egi.eu/portal/index.php?Page_Type=Site&id=1485
 +
Openstack endpoint: https://keystone3.ui.savba.sk:5000/v2.0
 +
OCCI endpoint: https://nova3.ui.savba.sk:8787
 +
Supported VOs: fedcloud.egi.eu, ops, dteam
  
FPGA (Field Programmable Gate Array)
 
 
Not commonly used in cloud environment
 
 
= Hypervisors  =
 
 
== QEMU/KVM  ==
 
 
Supports only pass-through virtualization model
 
 
vGPU support is under development
 
 
Instructions for configuring passthrough in KVM (link ???)
 
 
== Citrix XenServer 6, VMware ESXi 5.1  ==
 
 
Support both pass-through and vGPU virtualization models
 
 
Limitations:
 
 
*vGPU support require certified server HW
 
*Live VM migration is not supported
 
*VM snapshot with memory is not supported
 
 
Security issues
 
* Non-standard PCI device functionality may render pass-through insecure (http://xenbits.xen.org/xsa/advisory-124.html)
 
 
= Cloud Management Frameworks =
 
 
Some work done with PCI passthrough
 
*OpenStack PCI passthrough (https://wiki.openstack.org/wiki/Pci_passthrough) meetings (https://wiki.openstack.org/wiki/Meetings/Passthrough)
 
 
vGPU is in very early stage
 
*Design document for GPU and vGPU support for CloudStack Guest VMs (https://cwiki.apache.org/confluence/display/CLOUDSTACK/GPU+and+vGPU+support+for+CloudStack+Guest+VMs)
 
*Request to support vGPU in OpenNebula (http://dev.opennebula.org/issues/3028)
 
 
Work to be done:
 
 
*Define VM types/flavors with attributes for GPGPU
 
*Modify VM start to allow passthrough or allocate vGPU
 
*Modify scheduler to allocate VMs with GPGPU correctly
 
 
= VM images =
 
 
VM images should contain proper drivers and libraries for specific accelerators
 
*Not transferable from site to site
 
 
More suitable approach is to use vanilla images with GPU support provided by cloud provider
 
*Using VM contextualization like cloud-init for installing applications
 
 
Or using VM snapshots
 
* May require support from site admins
 
 
= FedCloud facilities =
 
 
AppDB
 
* VM images are rather site-specific: any sense to use AppDB ?
 
 
Information discovery
 
* Should use similar GLUE2 scheme like grid sites with GPGPU
 
 
Accounting
 
* How to account GPU? (again to coordinate with grid)
 
 
Brokering, monitoring, VM management
 
 
= Possible configuration =
 
 
== Dedicated cloud site with GPGPU ==
 
Homogenous: identical working nodes
 
 
Single VM type, single VM per node
 
*Simple configuration, no conflicting resources, no need to modify scheduler
 
 
== Cloud site with OS level hypervisor ==
 
 
VMs can have direct access to hardware resources and share them
 
 
Limitation to the same OS/kernel
 
 
= Related work =
 
 
*GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications. http://www.isi.edu/sites/default/files/users/jwalters/papers/Cloud_2014.pdf
 
  
 
= Progress =
 
= Progress =

Revision as of 14:31, 8 October 2015

EGI-Engage project: Main page WP1(NA1) WP3(JRA1) WP5(SA1) PMB Deliverables and Milestones Quality Plan Risk Plan Data Plan
Roles and
responsibilities
WP2(NA2) WP4(JRA2) WP6(SA2) AMB Software and services Metrics Project Office Procedures



Objective

To provide support for accelerated computing in EGI-Engage federated cloud.


Participants

Viet Tran (IISAS) Jan Astalos (IISAS) Miroslav Dobrucky (IISAS)

Current status

A working site with GPGPU in EGI federated cloud

HW configuration:

IBM dx360 M4 server with two NVIDIA Tesla K20 accelerators.
Ubuntu 14.04.2 LTS with KVM/QEMU, PCI passthrough virtualization of GPU cards.

SW configuration:

Base OS: Ubuntu 14.04.2 LTS
Hypervisor: KVM
Middleware: Openstack Kilo

GOCDB: IISAS-GPUCloud, https://goc.egi.eu/portal/index.php?Page_Type=Site&id=1485 Openstack endpoint: https://keystone3.ui.savba.sk:5000/v2.0 OCCI endpoint: https://nova3.ui.savba.sk:8787 Supported VOs: fedcloud.egi.eu, ops, dteam


Progress

  • May 2015
    • Review of available technologies
    • GPGPU virtualisation in KVM/QEMU
    • Performance testing of passthrough
HW configuration: 
IBM dx360 M4 server with two NVIDIA Tesla K20 accelerators.
Ubuntu 14.04.2 LTS with KVM/QEMU, PCI passthrough virtualization of GPU cards.
Tested application:
NAMD molecular dynamics simulation (CUDA version), STMV test example (http://www.ks.uiuc.edu/Research/namd/).
Performance results:
Tested application runs 2-3% slower in virtual machine compared to direct run on tested server.
If hyperthreading is enabled on compute server, vCPUs have to be pinned to real cores so that
whole cores will be dedicated to one VM. To avoid potential performance problems, hyperthreading 
should be switched off.

  • June 2015
    • Creating cloud site with GPGPU support
Configuration: master node, 2 worker nodes (IBM dx360 M4 servers, see above)
Base OS: Ubuntu 14.04.2 LTS
Hypervisor: KVM
Middleware: Openstack Kilo
  • July 2015
    • Creating cloud site with GPGPU support
Cloud site created at keystone3.ui.savba.sk, master + two worker nodes, configuration reported above
Creating VM images for GPGPU (based on Ubuntu 14.04, GPU driver and libraries)
  • August 2015
    • Testing cloud site with GPGPU support
Performance testing and tuning with GPGPU in Openstack 
 - comparing performance of cloud-based VM with non-cloud virtualization and physical machine, finding discrepancies and tuning them
 - setting CPU flavor in Openstack nova (performance optimization) 
 - Adjusting Openstack scheduler
Starting process of integration of the site to EGI FedCloud
 - Keystone VOMS support being integrated
 - OCCI in preparation, installation planned in September
  • September 2015
 Continue integration to EGI-FedCloud
  • Next steps
 Full integration, certification and production support

Back to Accelerated Computing task