Difference between revisions of "GPGPU-FedCloud"

From EGIWiki
Jump to: navigation, search
(How to create your own GPGPU server in cloud)
(How to create your own GPGPU server in cloud)
Line 97: Line 97:
 
  Your server is ready for your application. You can install additional software (NAMD, GROMACS, ...) and your own application now
 
  Your server is ready for your application. You can install additional software (NAMD, GROMACS, ...) and your own application now
 
   
 
   
 
  
 
  For your convenience, a script is created for installing NVIDIA + CUDA automatically https://github.com/tdviet/NVIDIA_CUDA_installer
 
  For your convenience, a script is created for installing NVIDIA + CUDA automatically https://github.com/tdviet/NVIDIA_CUDA_installer

Revision as of 13:20, 6 November 2015

EGI-Engage project: Main page WP1(NA1) WP3(JRA1) WP5(SA1) PMB Deliverables and Milestones Quality Plan Risk Plan Data Plan
Roles and
responsibilities
WP2(NA2) WP4(JRA2) WP6(SA2) AMB Software and services Metrics Project Office Procedures



Objective

To provide support for accelerated computing in EGI-Engage federated cloud.


Participants

Viet Tran (IISAS)

Jan Astalos (IISAS)

Miroslav Dobrucky (IISAS)

Current status

IISAS-GPUCloud site with GPGPU has been established and integrated into EGI federated cloud

HW configuration:

IBM dx360 M4 server with two NVIDIA Tesla K20 accelerators.
Ubuntu 14.04.2 LTS with KVM/QEMU, PCI passthrough virtualization of GPU cards.

SW configuration:

Base OS: Ubuntu 14.04.2 LTS
Hypervisor: KVM
Middleware: Openstack Kilo
GPU-enable flavors: gpu1cpu6 (1GPU + 6 CPU cores), gpu2cpu12 (2GPU +16 CPU cores)

EGI federated cloud configuration:

GOCDB: IISAS-GPUCloud, https://goc.egi.eu/portal/index.php?Page_Type=Site&id=1485
Monitoring https://cloudmon.egi.eu/nagios/cgi-bin/status.cgi?host=nova3.ui.savba.sk
Openstack endpoint: https://keystone3.ui.savba.sk:5000/v2.0
OCCI endpoint: https://nova3.ui.savba.sk:8787
Supported VOs: fedcloud.egi.eu, ops, dteam, moldyngrid, enmr.eu, vo.lifewatch.eu

Applications being tested/running on IISAS-GPUCloud

MolDynGrid http://moldyngrid.org/
WeNMR https://www.wenmr.eu/
Lifewatch-CC https://wiki.egi.eu/wiki/CC-LifeWatch

How to use GPGPU on IISAS-GPUCloud

For EGI users:

Join EGI federated cloud https://wiki.egi.eu/wiki/Federated_Cloud_user_support#Quick_Start

Install your rOCCI client if you don't have it already (in Linux: just single command "curl -L http://go.egi.eu/fedcloud.ui | sudo /bin/bash -" )

Get VOMS proxy certificate from fedcloud.egi.eu or any supported VO with -rfc (on rOCCI client: "voms-proxy-init --voms fedcloud.egi.eu -rfc")
 
Choose a suitable flavor with GPU (e.g. gpu1cpu6, OCCI users: resource_tpl#f0cd78ab-10a0-4350-a6cb-5f3fdd6e6294)

Choose a suitable image (e.g. Ubuntu-14.04-UEFI, OCCI users: os_tpl#4aaf1abc-4c21-4192-ac52-8896757978be)

Create a keypair for logging in to your server (and stored in tmpfedcloud.login)
          (see https://wiki.egi.eu/wiki/Fedcloud-tf:CLI_Environment#How_to_create_a_key_pair_to_access_the_VMs_via_SSH) 

Create a VM with the selected image, flavor and keypair (OCCI users: copy the following very long OCCI command
          occi  --endpoint  https://nova3.ui.savba.sk:8787/ \
          --auth x509 --user-cred $X509_USER_PROXY --voms --action create --resource compute \
          --mixin os_tpl#4aaf1abc-4c21-4192-ac52-8896757978be --mixin resource_tpl#f0cd78ab-10a0-4350-a6cb-5f3fdd6e6294 \
          --attribute occi.core.title="Testing GPU" \
          --context user_data="file://$PWD/tmpfedcloud.login")

Assign a public (floating) IP to your VM (using VM_ID from previous command and /network/nova
          occi --endpoint  https://nova3.ui.savba.sk:8787/  \
          --auth x509 --user-cred $X509_USER_PROXY --voms --action link \
          --resource https://nova3.ui.savba.sk:8787/compute/$YOUR_VM_ID_HERE -j /network/nova)

Log in the VM with your private key and use it as your own GPU server (ssh -i tmpfedcloud ubuntu@$VM_PUBLIC_IP)

Please remember to terminate your servers when you finish your jobs to release resources for other users


For access to IISAS-GPUCloud via portal:

Get a token issued by Keystone with VOMS proxy certificate. You can use the tool from https://github.com/tdviet/Keystone-VOMS-client

Login into Openstack Horizon dashboard with the token via https://horizon.ui.savba.sk/horizon/auth/token/

How to create your own GPGPU server in cloud

It is a short instruction to create a GPGPU server in cloud from Ubuntu vanilla image

Create a VM from vanilla image with UEFI support (e.g. Ubuntu-14.04-UEFI, make sure with flavor with GPU support)

Install gcc, make and kernel-extra: "apt-get update; apt-get install gcc make linux-image-extra-virtual"

Choose and download correct driver from http://www.nvidia.com/Download/index.aspx, and upload it to the VM

Install the NVIDIA driver: "./NVIDIA-Linux-x86_64-346.96.run"

Download CUDA toolkit from https://developer.nvidia.com/cuda-downloads (choose deb format for smaller download)

Install the CUDA toolkit: "dpkg -i cuda-repo-ubuntu*_amd64.deb; apt-get update; apt-get install cuda" (very large install, take a long time)

Your server is ready for your application. You can install additional software (NAMD, GROMACS, ...) and your own application now

For your convenience, a script is created for installing NVIDIA + CUDA automatically https://github.com/tdviet/NVIDIA_CUDA_installer

Be sure to make a snapshot of your server for later use. You may need to suspend your server before creating snapshot (due to KVM passthrough). 
Do not terminate your server before creating snapshot, whole server will be deleted when terminated

How to enable GPGPU passthrough in OpenStack

For admins of cloud providers

On computing node, get vendor/product ID of your hardware: "lspci | grep NVDIA" to get pci slot of GPU, then "virsh nodedev-dumpxml pci_xxxx_xx_xx_x"
On computing node, unbind device from host kernel driver
On computing node, add "pci_passthrough_whitelist = {"vendor_id":"xxxx","product_id":"xxxx"}" to nova.conf
On controller node, add "pci_alias = {"vendor_id":"xxxx","product_id":"xxxx", "name":"GPU"}" to nova.conf
On controller node, enable PciPassthroughFilter in the scheduler
Create new flavors with "pci_passthrough:alias" (or add key to existing flavor) e.g. nova flavor-key  m1.large set  "pci_passthrough:alias"="GPU:2"

Progress

  • May 2015
    • Review of available technologies
    • GPGPU virtualisation in KVM/QEMU
    • Performance testing of passthrough
HW configuration: 
IBM dx360 M4 server with two NVIDIA Tesla K20 accelerators.
Ubuntu 14.04.2 LTS with KVM/QEMU, PCI passthrough virtualization of GPU cards.
Tested application:
NAMD molecular dynamics simulation (CUDA version), STMV test example (http://www.ks.uiuc.edu/Research/namd/).
Performance results:
Tested application runs 2-3% slower in virtual machine compared to direct run on tested server.
If hyperthreading is enabled on compute server, vCPUs have to be pinned to real cores so that
whole cores will be dedicated to one VM. To avoid potential performance problems, hyperthreading 
should be switched off.

  • June 2015
    • Creating cloud site with GPGPU support
Configuration: master node, 2 worker nodes (IBM dx360 M4 servers, see above)
Base OS: Ubuntu 14.04.2 LTS
Hypervisor: KVM
Middleware: Openstack Kilo
  • July 2015
    • Creating cloud site with GPGPU support
Cloud site created at keystone3.ui.savba.sk, master + two worker nodes, configuration reported above
Creating VM images for GPGPU (based on Ubuntu 14.04, GPU driver and libraries)
  • August 2015
    • Testing cloud site with GPGPU support
Performance testing and tuning with GPGPU in Openstack 
 - comparing performance of cloud-based VM with non-cloud virtualization and physical machine, finding discrepancies and tuning them
 - setting CPU flavor in Openstack nova (performance optimization) 
 - Adjusting Openstack scheduler
Starting process of integration of the site to EGI FedCloud
 - Keystone VOMS support being integrated
 - OCCI in preparation, installation planned in September
  • September 2015
 Continue integration to EGI-FedCloud
  • October 2015
 Full integration to EGI-FedCloud, being in certification process
 Support for moldyngrid, enmr.eu and vo.lifewatch.eu VO
  • Next steps
Production, application support
Cooperation with APEL team on accounting of GPUs

Back to Accelerated Computing task