Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "Federated Cloud Virtual Machine Image Preparation"

From EGIWiki
Jump to navigation Jump to search
 
(8 intermediate revisions by the same user not shown)
Line 33: Line 33:
* '''DO''' adjust the size of the images as much as possible. As stated above, empty space can be allocated on runtime easily
* '''DO''' adjust the size of the images as much as possible. As stated above, empty space can be allocated on runtime easily
* '''DO''' use compressed image formats, like qcow2 or vmdk (used in OVA) to minimize the size of the image
* '''DO''' use compressed image formats, like qcow2 or vmdk (used in OVA) to minimize the size of the image
* '''DO''' fill with 0 the empty disk space of your image so it can be compressed, e.g. using <code>dd if=/dev/zero of=/bigemptyfile bs=4096k; rm -rf /bigemptyfile</code>
* '''DO''' fill with 0 the empty disk space of your image so when compressed it will be significantly reduced, e.g. using <code>dd if=/dev/zero of=/bigemptyfile bs=4096k; rm -rf /bigemptyfile</code>. Check [http://splatoperator.com/2012/07/compacting-a-vmdk-virtual-machine-disk-format-image/ Compacting a vmdk virtual machine disk format image] for more info.


For the disk layout is recommended to use a single partition (no /boot, no swap) and to avoid LVM. This will allow the cloud provider to easily resize your partition when instantiated and to modify files in it if needed.
For the disk layout is recommended to use a single partition (no /boot, no swap) and to avoid LVM. This will allow the cloud provider to easily resize your partition when instantiated and to modify files in it if needed.
Line 39: Line 39:
= Contextualization and credentials =
= Contextualization and credentials =


You can prepare fully customised Virtual Appliances and make them available to the sites supporting your VO.  
'''Your images should never include any credentials on them'''. Instead you should use contextualization.
[https://cloudinit.readthedocs.io/en/latest/ cloud-init] is a tool that will simplify the contextualization process for you. This is widely available as packages in major OS distributions and is supported by all the providers of the EGI Federated Cloud and most of the commercial providers.


# First, prepare a Virtual Machine Image (VMI) that encapsulates your application.
[https://cloudinit.readthedocs.io/en/latest/ cloud-init] documentation contains detailed examples on how to create users, run scripts, install packages and several other actions supported by the tool.
#* There are several methods for preparing the image, check  the [[Federated_Cloud_Virtual_Machine_Image_Preparation|VMI guide]] for tips
 
#* We use [https://packer.io packer] with [https://www.virtualbox.org VirtualBox], which can run easily on your computer
For complex setups, specially when applications involve multiple VMs it is recommendable to use cloud-init to bootstrap some [https://en.wikipedia.org/wiki/Comparison_of_open-source_configuration_management_software Configuration Management Software] that will manage the configuration of the VMs during runtime.
#* Beware images should not contain any credentials, use [https://cloudinit.readthedocs.io/en/latest/ cloud-init] and follow [[Virtual_Machine_Image_Endorsement#Hardening_guidelines |hardening guidelines]].
#Make the VMI available online, for example in the [http://appliance-repo.egi.eu/images/ EGI Appliance Repository]
#* See [[FAQ10#How_can_I_upload_a_VM_image_to_the_EGI_FedCloud_repository | How can I upload a VM image to the EGI FedCloud repository]] entry in the FAQ
#* If you don't use the EGI Appliance Repository, please ensure that the server used has enough bandwidth to allow sites download the image.
# Register the VMI as a new Virtual Appliance in the [http://appdb.egi.eu EGI Applications Database]
#* See  [https://wiki.appdb.egi.eu/main:faq:how_to_register_a_virtual_appliance AppDB how to register a virtual appliance documentation]
# Once your VA is published, inform your VO through Applications Database about it.
#*  [https://wiki.appdb.egi.eu/main:guides:guide_for_managing_virtual_appliance_versions_using_the_portal Check the guide for managing VA versions]
#* VO-wide image lists can be managed by users that have the ''VO Manager'', ''VO Expert'' or ''VO deputy'' roles within the VO.
# Once your appliance is in the VO-wide image list, it will be deployed on the Federated Cloud sites of your VO.


= Security =
= Security =


* '''Always remove all default passwords and accounts from your VM.'''
* Disable all services unless necessary for the intended tasks
* Make sure the firewall config (iptables for Linux, also on IPv6) is minimally open
* Put no shared credentials (passwords) in any image
You should also follow the best practice guides for each service that's exposed to the outside world. See for example guides for:
*[http://wiki.centos.org/HowTos/Network/SecuringSSH SSH]
*[https://www.owasp.org/index.php/Securing_tomcat Tomcat]
The [[Virtual_Machine_Image_Endorsement#Hardening_guidelines|hardening guidelines]] contain some extra tests that may be useful to run when preparing an image.
= Tools =
Whenever possible, automate the process of creating your images. This will allow you to:
* get reproducible results
* avoid tedious manual installation steps
* quickly produce updated versions of your images.
EGI uses [https://packer.io packer] as a tool for automating the creation of our base images. This tool can use [https://www.virtualbox.org/ VirtualBox] as hypervisor for the creation of the images and guarantees identical results under different platforms and providers.
Check the [https://github.com/EGI-FCTF/VMI-endorsement VMI-endorsement github repo] with all the packer recipes used to build our images and re-use them as needed for your images.
<!--
= Workflow =  
= Workflow =  


Line 62: Line 76:




<!--
Custom Virtual Machine Images (VMIs) are  
Custom Virtual Machine Images (VMIs) are  



Latest revision as of 12:00, 24 July 2017

Overview For users For resource providers Infrastructure status Site-specific configuration Architecture




Overview

Packaging your application in a custom VM image is a suggested solution in one of the following cases:

  • your particular OS flavor is not available into the existing image catalogue
  • installation of your application is very complex and time-consuming for being performed during contextualization
  • you want to reduce the number of 'moving-parts' of your software stack and follow an immutable infrastructure approach for deploying your application.

Custom VM images can be crafted in different ways. The two main possibilities are:

  • start from scratch, creating a virtual machine, installing an OS and the software on top of it, then taking the virtual machine OS disk as custom image, or
  • dump an existing disk from a VM or physical server and modify it, if needed, to run on a virtualization platform.

In this guide we will focus on the first option, because it tends to produce cleaner images and reduces the risks of hardware conflicts. Snapshotting may be also restricted by the cloud providers or by security policies.

Advantages

  • Possibility to build the virtual disk directly from a legacy machine, dumping the contents of the disk.
  • Possibility to speed-up the deployment for applications with complex and big installation packages. This because you do not need to install the application at startup, but the application is already included in the machine.

Disadvantages

  • Build a virtual disk directly from a legacy machine poses a set of compatibility issues with hardware drivers, which usually differs from a virtual and physical environment and even between different virtual environments.
  • You need to keep updated your machine. Outdated VM disk images may take long time to startup due to the need to download and install the latest OS updates.
  • If you are using special drivers or you are not packaging correctly the disk, your custom VM image may not run (or run slowly) on different cloud providers based on different virtualization technologies.
  • VM images images on public clouds are sometimes public, thus be aware of installing proprietary software on custom S images, since other users may be able to run the image or download it.
  • In general, the effort to implement this solution is higher than the basic contextualization.

Image size and layout

The bigger the VM image, the longer it will take to be distributed to the cloud providers and the longer it will take to be started on the infrastructure. As a general rule, always try to make images as smaller as possible following these guidelines:

  • DO NOT include (big) data in your image. There are other mechanisms for accessing data from your VM: block and object storage, or solutions like CVMFS
  • DO NOT include (big) empty space or swap in your image. Extra space for your computation or swap can be added with block storage once the VM is booted or using VM flavors that have extra disk allocated for your VM.
  • DO NOT install un-needed software. Tools like GUI are of no-use on most cases since you will have no access to the graphical console of the VM.
  • DO adjust the size of the images as much as possible. As stated above, empty space can be allocated on runtime easily
  • DO use compressed image formats, like qcow2 or vmdk (used in OVA) to minimize the size of the image
  • DO fill with 0 the empty disk space of your image so when compressed it will be significantly reduced, e.g. using dd if=/dev/zero of=/bigemptyfile bs=4096k; rm -rf /bigemptyfile. Check Compacting a vmdk virtual machine disk format image for more info.

For the disk layout is recommended to use a single partition (no /boot, no swap) and to avoid LVM. This will allow the cloud provider to easily resize your partition when instantiated and to modify files in it if needed.

Contextualization and credentials

Your images should never include any credentials on them. Instead you should use contextualization. cloud-init is a tool that will simplify the contextualization process for you. This is widely available as packages in major OS distributions and is supported by all the providers of the EGI Federated Cloud and most of the commercial providers.

cloud-init documentation contains detailed examples on how to create users, run scripts, install packages and several other actions supported by the tool.

For complex setups, specially when applications involve multiple VMs it is recommendable to use cloud-init to bootstrap some Configuration Management Software that will manage the configuration of the VMs during runtime.

Security

  • Always remove all default passwords and accounts from your VM.
  • Disable all services unless necessary for the intended tasks
  • Make sure the firewall config (iptables for Linux, also on IPv6) is minimally open
  • Put no shared credentials (passwords) in any image

You should also follow the best practice guides for each service that's exposed to the outside world. See for example guides for:

The hardening guidelines contain some extra tests that may be useful to run when preparing an image.

Tools

Whenever possible, automate the process of creating your images. This will allow you to:

  • get reproducible results
  • avoid tedious manual installation steps
  • quickly produce updated versions of your images.

EGI uses packer as a tool for automating the creation of our base images. This tool can use VirtualBox as hypervisor for the creation of the images and guarantees identical results under different platforms and providers.

Check the VMI-endorsement github repo with all the packer recipes used to build our images and re-use them as needed for your images.