Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

VT-CloudCaps:BestPractices

From EGIWiki
Revision as of 14:07, 30 August 2013 by Bjoernh (talk | contribs)
Jump to navigation Jump to search

This page details the dos and don'ts of image creation when it comes to leveraging various cloud capabilities and achieving multi-platform compatibility.

Basics

A number of recipes for basic image creation can be found on the internet. Each of these may or may not be biased towards a particular environment, e.g. cloud stack, at which the images are targeted. The goal of this document is to try and distinguish platform specific and generic rules from each other. We will provide links to these recipes and wherever applicable provide our own views about image creation.

Tools

A tool to create basic images, primarily targeted at the OpenNebula platform, is available from CERIT-SC[1]. It provides scripts to create Debian images of versions 6 (squeeze) and 7 (wheezy).

Generic documentation about how to create an image and craft it into an AMI image is available from the OpenStack documentation[2], which is not self-contained, but requires additional Ramdisk and Kernel images to run. This latter part of the approach will not be transferable among cloud sites in most situations. Therfore, we recommend to do the installation as described in the initial part of the documentation, but not extract the EXT4 partition. Thus, you are going to create a self-contained image that is very likely to run in all providers.

Compressed images can be transferred faster and thus may speed up start up time for newly distributed images. A good means to reduce the size of a compressed image is to set unused blocks in the image's file system to a fixed value, typically 0. This can be done by using the zerofree tool [3], which zeroes out any free space in the file system in a smart manner. On a newly installed Debian wheezy, we have found compressed images to be only about half the size of compressed images without using zerofree. Considering images of several GB in size, this can make a difference.

Image Layout

There are several ways of creating images that will all run in one environment or another. With OpenStack, one would usually craft a single file system image. The virtual instance is then run with a combination of Kernel, Ramdisk, and file sytem images. Another way is to provide a full disk image with bootable Kernel for full virtualization. This seems to be the most interoperable way.

There are some aspects to consider when using full disk image. For example, LVM may cause problems in some environments. OpenStack for instance, tries to inject the user's SSH key into the file system. However, when using LVM for the root file system, OpenStack will not be able to do that. Therefore, the key would have to be pulled from the metadata service, which again means to alter the image to cater for that.

It is safe not to use LVM for the root file system, making it easier for the cloud management platform to do minor alterations of the image as needed.

Local Adaptations and Optimizations

Syslog

At least in OpenStack, IP addresses are leased for very short terms, i.e. 45-60 seconds. You may want to filter the messages about IP address renewal with a configuration file like this:

/etc/rsyslog.d/25_dhclient.conf

:programname, contains, "dhclient" /var/log/dhclient.log
& ~

This will write dhclient messages to /var/log/dhclient and avoid writing them to /var/log/{syslog,messages}. Following this, you will want to add this file for log rotation, which can be as simple as adding it among the list of files in /etc/logrotate.d/rsyslog.

cloud-init

In Debian, cloud-init tries to use all supported data sources by default. This may lead to long timeouts when querying metadata services that are not available. You should reduce the number of supported data sources by reconfiguring the cloud-init package

# dpkg-reconfigure cloud-init

Also in Debian, cloud-init by itself will not be sufficient if the image is supposed to be resized at instantiation time. In order to do that, you'll need to additionally install

  • cloud-utils
  • cloud-initramfs-

Like cloud-init, these are available from the backports repositories.

Virtual Consoles

You will usually log into your VM via an SSH connection. Therefore, the TTYs that are usually created at system boot are not needed. You can deactivate them depending on the distribution you're running:

  • For Debian, remove or comment them in /etc/inittab
  • For Ubuntu, remove them from /etc/init/tty[2-6]

You may want to leave the first of them running in order to access your VM via VNC.

Platform-specific practices

OpenNebula

If you want to leverage some platform-specific features provided by OpenNebula and use workflows endorsed by its community, you should follow a few recommendations regarding image creation. This is especially important for users coming to OpenNebula from other cloud platforms, such as OpenStack. In order to understand these recommendations, you first have to understand how OpenNebula works with block storage, images and virtual machines in general.

Images

When you upload a file containing the file system of a virtual machine (image of a single partition or the whole disk, it doesn't matter at this point) to OpenNebula and register it as an IMAGE, it becomes attachable but not resizeable. This means you will be able to create a virtual machine using this IMAGE but you won't be able to change its size during instantiation or expand its file system to fill a bigger image.

The only way to add more block storage to a virtual machine is to create a special empty IMAGE using type DATABLOCK and attach it as a new block device. This means that storage space can be added only to a new mount point inside the virtual machine.

Virtual machine templates

Virtual machine TEMPLATE in OpenNebula is a connecting point for all virtualized resources used by the virtual machine instance launched from this TEMPLATE. It's the place where you can specify the number of CPU cores, amount of RAM, images that should be connected to this virtual machine and much more ... Here lies the difference between OpenNebula and other cloud platforms, such as OpenStack. There is no way to specify SIZE for existing images, they will be simply copied from the image repository as they are and attached to the virtual machine instance before the boot process begins. Hence the amount of block storage cannot be specified upon instantiation and it is tied to the size of existing images in the image repository that will be connected to the virtual machine instance.

Since this would be a massive drawback for the usability of OpenNebula, there is a way to create block storage "on-the-fly" upon instantiation of the virtual machine (or, in fact, during its runtime). You can simply specify that, in addition to all the existing images, you want to also attach empty block storage devices of the chosen size and file system type. These block devices will appear in the virtual machine as regular block devices and will have to be mounted to specific mount points before use. See VT-CloudCaps:BestPractices#Examples for some examples of this functionality.

Examples

  • Example #1
I have a raw disk image containing an MBR and a single partition
with the root file system of an arbitrary Linux distribution.

This image has the following attributes:
 -- 4 GB of space in total
 -- 2 GB taken by the OS installation
 -- no fs encryption, no LVM, no swap
 -- credentials for root access are already included
 -- everything is stored in a single file my_cloud_vm.img

Once I upload and register this file as an IMAGE in OpenNebula, it
becomes available as IMAGE_ID = 1 and I will use it to instantiate
a virtual machine.

After a successful login into the running virtual machine, I realize
that there is _NOT_ enough space to install the software or copy the
data I need to run my computing job.

The only way to solve this problem /without expanding, uploading
and registering a new (and bigger) IMAGE/ is to attach a DATABLOCK
and mount it inside the virtual machine as a new mount point.

There is no way to expand the existing root file system or IMAGE
itself once it is registered in OpenNebula.

So I create an empty DATABLOCK, attach it to the virtual machine
instance, mount the resulting block device to /data and install
my software/copy my data to this location without taking up space
in the root file system.
  • Example #2
I have a qcow2 disk image containing an MBR and a single partition
with the root file system of an arbitrary Linux distribution.

This image has the following attributes:
 -- 20 GB of space in total
 -- 2 GB taken by the OS installation
 -- no fs encryption, no LVM, no swap
 -- credentials for root access are already included
 -- everything is stored in a single file my_cloud_vm.qcow2 taking
    up only 2GB of disk space

Once I upload and register this file as a persistent IMAGE in OpenNebula, it
becomes available as IMAGE_ID = 2 and I will use it to instantiate
a virtual machine.

After a successful login into the running virtual machine, I realize
that there is enough space to install the software and copy the
data I need to run my computing job.

Since this image is persistent, it will be automatically overwritten in the
image repository once I shut down the virtual machine.

The next virtual machine instance using this image I launch will take forever
to deploy, since the image has already expanded to its full physical capacity
of 20 GB.

This is not the right way to use cloud capabilities in OpenNebula!
  • Example #3
Let's say that I want to make Example #1 more dynamic. How can I mount the
new block device inside the virtual machine automatically?

This is when contextualization comes in handy. This example will use CloudInit
(or cloud-init, if you like this notation more) installed inside the virtual
machine and OpenNebula's native contextualization feature. CloudInit can be
installed from Ubuntu repositories or from EPEL for RHEL-based distributions.

In addition to CloudInit, I have to register the following init-script inside the virtual machine:

#!/bin/sh -e

### BEGIN INIT INFO
# Provides:        opennebula
# Required-Start:  $network $mountall
# Required-Stop:   $network 
# Default-Start:   2 3 4 5
# Default-Stop:    0 1 6
# Short-Description: Start opennebula contextualisation script
### END INIT INFO

PATH=/sbin:/bin:/usr/sbin:/usr/bin

. /lib/lsb/init-functions

case $1 in
	start)
		# generate new host SSH keys
		if [ -x /usr/sbin/dpkg-reconfigure ] && \
			! [ -f /etc/ssh/ssh_host_dsa_key ] ;
		then
			dpkg-reconfigure openssh-server
		fi

		for i in xvdb sdb hdb vdb cdrom; do
			if mount -t iso9660 /dev/$i /mnt; then
				break;
			fi
		done
 
		if [ -f /mnt/context.sh ]; then
			  . /mnt/init.sh
		fi
 
		umount /mnt
 
		;;
	stop)
		;;
	*) 
		echo "Usage: $0 {start|stop}"
		;;
esac

And add the following to the virtual machine template in OpenNebula:

CONTEXT=[
  USER_DATA="#cloud-config
# see https://help.ubuntu.com/community/CloudInit

# always leave the [xvd|vd|hd]b device unused
mounts:
- [xvdc,none,swap,sw,0,0]
- [xvdd,/data,ext4,'noatime,nodiratime,nosuid,data=writeback,barrier=0,errors=remount-ro']
"
]

Now, when I start the virtual machine with additional block devices xvdc and xvdd, they will be automatically mounted as swap and /data inside the virtual machine.

Summary

  • DO make images as small as possible!
  • DO use compressed and copy-on-write image formats!
  • DO make use of available contextualization options!
  • DO make use of available object storage services for your data!
  • DO install additional software to mounted DATABLOCK block devices!
  • DON'T use images to store big data!
  • DON'T use install-all-in-one images!
  • DON'T upload small copy-on-write images just to expand them later!
  • DON'T include swap partitions in images!

OpenStack

Checklist

References