HOWTO10 How to port application into EGI Federated Cloud

From EGIWiki
Revision as of 11:47, 26 March 2014 by Mhaggel (talk | contribs)
Jump to: navigation, search

How To Port your application into the EGI Federated Cloud IaaS

Scope of this page is to provide a brief guide on how to integrate your application/web service into the EGI Federated Cloud environment. Target of this page are applications/web services developers/administrators, with minimal or null knowledge of Cloud technologies.

This guide does not aims to be exhaustive of the overall problem of porting of existing application/web services to a Cloud environment, but to give only a generic overview. For more information, you can ask direct support to the EGI Users Support Mailing List.

Even if this guide aims to be as more generic as possible, the particular technology solutions cited are referred to the EGI Federated Cloud environment and may differ on other environments.

General concept

Porting of your application to a Cloud Infrastructure-as-a-Service (IaaS) environment is a task that can be performed at different levels and with different strategies, according on the type of your application/web service, its resource consumption, the effort you want to spend in adapting the application logic to the cloud environment and the advantages that the advanced cloud features could bring to your application.

At the most basic level, Cloud IaaS lets you start manually (usually in a matter of minutes) a virtual server with an assigned IP address and a specific OS. This machine behaves like a normal server. You will have administrator access to it. You can login to it with your favorite remote shell client (usually based on SSH), you can install your own software on the machine and configure the OS as you would do with a normal server. When you do not need the machine anymore, you can just delete it.

Going a step above, since Cloud is based on virtualization technologies, you have the possibility to use a custom virtual disks, which you can prepare offline. For example, instead of going into the machine after startup via SSH to install your web service, you may build a custom OS disk with the web service already installed into it, upload it to the Cloud and start it directly, having your machine up and running already configured.

You own custom OS image or a basic OS image may be also personalized at start-up by running custom configuration script. The most fundamental personalization is the setup of the credentials to administer the machine remotely (usually temporary SSH keys generated by the user). This is very important if you use basic OS images, which are shared between users and needs to be secured as soon as they start. This process is named contextualization. In theory, there is no limit on what can be done during contextualization, thus you may not only inject your own access credentials, but also install your software, update it or set it up.

Moving from a single server application/web service to a more complex application, with multiple servers hosting the different components (ex. database, web interface, load balancer, etc...), the deployment of the different servers can be performed manually (setting up each service components one by one) or automatically, by using automatic deployment tools provided by an Infrastructure Broker.

Another main feature of the Cloud IaaS is the possibility to access pro grammatically all its features via a set of standard APIs. These functions may be exploited directly by your application, which could dynamically start and stop additional servers only when they are needed (ex. under heavy workload). Such scenario usually implies a modification to the application to support the different cloud APIs and a logic to decide when to start and stop new machines. This task may be eased by using an application brokers like the one listed in the paragraph below. Using an application broker, you will need only to integrate the application inside the broker and it will take care of the dynamic deployment of the servers on the cloud.

Integration Strategy

There are many different integration strategies you can use to port your application/web service into a cloud environment. Which one to use will depend on your particular application. In this paragraph we will give a brief introduction of the most common strategies, tryiing to help you in finding the best one for your needs.

The details on how to actually implement the strategies and perform the integration in the FedCloud are reported in the next paragraph.

1. Manual server setup

This is the most basic way to port your application, and the one which uses less of the Cloud advanced features. Using the cloud client you can start a virtual server with a pre-defined OS. You can select the OS you need from a list of generic images (ex. Ubuntu 11.04, CentOS 6.4, etc...). Once you have started the server, you just login into it (via SSH) and install your own software. Then, the server is ready to be used, you can test it and give access to the final web service/application users.

In this scenario, the contextualization is used only to setup your user credentials in the machine (pushing your SSH public key), but may be also completely ignored if the cloud infrastructure itself stores your access credentials and inject them into the machine.

If your application is composed by more than one server, you can start them one by one, set them up and test connectivity between them as you would do for multiple physical servers.

In general, this solution is recommended for testing and development, for very simple self-packaged applications with minimal effort for installation and configuration and for "disposable" applications (applications/web services which are started, used for a limited time, then destroyed and never started again).

2. Basic OS image with contextualization

This is a small step above the manual server setup. The addition is the automatic installation/configuration of your application at startup. This is done by running a custom contextualization script. This script maybe nothing more than a wrapper to a Bash or Python script, which is automatically executed at startup to install and configure the application. The required application packages and data are usually retrieved by the contextualization script from a web repository.

The advantage of this solution respect to the manual setup is that you can have a faster configuration of the server and higher portability, permitting to easily migrate your application on different cloud environment. You can more easily update your deployment by just modifying the contextualization script and you can rely on externally maintained basic OS images, without the need to maintain your own custom OS instance updated. Of course, there is an effort needed to build the contextualization script, which can be not negligible especially for old and not well documented applications.

In general, this solution is the recommended for most the web service applications, which usually need to stay on 24/7, with relatively infrequent application updates and downtimes (ex. monthly).

3. Custom OS image

If you do not find your particular OS flavor into the basic OS images or it is too complex to build a contextualization script or even install manually your application, you can package your application in a custom OS image, which will be uploaded to the cloud and started on demand. This is a virtual OS disk, which can be built and tested on your own computer. You can put everything you want into this OS image (application, dependencies and data), but you need to be aware that bigger this image will be, slower will be its start on the Cloud, so a general recommendation is to reduce as more as possible the size of the image, removing unnecessary data or packages which are not needed or can be easily accessed from remote web repositories on demand.

One advantage of this solution is the possibility to build the virtual disk directly from a legacy machine, dumping the contents of the disk. Anyway, this is generally not recommended, since it poses a set of compatibility issues with hardware drivers, which usually differs from a virtual and physical environment and even between different virtual environments. Another potential advantage is the possibility to speed-up the deployment for applications with complex and big installation packages. This because you do not need to install the application at startup, but the application is already included in the machine. Anyway, this is relative to the fact that you need to keep updated your machine. Outdated OS disk images may take long time to startup due to the need to download and install the latest OS updates.

One disadvantage is that, if you are using special drivers or you are not packaging correctly the OS disk, your custom OS image may not run (or run slowly) on different cloud providers based on different virtualization technologies.

An important point to do not forget is that OS disk images images on public clouds are sometimes public, thus be aware of installing proprietary software on custom OS images, since other users may be able to run the image or download it.

If you have different application components split into multiple servers, you can prepare different OS images (one for each application components) or you can use contextualization to setup the server for the specific purpose. For example, a custom OS disk image equipped with Apache may act as load balancer or server back-end, according to the contextualization script will use to start the server.

In general, the effort to implement this solution is higher than the basic contextualization and it is recommended only if you need to start servers very frequently, with low frequency of application and system updates, if you need special OS flavors or if your application installation (including dependencies) is very complex (thus is very demanding to download and install the application each time at startup).

4. Infrastructure broker

When you have complex applications with many components hosted by different servers, which shall be located on a single cloud site or split on multiple cloud sites, an Infrastructure broker solutions may help you in the deployment of the applications. Infrastructure broker may run, usually via a web interface or custom APIs, a full deployment of multiple servers (in the order of thousands or more), using contextualization to setup the component dependencies.

Configuration of the deployments depends on the particular broker solution, but, in general, each server may be started using a basic OS images or a custom OS image. Contextualization scripts are in any case needed to orchestrate the deployment.

In general, infrastructure broker usage is recommended for complex applications which are started manually relatively frequently (ex. processing clusters for absorbing peaks in processing, web service instances for training or demonstration purposes, etc...)

5. Application broker

An application broker abstracts the Cloud APIs and frees you from the need to control the deployment of the application. Once you have integrated your application into the broker, it will take care of starting the virtual servers according to the workload, configure them and dispatch the user jobs. The effort to integrate your application into the application broker depends on the application and the broker itself. The most common process, anyway, is to use basic OS images with contextualization or a custom OS image, to which you add the application broker worker node routines.

When your application performs parallel processing, using an application broker may speed-up the application execution, since the broker will take the task to submit the processing to different servers, using a wider number of resources.

In general, usage of application brokers is recommended for asynchronous processing applications, where workload is not constant but comes in burst and there is the need to dynamically adapt the infrastructure utilization to the application needs.

Summary

The following table tries to summarize the different solutions reported in the paragraphs below

Integration Strategy Description Recommended for
Manual server setup
  • Startup of a virtual server with manual configuration and a basic OS image
  • Contextualization only for setting up users credentials
  • Multiple application components hosted on different servers are started and configured manually one by one
Test, self-cointaned applications, "disposable" applications
Basic OS image with contextualization
  • Start of a basic generic OS image with application installed on startup via contextualization script
  • No special OS flavor can be used
  • Startup can be slow for complex applications installation
Web service applications, which usually stay on 24/7, with relatively infrequent application updates (ex. monthly)
Custom OS image
  • Start of a custom virtual disk, with application already installed on it and pre-configured
  • Additional post-configuration can be done via contextualization
  • Virtual disk image should be prepared carefully to minimize disk size or driver conflicts
Applications who need specials OS flavors or complex installation procedures, applications who are started and stopped very frequently (ex. virtual servers started hourly or daily)
Infrastructure broker
  • Automate multi-server deployment on multiple clouds sites
  • Uses contextualization scripts to orchestrate deployment of different components
  • May relay on custom OS images or basic OS images for the single applications components
Multi-server applications which are started manually relatively frequently (ex. processing clusters for absorbing peaks in processing, web service instances for training or demonstration purposes, etc...)
Application broker
  • Automate server start/stop, dynamically adapting to the workload
  • Automate process splitting for parallel applications
  • May relay on custom OS images or basic OS images for the single applications components
Asynchronous processing applications, where workload is not constant but comes in burst and there is the need to dynamically adapt the infrastructure utilization to the application needs.

Porting your application to the EGI Federated Cloud

This chapter provides a step-to-step guide to port your application/web service to the cloud. The instructions reported below are separated accordingly to the porting strategy you have identified for your application. More information on the application integration strategies are reported in the previous chapter.

Note that this chapter instructions are specific to the technology solutions used in the EGI Federated Cloud.

For the nature of the EGI Cloud Federation, prior to the porting of your application to the cloud, you need to perform a set of preliminary steps, which are:

  1. Get the credentials to access the FedCloud
  2. Create your own Virtual Organization or join an existing one
  3. Select a FedCloud site to support your resources

A brief guide on how to perform these preliminary steps is reported in the Federated Cloud CLI environment configuration page.

1. Manual server setup

Step 1. Setup the command line CLI environment:

The easiest way to manually start/stop your servers (and access the other FedCloud services) is to use the FedCloud CLI tools. To do so, you need also to request an authorized account and support from a Virtual Organization. More details on how to setup the FedCloud CLI environment are reported here.

Step 2. Browse AppDB and find a basic image

AppDB is the FedCloud marketplace. You can browse it to find a basic OS image which suits your application (ex. Ubuntu 12.1, CentOS 5.2, etc...). Once you have your OS in mind, you need to get the OS image id of the FedCloud site you want to use. To do so, you can follow this guide or just ask the FedCloud site directly.

NOTE: AppDB support for VM image marketplace is still in development. To find a basic OS image, please refer to the FedCloud Stratuslab Marketplace

Step 3. Generate a set of SSH keys

In order to login into the server, you need to have a set of SSH keys. To generate a set of authentication keys, in a Linux machine, you can run

ssh-keygen -t rsa -b 2048 -f tmpfedcloud
Step 4. Create a contextualization script to inject the key

A basic contextualization script is needed to configure your access credentials into the server. You can use the following commands to create the script

cat > tmpfedcloud.login << EOF
#cloud-config
users:
  - name: cloudadm
    sudo: ALL=(ALL) NOPASSWD:ALL
    lock-passwd: true
    ssh-import-id: cloudadm
    ssh-authorized-keys:
      - `cat tmpfedcloud.pub`
EOF

NOTE: For most of the Virtual Organizations, your login SSH key can be configured in the Virtual Organization VOMS, avoiding the necessity to use a contextualization script. Anyway, if you do not know how to inject your key into your VOMS server or you are using basic OS disks not supported by your VO, it is highly recommended to use this method'

Step 5. Start the virtual server

Using the CLI environment, generate a temporary proxy and run the server using the contextualization script created in the step before, as reported in the sample commands below (for the fedcloud.egi.eu VO and the CESNET site). For more information on how to start a server using contextualization you can refer to the FedCloud FAQ page.

[myuser@mymachine ~]$ voms-proxy-init -voms fedcloud.egi.eu -out myproxy.out -rfc -dont-verify-ac 
Enter GRID pass phrase:
Your identity: /O=dutchgrid/O=users/O=egi/CN=Name Surname
Creating temporary proxy ............................... Done
Contacting  voms2.grid.cesnet.cz:15002 [/DC=org/DC=terena/DC=tcs/C=CZ/O=CESNET/CN=voms2.grid.cesnet.cz] "fedcloud.egi.eu" Done
Creating proxy ............................................. Done

Your proxy is valid until Thu Jan  9 22:22:32 2014
[myuser@mymachine ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --action create -r compute -M resource#small -M os#sl6 --auth x509 --voms --user-cred myproxy.out -t occi.core.title="test" --context user_data="file://$PWD/tmpfedcloud.login"
https://carach5.ics.muni.cz:11443/compute/957d4dfe-ac80-48da-87cc-95430122d174
Step 6. Login and setup

Now, you can check if your server has started (active state), get its IP, connect to it via SSH using the generated temporary key and install your own application. For more information you can refer to the FedCloud FAQ page.

[myuser@mymachine ~]$ occi --endpoint https://carach5.ics.muni.cz:11443/ --action describe --resource /compute/957d4dfe-ac80-48da-87cc-95430122d174 --auth x509 --voms --user-cred myproxy.out                                        
COMPUTE:
  ID:       957d4dfe-ac80-48da-87cc-95430122d174
  TITLE:    test
  STATE:    active
  MEMORY:   2.0 GB
  CORES:    1
  LINKS:

    LINK "http://schemas.ogf.org/occi/infrastructure#networkinterface":
      ID:          /network/interface/1b3eab9b-d50d-4a09-ae07-bf20eb4b2957
      TITLE:       admin
      TARGET:      /network/admin

      IP ADDRESS:  190.12.142.30
      MAC ADDRESS: fg:16:3e:4d:7d:73

[myuser@mymachine ~]$ ssh -i tmpfedcloud cloudadm@190.12.142.30
Last login: Sat Dec  7 03:13:02 2013 from 65.148.66.1
[cloudadm@test ~]$ sudo su -
[root@test ~]$ 

2. Basic OS image with contextualization

Step 1. Setup the command line CLI environment:

The easiest way to manually start/stop your servers (and access the other FedCloud services) is to use the FedCloud CLI tools. To do so, you need also to request an authorized account and support from a Virtual Organization. More details on how to setup the FedCloud CLI environment are reported here.

Step 2. Browse AppDB and find a basic image

AppDB is the FedCloud marketplace. You can browse it to find a basic OS image which suits your application (ex. Ubuntu 12.1, CentOS 5.2, etc...). Once you have your OS in mind, you need to get the OS image id of the FedCloud site you want to use. To do so, you can follow this guide or just ask the FedCloud site directly.

NOTE: AppDB support for VM image marketplace is still in development. To find a basic OS image, please refer to the FedCloud Stratuslab Marketplace

Step 3. Start the basic image

For building up your contextualization script, you need to start a test server. You can do this in three different ways:

  1. Start manually a test server on the cloud, using the instruction reported in the paragraph "1. Manual server setup"
  2. Download the VM basic disk image from AppDB and run it on your local machine (You can use any virtualization server you want, VirtualBox is recommended solution)

The second option permits to have more control on the test server, with the possibility to perform snapshots, etc..., but requires a minimal knowledge of virtualization technologies, thus is not recommended for normal skilled users.

Step 4. Build a deployment script

This step is optional, but recommended for portability of your application. A deployment script is an automated script which installs, on a clean OS, all the dependencies of your web service/application, the web service/application itself, the data required for the application to run, configures it and test it.

The deployment script usually downloads the application packages, dependencies and data from a remote repository.

A deployment script is usually written in a scripting language, such as Bash or Python and runs as root user in the machine. The easiest way to build this script is to copy all the commands you perform from the shell to install the software. A sample deployment script is here.

You can test the deployment script from the VM you have started. It is recommended to reset the VM (reverting back to a clean OS snapshot or recreating the server) each time you run the deployment script.

To use the deployment script from your VM, it is recommended to upload it into a remote public repository. If you do not have one, you can use the FedCloud repository or services like pastebin.

Step 5. Build a contextualization script

EGI Federated Cloud uses CloudInit as contextualization system. CloudInit offers a huge set of features to customize your machine at startup. A detailed documentation is provided here.

As example, if you have a deployment script stored on a remote repository, you can use the following CloudInit script to run it on the VM

Content-Type: multipart/mixed; boundary="===============4393449873403893838=="
MIME-Version: 1.0

--===============4393449873403893838==
Content-Type: text/x-include-url; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="include.txt"

#include
http://appliance-repo.egi.eu/images/contextualization/Test-0.1/Test-deployment.sh

--===============4393449873403893838==
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"

#cloud-config
users:
 - name: cloudadm
   sudo: ALL=(ALL) NOPASSWD:ALL
   lock-passwd: true
   ssh-import-id: cloudadm
   ssh-authorized-keys:
    - <your SSH key>

--===============4393449873403893838==--

You can, of course, add more multiple deployment scripts to the include section to setup the different components of your application.

Step 6. Start the server

To test the successful deployment on the cloud, you can start your server using the contextualization script created in the previous step. To do so, you can refer to the FedCloud FAQ page.

3. Custom OS image

Custom OS images can be obtained in different ways. The two main possibilities are to start from scratch, creating a virtual machine, installing an OS and the software on top of it, then taking the virtual machine OS disk as custom image, or to dump an existing disk from a physical server and modify it, if needed, to run on a virtualization platform. In this guide we will focus on the first option, because it tends to produce cleaner images and reduces the risks of hardware conflicts.

NOTE: Basic knowledge of virtualization technologies is required for this part of the guide

Step 1. Create a virtual machine on your local PC

As pre-requisite to this step, you need to install a virtualization server (hypervisor) into your machine. Many different hypervisor technologies exists, here we recommend the usage of Oracle VirtualBox.

After you have installed and configured your hypervisor, you need to create a new machine. Select a Thin disk (space for the disk is not allocated if not used) for the OS, whith a limited size (ex. 10GB). Then, run the VM and install your OS on top of it. For Linux, during the installation, it is good practice to do not use LVM or Swap partitions. To keep the size of the VM low, it is highly recommended to install only a Minimal version of the OS, and then add the required features for your application later.

NOTE: Instead of installing a new OS in the virtual machine, you can download an existing basic OS image from the marketplace and run it on your local PC, as explained in the previous chapter.

Step 2. Configure the network and contextualization on the VM

If you installed the OS from scratch, you will probably need to setup the OS to dynamically configure the network. To do so, you need to enable DHCP protocol and, for linux, disable udev rule generation (in order to ignore changes in network virtual hardware). Check your OS administration guide on how to perform these tasks.

If you are going to use contextualization with your VM, you can setup a contextualization script. We recommend the use of CloudInit, available for many OS distributions here.

Step 3. Install your software in the machine

In the newly created VM you can install your application/web service and test it. When everything is installed, it is recommended to optimize the machine, by removing all the unnecessary services, packages, etc...

Step 4. Package the VM

Prior extracting the VM disk, it is recommended to zero the disk not used space. You can do that via tools like SDelete for Windows or BleachBit for Linux.

After the cleaning of the disk, shut down your VM and export it in OVF format. If the contents of your VM are private (ex. proprietary software is installed on the VM), you can crypt the image using a fixed random pass-phrase using GPG.

Step 5. Upload the image to the FedCloud repository

After you have a VM, you need to upload it to a remote repository. You can use your own repository or the EGI Federated Cloud repository (as indicated here).

Step 6. Register the virtual appliance and its associated image(s) in AppDB

For this step, you can follow this guide.

Step 7. Start the server

After your image is correctly uploaded and registered, the FedCloud Site who is supporting your VO will register into the system and assign to it a given OS disk image ID. Now, you can start the server as you would have done for a basic image. For more info, you can refer to the FedCloud FAQ page.

4. Infrastructure broker

Integration of your application into an infrastructure broker depends on the broker solution itself. In general, if possible, before selecting an infrastructure broker, it is always recommended to first test the single components separately, for example manually starting a server into the FedCloud.

For information about the infrastructure broker solutions which supports the FedCloud (and other cloud technologies), you can refer to the FedCloud Brokering page.

5. Application broker

Integration of your application into an application broker depends on the broker solution itself. Application brokers may offer a sort of PaaS environment for the applications (ex. Grid clusters, Hadoop clusters, etc...) or integrate applications via wrappers written in Java or other programming languages. For more details about the application broker solutions which supports the FedCloud (and other cloud technologies), you can refer to the FedCloud Brokering page.