Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Galaxy workflows with EC3

From EGIWiki
Jump to navigation Jump to search
Overview For users For resource providers Infrastructure status Site-specific configuration Architecture




Galaxy workflows with EC3

Introduction

This guide is intended for researchers who want to use Galaxy, an open web-based platform for data-intensive research, in the cloud-based resources provided by the EGI Long Tail of Science Platform.

Objectives

In this guide we will show how to:

  • Deploy an elastic cluster of VMs in the RECAS-BARI resource of the EGI Federation
  • Install a Galaxy web-based platform for data intensive research with a Torque Server as backend
  • Install CLUE as elasticity manager
  • Install custom Galaxy tool (booties2)
  • Download and register in Galaxy the Escherichia Coli Genome

Virtual Elastic Cluster: components

The Elastic Cloud Computing Cluster (EC3) is a framework to create elastic virtual clusters on top of Infrastructure as a Service (IaaS) providers is composed by the following components:

  • Infrastructure Manager (IM) is a tool that eases the access and the usability of IaaS clouds by automating the VMI selection, deployment, configuration, software installation, monitoring and update of Virtual Appliances. It supports APIs from a large number of virtual platforms, making user applications cloud-agnostic. In addition it integrates a contextualization system to enable the installation and configuration of all the user required applications providing the user with a fully functional infrastructure.
  • Resource and Application Description Language (RADL) is a tool used to specify the requirements of the resources where the scientific applications will be executed. It must address not only hardware (CPU number, CPU architecture, RAM size, etc.) but also software requirements (applications, libraries, data base systems, etc.). It should include all the configuration details needed to get a fully functional and configured VM (a Virtual Appliance or VA). It merges the definitions of specifications, such as OVF, but using a declarative scheme, with contextualization languages such as Ansible. It also allows describing the underlying network capabilities required.
  • CLUE is an energy management system for High Performance Computing (HPC) Clusters and Cloud infrastructures. The main function of the system is to power off internal cluster nodes when they are not being used, and conversely to power them on when they are needed. CLUES system integrates with the cluster management middleware, such as a batch-queuing system or a cloud infrastructure management system, by means of different connectors.
  • EC3 as a Service (EC3aaS) is a web service offered to the community to facilitate the usage of EC3 to non-experienced users. Anyone can access the website and try the tool by using the user-friendly wizard to easily configure and deploy Virtual Elastic Clusters on multiple Clouds.

Create Galaxy workflows on the LToS platform

Galaxy, web-based platform for data-intensive research, is one of the applications available in the EGI LToS platform. The platform is accessible through this portal and offers grid, cloud and application services from across the EGI community for individual researchers and small research teams.

Configuration and Deployment of a Cluster

To configure and deploy a Virtual Elastic Cluster using EC3aaS, a user accesses the front page and clicks on the "Deploy your cluster!" link as shown in figure:

EC3 frontpage.png

A wizard will guide the users during the configuration process of the cluster, allowing to configure details like the operating system, the characteristics of the nodes, the maximum number of nodes of the cluster or the pre-installed software packages.

Specifically, the general wizard steps are:

  1. Provider Account: user can select the OCCI endpoint of the provider where deploy the elastic cluster. OCCI endpoints are dynamically fetched from the EGI Application DataBase
  2. Operating System: user can choose the OS of the cluster, by using a select box where the most common OS are available or by indicating a valid AMI/VMI identifier for the Cloud selected.
  3. Instance details: the user must indicate the instance details.
  4. LRMS Selection: the user can choose the Local Resource Management System preferred to be automatically installed and configured by EC3. Currently, SLURM, Torque, Sun Grid Engine and Mesos are supported.
  5. Software Packages: a set of common software packages is available to be installed in the cluster, such as Docker Engine, Spark, Galaxy, OpenVPN, BLCR, GNUPlot, Tomcat or Octave. EC3 can install and configure them automatically in the contextualization process. If the user needs another software to be installed in his cluster, a new Ansible recipe can be developed and added to EC3 by using the CLI interface.
  6. Cluster size: the maximum number of node of the cluster, without incluind the front-end. This value indicates the maximum number of working nodes that the cluster can scale. Remember that, initially the cluster only is created with the front-end, and the nodes are powered on on-demand.
  7. Resume and Launch: a summary of the chosen configuration of the cluster is showed to the user at the last step of the wizard, and the deployment process can start by clicking the Submit button.