FedCloudTRUFA

From EGIWiki
Jump to: navigation, search
Overview For users For resource providers Infrastructure status Site-specific configuration Architecture



Federated Cloud Communities menu: Home Production use cases Under development use cases Closed use cases High level tools use cases



General Information

  • Status: Assessed
  • Start Date: 12/09/2014
  • End Date: -
  • EGI.eu contact: Diego Scardaci / diego.scardaci@egi.eu, Nuno Ferreira / nuno.ferreira@egi.eu
  • External contact: Jesus Marco de Lucas / marco@ifca.unican.es , Etienne Kornobise / ekornobis@gmail.com

Short Description

TRUFA (Transcriptomes User-Friendly Analysis) is a free webserver designed to help researchers in genomics to perform de novo RNA-seq analysis. TRUFA aims to be a public service targeting a broad audience, namely comparative biology, biomedicine, bioinformatics and others. Sequencing methods are becoming cheaper and thus it is foreseeable that researchers studying models as well as non-models organisms could make use of this service. Having such pipeline ready for use will allow any biologist to perform RNA-seq analysis in a user-friendly way.

Use Case

Exploiting FedCloud resources from the TRUFA Portal.

The TRUFA pipeline is currently under verification. The verification should be completed soon. The portal is already developed and is currently connected to and HPC infrastructure at IFCA. The goal is to make use of local resources and from EGI Federated Cloud resource providers as well, in order to exploit the flexibility offered by cloud technology. The portal will instantiate a VM with the HW characteristics able to satisfy the user request, so we will have better resource utilization. This can be done in the FedCloud mapping the user requests to a set of HW characteristics. The HW characteristics will be the input for the rOCCI client to instantiate the VM. The VM will be ended when the computation will be completed. The mapping between the user requests and the VM HW characteristics has to be done by the portal.

The first tests will be executed in the IFCA FedCloud site when the HPC resources will be moved to it. In a second phase, the portal should be able to instantiate VMs on other FedCloud sites. During the second phase, we foresee for a technical issue related to data transfer. The user will upload data on the portal and the data will be stored at IFCA site (assuming that the public portal stays at IFCA). The data will be easily accessible by VMs running at IFCA site. On the other hand, if the VM is hosted elsewhere in the FedCoud, we should move this data from the IFCA to the selected FedCloud site. We could/should look for a solution to directly upload the input data on the selected FedCloud site.

Requirements:

  • 1 VM for the portal: 1 GB of memory and 1 CPU core. Disk to host the user’s input (hundreds of GB).
  • VMs for the pipeline execution: according to user request. Maximum estimated size: 48 cores, 128 GB of RAM and some hundreds of GB for the storage.
  • Input data are usual in the order of hundreds of GB and the size of the ouput is, more or less, 3 times the input. Amount of Input data will depend on a case by case though.

Additional Files