- Start Date:
- End Date: 12/09/2014
- EGI.eu contact:
- External contact:
TRUFA (Transcriptomes User-Friendly Analysis) is a free webserver designed to help researchers in genomics to perform RNA-seq analysis. TRUFA aims to be a public service targeting a broad audience, namely comparative biology, biomedicine, bioinformatics and others. Sequencing methods are becoming cheaper and thus it is foreseeable that researchers studying different organisms could make use of this service. Having a full pipeline ready to be used will allow not only experts from the field but non-experts as well, to have at their disposal a robust set of tools in the transcriptomics area.
Exploting FedCloud resources from the TRUFA Portal
The TRUFA pipeline is currently under verification. The verification should be completed by the end of September 2014. The portal is already developed and is currently connected to and HPC infrastructure at IFCA. The goal is to make use of local resources and from EGI Federated Cloud resource providers as well, in order to exploit the flexibility offered by cloud technology. The portal will instantiate a VM with the HW characteristics able to satisfy the user request, so we will have better resource utilization. This can be done in the FedCloud mapping the user requests to a set of HW characteristics. The HW characteristics will be the input for the rOCCI client to instantiate the VM. The VM will be ended when the computation will be completed. The mapping between the user requests and the VM HW characteristics has to be done by the portal.
The first tests will be executed in the IFCA FedCloud site when the HPC resources will be moved to it. In a second phase, the portal should be able to instantiate VMs on other FedCloud sites. During the second phase, we foresee for a technical issue related to data transfer. The user will upload data on the portal and the data will be stored at IFCA site (assuming that the public portal stays at IFCA). The data will be easily accessible by VMs running at IFCA site. On the other hand, if the VM is hosted elsewhere in the FedCoud, we should move this data from the IFCA to the selected FedCloud site. We could/should look for a solution to directly upload the input data on the selected FedCloud site.
- 1 VM for the portal: 1 GB of memory and 1 CPU core. Disk to host the user’s input (hundreds of GB).
- VMs for the pipeline execution: according to user request. Maximum estimated size: 48 cores, 128 GB of RAM and some hundreds of GB for the storage.
- Input data are usual in the order of hundreds of GB and the size of the ouput is, more or less, 3 times the input. Amount of Input data will depend on a case by case though.
- Jochen B W Wolf. Principles of transcriptome analysis and gene expression quantification: an RNA-seq tutorial. Molecular ecology resources, 13:559-572, April 2013. ISSN 1755-0998. doi: 10.1111/1755-0998.12109. URL: http://www.ncbi.nlm.nih.gov/pubmed/23621713.