VT Life Science Data Integration
Main | Members | Workplan | Meetings | Actions |
General Project Information
- Project title : Integrating ELIXIR reference datasets within the European Grid Infrastructure
- Proposers : Fotis E. Psomopoulos, Giacinto Donvito
- Coordinator : Fotis E. Psomopoulos
- Start Date :
- End date :
Motivation
There has been significant work done in the EGI in the past to help the deployment and discovery of services, where “services” can be either computationally oriented (such as batch queues) or application oriented (such as web-services, ready-to-use applications embedded in portal gateways or encapsulated in Virtual Machine Images). However in bioinformatics many services used for analysis purposes rely on public reference datasets. Reference dataset are getting big and users struggle to discover, download and compute with them. There is an increasing demand to compute the data where the reference datasets are located. EGI members already host some biological reference datasets across the infrastructure, however currently EGI neither provides discovery capabilities for available datasets, nor provides guidelines for those who wish to use these datasets or would like to replicate additional datasets onto EGI sites. The project will facilitate the discovery of existing reference datasets in EGI and will develop and deploy services that allows the replication of life science reference datasets by data providers, resource providers and researchers, and the use of these datasets by life science researchers in analysis applications.
Tasks
- Identify existing life science datasets in EGI
- Identify reference datasets for replication
- EGI AppDB extension to a dataset registry
- Tools for data replication
- Analysis tools to work with data replicas
- Integration with ELIXIR Registry