Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @

Difference between revisions of "VT Life Science Data Integration"

From EGIWiki
Jump to navigation Jump to search
Line 17: Line 17:
== Tasks ==
== Tasks ==

#Identify existing life science datasets in EGI
#[[EGI_ELIXIR_Pilot_Workplan#Identify_existing_life_science_datasets_in_EGI | Identify existing life science datasets in EGI]]
#Identify reference datasets for replication
#[[EGI_ELIXIR_Pilot_Workplan#Identify_reference_datasets_for_replication | Identify reference datasets for replication]]
#EGI AppDB extension to a dataset registry
#[[EGI_ELIXIR_Pilot_Workplan#EGI_AppDB_extension_to_a_dataset_registry | EGI AppDB extension to a dataset registry]]
#Tools for data replication
#[[EGI_ELIXIR_Pilot_Workplan#Tools_for_data_replication | Tools for data replication]]
#Analysis tools to work with data replicas
#[[EGI_ELIXIR_Pilot_Workplan#Analysis_tools_to_work_with_data_replicas | Analysis tools to work with data replicas]]
#Integration with ELIXIR Registry
#[[EGI_ELIXIR_Pilot_Workplan#Integration_with_ELIXIR_Registry | Integration with ELIXIR Registry]]

== Resources ==
== Resources ==

Revision as of 10:53, 28 November 2014

Main Members Workplan Meetings Actions

General Project Information

  • Project title : Integrating ELIXIR reference datasets within the European Grid Infrastructure
  • Proposers : Fotis E. Psomopoulos, Giacinto Donvito
  • Coordinator : Fotis E. Psomopoulos
  • Mailing list : elixir-pilot .at.
  • Start Date :
  • End date :


There has been significant work done in the EGI in the past to help the deployment and discovery of services, where “services” can be either computationally oriented (such as batch queues) or application oriented (such as web-services, ready-to-use applications embedded in portal gateways or encapsulated in Virtual Machine Images). However in bioinformatics many services used for analysis purposes rely on public reference datasets. Reference dataset are getting big and users struggle to discover, download and compute with them. There is an increasing demand to compute the data where the reference datasets are located. EGI members already host some biological reference datasets across the infrastructure, however currently EGI neither provides discovery capabilities for available datasets, nor provides guidelines for those who wish to use these datasets or would like to replicate additional datasets onto EGI sites. The project will facilitate the discovery of existing reference datasets in EGI and will develop and deploy services that allows the replication of life science reference datasets by data providers, resource providers and researchers, and the use of these datasets by life science researchers in analysis applications.


  1. Identify existing life science datasets in EGI
  2. Identify reference datasets for replication
  3. EGI AppDB extension to a dataset registry
  4. Tools for data replication
  5. Analysis tools to work with data replicas
  6. Integration with ELIXIR Registry


EGI and ELIXIR will share and contribute equality to cover the cost of this pilot. Contributions will initially be covered from already running projects (such as EGI-InSPIRE), but opportunities for additional funding will be explored during the work. The partners may organise a joint workshop during the project to help the project achieve certain goals.


The project will benefit ELIXIR by establishing:

  1. A set of tools and recommendations that would help ELIXIR members and partners
    • Achieve more balanced load on storage resources across their sites
    • Unload user analysis jobs from large centres to partner sites (with data replicas)
    • Perform data processing at national or home institute resources
  2. A pilot infrastructure that includes
    • Key datasets for life science analysis workflows
    • Information about applications and tools that researchers can choose from to work with reference datasets
    • A registry that provides information for users about the reference datasets and about the tools that are available to interact with these data
  3. A group of experts who can
    • Guide the setup of production infrastructures based on the pilot infrastructure
    • Themselves become providers in production systems.