Integrating Reference Datasets Workplan

From EGIWiki
Jump to: navigation, search
Engagement overview Community requirements Community events Training EGI Webinars Documentations


EGI Virtual teams: Main Active Projects Closed Projects Guidelines


VT Integrating Reference Datasets: Main Members Workplan Meetings Actions


Contents

Introduction

This wiki describes the workplan devised by an EGI / ELIXIR task force and mentions both the tasks and their outcomes. The comprehensive workplan is available here.

Tasks

Overview

ID Task Leader Outcome Lenght
T1 Identify existing life science datasets in EGI Gergely Sipos Milestone 1 2 months
T2 Identify reference datasets for replication Fotis E. Psomopoulos Milestone 2 3 months
T3 EGI AppDB extension to a dataset registry William Karageorgos Deliverable 1 7 months
T4 Tools for data replication Giacinto Donvito Deliverable 2,

Deliverable 3,

Milestone 3

6 months
T5 Analysis tools to work with data replicas Afonso Duarte Deliverable 4 3 months
T6 Integration with ELIXIR Registry Marios Chatziangelou Deliverable 5 2 months

Description

The project between EGI and ELIXIR consists of the following tasks:

Task 1: Identify existing life science datasets in EGI

Identify existing biological reference dataset replicas within the EGI infrastructure together with their key characteristics that make them usable for analysis (such as dataset version, source, access mode, related analysis tools, size, update frequency, tools used for replication, etc.). The task will survey resource providers and life science users of EGI and will look for datasets in the EGI information system and/or other EGI registries. The expected output of the task is an informative table about the datasets that are available on EGI and their key characteristics for users and resource providers → Milestone 1


Task 2: Identify reference datasets for replication

Identify key biological reference datasets from life sciences that would benefit from replication to EGI sites for example to increase their availability or scalability of access. The task will identify, engage with and survey life science data providers and data users including developers of the ELIXIR tools registry. The expected output of this task is an informative table about life science reference datasets that should be made available on EGI, together with their key characteristics for resource providers and users to replicate them and to use them (ie. metadata describing for example the size, update frequency, preferred access mode, related tools, etc.) → Milestone 2


Task 3: EGI AppDB extension to a dataset registry

Extend the EGI Applications Database (AppDB) with new capabilities to expose information about biological reference datasets and their replicas across EGI. Key characteristics of these datasets should be made available by AppDB in the form of metadata for life science users. The initial dataset metadata schema should consist of basic attributes such as name, locations, size, and type; when input from tasks 1 & 2 becomes available, the schema should be revisited in order to identify any additional characteristics that may need to be included. A new access group should also be created, in order to allow particular individuals to input the actual initial metadata, once tasks 1 & 2 are complete. → Deliverable 1


Task 4: Tools for data replication

Identify and propose suitable software tools, software configurations, operational practices and documentations to those who want to replicate key biological reference datasets to the EGI infrastructure. The tools can be relevant for resource providers to replicate complete datasets for groups of users, and can be relevant for life science researchers to replicate parts of reference datasets for custom analysis. The task will also setup a distributed testbed where the proposed tools and configurations can be tested and validated with real reference datasets and applications by life science communities. The expected outputs of the task are:

  1. Recommended services to replicate reference biological datasets to EGI (software, software configurations, operational practices, documentation) → Deliverable 2.
  2. A distributed testbed where the recommended service portfolio for replication is deployed and where reference life science datasets are replicated → Deliverable 3
  3. An evaluation of the recommended services on the testbed by resource providers and by life science users. (e.g. online survey or face to face workshop) → Milestone 3


Task 5: Analysis tools to work with data replicas

Identify and provide guidance for the use of key life science software applications and tools that can be used to work with reference datasets on EGI. These tools can be used by life science researchers to define and execute custom analysis that work on reference datasets hosted on EGI. The task will review the identified tools on the distributed testbed of Task 4, and will provide information for the users about these tools at a central location, ideally as software profiles in EGI AppDB. → Deliverable 4


Task 6: Integration with ELIXIR Registry

Collaboration work between the developers of the EGI AppDB and the ELIXIR service registry to federate information about ‘biological reference datasets’ from AppDB to the ELIXIR registry. The task will make content from the EGI AppDB visible for the broader life sciences community. Output of this task is technical integration between the ELIXIR Registry and the EGI AppDB, so content about reference datasets hosted on EGI can be federated from the EGI AppDB into the ELIXIR registry. → Deliverable 5.


Milestones / Deliverables

Overview

Description

Milestone 1

Task 1 will provide an informative table about the datasets that are available on EGI together with their key characteristics for users and resource providers (ie. metadata describing the datasets such as size, update frequency, preferred access mode, etc.). This milestone will be used by Task 3 and Task 6 to implement metadata structures in AppDB and the ELIXIR registry to provide useful information about datasets. The milestone will be used also by Task 5 to identify analysis tools that can work with the existing reference dataset replicas.


Milestone 2

Task 2 will provide an informative table about life science reference datasets that should be made available on EGI, together with their key characteristics for resource providers and users to replicate them and to use them (ie. metadata describing for example the size, update frequency, preferred access mode, related tools, etc.). This milestone will be used by Task 3 and Task 6 as to implement the data structure that should be used by AppDB and the ELIXIR registry to provide information about datasets, and to populate these registries with content.


Milestone 3

Task 4 will provide an evaluation of the recommended services for dataset replication (D2). The evaluation will be performed by resource providers and life science users on the distributed testbed (D3) in the most suitable way, e.g. online survey, face-to-face workshop.


Deliverable 1

Task 3 will deliver an extend version of the EGI Applications Database to expose information about biological reference datasets and their replicas across EGI.


Deliverable 2

Task 4 will deliver recommended services to those who want to replicate key biological reference datasets to the EGI infrastructure. The tools can be relevant for resource providers to replicate complete datasets for groups of users, and can be relevant for life science researchers to replicate parts of reference datasets for custom analysis.The services are expected be software, software configurations, operational practices, documentation.


Deliverable 3

Task 4 will deliver a distributed testbed where the recommended services (D2) are deployed and where reference life science datasets are replicated.


Deliverable 4

Task 5 will provide information about key life science software applications and tools that can be used by life science researchers to define and execute custom analysis that work on reference datasets hosted on EGI. The information will be published at some central location, ideally as software profiles in EGI AppDB.


Deliverable 5

Task 6 will deliver technical integration between the ELIXIR Registry and the EGI AppDB, so content about reference datasets hosted on EGI can be federated from the EGI AppDB into the ELIXIR registry.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox
Print/export