Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "VT Scalable Access to Federated Data"

From EGIWiki
Jump to navigation Jump to search
Line 27: Line 27:
* collect use cases and respective requirements for an "active repository platform" offering the capability to easily deploy an active repository that combines large data storage with a set of computational services (high throughput computing and cloud compute IaaS) for accessing and viewing large volume datasets
* collect use cases and respective requirements for an "active repository platform" offering the capability to easily deploy an active repository that combines large data storage with a set of computational services (high throughput computing and cloud compute IaaS) for accessing and viewing large volume datasets
* collect use cases for accessing and depositing data with PID identifiers
* collect use cases for accessing and depositing data with PID identifiers
* implement a distributed infrastructure offering different test environments for the relevant use cases
* implement a distributed infrastructure offering different test environments for testing scalability of big data access in the EGI federated cloud/Grid infrastructure


== Tasks  ==
== Tasks  ==

Revision as of 16:20, 13 April 2015

Engagement overview Community requirements Community events Training EGI Webinars Documentations


General Project Information

(VT under construction)

Motivation

Different solutions for federated storage management for High Throughput Computing of data a la grid or on cloud are possible, but not yet widely available in EGI as validated platforms capable of meet the performance requirements of Research Infrastructures. The problem to be faced is processing and visualization of large datasets, where the volume of data makes transfer unfeasible, and requires the migration of computation to data.

For example, "large amounts of image stacks or volumetric data are produced daily at brain research sites around the world. This includes human brain imaging data in clinics, connectome data in research studies, whole brain imaging with light-sheet microscopy and tissue clearing methods or micro-optical sectioning techniques, two-photon imaging, array tomography, and electron beam microscopy." Similar requirements are emerging from other areas like structural biology and life sciences.

A key challenge in make such data available is to make it accessible without moving large amounts of data. Typical dataset sizes can reach in the terabyte range, while a researcher may want to only view or access a small subset of the entire dataset.


Objectives

  • collect use cases and respective requirements for an "active repository platform" offering the capability to easily deploy an active repository that combines large data storage with a set of computational services (high throughput computing and cloud compute IaaS) for accessing and viewing large volume datasets
  • collect use cases for accessing and depositing data with PID identifiers
  • implement a distributed infrastructure offering different test environments for testing scalability of big data access in the EGI federated cloud/Grid infrastructure

Tasks

TO BE PROVIDED List and describe the specific tasks with target dates for completion.

Outcomes/Deliverables

  • Deliverable 1 description and target date
  • Deliverable 2 description

Members

  • Infrastructure providers
    • GWDG/P. Kasprzak
    • INFN Bari: M. Antonacci, G. Donvito
  • Technology providers
    • OneData/L. Dutka, CYFRONET
  • Use cases
    • ELIXIR Data Replication use cases/F. Psomopoulos (AUTH) (to be completed)
    • Human Brain Project - Neuroscience/ (to be completed)

Resources

TO BE PROVIDED How is the Project Team to be resourced and how will members work? How much effort will be required from each individual, how will this effort impact contributing organisations? – do not confuse VT participants with stakeholders. VT participants do work for the project – others don’t have to do any work for the project. How will your funds be consumed?


Progress Reporting

TO BE PROVIDED How will you measure your progress and how will you report this? To whom? The objective here is to assure others that your project is on track, but if there are problems then this is the path to getting help and more resources (time, funding, effort).

The Project Leader will provide a short emailed progress report on a weekly basis. The report will be due by 17:00 on Fridays and is to contain details of:

  • Work achieved that week
  • Work planned for next week
  • Progress against the goals in the project plan
  • Issues that the virtual team leader needs help with (e.g. non-responsive partners, more resources, support from EGI.eu teams, etc.)