Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

VT Scalable Access to Federated Data

From EGIWiki
Jump to navigation Jump to search
Engagement overview Community requirements Community events Training EGI Webinars Documentations


General Project Information

(VT under construction)

Motivation

Different solutions for federated storage management for High Throughput Computing of data a la grid or on cloud are possible, but not yet widely available in EGI as validated platforms capable of meet the performance requirements of Research Infrastructures. The problem to be faced is processing and visualization of large datasets, where the volume of data makes transfer unfeasible, and requires the migration of computation to data.

For example, "large amounts of image stacks or volumetric data are produced daily at brain research sites around the world. This includes human brain imaging data in clinics, connectome data in research studies, whole brain imaging with light-sheet microscopy and tissue clearing methods or micro-optical sectioning techniques, two-photon imaging, array tomography, and electron beam microscopy." Similar requirements are emerging from other areas like structural biology and life sciences.

A key challenge in make such data available is to make it accessible without moving large amounts of data. Typical dataset sizes can reach in the terabyte range, while a researcher may want to only view or access a small subset of the entire dataset.


Objectives

  • collect use cases and respective requirements for an "active repository platform" offering the capability to easily deploy an active repository that combines large data storage with a set of computational services (high throughput computing and cloud compute IaaS) for accessing and viewing large volume datasets
  • collect use cases for accessing and depositing data with PID identifiers
  • implement a distributed infrastructure offering different test environments for testing scalability of big data access in the EGI federated cloud/Grid infrastructure

Tasks

  • Invite TCB, OMB, compentece centres and user communities to participate
  • Identify infrastructure providers contributing resources to the testbed
  • Define a list of relevant use cases for scalable big data access requiring co-location of compute and data
  • Performance testing in different test scenarios

Outcomes/Deliverables

  • May 2015. Distributed testbed, which can be incrementally developed with new technical solutions as needed by use cases
  • Dec 2015. Report on use cases and performance results

Members

  • Infrastructure providers
    • Germany. GWDG/P. Kasprzak
    • Italy. INFN Bari: M. Antonacci, G. Donvito
    • Poland. CYFRONET: L. Dutka
    • ...
  • Technology providers
    • OneData/L. Dutka, CYFRONET
    • ...
  • Use cases
    • ELIXIR Data Replication use cases/F. Psomopoulos (AUTH) (to be completed)
    • Human Brain Project - Neuroscience/ (to be completed)

Resources

  • unfunded participation
  • NGIs and user communities contributing to EGI-Engage competence centres