Difference between revisions of "VT Scalable Access to Federated Data"
Line 46: | Line 46: | ||
** France. IN2P3-IRES/J. Pansanel | ** France. IN2P3-IRES/J. Pansanel | ||
** Germany. GWDG/P. Kasprzak | ** Germany. GWDG/P. Kasprzak | ||
** Germany. DESY/C. Bernardt | |||
** Greece. GRNET/K. Koumantaros | ** Greece. GRNET/K. Koumantaros | ||
** Italy. INFN Bari: M. Antonacci, G. Donvito | ** Italy. INFN Bari: M. Antonacci, G. Donvito | ||
Line 55: | Line 56: | ||
** dynamic HTTP federation ("dynafed")/O. Keeble, F. Furano, CERN Data Management group | ** dynamic HTTP federation ("dynafed")/O. Keeble, F. Furano, CERN Data Management group | ||
** iRODS/J. Pansanel, CNRS | ** iRODS/J. Pansanel, CNRS | ||
** dCache/C. Bernardt | |||
* Use cases | * Use cases |
Revision as of 19:51, 17 April 2015
Engagement overview | Community requirements | Community events | Training | EGI Webinars | Documentations |
General Project Information
- Leader: Lukas Ludtka/CYFRONET
- Mailing List: vt-feddata at mailman.egi.eu
- Meetings: https://indico.egi.eu/indico/categoryDisplay.py?categId=164
- Status: Started
- Start Date: 01 April 2015
- Duration: 18 months
- Customer: NGIs/EIROs, user communities
(VT under construction)
Motivation
Different solutions for federated storage management for High Throughput Computing of data a la grid or on cloud are possible, but not yet widely available in EGI as validated platforms capable of meet the performance requirements of Research Infrastructures. The problem to be faced is processing and visualization of large datasets, where the volume of data makes transfer unfeasible, and requires the migration of computation to data.
For example, "large amounts of image stacks or volumetric data are produced daily at brain research sites around the world. This includes human brain imaging data in clinics, connectome data in research studies, whole brain imaging with light-sheet microscopy and tissue clearing methods or micro-optical sectioning techniques, two-photon imaging, array tomography, and electron beam microscopy." Similar requirements are emerging from other areas like structural biology and life sciences.
A key challenge in make such data available is to make it accessible without moving large amounts of data. Typical dataset sizes can reach in the terabyte range, while a researcher may want to only view or access a small subset of the entire dataset.
Objectives
- collect use cases and respective requirements for an "active repository platform" offering the capability to easily deploy an active repository that combines large data storage with a set of computational services (high throughput computing and cloud compute IaaS) for accessing and viewing large volume datasets
- collect use cases for accessing and depositing data with PID identifiers
- implement a distributed infrastructure offering different test environments for testing scalability of big data access in the EGI federated cloud/Grid infrastructure
Tasks
- Invite TCB, OMB, compentece centres and user communities to participate
- Identify infrastructure providers contributing resources to the testbed
- Define a list of relevant use cases for scalable big data access requiring co-location of compute and data
- Performance testing in different test scenarios
Outcomes/Deliverables
- May 2015. Distributed testbed, which can be incrementally developed with new technical solutions as needed by use cases
- Dec 2015. Report on use cases and performance results
Members
- Infrastructure providers
- France. IN2P3-IRES/J. Pansanel
- Germany. GWDG/P. Kasprzak
- Germany. DESY/C. Bernardt
- Greece. GRNET/K. Koumantaros
- Italy. INFN Bari: M. Antonacci, G. Donvito
- Poland. CYFRONET: L. Dutka
- ...
- Technology providers
- OneData/L. Dutka, CYFRONET
- dynamic HTTP federation ("dynafed")/O. Keeble, F. Furano, CERN Data Management group
- iRODS/J. Pansanel, CNRS
- dCache/C. Bernardt
- Use cases
- ELIXIR Data Replication use cases/F. Psomopoulos (AUTH)
- Human Brain Project - Neuroscience/S. Hill, J. Muller (EPFL) (use case under discussion)
Resources
- unfunded participation
- NGIs and user communities contributing to EGI-Engage competence centres