DCH-RP:PoC 2 Evaluate EUDAT services

From EGIWiki
Revision as of 14:36, 4 March 2014 by Jankowsk (talk | contribs) (Introduction)
Jump to: navigation, search
WP5: Proofs of Concept Scenarios PoC Phase 1 PoC Phase 2 DCH Glossary
Proofs of Concept 2 Experiment 1:
Evaluate SCAPE tools
Experiment 2:
Evaluate SCIDIP-ES services
Experiment 3:
Evaluate EUDAT services
Experiment 4:
Re-run Scenarios 1.1 & 1.4
Experiment 5:
(Long term) Data Preservation platform



Introduction

This experiment will look at the suitability of the EUDAT services for DCH.

The objective of this experiment is to verify usability of EUDAT services for DCH communities in the terms of the following requirements:

  • assuring data sustainability for long term
  • simple data access
  • data sharing

The EUDAT project provides two services that may be of interest to the DCH community:

  • B2SAFE - (previously Safe Replication) the central EUDAT service that offers functionality to replicate datasets across different data centres in a safe and efficient way while maintaining all information required to easily find and query information about the replica locations. The information about the replica locations and other important information is stored in PID records, each managed in separate administrative domains.The B2SAFE Service is implemented as an iRODS module providing a set of iRODS rules or policies to interface with the EPIC handle API and uses the iRODS middleware to replicate datasets from a source data (or community) centre to a destination data centre. See B2SAFE webpage for details.
  • B2SHARE - (previously Simple Store) is a customised version of Invenio (invenio-software.org) designed to offer a simple mechanism for uploading and sharing scientific data with associated metadata. It is intended for a large number of small files, like spread-sheet files with research data or analysis results, which may contain important data but do not easily fit in with regular data management. See B2SHARE webpage and B2SHARE User Documentation for details.

The B2SAFE service is interesting for DCH data from the point of view of assuring data sustainability and it will be used as a base layer for this experiment. However, B2SAFE offers quite low level interface which may occur impractical. The DCH institutions usually need easy access to their data and possibility to share the data between users and organizations. Thus, an additional layer for simplifying the access is needed. At the moment, three possible scenarios are considered, each of them using different tool or service as the upper layer of the software stack. They are shown on the below pictures:

B2SAFE workfow

Below we show B2SAFE workflow and describe steps necessary to replace a B2SAFE client (iRods based) by other client that uses different protocol.

B2SAFE workflow
  • While using iRODS client to put the data (iRODS protocol), the registration is done automatically by iRODS
  • While using any other client (e.g. NDS2, -SFTP protocol) we need to implement registration
    • We may make use of Linux inotify mechanism to do it

Metadata details:

  • Assumption: the domain specific metadata is stored in separate files
  • The metadata files are put on the storage with the data files
  • A module based on inotify (to be implemented) registers the files to iCAT
    • Step 1: only the general metadata is stored in iCAT
    • Step 2: some script is triggered and it extracts the domain specific metadata and stores it to iCAT –this will allow e.g. contextual data search
  • B2SAFE automatically assigns PID and performs replication of files (with data and metadata)