Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

Difference between revisions of "DCH-RP:PoC 2 Evaluate EUDAT services"

From EGIWiki
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{DCH-RP:PoC_main_menu}} {{DCH-RP:PoC_PoC2_submenu}} {{TOC_right}}
{{DCH-RP:PoC_main_menu}} {{DCH-RP:PoC_PoC2_submenu}} {{TOC_right}}
== Introduction ==


This experiment will look at the suitability of the EUDAT services for DCH.
This experiment will look at the suitability of the EUDAT services for DCH.
The '''objective''' of this experiment is to verify usability of EUDAT services for DCH communities in the terms of the following requirements:
* assuring data sustainability for long term
* simple data access
* data sharing
The EUDAT project provides two services that may be of interest to the DCH community:
The EUDAT project provides two services that may be of interest to the DCH community:


* B2SAFE - offers functionality to replicate datasets across different data centres in a safe and efficient way while maintaining all information required to easily find and query information about the replica locations. The information about the replica locations and other important information is stored in PID records, each managed in separate administrative domains.The B2SAFE Service is implemented as an iRODS module providing a set of iRODS rules or policies to interface with the '''EPIC handle API''' and uses the '''iRODS middleware''' to replicate datasets from a source data (or community) centre to a destination data centre. See [http://www.eudat.eu/b2safe B2SAFE webpage] for details.
* '''B2SAFE''' - (previously ''Safe Replication'') is the central EUDAT service that offers functionality to replicate datasets across different data centres in a safe and efficient way while maintaining all information required to easily find and query information about the replica locations. The information about the replica locations and other important information is stored in PID records, each managed in separate administrative domains.The B2SAFE Service is implemented as an iRODS module providing a set of iRODS rules or policies to interface with the '''EPIC handle API''' and uses the '''iRODS middleware''' to replicate datasets from a source data (or community) centre to a destination data centre. See [http://www.eudat.eu/b2safe B2SAFE webpage] for details.
 
* '''B2SHARE''' - (previously ''Simple Store'') is a customised version of Invenio ([http://invenio-software.org/ invenio-software.org]) designed to offer a simple mechanism for uploading and sharing scientific data with associated metadata. It is intended for a large number of small files, like spread-sheet files with research data or analysis results, which may contain important data but do not easily fit in with regular data management. See [http://www.eudat.eu/b2share B2SHARE webpage] and [https://github.com/B2SHARE/b2share/wiki/User-Documentation B2SHARE User Documentation] for details.
 
The B2SAFE service is interesting for DCH data from the point of view of assuring data sustainability and it will be used as a base layer for this experiment. However, B2SAFE offers quite low level interface which may occur impractical. The DCH institutions usually need easy access to their data and possibility to share the data between users and organizations. Thus, an additional layer for simplifying the access is needed. At the moment, three possible scenarios are considered, each of them using different tool or service as the upper layer of the software stack. They are shown on the below pictures:
 
<gallery mode="traditional" widths=320px heights=400px>
Image:b2share-b2safe.png|B2SHARE and B2SAFE scenario
Image:nds2-b2safe.png|NDS2 GUI client and B2SAFE scenario
Image:dlibra-b2safe.png|Community client and B2SAFE scenario
</gallery>


* B2SHARE - (previously SimpleStore) is a customised version of Invenio ([http://invenio-software.org/ invenio-software.org]) designed to offer a simple mechanism for uploading and sharing scientific data with associated metadata. It is intended for a large number of small files, like spread-sheet files with research data or analysis results, which may contain important data but do not easily fit in with regular data management. See [http://www.eudat.eu/b2share B2SHARE webpage] and [https://github.com/B2SHARE/b2share/wiki/User-Documentation B2SHARE User Documentation] for details.
== B2SAFE workfow ==


[[File:b2share-b2safe.png]]
Below we show B2SAFE workflow and describe steps necessary to replace a B2SAFE client (iRods based) by other client that uses different protocol.
[[NDS2 GUI client and B2SAFE scenario:nds2-b2safe.png]]
[[Community client and B2SAFE scenario:dlibra-b2safe.png]]


{|
|[[File:b2safe.png|thumb|320px|B2SAFE workflow]]
|
* While using iRODS client to put the data (iRODS protocol), the registration is done automatically by iRODS
* While using any other client (e.g. NDS2, -SFTP protocol) we need to implement registration
** We may make use of Linux inotify mechanism to do it
|}


Metadata details:
* Assumption: the domain specific metadata is stored in separate files
* The metadata files are put on the storage  with the data files
* A module based on inotify (to be implemented) registers the files to iCAT
** Step 1: only the general metadata is stored in iCAT
** Step 2: some script is triggered and it extracts the domain specific metadata and stores it to iCAT –this will allow e.g. contextual data search
* B2SAFE automatically assigns PID and performs replication of files (with data and metadata)


[[Category:Proofs Of Concept 2]]
[[Category:Proofs Of Concept 2]]

Latest revision as of 14:36, 4 March 2014

WP5: Proofs of Concept Scenarios PoC Phase 1 PoC Phase 2 DCH Glossary
Proofs of Concept 2 Experiment 1:
Evaluate SCAPE tools
Experiment 2:
Evaluate SCIDIP-ES services
Experiment 3:
Evaluate EUDAT services
Experiment 4:
Re-run Scenarios 1.1 & 1.4
Experiment 5:
(Long term) Data Preservation platform



Introduction

This experiment will look at the suitability of the EUDAT services for DCH.

The objective of this experiment is to verify usability of EUDAT services for DCH communities in the terms of the following requirements:

  • assuring data sustainability for long term
  • simple data access
  • data sharing

The EUDAT project provides two services that may be of interest to the DCH community:

  • B2SAFE - (previously Safe Replication) is the central EUDAT service that offers functionality to replicate datasets across different data centres in a safe and efficient way while maintaining all information required to easily find and query information about the replica locations. The information about the replica locations and other important information is stored in PID records, each managed in separate administrative domains.The B2SAFE Service is implemented as an iRODS module providing a set of iRODS rules or policies to interface with the EPIC handle API and uses the iRODS middleware to replicate datasets from a source data (or community) centre to a destination data centre. See B2SAFE webpage for details.
  • B2SHARE - (previously Simple Store) is a customised version of Invenio (invenio-software.org) designed to offer a simple mechanism for uploading and sharing scientific data with associated metadata. It is intended for a large number of small files, like spread-sheet files with research data or analysis results, which may contain important data but do not easily fit in with regular data management. See B2SHARE webpage and B2SHARE User Documentation for details.

The B2SAFE service is interesting for DCH data from the point of view of assuring data sustainability and it will be used as a base layer for this experiment. However, B2SAFE offers quite low level interface which may occur impractical. The DCH institutions usually need easy access to their data and possibility to share the data between users and organizations. Thus, an additional layer for simplifying the access is needed. At the moment, three possible scenarios are considered, each of them using different tool or service as the upper layer of the software stack. They are shown on the below pictures:

B2SAFE workfow

Below we show B2SAFE workflow and describe steps necessary to replace a B2SAFE client (iRods based) by other client that uses different protocol.

B2SAFE workflow
  • While using iRODS client to put the data (iRODS protocol), the registration is done automatically by iRODS
  • While using any other client (e.g. NDS2, -SFTP protocol) we need to implement registration
    • We may make use of Linux inotify mechanism to do it

Metadata details:

  • Assumption: the domain specific metadata is stored in separate files
  • The metadata files are put on the storage with the data files
  • A module based on inotify (to be implemented) registers the files to iCAT
    • Step 1: only the general metadata is stored in iCAT
    • Step 2: some script is triggered and it extracts the domain specific metadata and stores it to iCAT –this will allow e.g. contextual data search
  • B2SAFE automatically assigns PID and performs replication of files (with data and metadata)