Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

EGI-Engage:Data Plan

From EGIWiki
Revision as of 12:46, 9 September 2015 by Krakow (talk | contribs) (→‎Rules)
Jump to navigation Jump to search
EGI-Engage project: Main page WP1(NA1) WP3(JRA1) WP5(SA1) PMB Deliverables and Milestones Quality Plan Risk Plan Data Plan
Roles and
responsibilities
WP2(NA2) WP4(JRA2) WP6(SA2) AMB Software and services Metrics Project Office Procedures



Help and support: quality@egi.eu

This page describe data management plan for the research data that will be generated within EGI-Engage. For each dataset, it describes the type of data and their origin, the related metadata standards, the approach to sharing and target groups, and the approach to archival and preservation.

Deliverable 2.4 Data management plan

This document will be further developed before the mid-term and final project reviews:

  • February 2016
  • February 2017
  • August 2017

with more detailed information related to the discoverability, accessibility and exploitation of the data.

Rules

The obligations arising from the Grant Agreement of the projects are (see article 29.3):

Regarding the digital research data generated in the action (‘data’), the beneficiaries must:

  1. deposit in a research data repository and take measures to make it possible for third parties to access, mine, exploit, reproduce and disseminate — free of charge for any user — the following: the data, including associated metadata, needed to validate the results presented in scientific publications as soon as possible; other data, including associated metadata, as specified and within the deadlines laid down in the 'data management plan';
  2. provide information — via the repository — about tools and instruments at the disposal of the beneficiaries and necessary for validating the results (and — where possible — provide the tools and instruments themselves).

As an exception, the beneficiaries do not have to ensure open access to specific parts of their research data if the achievement of the action's main objective, as described in Annex 1, would be jeopardised by making those specific parts of the research data openly accessible. In this case, the data management plan must contain the reasons for not giving access.

Datasets

Task Contact Short description Data description Standards and metadata Data sharing Archiving and preservation
SA2.1/ SA2.2 gergely.sipos@egi.eu Feedback and requirements from existing and new EGI users are collected at training events and other types of face-to-face and electronic interactions. These data must be stored, managed, analysed and used efficiently because they represent high value for the EGI community to evolve its service portfolio.
  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)

Based on the nature of the data these can be:

SA2.3 kimmo.mattila@csc.fi No scientific data will be generated within the EGI ELIXIR competence centre, however ELIXIR, as an infrastructure, does manage life science data produced by life scientists
  • Types of data: life science data; the management of genomics data: Marine metagenomics, Plant genomics and phenotype and Human sensitive data
  • Origin of data: produced and submitted by scientists. ELIXIR repositories collect, integrate and provide access to the data.
  • Scale of data: The biggest data collections in life sciences are in the order of petabytes (PB), however, it is likely that the ELIXIR CC will work with smaller data sets. A single whole human genome raw data is roughly 200 GB.
Some standards like the standard formats in the marine or the plain domain are still under development. Some of the standards for capturing and exchanging genomic data that might be used in the use cases are described in BioSharing [R3]. Part of the data may be stored to public data repositories (e.g. ENA) that have clearly defines metadata models.
  • Target groups: researchers interested to submit or use Metagenomics, Plant and Human data.
  • Scientific Impact: scientific discoveries such as comparative environmental metagenomic analyses or finding genes related to a disease
  • Approach to sharing: ELIXIR promotes open data access, but naturally human data might be sensitive therefore requires authorised access.
Services for archiving and preservation within ELIXIR are listed in https://www.elixir-europe.org/services.
SA2.5 Alexandre Bonvin (a.m.j.j.bonvin@uu.nl)

  • Types of data: There is research data involved in the activity, but this is not produced with EGI-Engage resources, but from other EU projects. The types of data produced by those other projects are experimental NMR, Xray, SAXS and cryo-EM data.
  • Origin of data: Biological samples (owned by the end users of the facilities).
  • Scale of data:
The end results are typically deposited into public databases like the PDB or EMDB for cryo-EM data.
  • Target groups: The raw data are usually so complex that they are only of use to expert users in structural biology that have been trained in a specific technique. The processed and derived data typically deposited in public databases are of use to researchers in life sciences in general and for biotech and pharmaceutical companies.
  • Scientific Impact: This research data can underpin scientific publications.
  • Approach to sharing: Data are shared via databases (e.g. again PDB, EMDB), with possibly an embargo period until publication. Other datasets (e.g. the results of computations) can be shared via EUDAT or other repositories like SBGRID for structural biology. For such an example see: https://data.sbgrid.org/dataset/131/
From a university perspective, data are to be kept for 10 years. Currently, there is no proper archiving mechanism in place at the particular site (Utrecht University). At the moment, policies and services rely on what is provided by the database service providers where data are deposited.
Sa2.6 Davor Davidović (davor.davidovic@irb.hr)
  • Types of data: the centre will generate/collect data that come from the research activities in the fields of Arts and Humanities. Common types of research data generated and collected in A&H are books, letters, emails, paintings, photographs, manuscripts, various digital collections, audio/video materials, etc. However, in the research activities related to EGI-Engage, the focus is on digitised data, i.e. the information/data stored in different digital formats, such as plain files (text, photo, audio and video), metadata, collections, and annotations.
  • Origin of data: The digitised data used in these research activities originates from the physical objects/artefacts used in the research activities connected to A&H, for example, books, audio and video materials, paintings, archaeological artefacts, etc. that can be found in museums, libraries, etc. However, the focus is on existing digitised collections of these physical artefacts that are generated, operated and managed by the members of the DARIAH community. Thus, the main sources of data for this related research are those digitised data provided by various DARIAH members. Some of DARIAH members already operate their own digital repositories, which will be used as a data source.
  • Scale of data: It is hard to estimate the scale of the research data because of a large number of different sources and the amount of information that is stored.
The community does not promote any specific metadata standard. The adopted metadata formats vary from case to case. Also, there is no recommendation about any long-term preservation format and thus no domain-specific data format is used or recommended. Thus, an individual approach for each use case is required.
  • Target groups: The data collected within the DARIAH Competence Centre will be useful primarily to the members of the DARIAH community. In addition, it is believed that the wider audience having strong interests in exploiting A&H data can benefit in using these data.
  • Scientific Impact: This research data can underpin scientific publications.
  • Approach to sharing: For now, no further information on how data will be shared and accessed is known. , the majority of data are stored and shared via various data repositories that can be widely accessed. The repositories are mostly institutional (i.e. DARIAH member institutions such as computing/storage centres or research organisations).
The implementation of the repositories, safe guarantee, number of copies, etc. is on individual data/repository providers. The plan is to implement several digital repositories for a specific DARIAH use cases (e.g. Bavarian dialects) using gLibrary framework that allows storing the data on different storage elements (local, grid and cloud storage elements).
SA2.7 Jesus Marco de Lucas (marco@ifca.unican.es)

  • Types of data: The LifeWatch competence centre will generate/collect mainly test datasets as part of larger datasets, to analyse the LifeWatch-EGI Competence Centre framework. For example, one month of data collected at a water reservoir, or six different simulation outcomes related to it.
  • Origin of data: Instruments in the water reservoir.
  • Scale of data: GB of data in a database that can be exported in the CSV file format.
Under investigation.
  • Target groups: The data can be interesting for other research teams that make similar analysis at other water reservoirs.
  • Scientific Impact: The data can potentially underpin scientific publications.
  • Approach to sharing: The embargo period is usually two years as the data is exploited by an SME. The datasets released are those who are connected to scientific publications or referenced in public management reports. The repository is located at the IFCA data centre and freely accessible via web [R5], but registration is needed.
Copies are kept in WORM tapes, and in a separate server (400 km away) of the company. Main repository uses RAID technology and has not lost any data in the last 10 years. The data are automatically synchronised across the servers.




  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)





  • Types of data: survey data - textual data, structured data (typically CSV or XLS) or graphics (usually survey summary or analysis)
  • Origin of data: collected from existing and potential users of EGI
  • Scale of data: few MB / year
The data is not in any standard format
  • Target groups: technology provider and service developer and provider teams who contribute to the EGI service portfolio
  • Scientific Impact: used for the further-development of IT services offered by the EGI Community. These services are often result of technological R&D and subject of publications in conference proceedings and peer-review journals
  • Approach to sharing: A public version of the collected requirements is going to be shared in the EGI-Engage milestones and deliverables. The most important documents in this respect will be: M6.5 Joint training program for the second period (M15, May 2016), Intermediate and annual project reports (every 6 months)