Difference between revisions of "EGI-Engage:Data Plan"
(→Rules) |
|||
Line 121: | Line 121: | ||
| Copies are kept in WORM tapes, and in a separate server (400 km away) of the company. Main repository uses RAID technology and has not lost any data in the last 10 years. The data are automatically synchronised across the servers. | | Copies are kept in WORM tapes, and in a separate server (400 km away) of the company. Main repository uses RAID technology and has not lost any data in the last 10 years. The data are automatically synchronised across the servers. | ||
|- | |- | ||
| SA2.8 | |||
| Ingemar Häggström (ingemar.haggstrom@eiscat.se) | |||
| <br> | | <br> | ||
| <br> | | <br> | ||
*'''Types of data: '''Development of value-added products (e.g. processes, combined data, plots). | |||
*'''Origin of data: '''EISCAT Incoherent Scatter radar low-level data. | |||
*'''Types of data: ''' | *'''Scale of data:''' A few TB/year will be produced within EGI-Engage. EISCAT data are of a larger order of magnitude. | ||
*''' | |||
*'''Scale of data:''' few | |||
| | | A mixture of standards are adopted depending on type. For long-term preservation, the format hdf5 will be used. | ||
| | | | ||
*'''Target groups:''' | *'''Target groups:''' Mainly space and environmental researchers. | ||
*'''Scientific Impact: ''' | *'''Scientific Impact: '''This research data can underpin scientific publications. | ||
*'''Approach to sharing:''' | *'''Approach to sharing:''' Current value-added products are open to all from day zero, but low-level data is not. Discussions on the new products are still on going. | ||
| | | Data are stored on a few e-Infrastructures, mirrored and synchronised. There are two levels of storage: a large short-term, and a reduced long-term. | ||
|- | |- | ||
| SA2.10 | |||
| Eric Yen (Eric.Yen@twgrid.org) | |||
| <br> | | <br> | ||
| <br> | | <br> | ||
*'''Types of data: '''There are two main types of data: | |||
**Observation data from tidal gauge, weather stations, rainfall, radar data, satellite data and images, bathymetry, historical records of earthquake and tsunami, etc. | |||
*'''Types of data: ''' | **Waveform at any target site, potential source of a historical tsunami event, changes of rainfall, wind field and path of typhoon or any special weather event, dispersion path of aerosol or volcano ashes, are the primary simulation results. | ||
* | |||
*'''Origin of data: '''Government agency of weather, earthquake, tsunami, and volcano; or research institutes that own the data needed by the CC. | |||
*'''Scale of data:''' Data scale of the whole collection and generated data would be few TB to 10s of TB. Variation is possible due to the resolution of the generated output. | |||
*'''Origin of data: ''' | |||
*'''Scale of data:''' | |||
| The data | | The ISO 19156 standard for Observation and Measurement data model was selected. For weather and climate data, the centre will also comply with the Climate and Forecast convention (CF) (e.g. NetCDF). Both of these specifications are included in the new metadata model called ADAGUC Data format standard. | ||
| | | | ||
*'''Target groups:''' | *'''Target groups:''' The data can potentially underpin scientific publications. Scientists of tsunami, earthquake, volcano, weather, and climate changes; scientists, policy makers of disaster mitigation strategy and studies. | ||
*'''Scientific Impact: ''' | *'''Scientific Impact: '''The data can support new discoveries such as the sources and characteristics of potential tsunami sources or new ways of hazards simulation and analysis. The data can also support new modelling schemes and the change processes of climate and disaster events. | ||
*'''Approach to sharing:''' | *'''Approach to sharing:''' Almost every government has strict regulation for announcement of weather and natural hazards, so the centre is focusing on research instead of releasing results to the public. Moreover, sharing is still up to the clearance of right for dissemination from the original agency. At least during the project years, the data collected or generated would be shared in a restricted way and for academic purposes only. | ||
| | | The data will be organised and managed in a repository over the distributed infrastructure. The CC plans to have no less than three copies of the data set at different sites. Academia Sinica (Taiwan) is in charge of the long-term data preservation. | ||
|} | |} | ||
[[Category:EGI-Engage]] | [[Category:EGI-Engage]] |
Revision as of 13:49, 9 September 2015
Help and support: quality@egi.eu
This page describe data management plan for the research data that will be generated within EGI-Engage. For each dataset, it describes the type of data and their origin, the related metadata standards, the approach to sharing and target groups, and the approach to archival and preservation.
Deliverable 2.4 Data management plan
This document will be further developed before the mid-term and final project reviews:
- February 2016
- February 2017
- August 2017
with more detailed information related to the discoverability, accessibility and exploitation of the data.
Rules
The obligations arising from the Grant Agreement of the projects are (see article 29.3):
Regarding the digital research data generated in the action (‘data’), the beneficiaries must:
- deposit in a research data repository and take measures to make it possible for third parties to access, mine, exploit, reproduce and disseminate — free of charge for any user — the following: the data, including associated metadata, needed to validate the results presented in scientific publications as soon as possible; other data, including associated metadata, as specified and within the deadlines laid down in the 'data management plan';
- provide information — via the repository — about tools and instruments at the disposal of the beneficiaries and necessary for validating the results (and — where possible — provide the tools and instruments themselves).
As an exception, the beneficiaries do not have to ensure open access to specific parts of their research data if the achievement of the action's main objective, as described in Annex 1, would be jeopardised by making those specific parts of the research data openly accessible. In this case, the data management plan must contain the reasons for not giving access.
Datasets
Task | Contact | Short description | Data description | Standards and metadata | Data sharing | Archiving and preservation |
---|---|---|---|---|---|---|
SA2.1/ SA2.2 | gergely.sipos@egi.eu | Feedback and requirements from existing and new EGI users are collected at training events and other types of face-to-face and electronic interactions. These data must be stored, managed, analysed and used efficiently because they represent high value for the EGI community to evolve its service portfolio. |
|
The data is not in any standard format |
|
Based on the nature of the data these can be:
|
SA2.3 | kimmo.mattila@csc.fi | No scientific data will be generated within the EGI ELIXIR competence centre, however ELIXIR, as an infrastructure, does manage life science data produced by life scientists |
|
Some standards like the standard formats in the marine or the plain domain are still under development. Some of the standards for capturing and exchanging genomic data that might be used in the use cases are described in BioSharing [R3]. Part of the data may be stored to public data repositories (e.g. ENA) that have clearly defines metadata models. |
|
Services for archiving and preservation within ELIXIR are listed in https://www.elixir-europe.org/services. |
SA2.5 | Alexandre Bonvin (a.m.j.j.bonvin@uu.nl) |
|
The end results are typically deposited into public databases like the PDB or EMDB for cryo-EM data. |
|
From a university perspective, data are to be kept for 10 years. Currently, there is no proper archiving mechanism in place at the particular site (Utrecht University). At the moment, policies and services rely on what is provided by the database service providers where data are deposited. | |
Sa2.6 | Davor Davidović (davor.davidovic@irb.hr) |
|
The community does not promote any specific metadata standard. The adopted metadata formats vary from case to case. Also, there is no recommendation about any long-term preservation format and thus no domain-specific data format is used or recommended. Thus, an individual approach for each use case is required. |
|
The implementation of the repositories, safe guarantee, number of copies, etc. is on individual data/repository providers. The plan is to implement several digital repositories for a specific DARIAH use cases (e.g. Bavarian dialects) using gLibrary framework that allows storing the data on different storage elements (local, grid and cloud storage elements). | |
SA2.7 | Jesus Marco de Lucas (marco@ifca.unican.es) |
|
Under investigation. |
|
Copies are kept in WORM tapes, and in a separate server (400 km away) of the company. Main repository uses RAID technology and has not lost any data in the last 10 years. The data are automatically synchronised across the servers. | |
SA2.8 | Ingemar Häggström (ingemar.haggstrom@eiscat.se) |
|
A mixture of standards are adopted depending on type. For long-term preservation, the format hdf5 will be used. |
|
Data are stored on a few e-Infrastructures, mirrored and synchronised. There are two levels of storage: a large short-term, and a reduced long-term. | |
SA2.10 | Eric Yen (Eric.Yen@twgrid.org) |
|
The ISO 19156 standard for Observation and Measurement data model was selected. For weather and climate data, the centre will also comply with the Climate and Forecast convention (CF) (e.g. NetCDF). Both of these specifications are included in the new metadata model called ADAGUC Data format standard. |
|
The data will be organised and managed in a repository over the distributed infrastructure. The CC plans to have no less than three copies of the data set at different sites. Academia Sinica (Taiwan) is in charge of the long-term data preservation. |