Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

VT Federated Open Research Data Meetings/2014-02-07

From EGIWiki
Revision as of 21:41, 20 May 2014 by Sergio (talk | contribs) (Created page with "{{Template:FederatedOpenResearchDataRepository_menubar}} __NOTOC__ == Meeting 7 February 2014 - 17 == Zenodo: supports publications + datasets ingestion, deposition, and descrip...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
VT Federated Open Research Data Repository Tasks Meetings & Actions Resources


Meeting 7 February 2014 - 17

Zenodo: supports publications + datasets ingestion, deposition, and description; more for the long tails as 2GB per dataset; it supports the DataCite schema invented to describe dataset at very high level (e.g., version, MD5, soft signature); it does not use rich metadata description typically vertical to user communities, e.g., geo-refential info; data are digital objects;

How Zenodo/Invenio manage association between objects (e.g. Linked Data)? Paolo: LinkedData is an approach not yet supported by Zenodo; DataCite offers some elements, e.g. supports the type of relationships, a persistent identifier; the OpenAIRE information space includes Zenodo and is working on making the links between data objects accessible through API (to be developed in the last part of the project),

Bruce: In CHAIN-REDS project, they have established MoU with biggest university in South-Africa and need to consult them how to publish data and link them to publications so to demonstrate that data are reused, generate more research and infrastructure should be supported Paolo suggested to look into research objects (Carole Goble, Univ. Manchester, MyExperiments.org); Research Objects include data and workflow people can re-execute the data; enables repetition (repeat on the data) and repeatability (repeat with my data); it uses Taverna; eventually to explore Zenodo/Invenio as simple store and also explore to do more with Taverna approach

Sergio asked about experience with CKAN; Paolo has no direct experience, he know that it is much simpler and less customisable than Invenio; one click and install; offers nice features for federations; EUDAT thinking to offer for aggregation

OpenAIRE expects data to be exposed via DataCite, OpenAIRE imports DataCite only if they have a link to OpenAIRE paper; out of almost 2M datasets discovered, only 700K have link to OpenAIRE publications

Iban asked about scalability of Invenio (e.g., dataset/metadata limitations); Paolo explained that the main limit is the resources available (note: Tim Smith explained that data are stored directly on the file system and invenio supports different backend, e.g. at CERN they use AFSl; he mentioned that in terms of number of objects being managed, Invenio is working on moving from 10M to 100M)

Bruce asked about how to link data to publications so to be able to demonstrate that people reuse data; Paolo said that there are organisations like gigascience who force people to deposit their data and cite/link them into publications; other communities are starting to cite data; OpenAIRE offers inference technique to discover what datasets are related to publications; OpenAIRE will offer capability for users to create link between datasets and publications through OpenAIRE by the end of the year; Zenodo generates a DOI for each deposited data; if you have a DOI for a dataset, you have already deposited it somewhere else so you do not need to deposit in Zenodo, just link to the publication; DataCite allows for several persistent identifiers to co-exist;

Jesus said that they are already working with Invenio for CMS in the area of long time preservation; they are planning to use it for LifeWatch and was wondering if the software is free; this was confirmed, Invenio is released as open source (http://invenio-software.org/)

Bruce said that in South-Africa they have few installations of Invenio, many of DSPACE; there is an heavy use of iRods used to manage several petabytes of data; CHAIN-REDS is working on reproducibility of experiments (GRNET guys are leading this task); there will be a workshop about discoverability of data

Next steps: to organise a call of the task force to discuss how to move forward with the evaluation


EGI supporting Open Research Data Pilot:

Tiziana clarified the scope of the EGI pilot as to explore opportunities for EGI to expand its service catalogue to cover the need for services from user communities

OpenAIRE has the scope of aggregating data repositories

Invenio developed for 10/12 years at CERN for CERN services, now well known in the INSPIRE repository. The code is GPL, everybody welcome to contribute.

Support: some institutes have skilled admin/engineers so they need no help; there are admin support list for self-help; often is the community of Invenio hosting providers providing support; in the past, CERN provided contract, became too much a burden so they launched a support company (CERN released from support, but tightly linked).

Invenio is the full stack; CDS is a customisation of the look-and-feel and workflow; there are e.g. specific modules

Zenodo is running on the latest software as it was not constrained by previous legacy production ()

Zenodo is Invenio run as a service;


Many different repositories are already available, subject or region base;

Invenio: http://invenio-software.org/


Demo of Invenio fresh install: http://invenio-demo.cern.ch/

Tim to send the list

Compliance with guidelines in OpenAIRE, invenio is one compliant, Invenio offers the harvesting interface: OAI-PMH

Federation: yes, it is possible to federate different Invenio installations, e.g. federated search, hosted collection,

Type of data: objects are stored in the file system, therefore it depends on the file system; CERN uses local file system, then moved to NFS, finally to AFS; Invenio has file system abstraction layer so can link to ; max file size depends on the data management capabilities; restriction on number of elements there are limits as these are stored in data base ( originally 1M, now 10M, future, 100M);

DSPACE, ePrints designed mainly for publications, but moving into data; number of records they support 10K, 100K; they are very popular as very easy to set up;

If data is structured as database, how can be hosted by Invenio? not possible, Invenio focuses on data stored as objects

Time for install: old invent, 30 steps, now it should be much lower, half-day/1-day; then follows a customisation; couple of days to be fully functional;

big software with many options, so customising can take months;

Zenodo is a catch-all; EUDAT is customising for a given set of communities; and where considering iRODS backend as long as you do not


http://ckan.org/2013/11/28/ckan4rdm-st-andrews/ https://lists.okfn.org/pipermail/ckan4rdm/2013-December/000037.html