Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @ egi.eu.

AgINFRA

From EGIWiki
Revision as of 17:22, 30 July 2015 by Ychen (talk | contribs)
Jump to navigation Jump to search
Engagement overview Community requirements Community events Training EGI Webinars Documentations


Community Information

Community Name

EGI Federated Cloud services for the agri-food research

Community Short Name

agINFRA

Community Website

http://www.aginfra.eu

Community Description

The agINFRA project, supported by the Agriculture Information Management Standards of the Food and Agriculture Organization of the United Nations (AIMS FAO) and the CIARD global initiative, introduces a set of recommendations applying to agri-food research community for data management, sharing and dissemination. Additionally, these recommendations aim to provide a framework for the research community of European agri-food research institutions that need to follow the H2020 Open Access mandate and share their metadata with their thematic aggregator in order to publish them in OpenAire. (from www.aginfra.eu)

Community Objectives

agINFRA aims to function as the thematic aggregator of the agri-food research domain and act as the main research community for OpenAire.

Main Contact Institutions

Agro-Know, FAO

Main Contact

Effie Tsiflidou, effie@agroknow.gr

Prior requirement capture activities

agINFRA D2.2 Revised stakeholders needs deliverable http://www.aginfra.eu/project/images/DELIVERABLES/aginfra_d2.2_revised-review-of-stakeholder-needs_final_20131025.pdf agINFRA D5.5 Report on agricultural data sources/repositories integration

Science Viewpoint

Scientific Challenges

  • High volume storage
  • Impossible to use centralized storage
  • Large, live, constantly updated data streams
  • Handling of heterogeneous data

Objectives

  • Raw data resources with agricultural data must be publicly available, using a unified search and discovery platform
  • Making such resources more broadly discoverable by humans and machines by registering them in shared public directories and providing all the technical information that allows applications to process those data
  • Reach out to entrepreneurs who can put their data to work in new services
  • Invite commercial entities into the conversation around the future of data

KPI inputs

Access Increased access and usage of e-Infrastructures by scientific communities, simplifying the “embracing” of e-Science. Number of users of the web portals: 10000 monthly; Number of sites provide the services: 20
Visibility Visibility of the project among scientists, technology providers and resource managers at high level. Number of portal cloud installations/usage: 4

User Stories

Use cases taken from agINFRA public deliverable D1.3.3 agINFRA Scientific Vision: Part A

  • Data provider who needs to host and store a small scale CMS

In this case, data provider requests from the system to set up his own CMS instance in order to cover the needs for a small scale CMS E.g. Open Educational Resources (http://www.oercommons.org/), which provides access to hundreds of course-related materials and collections in several themes

  • Data provider, who needs to host and store a large scale hosting & replication CMS

In this case, data provider requests from the system to allocate space or to set up accounts in a large scale CMS E.g. Consiglio per la Ricerca e la Sperimentazione in Agricoltura - CRA (http://sito.entecra.it/portale/index2.php), which includes thousands of data sources in several research fields in agriculture and related domains

  • Data provider, who needs to host CMS at own or external / commercial infrastructure In this case, content provider is interested to expose (meta)data to e-infrastructure, E.g. Turkish Agricultural Learning Objects Repository - TrAgLOR (http://traglor.cu.edu.tr/), which serves as an organized collections of learning objects, stored on servers and delivered through networks.

Information Viewpoint

Data

Data Object types

Germplasm data

Data size

~ 10KB

Data collection size

~ 1PB

Data format

XML

Standards in use

MCPD (for Germplasm data)

Data management plan

  • agINFRA collects data free of access to make them publicly available
  • agINFRA should ensure long-term preservation

Privacy policy

  • publicly available, free of access

Metadata

Metadata object types

  • AGRIS Bibliographic information: metadata for publications (scientific articles, thesis, dissertations, journals)
  • GLN metadata for educational resources.
  • VocBench instances
  • VEST Registry
  • CIARD RING

Metadata Identifiers

ARN

Metadata Size

~10KB

Metadata format

RDF, OWL, XML

Standards in use

RDF, OWL, SKOS, OAI-PMH

Metadata generation

Custom java code based on xml transformations

Other aspects

Triple store with RDF files in order to preserve linked open data

Data Lifecycle

  • Data acquisition level (including manual sent raw XML files or harvesting via protocols like OAI-PMH)
  • Metadata records evaluation and mappings
  • Data transformation
  • Data identification – deduplication
  • Data triplification (XML to RDF)
  • Upload RDFs to allegro-graph triple store
  • Data indexing
  • Data publishing to AGRIS portal and also provide an FTP with XML records and RDFs
    • Data curation