Difference between revisions of "AgINFRA"

Revision as of 17:41, 30 July 2015

Engagement overview

Community requirements

Community events

Training

EGI Webinars

Documentations

Community Information

Community Name

EGI Federated Cloud services for the agri-food research

Community Short Name

agINFRA

Community Website

http://www.aginfra.eu

Community Description

The agINFRA project, supported by the Agriculture Information Management Standards of the Food and Agriculture Organization of the United Nations (AIMS FAO) and the CIARD global initiative, introduces a set of recommendations applying to agri-food research community for data management, sharing and dissemination. Additionally, these recommendations aim to provide a framework for the research community of European agri-food research institutions that need to follow the H2020 Open Access mandate and share their metadata with their thematic aggregator in order to publish them in OpenAire. (from www.aginfra.eu)

Community Objectives

agINFRA aims to function as the thematic aggregator of the agri-food research domain and act as the main research community for OpenAire.

Main Contact Institutions

Agro-Know, FAO

Main Contact

Effie Tsiflidou, effie@agroknow.gr

Prior requirement capture activities

agINFRA D2.2 Revised stakeholders needs deliverable http://www.aginfra.eu/project/images/DELIVERABLES/aginfra_d2.2_revised-review-of-stakeholder-needs_final_20131025.pdf agINFRA D5.5 Report on agricultural data sources/repositories integration

Science Viewpoint

Scientific Challenges

High volume storage
Impossible to use centralized storage
Large, live, constantly updated data streams
Handling of heterogeneous data

Objectives

Raw data resources with agricultural data must be publicly available, using a unified search and discovery platform
Making such resources more broadly discoverable by humans and machines by registering them in shared public directories and providing all the technical information that allows applications to process those data
Reach out to entrepreneurs who can put their data to work in new services
Invite commercial entities into the conversation around the future of data

KPI inputs

Access	Increased access and usage of e-Infrastructures by scientific communities, simplifying the “embracing” of e-Science.	Number of users of the web portals: 10000 monthly; Number of sites provide the services: 20
Visibility	Visibility of the project among scientists, technology providers and resource managers at high level.	Number of portal cloud installations/usage: 4

User Stories

Use cases taken from agINFRA public deliverable D1.3.3 agINFRA Scientific Vision: Part A

Data provider who needs to host and store a small scale CMS

In this case, data provider requests from the system to set up his own CMS instance in order to cover the needs for a small scale CMS E.g. Open Educational Resources (http://www.oercommons.org/), which provides access to hundreds of course-related materials and collections in several themes

Data provider, who needs to host and store a large scale hosting & replication CMS

In this case, data provider requests from the system to allocate space or to set up accounts in a large scale CMS E.g. Consiglio per la Ricerca e la Sperimentazione in Agricoltura - CRA (http://sito.entecra.it/portale/index2.php), which includes thousands of data sources in several research fields in agriculture and related domains

Data provider, who needs to host CMS at own or external / commercial infrastructure In this case, content provider is interested to expose (meta)data to e-infrastructure, E.g. Turkish Agricultural Learning Objects Repository - TrAgLOR (http://traglor.cu.edu.tr/), which serves as an organized collections of learning objects, stored on servers and delivered through networks.

Information Viewpoint

Data

Data Object types

Germplasm data

Data size

~ 10KB

Data collection size

~ 1PB

Data format

XML

Standards in use

MCPD (for Germplasm data)

Data management plan

agINFRA collects data free of access to make them publicly available
agINFRA should ensure long-term preservation

Privacy policy

publicly available, free of access

Metadata

Metadata object types

AGRIS Bibliographic information: metadata for publications (scientific articles, thesis, dissertations, journals)
GLN metadata for educational resources.
VocBench instances
VEST Registry
CIARD RING

Metadata Identifiers

ARN

Metadata Size

~10KB

Metadata format

RDF, OWL, XML

Standards in use

RDF, OWL, SKOS, OAI-PMH

Metadata generation

Custom java code based on xml transformations

Other aspects

Triple store with RDF files in order to preserve linked open data

Data Lifecycle

Data acquisition level (including manual sent raw XML files or harvesting via protocols like OAI-PMH)
Metadata records evaluation and mappings
Data transformation
Data identification – deduplication
Data triplification (XML to RDF)
Upload RDFs to allegro-graph triple store
Data indexing
Data publishing to AGRIS portal and also provide an FTP with XML records and RDFs
- Data curation

Technology Viewpoint

System Architecture

In the context of the agINFRA project, there are a number of data providers providing access to different data types, such as educational, bibliographic, germplasm, statistical, soil maps, cultural and other. The aggregation of metadata from these data sources, which use different metadata schemas in order to meet the specific requirements of each data type, would traditionally be carried out by individually transforming and then harvesting each data source. This approach would be most appropriate for serving the data integration as well as other services deployed by the agINFRA project. A more state-of-the-art methodology should apply the current advances in the context of the Semantic Web, including the publication of all available data as linked and open data. The first step in this process would be the development of a metadata model for each resource type, which would accommodate the most common and / or essential elements of the metadata schemas used in agINFRA by the data providers. (from agINFRA D5.4 public deliverable)

agINFRA infrastructure pays special attention in topics like the efficient metadata management (checking for mappings and transformation of the targeted metadata schemas to a common schema), storage issues for hosting data components and scaling up the handled metadata aggregations and their versions, computing issues in terms of time and resources that are needed for harvesting and often recurring for the coverage similar workflows that are needed (for validation, transformation, harvesting, auto-tagging and indexing). (from agINFRA D1.3.3 public deliverable)

=Community data access protocols

web interface & FTP

Data management technology

Custom

Data access control

POSIX

Public data access protocol

HTTP

Public authentication mechanism

anonymous access

Computing capacities

CPU	3500 CPU’s
GPU	no
RAM	4GB
Storage	30GB
e-Infrastructure	Cloud
Client	Desktop, laptop, mobile device

@@ Line 108: / Line 108: @@
 ===System Architecture ===
 [[File:agINFAR system architecture.png|500px|agINFAR Architecture]]
 In the context of the agINFRA project, there are a number of data providers providing access to different data types, such as educational, bibliographic, germplasm, statistical, soil maps, cultural and other. The aggregation of metadata from these data sources, which use different metadata schemas in order to meet the specific requirements of each data type, would traditionally be carried out by individually transforming and then harvesting each data source. This approach would be most appropriate for serving the data integration as well as other services deployed by the agINFRA project. A more state-of-the-art methodology should apply the current advances in the context of the Semantic Web, including the publication of all available data as linked and open data. The first step in this process would be the development of a metadata model for each resource type, which would accommodate the most common and / or essential elements of the metadata schemas used in agINFRA by the data providers. (from agINFRA D5.4 public deliverable)

Difference between revisions of "AgINFRA"

Revision as of 17:41, 30 July 2015

Community Information

Community Name

Community Short Name

Community Website

Community Description

Community Objectives

Main Contact Institutions

Main Contact

Prior requirement capture activities

Science Viewpoint

Scientific Challenges

Objectives

KPI inputs

User Stories

Information Viewpoint

Data

Data Object types

Data size

Data collection size

Data format

Standards in use

Data management plan

Privacy policy

Metadata

Metadata object types

Metadata Identifiers

Metadata Size

Metadata format

Standards in use

Metadata generation

Other aspects

Data Lifecycle

Technology Viewpoint

System Architecture

=Community data access protocols

Data management technology

Data access control

Public data access protocol

Public authentication mechanism

Computing capacities

Navigation menu

Search