Alert.png The wiki is deprecated and due to be decommissioned by the end of September 2022.
The content is being migrated to other supports, new updates will be ignored and lost.
If needed you can get in touch with EGI SDIS team using operations @


From EGIWiki
Jump to navigation Jump to search
Engagement overview Community requirements Community events Training EGI Webinars Documentations

Community Information

Community Name

EGI Federated Cloud services for the agri-food research

Community Short Name


Community Website

Community Description

The agINFRA project, supported by the Agriculture Information Management Standards of the Food and Agriculture Organization of the United Nations (AIMS FAO) and the CIARD global initiative, introduces a set of recommendations applying to agri-food research community for data management, sharing and dissemination. Additionally, these recommendations aim to provide a framework for the research community of European agri-food research institutions that need to follow the H2020 Open Access mandate and share their metadata with their thematic aggregator in order to publish them in OpenAire. (from

Community Objectives

agINFRA aims to function as the thematic aggregator of the agri-food research domain and act as the main research community for OpenAire.

Main Contact Institutions

Agro-Know, FAO

Main Contact

  • Effie Tsiflidou,
  • Nilolaos Marianos,

Prior requirement capture activities

agINFRA D2.2 Revised stakeholders needs deliverable agINFRA D5.5 Report on agricultural data sources/repositories integration

Science Viewpoint

Scientific Challenges

  • High volume storage
  • Impossible to use centralized storage
  • Large, live, constantly updated data streams
  • Handling of heterogeneous data


  • Raw data resources with agricultural data must be publicly available, using a unified search and discovery platform
  • Making such resources more broadly discoverable by humans and machines by registering them in shared public directories and providing all the technical information that allows applications to process those data
  • Reach out to entrepreneurs who can put their data to work in new services
  • Invite commercial entities into the conversation around the future of data

KPI inputs

Access Increased access and usage of e-Infrastructures by scientific communities, simplifying the “embracing” of e-Science. Number of users of the web portals: 10000 monthly; Number of sites provide the services: 20
Visibility Visibility of the project among scientists, technology providers and resource managers at high level. Number of portal cloud installations/usage: 4

User Stories

Use cases taken from agINFRA public deliverable D1.3.3 agINFRA Scientific Vision: Part A

  • Data provider who needs to host and store a small scale CMS

In this case, data provider requests from the system to set up his own CMS instance in order to cover the needs for a small scale CMS E.g. Open Educational Resources (, which provides access to hundreds of course-related materials and collections in several themes

  • Data provider, who needs to host and store a large scale hosting & replication CMS

In this case, data provider requests from the system to allocate space or to set up accounts in a large scale CMS E.g. Consiglio per la Ricerca e la Sperimentazione in Agricoltura - CRA (, which includes thousands of data sources in several research fields in agriculture and related domains

  • Data provider, who needs to host CMS at own or external / commercial infrastructure In this case, content provider is interested to expose (meta)data to e-infrastructure, E.g. Turkish Agricultural Learning Objects Repository - TrAgLOR (, which serves as an organized collections of learning objects, stored on servers and delivered through networks.

Information Viewpoint


Data Object types

Germplasm data

Data size

~ 10KB

Data collection size

~ 1PB

Data format


Standards in use

MCPD (for Germplasm data)

Data management plan

  • agINFRA collects data free of access to make them publicly available
  • agINFRA should ensure long-term preservation

Privacy policy

  • publicly available, free of access


Metadata object types

  • AGRIS Bibliographic information: metadata for publications (scientific articles, thesis, dissertations, journals)
  • GLN metadata for educational resources.
  • VocBench instances
  • VEST Registry

Metadata Identifiers


Metadata Size


Metadata format


Standards in use


Metadata generation

Custom java code based on xml transformations

Other aspects

Triple store with RDF files in order to preserve linked open data

Data Lifecycle

  • Data acquisition level (including manual sent raw XML files or harvesting via protocols like OAI-PMH)
  • Metadata records evaluation and mappings
  • Data transformation
  • Data identification – deduplication
  • Data triplification (XML to RDF)
  • Upload RDFs to allegro-graph triple store
  • Data indexing
  • Data publishing to AGRIS portal and also provide an FTP with XML records and RDFs
    • Data curation

Technology Viewpoint

System Architecture

agINFAR Architecture

In the context of the agINFRA project, there are a number of data providers providing access to different data types, such as educational, bibliographic, germplasm, statistical, soil maps, cultural and other. The aggregation of metadata from these data sources, which use different metadata schemas in order to meet the specific requirements of each data type, would traditionally be carried out by individually transforming and then harvesting each data source. This approach would be most appropriate for serving the data integration as well as other services deployed by the agINFRA project. A more state-of-the-art methodology should apply the current advances in the context of the Semantic Web, including the publication of all available data as linked and open data. The first step in this process would be the development of a metadata model for each resource type, which would accommodate the most common and / or essential elements of the metadata schemas used in agINFRA by the data providers. (from agINFRA D5.4 public deliverable)

agINFRA infrastructure pays special attention in topics like the efficient metadata management (checking for mappings and transformation of the targeted metadata schemas to a common schema), storage issues for hosting data components and scaling up the handled metadata aggregations and their versions, computing issues in terms of time and resources that are needed for harvesting and often recurring for the coverage similar workflows that are needed (for validation, transformation, harvesting, auto-tagging and indexing). (from agINFRA D1.3.3 public deliverable)

Community data access protocols

web interface & FTP

Data management technology


Data access control


Public data access protocol


Public authentication mechanism

anonymous access

Computing capacities

CPU 3500 CPU’s
GPU no
Storage 30GB
e-Infrastructure Cloud
Client Desktop, laptop, mobile device

Software and applications in use

Software/ applications/services

  • Software name: apache tomcat, solr, custom java code
  • Software Licensing: open source
  • Configuration: our web app is based on java war application and run on tomcat 6
  • Dependencies needed to run the application, indicating origin and requirements: cloud infrastructure, open jdk 6, apache tomcat 6

Operating system

centos 5 linux

Runtime libraries/APIs

java, sax parser, solr 1.4

Typical processing time


e-Infrastructure in use

EGI, GEANT though GRNET cloud (oceans and Vima)

Requirements for EGI Testbed Establishments

Does the case include preferences on specific tools and technologies to use? cloud infrastructure like virtual machine instances
Does the user have preferences on specific resource providers? no
Approximately how much compute and storage capacity and for how long time is needed? 2.2 GHz, long term preservation, 100 GB
Does the user need access to an existing allocation, or does he/she needs a new allocation? no
Does the user (or those he/she represent) have the resources, time and skills to manage an EGI VO? No