Difference between revisions of "AgINFRA"
Line 108: | Line 108: | ||
===System Architecture === | ===System Architecture === | ||
[[File:agINFAR system architecture.png|500px|agINFAR Architecture]] | [[File:agINFAR system architecture.png|500px|agINFAR Architecture]] | ||
In the context of the agINFRA project, there are a number of data providers providing access to different data types, such as educational, bibliographic, germplasm, statistical, soil maps, cultural and other. The aggregation of metadata from these data sources, which use different metadata schemas in order to meet the specific requirements of each data type, would traditionally be carried out by individually transforming and then harvesting each data source. This approach would be most appropriate for serving the data integration as well as other services deployed by the agINFRA project. A more state-of-the-art methodology should apply the current advances in the context of the Semantic Web, including the publication of all available data as linked and open data. The first step in this process would be the development of a metadata model for each resource type, which would accommodate the most common and / or essential elements of the metadata schemas used in agINFRA by the data providers. (from agINFRA D5.4 public deliverable) | In the context of the agINFRA project, there are a number of data providers providing access to different data types, such as educational, bibliographic, germplasm, statistical, soil maps, cultural and other. The aggregation of metadata from these data sources, which use different metadata schemas in order to meet the specific requirements of each data type, would traditionally be carried out by individually transforming and then harvesting each data source. This approach would be most appropriate for serving the data integration as well as other services deployed by the agINFRA project. A more state-of-the-art methodology should apply the current advances in the context of the Semantic Web, including the publication of all available data as linked and open data. The first step in this process would be the development of a metadata model for each resource type, which would accommodate the most common and / or essential elements of the metadata schemas used in agINFRA by the data providers. (from agINFRA D5.4 public deliverable) | ||
Revision as of 17:41, 30 July 2015
Engagement overview | Community requirements | Community events | Training | EGI Webinars | Documentations |
Community Information
Community Name
EGI Federated Cloud services for the agri-food research
Community Short Name
agINFRA
Community Website
Community Description
The agINFRA project, supported by the Agriculture Information Management Standards of the Food and Agriculture Organization of the United Nations (AIMS FAO) and the CIARD global initiative, introduces a set of recommendations applying to agri-food research community for data management, sharing and dissemination. Additionally, these recommendations aim to provide a framework for the research community of European agri-food research institutions that need to follow the H2020 Open Access mandate and share their metadata with their thematic aggregator in order to publish them in OpenAire. (from www.aginfra.eu)
Community Objectives
agINFRA aims to function as the thematic aggregator of the agri-food research domain and act as the main research community for OpenAire.
Main Contact Institutions
Agro-Know, FAO
Main Contact
Effie Tsiflidou, effie@agroknow.gr
Prior requirement capture activities
agINFRA D2.2 Revised stakeholders needs deliverable http://www.aginfra.eu/project/images/DELIVERABLES/aginfra_d2.2_revised-review-of-stakeholder-needs_final_20131025.pdf agINFRA D5.5 Report on agricultural data sources/repositories integration
Science Viewpoint
Scientific Challenges
- High volume storage
- Impossible to use centralized storage
- Large, live, constantly updated data streams
- Handling of heterogeneous data
Objectives
- Raw data resources with agricultural data must be publicly available, using a unified search and discovery platform
- Making such resources more broadly discoverable by humans and machines by registering them in shared public directories and providing all the technical information that allows applications to process those data
- Reach out to entrepreneurs who can put their data to work in new services
- Invite commercial entities into the conversation around the future of data
KPI inputs
Access | Increased access and usage of e-Infrastructures by scientific communities, simplifying the “embracing” of e-Science. | Number of users of the web portals: 10000 monthly; Number of sites provide the services: 20 |
Visibility | Visibility of the project among scientists, technology providers and resource managers at high level. | Number of portal cloud installations/usage: 4 |
User Stories
Use cases taken from agINFRA public deliverable D1.3.3 agINFRA Scientific Vision: Part A
- Data provider who needs to host and store a small scale CMS
In this case, data provider requests from the system to set up his own CMS instance in order to cover the needs for a small scale CMS E.g. Open Educational Resources (http://www.oercommons.org/), which provides access to hundreds of course-related materials and collections in several themes
- Data provider, who needs to host and store a large scale hosting & replication CMS
In this case, data provider requests from the system to allocate space or to set up accounts in a large scale CMS E.g. Consiglio per la Ricerca e la Sperimentazione in Agricoltura - CRA (http://sito.entecra.it/portale/index2.php), which includes thousands of data sources in several research fields in agriculture and related domains
- Data provider, who needs to host CMS at own or external / commercial infrastructure In this case, content provider is interested to expose (meta)data to e-infrastructure, E.g. Turkish Agricultural Learning Objects Repository - TrAgLOR (http://traglor.cu.edu.tr/), which serves as an organized collections of learning objects, stored on servers and delivered through networks.
Information Viewpoint
Data
Data Object types
Germplasm data
Data size
~ 10KB
Data collection size
~ 1PB
Data format
XML
Standards in use
MCPD (for Germplasm data)
Data management plan
- agINFRA collects data free of access to make them publicly available
- agINFRA should ensure long-term preservation
Privacy policy
- publicly available, free of access
Metadata
Metadata object types
- AGRIS Bibliographic information: metadata for publications (scientific articles, thesis, dissertations, journals)
- GLN metadata for educational resources.
- VocBench instances
- VEST Registry
- CIARD RING
Metadata Identifiers
ARN
Metadata Size
~10KB
Metadata format
RDF, OWL, XML
Standards in use
RDF, OWL, SKOS, OAI-PMH
Metadata generation
Custom java code based on xml transformations
Other aspects
Triple store with RDF files in order to preserve linked open data
Data Lifecycle
- Data acquisition level (including manual sent raw XML files or harvesting via protocols like OAI-PMH)
- Metadata records evaluation and mappings
- Data transformation
- Data identification – deduplication
- Data triplification (XML to RDF)
- Upload RDFs to allegro-graph triple store
- Data indexing
- Data publishing to AGRIS portal and also provide an FTP with XML records and RDFs
- Data curation
Technology Viewpoint
System Architecture
In the context of the agINFRA project, there are a number of data providers providing access to different data types, such as educational, bibliographic, germplasm, statistical, soil maps, cultural and other. The aggregation of metadata from these data sources, which use different metadata schemas in order to meet the specific requirements of each data type, would traditionally be carried out by individually transforming and then harvesting each data source. This approach would be most appropriate for serving the data integration as well as other services deployed by the agINFRA project. A more state-of-the-art methodology should apply the current advances in the context of the Semantic Web, including the publication of all available data as linked and open data. The first step in this process would be the development of a metadata model for each resource type, which would accommodate the most common and / or essential elements of the metadata schemas used in agINFRA by the data providers. (from agINFRA D5.4 public deliverable)
agINFRA infrastructure pays special attention in topics like the efficient metadata management (checking for mappings and transformation of the targeted metadata schemas to a common schema), storage issues for hosting data components and scaling up the handled metadata aggregations and their versions, computing issues in terms of time and resources that are needed for harvesting and often recurring for the coverage similar workflows that are needed (for validation, transformation, harvesting, auto-tagging and indexing). (from agINFRA D1.3.3 public deliverable)
=Community data access protocols
web interface & FTP
Data management technology
Custom
Data access control
POSIX
Public data access protocol
HTTP
Public authentication mechanism
anonymous access
Computing capacities
CPU | 3500 CPU’s |
GPU | no |
RAM | 4GB |
Storage | 30GB |
e-Infrastructure | Cloud |
Client | Desktop, laptop, mobile device |