CC-LifeWatch Meetings/Amsterdam EGI 2016

From EGIWiki
Jump to: navigation, search

Amsterdam EGI 2016 (April 7th 2016)

SLIDES: https://indico.ifca.es/indico/event/223/

Introduction

Use of OpenProject: Wiki, Documents, Defining User stories and Hardware requirement to have documented everything. Way of communication for WGs.

COOP+

Reinforce the collaboration with American Institutions and start linking with others from other countries.

WP1: Identification of Global challenges in the field of biodiversity, Marine Science and Artic environments. Planetary boundaries.

WP2&3: Coordination framework in global challenges. Establish links with some other RIs existing in other countries.

WP4: Promoting best practices: RIs, techniques and procedures. Create networks of different types between RIs: satellites, sensor networks, etc.

Kick off metting 19-20 May in Granada. First meeting in EGU 2016 at Wien.

R Working group

Alexandre leads R Group. Highlights: Starting in September, first contacts, meeting in Bari, Training course for RvLab in Ostende (VLIZ), Deliverable in February, New additions to RvLab preparing systems to engage.

Proposed services/tools from HCMR, VLIZ and IFCA: Scripts, Lifewatch Data Explorer, RvLab.

A number of people need to be involved on this with different profiles: System administrators, R developers, users, etc.

Approaches: IDEs for R (Rshiny, Rstudio, Jupyter), RvLab (from scratch).

Discussions: EGI issues (requirements), User policies like AAI.

Data produced: how to preserve, discovering...

Storage: Users data, for how long? Public? DOI mint

WG on Python and WG on Workflows

Aim: support of workflow-based portals on FedCloud. Potential scenarios: all component deployed on the cloud, working nodes statically or dynamically deployed on the cloud.

Shared directories: NFS is currently used. Across-sites solution needed. Data access: *Performance

Galaxy overview. Galaxy on the Cloud. Elastic Compute Cluster and Infrastructure Manager to manage the dynamic deployment.

Last months: support VMis not properly configured. Manage long living proxies.

Working with python: Input – Calculation/Analysis . Output. Pipelines.


Geoserver WG

Not much work so far. Metadata available to analyze. Syncing biosensing data with ICT core.

LW data explorer: Rstudio, Rshiny... Conncted with different data sources: based on GIS, databases, files. Produces: Leaflets, Datatables, Dyngraph, CSVs.

Load balancers for working in different instances.

Geoserver provides many services that can be explote for instance by R to analyze geo data.

Some doubts: MongoDB to consume raster data from GeoServer. WPS using R? To be investigated.

Citizen Science WG

Models and predictions: ubiquitous, as good as the data put in. Good data = Good models. Different types of data: From science: formal, reliable, integrable (+-) accesible, technical. From public administrations: representative, current, focused in territories or species. Hard to integrate. From Citizen: Curent, expandins, reliability under question. Increase of data available.

Development: inaturalist adapted in natusfera. Why? Different lenguages, more flexibility controlling users, customised data publication avenues for users, iNaturalist subscription model was problematic to accept and to maintain.

INaturalist provider of GBIF (all data as a block). Species identification.

Future action lines: Linking Citizen Science Biodiversity observation platforms with species identification. Linking bIodiversity – environmental (Natusfera-Lifewatch-Natusfera GBIF).

Promote app in different citizen science initiatives.


Using of Caffe as patter recognition tool in different environments: Local, fedCloud. Four datasets: Oxford 102, Portuguese Flora, RJB Orchidees, Tiger Mosquito. 1st: Train dataset 2nd: Classiy images.

Pybossa to crop only the target: mosquito, flower, etc.


Semantics WG

As leader, Nicola ask for different previous experiences in the topic. Metadata standards are limited to describing a few aspects of data content.

Goals of the last years approaches. Based on that previous experience, LFW taly designed LFW Model core ontology, that tries to describe all the data gathered. Tested with specific domain: phytoplankton with SKOS standard. Use of protege.

WG goals: state of art, define ontoloy creation process, create a community area as decision forum, short-term application of ontology to demonstrate benefits of approach.

Goal of Lifewatch: provide different tools in one place. Find all data about an specific issue in different places.

Storage WG

Missing

AAI WG

Goals: -Identify the needs of Lifewatch community in AAI -Share the developments of the different National Initiatives: Local users?Institutional Users? -Learn about AAI from other projects (Like INDIGO) -Explore which services needs final user AAI -Integrate different roles in AAI (Infrastructure manager, developer, final user)

NoSQL WG

Status of Network of Life:

Exploring/testing the ArangoDB functionalities and limitations • Implementing a management system for taxonomies with requirements needed in NoL • Not to be taken as a final/definitive architecture • Help and evaluate better the design of the data model • Modeling the taxonomy of relations in the network, as well as non taxonomic ranks: • So that any query returns results in an efficient way • Takes advantage of ArangoDB graph traversal queries

Seems that ArangoDB is a good choice, but needs to be deployed and tested in cloud resources (next steps) Since ArangoDB is a multi-model DB it may be used or useful for other use cases