Advancing FAIRness of soil water content (meta)data with the help of Semantic Web technologies

Résumé

Generation of soil water content data takes place in numerous research institutions across Europe, each arranging, describing and implementing its data collection in ways that fit purposes, conventions, standards, terminologies, and limitations specific to their case. This practice drives data consumers into a maze of portals, services, people, and ad hoc approaches, while data – when eventually obtained - is often interpretable in different ways, depending on where it comes from.The situation could improve by making data and metadata more FAIR across institutions. Hence, the effort of ENVRI-FAIR to harmonise data and services, part of which is the presented use case that gathered the contribution of six research entities: LifeWatch ERIC, AnaEE, ICOS, SIOS, DANUBIUS-RI, and eLTER. A means to improve FAIRness is the stack of semantic web technologies, which add semantics to metadata schemata, dataset structures, and data itself. Semantics allow data consumers to search for data using their preferred terms, to interpret search results correctly, and to link together data of different provenance in order to reuse it. Data querying and processing also become more meaningful with semantics, as all fields and values now refer or map to a common semantic model, creating a single network of (meta)data (Knowledge Graph). Eventually, all nodes and links of the graph carry meaning and are potentially queryable from a single endpoint, even if they may partly reside in separate repositories.The solution we implemented makes use of a semantic model, a set of queries to retrieve information from the model, and a web application to run the queries and serve the output. The model is constructed as a mash-up of new and existing semantic entities and relationships. Entities of the graph were initially defined based on the netCDF structure of the AnaEE data series, a number of complex search queries submitted in text by the project participants, and a list of concepts to describe soil data context. The entities were then connected structurally, either with OWL and RDFS properties or with ad hoc relationships, working principally in agreement with the participants, who acted as domain experts, data producers, and data consumers. Eventually, entities were mapped to or replaced by externally defined ones from well-established and recognised semantic artefacts. The reused components of the graph include ontologies, controlled vocabularies for application interoperability and for domain expertise (e.g., the AnaEE thesaurus, DCAT, GeoSPARQL), and the SKOS model for linking domain thesauri.The final service is a desktop web application – a dashboard - built with Angular framework and with a semantic graph database (GraphDB) at its backend, and serves as an entry point for (meta)data in the model. Among others, users may search for datasets by type of soil texture or pedological class in the site where data was collected, or they may search using terms from the thesaurus of their preference. Spatial search and dataset location are also enabled, while results are aggregated in tables and histograms and they are visually rendered on an interactive map.


Auteurs, date et publication :

Auteurs Xeni Kechagioglou , Giuseppe Turrisi , Francesca De Pascalis , Claudio D’Onofrio , Luke Marsden , Christian Pichot , Christoph Wohner , Nicola Fiore , Dario Papale , Alberto Basset , Giovanni L’Abate , André Chanzy

Date : 2023