Elsevier

Journal of Web Semantics

Volumes 52–53, October 2018, Pages 16-32
Journal of Web Semantics

GeoTriples: Transforming geospatial data into RDF graphs using R2RML and RML mappings

https://doi.org/10.1016/j.websem.2018.08.003Get rights and content

Abstract

A lot of geospatial data has become available at no charge in many countries recently. Geospatial data that is currently made available by government agencies usually do not follow the linked data paradigm. In the few cases where government agencies do follow the linked data paradigm (e.g., Ordnance Survey in the United Kingdom), specialized scripts have been used for transforming geospatial data into RDF. In this paper we present the open source tool GeoTriples which generates and processes extended R2RML and RML mappings that transform geospatial data from many input formats into RDF. GeoTriples allows the transformation of geospatial data stored in raw files (shapefiles, CSV, KML, XML, GML and GeoJSON) and spatially-enabled RDBMS (PostGIS and MonetDB) into RDF graphs using well-known vocabularies like GeoSPARQL and stSPARQL, but without being tightly coupled to a specific vocabulary. GeoTriples has been developed in European projects LEO and Melodies and has been used to transform many geospatial data sources into linked data. We study the performance of GeoTriples experimentally using large publicly available geospatial datasets, and show that GeoTriples is very efficient and scalable especially when its mapping processor is implemented using Apache Hadoop.

Introduction

In the last few years, the area of linked geospatial data has received attention as researchers and practitioners have started tapping the wealth of existing geospatial information and making it available on the Web [[1], [2]]. As a result, the linked open data (LOD) cloud has been slowly populated with geospatial data. For example, Great Britain’s national mapping agency, Ordnance Survey, has been the first national mapping agency that has made various kinds of geospatial data from Great Britain available as linked open data.1 Similarly, projects TELEIOS,2 LEO,3 MELODIES4 and Copernicus App Lab,5 in which our research groups participated, published a number of geospatial datasets that are Earth observation products e.g., CORINE Land Cover and Urban Atlas.6 Also, the Spatial Data on the Web working group7 created jointly by the Open Geospatial Consortium (OGC) and the World Wide Web Consortium (W3C) has produced in 2017 five relevant working notes on best practices, use cases and requirements, Earth observation data, spatio-temporal data cubes and coverages as linked data.

Geospatial data can come in vector or raster form and are usually accompanied by metadata. Vector data, available in formats such as ESRI shapefiles, KML, and GeoJSON documents, can be accessed either directly or via Web Services such as the OGC Web Feature Service or the query language of a geospatial DBMS. Raster data, available in formats such as GeoTIFF, Network Common Data Form (netCDF) and Hierarchical Data Format (HDF), can be accessed either directly or via Web Services such as the OGC Web Coverage Processing Service (WCS) or the query language of an array DBMS, e.g., rasdaman8 or MonetDB/SciQL. Metadata about geospatial data are encoded in various formats ranging from custom XML schemas to domain specific standards like the OGC GML Application schema for EO products and the OGC Metadata Profile of Observations and Measurements. Automating the process of transforming input geospatial data to linked data has only been addressed by few works so far [[3], [4], [5], [6], [7]]. In many cases, for example in the wildfire monitoring and management application that we developed in TELEIOS [5], custom Python scripts were used for transforming all the necessary geospatial data into linked data.

In this paper we extend the mapping languages R2RML9 and RML10 with some new constructs that help to specify ways of transforming geospatial data from its original format into RDF. We also present the tool GeoTriples that generates automatically and processes extended R2RML and RML mappings for transforming geospatial data from various formats into RDF graphs. The input formats supported are spatially-enabled relational databases (PostGIS and MonetDB), ESRI shapefiles, XML documents following a given schema (hence GML documents as well), KML documents, JSON and GeoJSON documents and CSV documents. GeoTriples is a semi-automated tool that enables the automatic transformation of geospatial data into RDF graphs using state of the art vocabularies like GeoSPARQL [8], but at the same time it is not tightly coupled to a specific vocabulary. The transformation process comprises three steps. First, GeoTriples generates automatically extended R2RML or RML mappings for transforming data that reside in spatially-enabled databases or raw files into RDF. As an optional second step, the user may revise these mappings according to her needs e.g., to utilize a different vocabulary. Finally, GeoTriples processes these mappings and produces an RDF graph.

Users can store and query an RDF graph generated by GeoTriples using a geospatial RDF store like Strabon.11 They can also interlink this graph with other linked geospatial data using tools like the temporal and geospatial extension of Silk12 developed in our group [9] or the more recent tool Radon developed with the participation of our group [10]. For example, it might be useful to infer links involving topological relationships e.g., A geo:sfContains F where A is the area covered by a remotely sensed multispectral image I, F is a geographical feature of interest (field, lake, city etc.) and geo:sfContains is a topological relationship from the topology vocabulary extension of GeoSPARQL. The existence of this link might indicate that I is an appropriate image for studying certain properties of F.

It is often the case in applications that relevant geospatial data is stored in spatially-enabled relational databases (e.g., PostGIS) or files (e.g., shapefiles), and its owners do not want to explicitly transform it into linked data [[11], [12]]. For example, this might be because these data sources get frequently updated and/or are very large. If this is the case, GeoTriples is still very useful. GeoTriple users can use the generated mappings in the system Ontop-spatial to view their data sources virtually as linked data. Ontop-spatial is a geospatial extension of the Ontology-Based Data Access (OBDA) system Ontop13 developed by our group [13]. Ontop performs on-the-fly SPARQL-to-SQL translation on top of relational databases using ontologies and mappings. Ontop-spatial extends Ontop by enabling on-the-fly GeoSPARQL-to-SQL translation on top of geospatial databases. The experimental evaluation of [13] has shown that this approach is not only simpler for the users as it does not require transformation of data, but also more efficient in terms of query response time.

GeoTriples is an open source tool that has been developed in the context of the EU FP7 projects LEO and MELODIES mentioned in the beginning of this section. It is currently utilized in the EU Horizon 2020 project Copernicus App Lab where data from three Copernicus Services14 (Land, Marine and Atmosphere) are made available as linked data to aid their take-up by mobile developers.

The organization of the paper is as follows. Section 2 presents background information and discusses related work. In Section 3 we present the extensions to the mapping languages R2RML and RML for the geospatial domain. In Section 4 we present the architecture of GeoTriples and discuss how GeoTriples generates automatically mappings, and how these mappings are subsequently processed for transforming a geospatial data source into an RDF graph. Section 5 gives an example of translating an input shapefile into RDF, using the GeoTriples utilities. Section 6 presents an implementation of the mapping process of GeoTriples that uses Apache Hadoop. In Section 7 we perform a performance evaluation of the implementations of GeoTriples using publicly available geospatial data. We also compare GeoTriples with the similar tool TripleGeo. Finally, in Section 8, we conclude the paper and discuss future work.

Section snippets

Background and related work

In this section we present related work on methodologies and tools for transformation of data sources into RDF graphs. Currently, most similar approaches have been focusing on mapping relational databases into RDF graphs. We will discuss two state-of-the-art approaches, direct mapping and R2RML and a recent proposal for mapping heterogeneous data into RDF, the mapping language RML. We also include related work on transforming geospatial data into RDF graphs based on these mapping techniques.

Extending the mapping languages R2RML and RML forgeospatial data

Much work has been done recently on extending RDF to represent and query geospatial information. The most mature results of this work are the data model stRDF and the query language stSPARQL [[24], [25]] and the OGC standard GeoSPARQL [8]. These data models and query languages have been implemented in many geospatial triple stores including Strabon, GraphDB,24 Oracle Spatial and Graph,25

The tool GeoTriples

In this section we present the tool GeoTriples that we developed for transforming geospatial data sources into RDF. GeoTriples26 is an open-source tool that is distributed freely according to the Mozilla Public License v2.0. We will present the architecture of GeoTriples and discuss its main components and their respective implementation details. We will then describe how GeoTriples generates R2RML and RML mappings for transforming data that reside in

An example

Let us now show an example of RML mapping generation by GeoTriples for an input shapefile.

A shapefile is a vector data storage format for storing the location, shape, and attributes of geographic features. It is an open specification which has been developed by ESRI in the context of its ArcGIS product. Shapefiles can represent geographic features along with the spatial and non-spatial attributes that describe them. For example, they can store the geometry of a country in conjunction with its

Implementing the mapping processor of GeoTriples using Apache Hadoop

To enable the efficient transformation of large or numerous input geospatial files into RDF, we have developed an implementation of the GeoTriples mapping processor using Apache Hadoop.37 We call this implementation GeoTriples-Hadoop and present its architecture in Fig. 4. Apache Hadoop is an open source framework that allows the distributed processing of large datasets across clusters of computers. The main components of Apache Hadoop are HDFS (its distributed file

Performance evaluation of GeoTriples

In this section we present a performance evaluation of three versions of GeoTriples: the single-node implementation (called simply GeoTriples in this section), the GeoTriples-Hadoop implementation, and a version of the single-node implementation which uses the shell tool GNU Parallel44 and multiple threads to parallelize the work of processing the mappings (called GeoTriples-Multi in this section). For a fairer comparison of GeoTriples-Hadoop and

Summary and conclusions

We presented the tool GeoTriples which is able to transform geospatial data stored in raw files and spatially-enabled RDBMS to RDF graphs using well-known vocabularies. The tool works in three steps. First, it generates automatically extended R2RML or RML mappings that can be used to transform the input data into RDF. As an optional second step, the user may revise these mappings according to her needs e.g., to utilize a different vocabulary. Finally, GeoTriples processes these mappings and

Acknowledgments

This work has been funded in part by the European FP7 project LEO, European Commission (611141): FP7 project MELODIES, European Commission (603525): Dutch NWO project COMMIT, Netherlands: H2020 project Copernicus App Lab , European Commission (730124).

References (33)

  • KyzirakosK. et al.

    Wildfire monitoring using satellite images, ontologies and linked geospatial data

    J. Web Semant.

    (2014)
  • Rodriguez-MuroM. et al.

    Efficient SPARQL-to-SQL with R2RML mappings

    J. Web Sem.

    (2015)
  • KoubarakisM. et al.

    Big, linked geospatial data and its applications in earth observation

    IEEE Internet Comput.

    (2017)
  • KoubarakisM. et al.

    Data models and query languages for linked geospatial data

  • AuerS. et al.

    LinkedGeoData: Adding a spatial dimension to the web of data

  • ChentoutK. et al.

    Adding spatial support to R2RML mappings

  • de LeónA. et al.

    Geographical linked data: A Spanish use case

  • PatroumpasK. et al.

    TripleGeo: an ETL Tool for Transforming Geospatial Data into RDF Triples

  • Open Geospatial Consortium. GeoSPARQL - A geographic query language for RDF data. OpenGIS Implementation Standard...
  • SmerosP. et al.

    Discovering spatial and temporal links among RDF Data

  • SherifM.A. et al.

    Radon - rapid discovery of topological relations

  • BeretaK. et al.

    Ontop-spatial: Geospatial data integration using GeoSPARQL-to-SQL translation

  • BrüggemannS. et al.

    Ontology-based data access for maritime security

  • BeretaK. et al.

    Ontop of geospatial databases

  • A. Bertails, M. Arenas, E. Prudh́ommeaux, J. Sequeda, A direct mapping of relational data to RDF. W3C Recommendation...
  • T. Berners-Lee, Relational databases on the semantic web....
  • Cited by (42)

    • Deep attention based optimized Bi-LSTM for improving geospatial data ontology

      2023, Data and Knowledge Engineering
      Citation Excerpt :

      The remaining structure of the research article is arranged as follows: Section 2 is the most recent related research models; Section 3 explained the developed scheme; Section 4 gives the discussion of implemented results and at last, the overall conclusion of work in Section 5. Kyzirakos et al. [21] developed a GeoTriples model which was used for transforming geospatial data saved in raw files and spatially-enabled RDBMS to RDF graphs by GeoSPARQL and stPARQL. This model works in 3 phases.

    • Declarative RDF graph generation from heterogeneous (semi-)structured data: A systematic literature review

      2023, Journal of Web Semantics
      Citation Excerpt :

      5 characteristics were discussed for data transformations (Section 5). We observed that most data transformations are dedicated to a certain schema transformation, but only in the case of RML, there are 3 alternative data transformations proposed (GeoTriples [51,52], Function Ontology (FnO) [42,63,68], and FunUL [66]). FnO [42,63,68] and SPARQL functions [92] are the only data transformations not depending on a specific schema transformation.

    • Knowledge hypergraph-based approach for data integration and querying: Application to Earth Observation

      2021, Future Generation Computer Systems
      Citation Excerpt :

      Data integration is the process of combining data retrieved from multiple and independent sources to provide an integrated and interoperable structure [8]. Currently, several works and approaches are aiming to solve the aforementioned data integration problems, many of which are based on the Semantic Web (SW) technologies [9–11]. Ontologies are a potential solution for data integration with SW technologies.

    • ”CITYJSON2RDF” A CONVERTER FOR PRODUCING 3D CITY KNOWLEDGE GRAPHS

      2024, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives
    View all citing articles on Scopus
    View full text