Coupling OGC WPS and W3C PROV for provenance-aware geoprocessing workflows

https://doi.org/10.1016/j.cageo.2020.104419Get rights and content

Highlights

  • OGC WPS and W3C PROV are coupled to represent provenance in geoprocessing workflows.

  • The plan for workflow provenance is recorded in details.

  • XML schemas for geoprocessing workflow provenance are defined.

  • Using the existing standards facilitates the interoperability of provenance information.

Abstract

With the advancement of cyberinfrastructure, an increasing number of geoprocessing functions are available on the Web. Scientific workflows are frequently used to orchestrate distributed services to address complex geospatial problems. In the workflow systems, geospatial data provenance is extremely valuable to evaluate data reliability and usability, also reproduce data products, especially considering the heterogeneous data and computing resources in the Web environment. W3C PROV is an expressive model for provenance information in the general domain, which is extended to support OGC WPS in describing provenance in geoprocessing workflows. A conceptual model that couples OGC WPS and W3C PROV is proposed, and the XML schema definitions of the model are also implemented. The proposed model can provide more complete provenance information, including used geospatial data and geoprocessing services, and their plans, which helps advance provenance awareness in workflow systems. Coupling OGC WPS and W3C PROV can benefit from the maturity and interoperability of the existing standards.

Introduction

With the advancement of Web service technologies, increasing geospatial data and geoprocessing functions are available on the Web (Hey and Trefethen, 2005; Zhao et al., 2012; Yue et al., 2015a). Scientific workflows are widely used to orchestrate these distributed resources to create more powerful and new added-value services (Sheng et al., 2014; Lemos et al., 2016; Yue et al., 2015b). In such situations, a geospatial data product is generated by a series of geoprocessing steps. Provenance information becomes very important when consuming these data products. Provenance records the derivation of a dataset, including process steps taken, their inputs and outputs, and the organization/person responsible for the product (Di et al., 2013b; He et al., 2015a; Yuan et al., 2013). It brings transparency and helps determine the usability and reliability of data products (Foster, 2005; Di et al., 2013b). Data provenance has been considered necessary for Earth science (Moreau, 2010; Di et al., 2013b; Iturbide et al., 2019; Spiekermann et al., 2019; Zhang et al., 2017b; Yue et al., 2016; Essawy et al., 2018), especially for distributed environments (He et al., 2015b; Yue et al., 2011; Di et al., 2013a), since distributed services can be offered by various providers.

Provenance representation is a key consideration for provenance-aware applications, which includes the model for provenance and its implementation syntax (Di et al., 2013b). The Provenance Working Group of World Wide Web Consortium (W3C) provides a PROV family of documents that define a model (PROV-DM), corresponding serializations (e.g., PROV-XML, PROV-O), and some other definitions that facilitate the interoperable interchange of provenance information in the general Web domain (Groth and Moreau, 2013). Although W3C PROV is widely investigated for its use in geospatial domain (Di et al., 2013b; He et al., 2015b; Jiang et al., 2018; Closa et al., 2017a; Zhang et al., 2017b), it is mainly extended to describe execution information, and the plan information is not general addressed. A plan represents a set of actions or steps intended to take to achieve some goals (Moreau and Missier, 2013) and is increasingly considered as an important part of provenance information. The plan can provide a high-level description of what was executed, which is useful to understand the workflows and steps, and facilitate future reuse or adjustment of the workflows. Actually, workflow specification or language can play the role of a plan. It is the formalism that expresses the composition logic (Lemos et al., 2016), which provides basic information about processes including their supposed inputs and outputs, the method to invoke them, their execution sequences, and workflow metadata. The challenge is to integrate the provenance model and the conceptual model of workflow specification to provide a more complete provenance representation.

W3C PROV already provides the term prov:Plan. However, it does not provide further elaboration on how plans should be described or related to other provenance elements. The Ontology for Provenance and Plans (P-Plan) (Garijo and Gil, 2012), Open Provenance Model for Workflows (OPMW) (Garijo and Gil, 2014), and ProvONE (Cuevas-Vicenttín et al., 2016) are all PROV extension models for scientific workflow provenance in the general domain. In the geospatial Web service community, the Open Geospatial Consortium (OGC) published a Web Processing Service (WPS) specification, which provides a standard method for sharing geoprocessing functions (Müller and Pross, 2015) that is extensively used and accepted in the geospatial domain (Qiao et al., 2019). The WPS specification provides a process description framework that can be used to enrich provenance information. For example, Closa et al. (2017b, 2019) proposed novel approaches to describe geospatial data provenance more precisely by integrating OGC WPS into a provenance model.

This paper proposes a conceptual provenance model for geoprocessing workflows by coupling OGC WPS and W3C PROV, which covers three stages of workflows, namely, construction, execution and provenance. An XML implementation of the proposed model in a workflow tool is given. A use case in the geospatial domain demonstrates the applicability of the model. This approach provides a more complete description of workflow provenance. The rest of the paper is organized as follows. Section 2 introduces the background of the provenance models and WPS description framework. The provenance model that couples OGC WPS and W3C PROV and its implementation are given in Section 3. Section 4 introduces a use case that demonstrates the application of the proposed method. Section 5 draws the conclusions.

Section snippets

Provenance models

W3C PROV and ISO 19115 are two popular provenance information models used in geospatial domains. W3C PROV defines a conceptual model and its serializations (e.g., ontology and XML), which improve the interoperability of provenance information in heterogeneous environments such as the Web. PROV-DM defines three core types and relations among them (Fig. 1). At the core, provenance describes the use and production of entities by activities, which may be influenced by agents. The seven core

Coupling OGC WPS and W3C PROV

The conceptual model for provenance-aware geoprocessing workflows is illustrated by the UML diagram in Fig. 3, which couples OGC WPS and W3C PROV. The workflow description plays the role of a plan, which is represented using OGC WPS. Workflow execution information is recorded by extending and complementing W3C PROV, including used geospatial data and geoprocessing services, and relations to the plans.

Use case

In this paper, a use case that extracts water-bodies from remote sensing images is used to illustrate how to realize provenance awareness in geoprocessing workflows.

Conclusions

This paper couples OGC WPS and W3C PROV for provenance-aware geoprocessing workflows. The WPS specification provides a comprehensive description approach for geospatial services. W3C PROV is extended in the following aspects: (1) mapping the core structures in PROV-DM to basic elements of geoprocessing workflows, (2) enriching the geospatial dataset representation, and (3) providing detailed plan information using OGC WPS. The XML schema definitions of the proposed model are implemented for its

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We appreciate the two anonymous reviewers for their very constructive comments that helped improve the quality of the paper. This work was supported by the National Natural Science Foundation of China (No. 41901313 and 41901315) and the Major State Research Development Program of China (No. 2017YFB0504103).

References (40)

  • M. Zhang et al.

    GeoJModelBuilder: an open source geoprocessing workflow tool. Open Geospatial Data

    Softw. Stand.

    (2017)
  • M. Zhang et al.

    Model provenance tracking and inference for integrated environmental modelling

    Environ. Model. Software

    (2017)
  • P. Zhao et al.

    The geoprocessing web

    Comput. Geosci.

    (2012)
  • G. Closa et al.

    Web processing services to describe provenance and geospatial modelling

  • G. Closa et al.

    A Provenance Metadata Model Integrating ISO Geospatial Lineage and the OGC WPS: conceptual Model and Implementation

    (2019)
  • V. Cuevas-Vicenttín et al.

    ProvONE: A PROV Extension Data Model for Scientific Workflow Provenance

    (2016)
  • L. Di et al.

    Implementation of geospatial data provenance in a web service workflow environment with ISO 19115 and ISO 19115-2 lineage model

    IEEE Trans. Geosci. Rem. Sens.

    (2013)
  • L. Di et al.

    Geoscience data provenance: an overview

    Geosci. Remote Sensing, IEEE Trans.

    (2013)
  • I. Foster

    Service-oriented science

    Science (80-)

    (2005)
  • D. Garijo et al.

    The OPMW-PROV Ontology

    (2014)
  • Cited by (14)

    • A review of Earth Artificial Intelligence

      2022, Computers and Geosciences
      Citation Excerpt :

      The emergence of the physics-informed ML model (Kashinath et al., 2021) underscores the importance of advancing cutting-edge algorithms. Earth scientists have proposed standards to document the provenance of both data and scientific workflows (Sun et al., 2020a) including ISO 19115:2003 and ISO 19115–2:2009, the Open Provenance Model (Moreau et al., 2008), the data service standards of the Open Geospatial Consortium, and the Provenance Ontology of W3C (Hills et al., 2015; Lebo et al., 2013; Sun et al., 2013; Tilmes et al., 2013; Zhang et al., 2020). Software like Docker, Helm, Conda/Anaconda-project, Prov, MetaClip, and Geoweaver can be used to record the AI workflow being used so that it can be made available for later retrieval to understand, replicate, reproduce, and reuse the trained AI models.

    • A framework for ecosystem service assessment using GIS interoperability standards

      2021, Computers and Geosciences
      Citation Excerpt :

      These standards specify the use of the Hypertext Transfer Protocol (HTTP) to communicate metadata and data inside of Extensible Markup Language (XML) documents. The metadata includes basic data like extent, projection, and provenance essential for data quality (Zhang et al., 2020), and the availability of basic query functions with parameters like counts and the list of names and types for datasets or processing functions. The data can be embedded directly in the XML response document but is more typically given by reference to an external data source that can be in a variety of formats.

    View all citing articles on Scopus
    View full text