Coupling OGC WPS and W3C PROV for provenance-aware geoprocessing workflows
Introduction
With the advancement of Web service technologies, increasing geospatial data and geoprocessing functions are available on the Web (Hey and Trefethen, 2005; Zhao et al., 2012; Yue et al., 2015a). Scientific workflows are widely used to orchestrate these distributed resources to create more powerful and new added-value services (Sheng et al., 2014; Lemos et al., 2016; Yue et al., 2015b). In such situations, a geospatial data product is generated by a series of geoprocessing steps. Provenance information becomes very important when consuming these data products. Provenance records the derivation of a dataset, including process steps taken, their inputs and outputs, and the organization/person responsible for the product (Di et al., 2013b; He et al., 2015a; Yuan et al., 2013). It brings transparency and helps determine the usability and reliability of data products (Foster, 2005; Di et al., 2013b). Data provenance has been considered necessary for Earth science (Moreau, 2010; Di et al., 2013b; Iturbide et al., 2019; Spiekermann et al., 2019; Zhang et al., 2017b; Yue et al., 2016; Essawy et al., 2018), especially for distributed environments (He et al., 2015b; Yue et al., 2011; Di et al., 2013a), since distributed services can be offered by various providers.
Provenance representation is a key consideration for provenance-aware applications, which includes the model for provenance and its implementation syntax (Di et al., 2013b). The Provenance Working Group of World Wide Web Consortium (W3C) provides a PROV family of documents that define a model (PROV-DM), corresponding serializations (e.g., PROV-XML, PROV-O), and some other definitions that facilitate the interoperable interchange of provenance information in the general Web domain (Groth and Moreau, 2013). Although W3C PROV is widely investigated for its use in geospatial domain (Di et al., 2013b; He et al., 2015b; Jiang et al., 2018; Closa et al., 2017a; Zhang et al., 2017b), it is mainly extended to describe execution information, and the plan information is not general addressed. A plan represents a set of actions or steps intended to take to achieve some goals (Moreau and Missier, 2013) and is increasingly considered as an important part of provenance information. The plan can provide a high-level description of what was executed, which is useful to understand the workflows and steps, and facilitate future reuse or adjustment of the workflows. Actually, workflow specification or language can play the role of a plan. It is the formalism that expresses the composition logic (Lemos et al., 2016), which provides basic information about processes including their supposed inputs and outputs, the method to invoke them, their execution sequences, and workflow metadata. The challenge is to integrate the provenance model and the conceptual model of workflow specification to provide a more complete provenance representation.
W3C PROV already provides the term prov:Plan. However, it does not provide further elaboration on how plans should be described or related to other provenance elements. The Ontology for Provenance and Plans (P-Plan) (Garijo and Gil, 2012), Open Provenance Model for Workflows (OPMW) (Garijo and Gil, 2014), and ProvONE (Cuevas-Vicenttín et al., 2016) are all PROV extension models for scientific workflow provenance in the general domain. In the geospatial Web service community, the Open Geospatial Consortium (OGC) published a Web Processing Service (WPS) specification, which provides a standard method for sharing geoprocessing functions (Müller and Pross, 2015) that is extensively used and accepted in the geospatial domain (Qiao et al., 2019). The WPS specification provides a process description framework that can be used to enrich provenance information. For example, Closa et al. (2017b, 2019) proposed novel approaches to describe geospatial data provenance more precisely by integrating OGC WPS into a provenance model.
This paper proposes a conceptual provenance model for geoprocessing workflows by coupling OGC WPS and W3C PROV, which covers three stages of workflows, namely, construction, execution and provenance. An XML implementation of the proposed model in a workflow tool is given. A use case in the geospatial domain demonstrates the applicability of the model. This approach provides a more complete description of workflow provenance. The rest of the paper is organized as follows. Section 2 introduces the background of the provenance models and WPS description framework. The provenance model that couples OGC WPS and W3C PROV and its implementation are given in Section 3. Section 4 introduces a use case that demonstrates the application of the proposed method. Section 5 draws the conclusions.
Section snippets
Provenance models
W3C PROV and ISO 19115 are two popular provenance information models used in geospatial domains. W3C PROV defines a conceptual model and its serializations (e.g., ontology and XML), which improve the interoperability of provenance information in heterogeneous environments such as the Web. PROV-DM defines three core types and relations among them (Fig. 1). At the core, provenance describes the use and production of entities by activities, which may be influenced by agents. The seven core
Coupling OGC WPS and W3C PROV
The conceptual model for provenance-aware geoprocessing workflows is illustrated by the UML diagram in Fig. 3, which couples OGC WPS and W3C PROV. The workflow description plays the role of a plan, which is represented using OGC WPS. Workflow execution information is recorded by extending and complementing W3C PROV, including used geospatial data and geoprocessing services, and relations to the plans.
Use case
In this paper, a use case that extracts water-bodies from remote sensing images is used to illustrate how to realize provenance awareness in geoprocessing workflows.
Conclusions
This paper couples OGC WPS and W3C PROV for provenance-aware geoprocessing workflows. The WPS specification provides a comprehensive description approach for geospatial services. W3C PROV is extended in the following aspects: (1) mapping the core structures in PROV-DM to basic elements of geoprocessing workflows, (2) enriching the geospatial dataset representation, and (3) providing detailed plan information using OGC WPS. The XML schema definitions of the proposed model are implemented for its
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
We appreciate the two anonymous reviewers for their very constructive comments that helped improve the quality of the paper. This work was supported by the National Natural Science Foundation of China (No. 41901313 and 41901315) and the Major State Research Development Program of China (No. 2017YFB0504103).
References (40)
- et al.
W3C PROV to describe provenance at the dataset, feature and attribute levels in a distributed environment
Comput. Environ. Urban Syst.
(2017) - et al.
Sentinel-2: ESA's optical high-resolution mission for GMES operational services
Remote Sens. Environ.
(2012) - et al.
Integrating scientific cyberinfrastructures to improve reproducibility in computational hydrology: example for HydroShare and GeoTrust
Environ. Model. Software
(2018) - et al.
The R-based climate4R open framework for reproducible climate data access and post-processing
Environ. Model. Software
(2019) - et al.
Advancing interoperability of geospatial data provenance on the web: gap analysis and strategies
Comput. Geosci.
(2018) - et al.
Simplifying the deployment of OGC web processing services (WPS) for environmental modelling–Introducing Tethys WPS Server
Environ. Model. Software
(2019) - et al.
Web services composition: a decade's overview
Inf. Sci. (Ny)
(2014) - et al.
Implementations of fine-grained automated data provenance to support transparent environmental modelling
Environ. Model. Software
(2019) - et al.
Sharing geospatial provenance in a service-oriented environment
Comput. Environ. Urban Syst.
(2011) - et al.
A geoprocessing workflow system for environmental monitoring and integrated modelling
Environ. Model. Software
(2015)
GeoJModelBuilder: an open source geoprocessing workflow tool. Open Geospatial Data
Softw. Stand.
Model provenance tracking and inference for integrated environmental modelling
Environ. Model. Software
The geoprocessing web
Comput. Geosci.
Web processing services to describe provenance and geospatial modelling
A Provenance Metadata Model Integrating ISO Geospatial Lineage and the OGC WPS: conceptual Model and Implementation
ProvONE: A PROV Extension Data Model for Scientific Workflow Provenance
Implementation of geospatial data provenance in a web service workflow environment with ISO 19115 and ISO 19115-2 lineage model
IEEE Trans. Geosci. Rem. Sens.
Geoscience data provenance: an overview
Geosci. Remote Sensing, IEEE Trans.
Service-oriented science
Science (80-)
The OPMW-PROV Ontology
Cited by (14)
Documentation strategy for facilitating the reproducibility of geo-simulation experiments
2023, Environmental Modelling and SoftwareWS4GEE: Enhancing geospatial web services and geoprocessing workflows by integrating the Google Earth Engine
2023, Environmental Modelling and SoftwareA review of Earth Artificial Intelligence
2022, Computers and GeosciencesCitation Excerpt :The emergence of the physics-informed ML model (Kashinath et al., 2021) underscores the importance of advancing cutting-edge algorithms. Earth scientists have proposed standards to document the provenance of both data and scientific workflows (Sun et al., 2020a) including ISO 19115:2003 and ISO 19115–2:2009, the Open Provenance Model (Moreau et al., 2008), the data service standards of the Open Geospatial Consortium, and the Provenance Ontology of W3C (Hills et al., 2015; Lebo et al., 2013; Sun et al., 2013; Tilmes et al., 2013; Zhang et al., 2020). Software like Docker, Helm, Conda/Anaconda-project, Prov, MetaClip, and Geoweaver can be used to record the AI workflow being used so that it can be made available for later retrieval to understand, replicate, reproduce, and reuse the trained AI models.
A framework for ecosystem service assessment using GIS interoperability standards
2021, Computers and GeosciencesCitation Excerpt :These standards specify the use of the Hypertext Transfer Protocol (HTTP) to communicate metadata and data inside of Extensible Markup Language (XML) documents. The metadata includes basic data like extent, projection, and provenance essential for data quality (Zhang et al., 2020), and the availability of basic query functions with parameters like counts and the list of names and types for datasets or processing functions. The data can be embedded directly in the XML response document but is more typically given by reference to an external data source that can be in a variety of formats.
Modeling the Dashboard Provenance
2023, arXivResearch on Provenance Model for Multi-source Remote Sensing Images
2023, Journal of Geo-Information Science