A software framework for construction of process-based stochastic spatio-temporal models and data assimilation

https://doi.org/10.1016/j.envsoft.2009.10.004Get rights and content

Abstract

Process-based spatio-temporal models simulate changes over time using equations that represent real world processes. They are widely applied in geography and earth science. Software implementation of the model itself and integrating model results with observations through data assimilation are two important steps in the model development cycle. Unlike most software frameworks that provide tools for either implementation of the model or data assimilation, this paper describes a software framework that integrates both steps. The software framework includes generic operations on 2D map and 3D block data that can be combined in a Python script using a framework for time iterations and Monte Carlo simulation. In addition, the framework contains components for data assimilation with the Ensemble Kalman Filter and the Particle filter. Two case studies of distributed hydrological models show how the framework integrates model construction and data assimilation.

Introduction

Spatio-temporal numerical models simulating geographic change are one of the cornerstones of research in geography and earth science and are frequently used in management, planning, and risk assessment in application domains such as land use change (Ligtenberg et al., 2004, Moulin et al., 2004), hazards and evacuation (Helbing et al., 2000), ecosystem studies (Sydelko et al., 2001, Gimblett et al., 2003), spread of diseases (Breukers et al., 2006), criminology (Groff, 2007), land degradation and geomorphology (Karssenberg and Bridge, 2008, Wilkinson et al., 2009), or hydrology (Beven, 2002, Ajami et al., 2007, Blöschl et al., 2008, Brown et al., 2008). Although the application field may vary, spatio-temporal numerical models have in common that they simulate change over time using equations that represent real world processes (Wesseling et al., 1996, Burrough, 1998), whereby the state of the modelled system at each moment in time is a function of its state in the past. Another common characteristic is that processes are modelled in a spatially-explicit way, which means that spatial patterns and spatial interaction in the system are taken into account (Karssenberg and De Jong, 2005a). Spatio-temporal numerical models are either individual-based or field-based. Individual-based models, also referred to as agent-based or object-based models, consider the geographic space as a set of objects (Benenson and Torrens, 2004, Grimm and Railsback, 2005). Field-based models represent the geographic space using continuous fields of attributes that have a value at all locations (Burrough and McDonnell, 1998). The focus in this paper is on field-based models, although many concepts presented here apply to individual-based models, too.

As it is required to use models tailored to the research goals of a project, the available data, and the properties of the system being modelled (Karssenberg et al., 2006), model development is central in almost any research project that involves modelling. Three important steps in the model development cycle (Karssenberg et al., 2006) are the conversion of the conceptual model structure to computer code, i.e. the implementation or construction of the model, model calibration, and state estimation by assimilation of spatio-temporal observational data collected by remote sensing, automatic data loggers, or questionnaires, or retrieved from large data bases. The term calibration is used for the process that aims at finding model parameters that result in an optimal fit between modelled and observed state variables (e.g., Hill and Tiedeman, 2007). The term data assimilation refers here to sequential Bayesian estimation. This procedure sequentially updates the model state at time steps when observations of state variables or parameters are available (e.g., Gelb, 1974, Simon, 2006). Data assimilation is increasingly being used to integrate data with spatio-temporal models in a wide range of different fields in the earth sciences, such as oceanography (van Leeuwen, 2003), hydrology (Clark et al., 2006, Moradkhani, 2008), ecology (Chen et al., 2008), or crop science (Naud et al., 2007). Below, we use the term optimization to refer to both calibration and data assimilation. Although both model development and optimization can be done by programming software from scratch using system programming languages, it is preferable to use software frameworks at a higher level of abstraction that can be used by scientists and modellers without specialist knowledge in programming (van Deursen et al., 2000, Karssenberg, 2002).

A number of software frameworks exist for construction of temporal numerical models in geography and earth science. Widely used are graphical modelling languages (ModelMaker, 2009, STELLA, 2009), languages incorporated in Geographical Information Systems (GRASS, 2009, ESRI, 2009), technical computer languages (MATLAB, 2008), and modelling languages designed for spatio-temporal modelling in geography (SIMUMAP, Pullar, 2004, PCRaster, 2009). Karssenberg (2002) and Karssenberg and De Jong (2005a) evaluate and discuss these frameworks. Apart from technical computer languages, none of these frameworks come with integrated tools for calibration of models or data assimilation. This is mostly done by interfacing the model with an external framework that incorporates solution schemes for calibration (e.g., PEST, 2008) or data assimilation (e.g., BUGS, 2008, COSTA, 2008, ReBEL, 2009).

The use of two different software frameworks for model construction and optimization has the disadvantage that the user requires knowledge of two different frameworks. This can be a problem as the frameworks will have totally different programming and visualisation environments. Also, the implementation of the interface between the model construction and optimization frameworks can be cumbersome and hinders modification of the model. The latter is because changing the model often comes with changes in the variables and parameters. As the optimization framework interfaces with the model through these variables and parameters, the interface that handles this needs to be adjusted. In many cases modifying the interface is not feasible within a project. As a result, exploratory model development whereby a number of candidate models are developed and optimized in order to find the optimal model is often not possible. A possible solution to these problems is the use of a single framework that supports model construction and optimization. This approach is followed here. Such integrated frameworks have not yet been widely developed as the focus of software development teams has been on either frameworks for model construction or model optimization. The proprietary MATLAB framework allows doing both when using the external ReBEL toolkit for optimization that runs inside MATLAB. In this paper we extend the PCRaster model construction framework (van Deursen, 1995, Wesseling et al., 1996, PCRaster, 2009). New modules for data assimilation with the widely used Ensemble Kalman Filter (e.g., Evensen, 2003) and the particle filter (van Leeuwen, 2003, Weerts and El Serafy, 2006) are added resulting in an integrated framework for model construction and optimization. The modeller has access to these components and combines them with the generic Python scripting language (Python, 2009). Stochastic spatio-temporal model inputs and outputs can be analysed with an integrated, interactive visualisation program. In addition to optimization of models built within the framework, the framework provides an interface to external models. The framework also integrates a calibration toolbox using Genetic Algorithms. For a description of this component the reader is referred to (AMORI, 2009).

The purpose of this paper is to explain how the integrated framework is used for model construction and data assimilation, and to evaluate the framework with two case studies of distributed models. The first case study is a simplified snowmelt model that is constructed inside the framework. We will assimilate distributed snow cover data into the model to improve estimation of snow cover and discharge. The assimilation of snow cover data is expected to improve the prediction of snow cover and discharge, as has been shown by others using remotely sensed snow cover data (e.g., Clark et al., 2006, Nagler et al., 2008). The second case study shows how the external LISFLOOD model (Van der Knijf et al., in press) can be optimized with the framework. LISFLOOD is a hydrological model that runs at the river basin scale. The purpose of the paper is mainly to show how the different filter techniques can be used and does not pretend to provide an extensive comparison of the performance of the filters. However, we provide a preliminary comparison of the Ensemble Kalman Filter and the Particle Filter.

Section snippets

Monte Carlo simulation and concepts of the framework

We first outline modelling concepts and define notations for the case without data assimilation or calibration. Let the vector zt be the state variables of the model at time index t = 1, 2, …, T. Given an initial state z0, zt evolves over time according to the governing equation:zt=ft(zt1,it,pt),for eacht.In Eq. (1), ft is a system transition function that mimics real world processes and pt is a vector containing the parameters used in ft. The vector it contains the inputs or boundary conditions

General theory and framework for data assimilation

In sequential data assimilation, the model Eq. (1) is updated at time indices when observational data are available, referred to here as update moments. We give a short outline of the basic data assimilation formulations here. For a more extensive explanation the reader is referred to Doucet et al. (2001) and Simon (2006). Data assimilation is mostly done with observations of the state variables zt. In some cases, observations of model inputs it and parameters pt are also assimilated. Let xt (t

Theory

The Particle Filter approximates the posterior probability density function in Eq. (8) by the collection of Monte Carlo samples (i.e., particles), assigning a weight to each sample:p(xt|Yt)n=1Npt(n)δ(xtxt(n))

The weights pt(n), also referred to as probability masses, sum to one. In Eq. (10), δ() denotes the Dirac delta function. For Gaussian measurement error vt, the weights are proportional to (Simon, 2006, Chin et al., 2007):atn=exp([ytHt(xt(n))]TRt1[ytHt(xt(n))]/2)ptnatn}where Rt is

Theory

The Ensemble Kalman Filter is a Monte Carlo approximation of the Kalman filter (e.g., Evensen, 2003, Simon, 2006). The evaluation scheme is identical to the one given in Eq. (9a), (9b), (9c), (9d), and evaluation of Eq. (8) is done by:ut(n),+=ut(n),0+Pt0HtT(HtPt0HtT+Rt)1(yt(n)Htut(n),0),for eachnIn a standard Ensemble Kalman Filter, ut(n) is equal to zt(n), the realizations of all state variables in the model. It contains the state variables for which observations are available, referred to

Results and discussion of filters applied to the snow model

When a small number of particles is copied a large number of times in the Particle Filter, it may happen that the posterior probability density function of the model is represented by a too small number of different, unique, particles. This is known as particle collapse or impoverishment. In our model, this problem does not occur as can be seen in the plots created from files stored by the framework (Fig. 5, Fig. 6). Each update moment, a relatively large number of samples is copied. As a

Framework concepts

The software framework provides a close integration between the definition of a model itself, i.e. the code describing the model equations, and the code to integrate observations using data assimilation. However, in some cases it is required to perform Monte Carlo simulation or data assimilation using an existing model. This is also supported by the software framework through functions that pass information from the software framework to the external model. The information that needs to be

Discussion and conclusions

We have built a software framework that integrates a framework for model construction and routines for visualisation of model data. The software framework for model construction includes a large set of spatial operations on raster maps. These operations can be used in various framework classes that provide control flow for a number of different modelling approaches and activities: static modelling, spatio-temporal modelling, deterministic modelling, stochastic modelling, and data assimilation.

Acknowledgements

Cees Wesseling and Willem van Deursen (PCRaster Environmental Software) are thanked for their inputs in the development of the PCRaster Python software. This research was funded by ‘Ruimte voor Geo-Informatie’, project ‘On-line coupling of spatial optimization tools and spatially distributed simulation models’, RGI 313.

References (71)

  • C. Naud et al.

    Application of an interacting particle filter to improve nitrogen nutrition index predictions for winter wheat

    Ecological Modelling

    (2007)
  • D. Pullar

    SimuMap: a computational system for spatial modelling

    Environmental Modelling and Software

    (2004)
  • P.J. Sydelko et al.

    An object-oriented framework for dynamic ecosystem modeling: application for integrated risk assessment

    Science of the Total Environment

    (2001)
  • S.N. Wilkinson et al.

    Modelling and testing spatially distributed sediment budgets to relate erosion processes to sediment yields

    Environmental Modelling and Software

    (2009)
  • J.H.M. Wösten et al.

    Development and use of a database of hydraulic properties of European soils

    Geoderma

    (1999)
  • N.K. Ajami et al.

    An integrated hydrologic Bayesian multimodel combination framework: Confronting input, parameter, and model structural uncertainty in hydrologic prediction

    Water Resources Research

    (2007)
  • AMORI, Februari 2009. Automatic Model Optimization Reference Implementation Available from:...
  • I. Benenson et al.

    Geosimulation: Automata-based Modeling of Urban Phenomena

    (2004)
  • K.J. Beven

    Rainfall-runoff Modelling The Primer

    (2002)
  • T.R. Blandford et al.

    Seasonal and synoptic variations in near-surface air temperature lapse rates in a mountainous basin

    Journal of Applied Meteorology and Climatology

    (2008)
  • L. Brown et al.

    Using satellite imagery to validate snow distribution simulated by a hydrological model in large northern basins

    Hydrological Processes

    (2008)
  • Bugs, December 2008. The BUGS Project, Available from:...
  • P.A. Burrough

    Dynamic Modelling and Geocomputation

  • P.A. Burrough et al.

    Principles of Geographical Information Systems

    (1998)
  • T.M. Chin et al.

    An ensemble-based smoother with retrospectively updated weights for highly nonlinear systems

    Monthly Weather Review

    (2007)
  • COSTA, Januari 2008. Common Set of Tools for Assimilation of Data. Available from:...
  • A. Doucet et al.

    Sequential Monte Carlo Methods in Practice

    (2001)
  • EIONET, January 2009. The European Topic Centre Land Use and Spatial Information ETC-LUSI: Corine Land Cover Database...
  • ESRI, January 2009. Environmental Systems Research Institute, Available from:...
  • G. Evensen

    The Ensemble Kalman Filter: theoretical formulation and practical implementation

    Ocean Dynamics

    (2003)
  • A. Gelman et al.

    Bayesian Data Analysis

  • GRASS, January 2009. GRASS GIS, Available from:...
  • V. Grimm et al.

    Individual-Based Modelling and Ecology

    (2005)
  • E.R. Groff

    ‘Situating’ simulation to model human spatio-temporal interactions: an example using crime events

    Transactions in GIS

    (2007)
  • Cited by (143)

    View all citing articles on Scopus
    1

    Tel.: +39 332 786013; fax: +39 332 786653. E-mail: [email protected].

    View full text