A software framework for construction of process-based stochastic spatio-temporal models and data assimilation
Introduction
Spatio-temporal numerical models simulating geographic change are one of the cornerstones of research in geography and earth science and are frequently used in management, planning, and risk assessment in application domains such as land use change (Ligtenberg et al., 2004, Moulin et al., 2004), hazards and evacuation (Helbing et al., 2000), ecosystem studies (Sydelko et al., 2001, Gimblett et al., 2003), spread of diseases (Breukers et al., 2006), criminology (Groff, 2007), land degradation and geomorphology (Karssenberg and Bridge, 2008, Wilkinson et al., 2009), or hydrology (Beven, 2002, Ajami et al., 2007, Blöschl et al., 2008, Brown et al., 2008). Although the application field may vary, spatio-temporal numerical models have in common that they simulate change over time using equations that represent real world processes (Wesseling et al., 1996, Burrough, 1998), whereby the state of the modelled system at each moment in time is a function of its state in the past. Another common characteristic is that processes are modelled in a spatially-explicit way, which means that spatial patterns and spatial interaction in the system are taken into account (Karssenberg and De Jong, 2005a). Spatio-temporal numerical models are either individual-based or field-based. Individual-based models, also referred to as agent-based or object-based models, consider the geographic space as a set of objects (Benenson and Torrens, 2004, Grimm and Railsback, 2005). Field-based models represent the geographic space using continuous fields of attributes that have a value at all locations (Burrough and McDonnell, 1998). The focus in this paper is on field-based models, although many concepts presented here apply to individual-based models, too.
As it is required to use models tailored to the research goals of a project, the available data, and the properties of the system being modelled (Karssenberg et al., 2006), model development is central in almost any research project that involves modelling. Three important steps in the model development cycle (Karssenberg et al., 2006) are the conversion of the conceptual model structure to computer code, i.e. the implementation or construction of the model, model calibration, and state estimation by assimilation of spatio-temporal observational data collected by remote sensing, automatic data loggers, or questionnaires, or retrieved from large data bases. The term calibration is used for the process that aims at finding model parameters that result in an optimal fit between modelled and observed state variables (e.g., Hill and Tiedeman, 2007). The term data assimilation refers here to sequential Bayesian estimation. This procedure sequentially updates the model state at time steps when observations of state variables or parameters are available (e.g., Gelb, 1974, Simon, 2006). Data assimilation is increasingly being used to integrate data with spatio-temporal models in a wide range of different fields in the earth sciences, such as oceanography (van Leeuwen, 2003), hydrology (Clark et al., 2006, Moradkhani, 2008), ecology (Chen et al., 2008), or crop science (Naud et al., 2007). Below, we use the term optimization to refer to both calibration and data assimilation. Although both model development and optimization can be done by programming software from scratch using system programming languages, it is preferable to use software frameworks at a higher level of abstraction that can be used by scientists and modellers without specialist knowledge in programming (van Deursen et al., 2000, Karssenberg, 2002).
A number of software frameworks exist for construction of temporal numerical models in geography and earth science. Widely used are graphical modelling languages (ModelMaker, 2009, STELLA, 2009), languages incorporated in Geographical Information Systems (GRASS, 2009, ESRI, 2009), technical computer languages (MATLAB, 2008), and modelling languages designed for spatio-temporal modelling in geography (SIMUMAP, Pullar, 2004, PCRaster, 2009). Karssenberg (2002) and Karssenberg and De Jong (2005a) evaluate and discuss these frameworks. Apart from technical computer languages, none of these frameworks come with integrated tools for calibration of models or data assimilation. This is mostly done by interfacing the model with an external framework that incorporates solution schemes for calibration (e.g., PEST, 2008) or data assimilation (e.g., BUGS, 2008, COSTA, 2008, ReBEL, 2009).
The use of two different software frameworks for model construction and optimization has the disadvantage that the user requires knowledge of two different frameworks. This can be a problem as the frameworks will have totally different programming and visualisation environments. Also, the implementation of the interface between the model construction and optimization frameworks can be cumbersome and hinders modification of the model. The latter is because changing the model often comes with changes in the variables and parameters. As the optimization framework interfaces with the model through these variables and parameters, the interface that handles this needs to be adjusted. In many cases modifying the interface is not feasible within a project. As a result, exploratory model development whereby a number of candidate models are developed and optimized in order to find the optimal model is often not possible. A possible solution to these problems is the use of a single framework that supports model construction and optimization. This approach is followed here. Such integrated frameworks have not yet been widely developed as the focus of software development teams has been on either frameworks for model construction or model optimization. The proprietary MATLAB framework allows doing both when using the external ReBEL toolkit for optimization that runs inside MATLAB. In this paper we extend the PCRaster model construction framework (van Deursen, 1995, Wesseling et al., 1996, PCRaster, 2009). New modules for data assimilation with the widely used Ensemble Kalman Filter (e.g., Evensen, 2003) and the particle filter (van Leeuwen, 2003, Weerts and El Serafy, 2006) are added resulting in an integrated framework for model construction and optimization. The modeller has access to these components and combines them with the generic Python scripting language (Python, 2009). Stochastic spatio-temporal model inputs and outputs can be analysed with an integrated, interactive visualisation program. In addition to optimization of models built within the framework, the framework provides an interface to external models. The framework also integrates a calibration toolbox using Genetic Algorithms. For a description of this component the reader is referred to (AMORI, 2009).
The purpose of this paper is to explain how the integrated framework is used for model construction and data assimilation, and to evaluate the framework with two case studies of distributed models. The first case study is a simplified snowmelt model that is constructed inside the framework. We will assimilate distributed snow cover data into the model to improve estimation of snow cover and discharge. The assimilation of snow cover data is expected to improve the prediction of snow cover and discharge, as has been shown by others using remotely sensed snow cover data (e.g., Clark et al., 2006, Nagler et al., 2008). The second case study shows how the external LISFLOOD model (Van der Knijf et al., in press) can be optimized with the framework. LISFLOOD is a hydrological model that runs at the river basin scale. The purpose of the paper is mainly to show how the different filter techniques can be used and does not pretend to provide an extensive comparison of the performance of the filters. However, we provide a preliminary comparison of the Ensemble Kalman Filter and the Particle Filter.
Section snippets
Monte Carlo simulation and concepts of the framework
We first outline modelling concepts and define notations for the case without data assimilation or calibration. Let the vector zt be the state variables of the model at time index t = 1, 2, …, T. Given an initial state z0, zt evolves over time according to the governing equation:In Eq. (1), ft is a system transition function that mimics real world processes and pt is a vector containing the parameters used in ft. The vector it contains the inputs or boundary conditions
General theory and framework for data assimilation
In sequential data assimilation, the model Eq. (1) is updated at time indices when observational data are available, referred to here as update moments. We give a short outline of the basic data assimilation formulations here. For a more extensive explanation the reader is referred to Doucet et al. (2001) and Simon (2006). Data assimilation is mostly done with observations of the state variables zt. In some cases, observations of model inputs it and parameters pt are also assimilated. Let xt (t
Theory
The Particle Filter approximates the posterior probability density function in Eq. (8) by the collection of Monte Carlo samples (i.e., particles), assigning a weight to each sample:
The weights , also referred to as probability masses, sum to one. In Eq. (10), denotes the Dirac delta function. For Gaussian measurement error vt, the weights are proportional to (Simon, 2006, Chin et al., 2007):where Rt is
Theory
The Ensemble Kalman Filter is a Monte Carlo approximation of the Kalman filter (e.g., Evensen, 2003, Simon, 2006). The evaluation scheme is identical to the one given in Eq. (9a), (9b), (9c), (9d), and evaluation of Eq. (8) is done by:In a standard Ensemble Kalman Filter, is equal to , the realizations of all state variables in the model. It contains the state variables for which observations are available, referred to
Results and discussion of filters applied to the snow model
When a small number of particles is copied a large number of times in the Particle Filter, it may happen that the posterior probability density function of the model is represented by a too small number of different, unique, particles. This is known as particle collapse or impoverishment. In our model, this problem does not occur as can be seen in the plots created from files stored by the framework (Fig. 5, Fig. 6). Each update moment, a relatively large number of samples is copied. As a
Framework concepts
The software framework provides a close integration between the definition of a model itself, i.e. the code describing the model equations, and the code to integrate observations using data assimilation. However, in some cases it is required to perform Monte Carlo simulation or data assimilation using an existing model. This is also supported by the software framework through functions that pass information from the software framework to the external model. The information that needs to be
Discussion and conclusions
We have built a software framework that integrates a framework for model construction and routines for visualisation of model data. The software framework for model construction includes a large set of spatial operations on raster maps. These operations can be used in various framework classes that provide control flow for a number of different modelling approaches and activities: static modelling, spatio-temporal modelling, deterministic modelling, stochastic modelling, and data assimilation.
Acknowledgements
Cees Wesseling and Willem van Deursen (PCRaster Environmental Software) are thanked for their inputs in the development of the PCRaster Python software. This research was funded by ‘Ruimte voor Geo-Informatie’, project ‘On-line coupling of spatial optimization tools and spatially distributed simulation models’, RGI 313.
References (71)
- et al.
A spatially distributed flash flood forecasting model
Environmental Modelling and Software
(2008) - et al.
Individual-based models in the analysis of disease transmission in plant production chains: An application to potato brown rot
Agricultural Systems
(2006) - et al.
An improved state-parameter analysis of ecosystem models using data assimilation
Ecological Modelling
(2008) - et al.
Assimilation of snow covered area information into hydrologic and land-surface models
Advances in Water Resources
(2006) - et al.
Parameter optimisation and uncertainty assessment for large-scale streamflow simulation with the LISFLOOD model
Journal of Hydrology
(2007) - et al.
Deriving artificial models of visitors from dispersed patterns of use in the Sierra Nevada Wilderness, California
Journal for Nature Conservation
(2003) - et al.
Development of a soil geographical database from the soil map of the European Communities
Catena
(1994) - et al.
A design and application of a multi-agent system for simulation of multi-actor spatial planning
Journal of Environmental Management
(2004) - et al.
PADI-Simul: An agent-based geosimulation software supporting the design of geographic spaces
Computers, Environment and Urban Systems
(2004) - et al.
Assimilation of meteorological and remote sensing data for snowmelt runoff forecasting
Remote Sensing of Environment
(2008)
Application of an interacting particle filter to improve nitrogen nutrition index predictions for winter wheat
Ecological Modelling
SimuMap: a computational system for spatial modelling
Environmental Modelling and Software
An object-oriented framework for dynamic ecosystem modeling: application for integrated risk assessment
Science of the Total Environment
Modelling and testing spatially distributed sediment budgets to relate erosion processes to sediment yields
Environmental Modelling and Software
Development and use of a database of hydraulic properties of European soils
Geoderma
An integrated hydrologic Bayesian multimodel combination framework: Confronting input, parameter, and model structural uncertainty in hydrologic prediction
Water Resources Research
Geosimulation: Automata-based Modeling of Urban Phenomena
Rainfall-runoff Modelling The Primer
Seasonal and synoptic variations in near-surface air temperature lapse rates in a mountainous basin
Journal of Applied Meteorology and Climatology
Using satellite imagery to validate snow distribution simulated by a hydrological model in large northern basins
Hydrological Processes
Dynamic Modelling and Geocomputation
Principles of Geographical Information Systems
An ensemble-based smoother with retrospectively updated weights for highly nonlinear systems
Monthly Weather Review
Sequential Monte Carlo Methods in Practice
The Ensemble Kalman Filter: theoretical formulation and practical implementation
Ocean Dynamics
Bayesian Data Analysis
Individual-Based Modelling and Ecology
‘Situating’ simulation to model human spatio-temporal interactions: an example using crime events
Transactions in GIS
Cited by (143)
Enhancing flood event predictions: Multi-objective calibration using gauge and satellite data
2024, Journal of HydrologySimulating event-based pesticide transport with runoff and erosion; OpenLISEM-pesticide v.1
2024, Environmental Modelling and SoftwareAssessing the hillslope-channel contributions to the catchment sediment balance under climate change
2024, Environmental Modelling and SoftwareEvaluating the impact of the Central Chile Mega Drought on debris cover, broadband albedo, and surface drainage system of a Dry Andes glacier
2023, Science of the Total EnvironmentCalibration of ECMWF SEAS5 based streamflow forecast in Seasonal hydrological forecasting for Citarum river basin, West Java, Indonesia
2023, Journal of Hydrology: Regional Studies
- 1
Tel.: +39 332 786013; fax: +39 332 786653. E-mail: [email protected].