Elsevier

Ecological Informatics

Volume 71, November 2022, 101764
Ecological Informatics

Calibration of a complex hydro-ecological model through Approximate Bayesian Computation and Random Forest combined with sensitivity analysis

https://doi.org/10.1016/j.ecoinf.2022.101764Get rights and content

Highlights

  • ABC-RF can be used to calibrate complex highly-parametrized biogeochemical models.

  • ABC-RF has to be coupled with a sensitivity analysis to obtain good calibration.

  • At least 25,000 simulations are needed to calibrate a model with 133 parameters.

  • Using a preselected set of the closest simulations help facilitating the calibration.

Abstract

An automated calibration method is proposed and applied to the complex hydro-ecological model Delft3D-BLOOM which is calibrated from monitoring data of the lake Champs-sur-Marne, a small shallow urban lake in the Paris region (France). This method (ABC-RF-SA) combines Approximate Bayesian Computation (ABC) with the machine learning algorithm Random Forest (RF) and a Sensitivity Analysis (SA) of the model parameters. Three target variables are used (total chlorophyll, cyanobacteria and dissolved oxygen concentration) to calibrate 133 parameters. ABC-RF-SA is first applied on a set of simulated observations to validate the methodology. It is then applied on a real set of high-frequency observations recorded during about two weeks on the lake Champs-sur-Marne. The methodology is also compared to standard ABC and ABC-RF formulations. Only ABC-RF-SA allowed the model to reproduce the observed biogeochemical dynamics. The coupling of ABC with RF and SA thus appears crucial for its application to complex hydro-ecological models.

Introduction

Modelling biogeochemical cycling and phytoplankton dynamics in aquatic ecosystems is a complex task. It implies taking into account many processes that belong to different scientific fields, ranging from physics to biology to chemistry. Mechanistic hydro-ecological models, which seek to include all these processes, are often very complex and over-parameterized (Luo et al., 2018; Makler-Pick et al., 2011; Reichert and Omlin, 1997; Rigosi et al., 2011; Vinçon-Leite and Casenave, 2019). In addition, most parameters are difficult to measure directly by field observations. Reference values for key model parameters can be found in scientific literature but they are uncertain and often have a wide range of variability [e.g. (Fenocchi et al., 2019; Makler-Pick et al., 2011)], which affects the reliability of the models.

For these reasons, sensitivity analysis, calibration and validation of complex hydro-ecological models are important tasks. However, Shimoda and Arhonditsis (2016) showed that only half of the publications published between 1980 and 2012 include a proper sensitivity analysis, and when calibration is performed, it is mostly done by trial-and-error. Yet, while the results of trial-and-error calibration depend heavily on the skill and knowledge of the modeller (Luo et al., 2018), automated calibration can reduce model uncertainty and simultaneously allow to carry out a sensitivity analysis of the model parameters. However, it is rarely applied for complex hydro-ecological models, especially when they are three-dimensional. In the literature, automated calibration is only applied on 0D or 1D models, most often by optimization or Monte Carlo and Bayesian inference [(Vinçon-Leite and Casenave, 2019), and references therein].

There are several reasons for this. First, automated calibration strategies are generally computationally expensive. They often require a large number of model runs and their computational cost increases with the number of parameters to be estimated, which hinders their application to complex hydro-ecological models. Moreover, in limnological studies, data traditionally come from field campaigns which, although regular, lead at best to sparse datasets that are not well suited to automated calibration strategies.

If the available data set is rich enough, a wide range of approaches and techniques can be applied for automated calibration. This includes various optimization algorithms, such as Newton's algorithms and genetic algorithms (e.g., Particle Swarm Optimization), as well as Bayesian parameter inference algorithms (Mahevas et al., 2019).

However, classical Bayesian parameter inference is often problematic for complex mechanistic models. For such models, the likelihood function is analytically intractable and its evaluation by computational methods is extremely computationally demanding. Approximate Bayesian Computation (ABC) is an innovative and promising technique for parameter inference, rooted in Bayesian statistics, which has the great advantage of bypassing the computation of the likelihood function. It requires a large number of model runs with different sets of parameters obtained by random sampling according to user-defined prior distributions. This set of simulations is used as a training dataset, in order to approximate the posterior probability distribution function of the parameters. Different methods can be used for this purpose, among which machine learning techniques. For example, the use of random forests has been recently proposed and seems to be particularly advantageous (Raynal et al., 2019).

Like most calibration techniques, ABC is expensive in terms of computational effort. However, it offers a good compromise between the number of parameters to identify and the number of model evaluations (Mahevas et al., 2019). Moreover, compared to other calibration techniques such as evolutionary algorithms, it has the advantage of not being iterative. This allows the model evaluations to be performed in parallel, which is particularly interesting in the case of complex hydro-ecological models with high simulation times. ABC has already been applied to complex statistical models (Nott et al., 2018) and individual-based ecological models (e.g. (Dominguez Almela et al., 2020; Lagarrigues et al., 2015; van der Vaart et al., 2015)) with a few dozen parameters, but, to the best of our knowledge, never to a complex process-based model with more than 100 parameters.

In this work, an innovative method for automated calibration is proposed and applied to the complex hydro-ecological model Delft3D-BLOOM (Deltares, 2018). This method (called ABC-RF with SA hereafter) is based on the ABC-RF (Approximate Bayesian Computation - Random Forest) method proposed in Raynal et al. (2019) which is combined with a sensitivity analysis (SA) of the model parameters.

The main computational cost of ABC is the large number of model simulations that must be performed in order to build a robust training dataset to apply the ABC. In this study, the availability of high-frequency data aggregated to an hourly time step, allowed the calibration effort to be focused on a 16-day simulation, greatly reducing the computational time while focusing on the model's ability to simulate short-term variations. The aim of this study is to test the ability of the ABC-RF with SA to reproduce a series of observations with a complex biogeochemical model that involves a large number (133) of parameters. Three target variables are considered in this calibration procedure: total chlorophyll, phycocyanin and dissolved oxygen. These variables are representative of biological processes in aquatic ecosystems. Total chlorophyll is an indicator of total phytoplankton biomass and is the variable on which most alert guidelines for monitoring harmful algal blooms are based. Phycocyanin is a pigment specific to cyanobacteria that can be considered an indicator of their abundance. Finally, dissolved oxygen concentration, especially in a eutrophic environment, can be considered as a resultant variable of various processes: growth, mortality, decomposition of organic matter, and nutrient recycling.

The method ABC-RF with SA is first applied on a set of simulated data to validate the method and test its ability to reproduce both the simulated data and the parameter values. It is then applied on a real observation dataset of the lake Champs-sur-Marne, a small shallow lake of the Paris region. The standard ABC method and the ABC-Random Forest (ABC-RF) method are also applied for comparison.

Section snippets

Dataset and study site

The lake Champs-sur-Marne is a small and shallow lake located in the Great Paris region. Its surface area is of 0.12 km2, and the average and maximum depths are about 2.5 m and 4 m respectively. As shown in Fig. 1, the lake has no inflow or outflow and is fed primarily by groundwater from the Marne River that flows north of the water body.

The lake Champs-sur-Marne suffers from strong eutrophication conditions that lead to a succession of serious harmful algal blooms between the months of

Validation of the methodology

The ABC methodology was first validated on a set of simulated observations issued from the best model run among the complete set of simulations in terms of total NMSE value (run number 4022, see Fig. 8).

Discussion

In this paper, Approximate Bayesian Computation with Random Forest (ABC-RF) has been tested for the calibration of a highly parametrized complex biogeochemical model. The calibration procedure focuses on three variables that are particularly relevant to aquatic ecology and water resource management: total chlorophyll, cyanobacteria and dissolved oxygen concentrations.

Conclusion

Biogeochemical models are often highly parameterized and complex. Their calibration is difficult and often neglected in the scientific literature. Our study shows that, among the various techniques available for automated calibration, ABC-RF can be successfully applied to calibrate a complex and highly parameterized biogeochemical model. Our work focuses on a short-term algal bloom, an event that could possibly be missed by a traditional periodic survey. After calibration, the model was able to

Software

Version 5.01.03.000000 of the model Delft3D-DELWAQ and version 6.01.06.62914 of model Delft3D-FLOW2D3D of the modelling suite software called Delft3D has been used for the simulation of the concentration of the total chlorophyll, the phycocyanin and the dissolved oxygen in the lake Champs-sur-Marne. This software is open-source and the version used in this study can be downloaded at.

https://svn.oss.deltares.nl/repos/delft3d/tags/delft3d4/3426.

The R and Matlab scripts used to implement the

Declaration of Competing Interest

None.

Acknowledgement

This work was supported by the French National Research Agency [ANSWER research project, grant number ANR-16-CE32-0009-02].

The authors would like to thank Max Zinsou Debaly for his internship work, Sébastien Roux for his help in choosing the sensitivity analysis method and Isabelle Sanchez who helped us with the sharing of the datasets and simulation codes.

References (51)

  • A. Rigosi et al.

    A calibration strategy for dynamic succession models including several phytoplankton groups

    Environ. Model. Softw.

    (2011)
  • Y. Shimoda et al.

    Phytoplankton functional type modelling: running before we can walk? A critical evaluation of the current state of knowledge

    Ecol. Model.

    (2016)
  • E. van der Vaart et al.

    Calibration and evaluation of individual-based models using approximate Bayesian computation

    Ecol. Model.

    (2015)
  • B. Vinçon-Leite et al.

    Modelling eutrophication in lake ecosystems: a review

    Sci. Total Environ.

    (2019)
  • B.A. Ward et al.

    When is a biogeochemical model too complex? Objective model reduction and selection for North Atlantic time-series sites

    Prog. Oceanogr.

    (2013)
  • R.F. Weiss

    The solubility of nitrogen, oxygen and argon in water and seawater

    Deep-Sea Res. Oceanogr. Abstr.

    (1970)
  • T.R. Anderson

    Plankton functional type modelling: running before we can walk?

    J. Plankton Res.

    (2005)
  • G.B. Arhonditsis et al.

    Addressing equifinality and uncertainty in eutrophication models

    Water Resour. Res.

    (2008)
  • M.A. Beaumont

    Approximate Bayesian computation in evolution and ecology

    Annu. Rev. Ecol. Evol. Syst.

    (2010)
  • M.A. Beaumont et al.

    Approximate Bayesian computation in population genetics

    Genetics

    (2002)
  • M.A. Beaumont et al.

    Adaptive approximate Bayesian computation

    Biometrika

    (2009)
  • M.B. Beck

    Water quality modeling: a review of the analysis of uncertainty

    Water Resour. Res.

    (1987)
  • R. Beck et al.

    Comparison of satellite reflectance algorithms for estimating phycocyanin values and cyanobacterial total biovolume in a temperate reservoir using coincident Hyperspectral aircraft imagery and dense coincident surface observations

    Remote Sens.

    (2017)
  • T. Burr et al.

    Selecting summary statistics in approximate Bayesian computation for calibrating stochastic models

    Biomed. Res. Int.

    (2013)
  • M.P. Chapuis et al.

    A young age of subspecific divergence in the desert locust inferred by ABC random forest

    Mol. Ecol.

    (2020)
  • Cited by (7)

    View all citing articles on Scopus
    View full text