Elsevier

Water Research

Volume 121, 15 September 2017, Pages 290-301
Water Research

Parameter estimation of hydrologic models using a likelihood function for censored and binary observations

https://doi.org/10.1016/j.watres.2017.05.038Get rights and content

Highlights

  • We propose a likelihood function for censored and binary observations.

  • This likelihood function accounts for uncertainty due to model structure and inputs.

  • We successfully use this function for parameter estimation of a hydrologic model.

  • We show that the parameter inference is sensitive to choice of prior and threshold.

Abstract

Observations of a hydrologic system response are needed to accurately model system behaviour. Nevertheless, often very few monitoring stations are operated because collecting such reference data adequately and accurately is laborious and costly. It has been recently suggested to use observations not only from dedicated flow meters but also from simpler sensors, such as level or event detectors, which are available more frequently but only provide censored information. Binary observations can be considered as extreme censoring. It is still unclear, however, how to use censored observations most effectively to learn about model parameters. To this end, we suggest a formal likelihood function that incorporates censored observations, while accounting for model structure deficits and uncertainty in input data. Using this likelihood function, the parameter inference is performed within the Bayesian framework. We demonstrate the implementation of our methodology on a case study of an urban catchment, where we estimate the parameters of a hydrodynamic rainfall-runoff model from binary observations of combined sewer overflows. Our results show, first, that censored observations make it possible to learn about model parameters, with an average decrease of 45% in parameter standard deviation from prior to posterior. Second, the inference substantially improves model predictions, providing higher Nash-Sutcliffe efficiency. Third, the gain in information largely depends on the experimental design, i.e. sensor placement. Given the advent of Internet of Things, we foresee that the plethora of censored data promised to be available can be used for parameter estimation within a formal Bayesian framework.

Introduction

Traditionally, mathematical models of river catchments and urban drainage systems are calibrated on uncensored observations of physical variables, such as discharge or water levels (Gupta et al., 1998, Madsen, 2003, Refsgaard, 1997, Westerberg et al., 2011). By uncensored observations we mean that the monitoring device used has an adequately large measurement range i.e. it can measure any physically possible value of the hydrologic variable. However, the installation and management of such monitoring devices for hydrologic systems is generally laborious and costly (Maheepala et al., 2001). Therefore, typically only a few locations are equipped with such sensors. There are other types of sensors and observation equipment that collect data only within a certain range, and the value of any system response beyond an interval cannot be measured (Borup et al., 2015). Also, recent advances in sensor and data transfer technology development reveal a trend towards more cost-efficient sensors, which simply detect the occurrence of overflow events (Rasmussen et al., 2008) or detect only the exceedance after some critical water levels, thus providing binary information. Within this context, it has even been suggested to specifically develop binary monitoring devices based on robust and low-cost sensors such as temperature probes (Hofer et al., 2014, Montserrat et al., 2016, Montserrat et al., 2013), motion detectors (Siemers et al., 2011) and electrical switches (Rasmussen et al., 2008). Also, even though learning about the depth and velocity of flow from the footage of CCTV cameras during floods may be challenging, they easily capture the inundation and the duration of an event (Le Coz et al., 2016, Lo et al., 2015). In the context of Internet of Things, we are witnessing a growing trend where more real-time information related to hydrologic systems is available from hitherto unconventional sources (Eggimann et al., 2017, Kerkez et al., 2016).

Availability of any kind of relevant information related to the output of a modelled system should theoretically be able to reduce uncertainty in the parameters, allowing us to update our prior assumptions about their values (Riggelsen, 2006, Rinderknecht et al., 2014). As model predictions for hydrologic systems can be highly sensitive to various parameters (Gamerith et al., 2013, Song et al., 2015), the reduction in parametric uncertainty and identification of better parameter values has a direct bearing on the performance of models. It is therefore desired that in the future all types of data from different sensors, a) uncensored and b) censored, should help to calibrate hydrologic models within one consistent and general mathematical framework.

The underlying idea that many less accurate sensors may provide as much information about a complex hydrologic system as a few very accurate ones is very compelling. The first step to test this hypothesis can be to evaluate the value of binary observations in isolation, which can be seen as the most extreme case of censored data. While there has been research on modifying data assimilation techniques, like ensemble Kalman filtering, in order to handle censored data for updating the system state (Borup et al., 2015), the next logical step is to explore techniques that can infer model parameters from censored observations as well. However, so far only ad-hoc approaches to model calibration and parameter estimation using censored data have been suggested and it is currently not clear what the information content of such observations is and how we can use them most efficiently to learn about the system. Rasmussen et al. (2008) used the binary information measured during the occurrence of combined sewer overflows (CSO) to estimate the hydrologic reduction coefficient based on the mismatch between the predicted duration of the overflow and its observed duration. Similarly, measurements of overflow duration have been successfully used for calibration in other studies (Montserrat et al., 2016). Aronica et al. (2002) used a scalar performance measure and mapped it to the parameter space of the model with an informal likelihood function. The performance measure gives an aggregated indication of the performance of the inundation model. Thorndahl et al. (2008) applied similar mathematical framework for calibration of an urban drainage model and the quantification of parametric uncertainty using data on CSOs. In these two studies, those parameters that perform well on the chosen performance measure were identified as good and those which perform below a defined value were termed non-behavioural and not considered for model simulations. However, in the authors' opinion, this inference procedure is limited by the usage of an informal likelihood function (Stedinger et al., 2008). First, the criterion of acceptance/rejection of various parameter values is ad-hoc (Dotto et al., 2012). Second, having more observations does not reduce parametric uncertainty (Mantovan and Todini, 2006). And third, it assumes a perfect model structure and error-free input data, which is often not the case in rainfall-runoff studies (Del Giudice et al., 2015). To incorporate relevant error-generating processes, we need a formal likelihood function, which generally is lacking in the current treatment of censored observation. In this paper, we therefore suggest a likelihood function which makes it possible to estimate parameters from censored signals, while accounting for uncertainties in input variables, such as rainfall or land use, and model structure deficits. Based on the results from a case study, we demonstrate that censored data is surprisingly effective in reducing parametric uncertainty of rainfall-runoff models. Although uncensored observations are comparably more informative than censored data, our results are still quite promising, because they show the way towards using all forms of available information in model calibration, which is often not done.

In the following sections, we first present the mathematical formulation of our likelihood function. We then demonstrate its application in inferring parameters and discuss why it requires a Bayesian framework (Section 2.1). The proof of concept is given through parameter estimation experiments for a real-life case study (described in Section 2.3). For this, we first infer parameters for a hydrologic model using binary data collected in our dedicated measurement campaign and then test the sensitivity of inference to experimental design and choice of priors (Section 2.4). In the same section, we also conduct tests on synthetic data from a simple linear model to facilitate reproducibility of this research. Finally, we present the results from simulation experiments (Section 3), provide a critical overview of this technique's limitations and interpret the results (Section 4). At the end, we draw our main conclusions (Section 5).

Section snippets

Method and material

In this section we lay out the mathematical formulation of our likelihood function. Once the likelihood function is defined, we discuss how Bayesian inference can be used to combine censored information and prior belief about the parameter values to update the probabilities of parameters. We then describe the case study and the simulation experiments carried out on it. The simulation experiments first show the performance of the inference procedure on binary observations. Finally we describe

Parameter estimation for the linear model

We found that the synthetic binary data was able to guide the inference procedure towards the true parameter values. The maximum prior probability density parameter values that had a poor NSE (−0.94) were updated to parameter values that produce a significantly better NSE (0.84), after inference where five hundred data points were used (Fig. 4). The maximum posterior density parameter vector produced after inference is closer to its true values (a = 1.03, b = 0.97, θ1E = 0.075, θ1B = 0.29, θ2B

Discussion

The results of this research are in accordance with the theoretical framework that we employed. We expected noticeable improvements in model performance and our results confirm this. The main aim of this research was to formulate a formal Bayesian framework in which censored data can be used for parameter inference. It is difficult to properly associate probabilities to parameter values using ad-hoc methods. Also, it has not yet been suggested how to combine uncensored and censored observations

Conclusion

We see that monitoring the dynamic response of hydrologic systems is often constrained by inadequately available manpower, technology and financial resources. Data from traditional flow and water level observation devices is hence limited. In the future, Internet-of-Things type communication platforms will also make it possible to operate low-power sensors that can transfer comparably little and often aggregated or censored data, such as the duration of an overflow or whether a threshold was

Author contributions

OW, AS, FB and JR wrote and structured the main text. OW conceptualized the simulation experiments, wrote the code, and carried out the analysis. OW and AS formulated the likelihood function. JPC built and trained the emulator, and also wrote the supplementary section. FB planned and conducted the observations, and developed the rainfall-runoff model. JR and AS envisioned this project, whereas JR and FB actively supervised the research. All authors reviewed the paper.

Acknowledgement

We gratefully acknowledge the help of Tobias Doppler (Eawag, VSA) during the field monitoring in Lucerne. We thank MeteoSwiss, the Swiss Federal Office of Meteorology and Climatology and the city of Lucerne for providing us with the precipitation and infrastructure data. We furthermore would like to thank the Engineering Consultants from HOLINGER AG, Bern for assisting us with details on the hydraulic model and extracting operation data from the central operator database. We acknowledge the

References (46)

  • J.C. Refsgaard

    Parameterisation, calibration and validation of distributed hydrological models

    J. Hydrol.

    (1997)
  • C. Riggelsen

    Learning parameters of Bayesian networks from incomplete data via importance sampling

    Int. J. Approx. Reason

    (2006)
  • S.L. Rinderknecht et al.

    The effect of ambiguous prior knowledge on Bayesian model parameter inference and prediction

    Environ. Model. Softw.

    (2014)
  • X. Song et al.

    Global sensitivity analysis in hydrological modeling: review of concepts, methods, theoretical framework, and applications

    J. Hydrol.

    (2015)
  • S. Thorndahl et al.

    Event based uncertainty assessment in urban drainage modelling, applying the GLUE methodology

    J. Hydrol.

    (2008)
  • G. Aronica et al.

    Assessing the uncertainty in distributed model predictions using observed binary pattern information within GLUE

    Hydrol. Process

    (2002)
  • B.C. Bates et al.

    A Markov chain Monte Carlo scheme for parameter estimation and inference in conceptual rainfall-runoff modeling

    Water Resour. Res.

    (2001)
  • M. Borup et al.

    A partial ensemble Kalman filtering approach to enable use of range limited observations. Stoch

    Environ. Res. Risk Assess.

    (2015)
  • G.E.P. Box et al.

    An analysis of transformations

    J. R. Stat. Soc. Ser. B

    (1964)
  • CRAN, 2015....
  • D. Del Giudice et al.

    Improving uncertainty estimation in urban hydrological modeling by statistically describing bias

    Hydrol. Earth Syst. Sci.

    (2013)
  • S. Eggimann et al.

    The potential of knowing more – a review of data-driven urban water management

    Environ. Sci. Technol.

    (2017)
  • Environmental Protection Agency, United States, 2015....
  • Cited by (0)

    View full text