Parameter estimation of hydrologic models using a likelihood function for censored and binary observations
Graphical abstract
Introduction
Traditionally, mathematical models of river catchments and urban drainage systems are calibrated on uncensored observations of physical variables, such as discharge or water levels (Gupta et al., 1998, Madsen, 2003, Refsgaard, 1997, Westerberg et al., 2011). By uncensored observations we mean that the monitoring device used has an adequately large measurement range i.e. it can measure any physically possible value of the hydrologic variable. However, the installation and management of such monitoring devices for hydrologic systems is generally laborious and costly (Maheepala et al., 2001). Therefore, typically only a few locations are equipped with such sensors. There are other types of sensors and observation equipment that collect data only within a certain range, and the value of any system response beyond an interval cannot be measured (Borup et al., 2015). Also, recent advances in sensor and data transfer technology development reveal a trend towards more cost-efficient sensors, which simply detect the occurrence of overflow events (Rasmussen et al., 2008) or detect only the exceedance after some critical water levels, thus providing binary information. Within this context, it has even been suggested to specifically develop binary monitoring devices based on robust and low-cost sensors such as temperature probes (Hofer et al., 2014, Montserrat et al., 2016, Montserrat et al., 2013), motion detectors (Siemers et al., 2011) and electrical switches (Rasmussen et al., 2008). Also, even though learning about the depth and velocity of flow from the footage of CCTV cameras during floods may be challenging, they easily capture the inundation and the duration of an event (Le Coz et al., 2016, Lo et al., 2015). In the context of Internet of Things, we are witnessing a growing trend where more real-time information related to hydrologic systems is available from hitherto unconventional sources (Eggimann et al., 2017, Kerkez et al., 2016).
Availability of any kind of relevant information related to the output of a modelled system should theoretically be able to reduce uncertainty in the parameters, allowing us to update our prior assumptions about their values (Riggelsen, 2006, Rinderknecht et al., 2014). As model predictions for hydrologic systems can be highly sensitive to various parameters (Gamerith et al., 2013, Song et al., 2015), the reduction in parametric uncertainty and identification of better parameter values has a direct bearing on the performance of models. It is therefore desired that in the future all types of data from different sensors, a) uncensored and b) censored, should help to calibrate hydrologic models within one consistent and general mathematical framework.
The underlying idea that many less accurate sensors may provide as much information about a complex hydrologic system as a few very accurate ones is very compelling. The first step to test this hypothesis can be to evaluate the value of binary observations in isolation, which can be seen as the most extreme case of censored data. While there has been research on modifying data assimilation techniques, like ensemble Kalman filtering, in order to handle censored data for updating the system state (Borup et al., 2015), the next logical step is to explore techniques that can infer model parameters from censored observations as well. However, so far only ad-hoc approaches to model calibration and parameter estimation using censored data have been suggested and it is currently not clear what the information content of such observations is and how we can use them most efficiently to learn about the system. Rasmussen et al. (2008) used the binary information measured during the occurrence of combined sewer overflows (CSO) to estimate the hydrologic reduction coefficient based on the mismatch between the predicted duration of the overflow and its observed duration. Similarly, measurements of overflow duration have been successfully used for calibration in other studies (Montserrat et al., 2016). Aronica et al. (2002) used a scalar performance measure and mapped it to the parameter space of the model with an informal likelihood function. The performance measure gives an aggregated indication of the performance of the inundation model. Thorndahl et al. (2008) applied similar mathematical framework for calibration of an urban drainage model and the quantification of parametric uncertainty using data on CSOs. In these two studies, those parameters that perform well on the chosen performance measure were identified as good and those which perform below a defined value were termed non-behavioural and not considered for model simulations. However, in the authors' opinion, this inference procedure is limited by the usage of an informal likelihood function (Stedinger et al., 2008). First, the criterion of acceptance/rejection of various parameter values is ad-hoc (Dotto et al., 2012). Second, having more observations does not reduce parametric uncertainty (Mantovan and Todini, 2006). And third, it assumes a perfect model structure and error-free input data, which is often not the case in rainfall-runoff studies (Del Giudice et al., 2015). To incorporate relevant error-generating processes, we need a formal likelihood function, which generally is lacking in the current treatment of censored observation. In this paper, we therefore suggest a likelihood function which makes it possible to estimate parameters from censored signals, while accounting for uncertainties in input variables, such as rainfall or land use, and model structure deficits. Based on the results from a case study, we demonstrate that censored data is surprisingly effective in reducing parametric uncertainty of rainfall-runoff models. Although uncensored observations are comparably more informative than censored data, our results are still quite promising, because they show the way towards using all forms of available information in model calibration, which is often not done.
In the following sections, we first present the mathematical formulation of our likelihood function. We then demonstrate its application in inferring parameters and discuss why it requires a Bayesian framework (Section 2.1). The proof of concept is given through parameter estimation experiments for a real-life case study (described in Section 2.3). For this, we first infer parameters for a hydrologic model using binary data collected in our dedicated measurement campaign and then test the sensitivity of inference to experimental design and choice of priors (Section 2.4). In the same section, we also conduct tests on synthetic data from a simple linear model to facilitate reproducibility of this research. Finally, we present the results from simulation experiments (Section 3), provide a critical overview of this technique's limitations and interpret the results (Section 4). At the end, we draw our main conclusions (Section 5).
Section snippets
Method and material
In this section we lay out the mathematical formulation of our likelihood function. Once the likelihood function is defined, we discuss how Bayesian inference can be used to combine censored information and prior belief about the parameter values to update the probabilities of parameters. We then describe the case study and the simulation experiments carried out on it. The simulation experiments first show the performance of the inference procedure on binary observations. Finally we describe
Parameter estimation for the linear model
We found that the synthetic binary data was able to guide the inference procedure towards the true parameter values. The maximum prior probability density parameter values that had a poor NSE (−0.94) were updated to parameter values that produce a significantly better NSE (0.84), after inference where five hundred data points were used (Fig. 4). The maximum posterior density parameter vector produced after inference is closer to its true values (a = 1.03, b = 0.97, = 0.075, = 0.29
Discussion
The results of this research are in accordance with the theoretical framework that we employed. We expected noticeable improvements in model performance and our results confirm this. The main aim of this research was to formulate a formal Bayesian framework in which censored data can be used for parameter inference. It is difficult to properly associate probabilities to parameter values using ad-hoc methods. Also, it has not yet been suggested how to combine uncensored and censored observations
Conclusion
We see that monitoring the dynamic response of hydrologic systems is often constrained by inadequately available manpower, technology and financial resources. Data from traditional flow and water level observation devices is hence limited. In the future, Internet-of-Things type communication platforms will also make it possible to operate low-power sensors that can transfer comparably little and often aggregated or censored data, such as the duration of an overflow or whether a threshold was
Author contributions
OW, AS, FB and JR wrote and structured the main text. OW conceptualized the simulation experiments, wrote the code, and carried out the analysis. OW and AS formulated the likelihood function. JPC built and trained the emulator, and also wrote the supplementary section. FB planned and conducted the observations, and developed the rainfall-runoff model. JR and AS envisioned this project, whereas JR and FB actively supervised the research. All authors reviewed the paper.
Acknowledgement
We gratefully acknowledge the help of Tobias Doppler (Eawag, VSA) during the field monitoring in Lucerne. We thank MeteoSwiss, the Swiss Federal Office of Meteorology and Climatology and the city of Lucerne for providing us with the precipitation and infrastructure data. We furthermore would like to thank the Engineering Consultants from HOLINGER AG, Bern for assisting us with details on the hydraulic model and extracting operation data from the central operator database. We acknowledge the
References (46)
- et al.
Appraisal of data-driven and mechanistic emulators of nonlinear simulators: the case of hydrodynamic urban drainage models
Environ. Model. Softw.
(2017) - et al.
Model bias and complexity - understanding the effects of structural deficits and input errors on runoff predictions
Environ. Model. Softw.
(2015) - et al.
Comparison of different uncertainty techniques in urban stormwater quantity and quality modelling
Water Res.
(2012) - et al.
Applying global sensitivity analysis to the modelling of flow and water quality in sewers
Water Res.
(2013) - et al.
Crowdsourced data for flood hydrology: feedback from recent citizen science projects in Argentina, France and New Zealand
J. Hydrol.
(2016) Parameter estimation in distributed hydrological catchment modelling using automatic calibration with multiple objectives
Adv. Water Resour.
(2003)- et al.
Hydrological data monitoring for urban stormwater drainage systems
J. Hydrol.
(2001) - et al.
Hydrological forecasting uncertainty assessment: incoherence of the GLUE methodology
J. Hydrol.
(2006) - et al.
Field validation of a new low-cost method for determining occurrence and duration of combined sewer overflows
Sci. Total Environ.
(2013) - et al.
River flow forecasting through conceptual models part I — a discussion of principles
J. Hydrol.
(1970)