Abstract
Data-driven models and conceptual models have been utilized in an attempt to perform rainfall–runoff modelling. The aim of this study is comparing the performance of an artificial neural network (ANN) model, wavelet-based artificial neural network (WANN) model and GR4J lumped daily conceptual model for rainfall–runoff modelling of two rivers in the USA. It was obtained that the performance of the data-driven models (ANN, WANN) is better than the GR4J model especially when streamflow data the preceding day (Qt-1) and streamflow data the preceding two days (Qt-2) are used as input data in the ANN and WANN models for the simulation of low and high flows, in particular. On the other hand, when only precipitation and potential evapotranspiration data are used as input variables, the GR4J model performs better than the data-driven models.
INTRODUCTION
Detecting the rainfall–runoff relationship is very significant in terms of water resources planning. In this context, there have been a lot of studies which aim to reveal the rainfall–runoff relationship by using data-driven or conceptual models for decades (Anctil et al. 2004; Sedki et al. 2009; Demirel et al. 2013). De Vos & Rientjes (2007) carried out multi-objective comparisons between an artificial neural network (ANN) model and HBV conceptual rainfall–runoff model. In this regard, they put forward that the ANN model performs better than the HBV model for one-hour-ahead forecasting, whereas the HBV conceptual model outperforms ANN when the time interval expands. Nayak et al. (2013) used ANN, Nedbør-Afstrømnings-Model (NAM) and wavelet neural network (WNN) models for rainfall–runoff modelling and they found that the WNN model is more successful than conventional ANN and NAM models. In relation to that, they pointed out that the WNN model could be more useful as regards revealing that the nonlinear rainfall–runoff relationship and decomposition process which is carried out by wavelet transform could be accountable for the outperformance. Demirel et al. (2015) analysed the performances of ANN-Ensemble (ANN-E), HBV and GR4J models for low-flow prediction by utilizing ensemble precipitation and evapotranspiration parameters as input data. Accordingly, they stated that ANN-E and HBV are the two useful models with regard to simulating low streamflow. Daliakopoulos & Tsanis (2016) compared the performances of ANN models and conventional conceptual models for high-flow forecasting and they found that ANN models are useful in comparison with conceptual models. Humphrey et al. (2016) compared GR4J and Bayesian artificial neural network (BANN) models and a hybrid model approach based on using the inputs that are the output of the GR4J model in addition to conventional inputs such as rainfall and evapotranspiration. As a result of their study, they maintained that the performance of the hybrid model was more successful than the ANN and GR4J models. Mehr & Demirel (2016) researched on low-flow simulations by using genetic programming, ANN, HBV and GR4J models in Moselle River basin. As a result of their study, they pointed out that genetic programming reveals good performance as compared with ANN, HBV and GR4J models for low-flow prediction. Makwana & Tiwari (2017) compared Soil and Water Assessment Tool and ANN models in order to predict the daily streamflow and they obtained that the ANN model is more useful than the SWAT model for the calibration and validation periods.
In this study, it is aimed to determine how wavelet-based artificial neural network (WANN), ANN and GR4J models exhibit a performance for rainfall–runoff modelling of two rivers in the USA. In this regard, it will be an original study with regard to comparison of a GR4J hydrological conceptual model with a WANN data-driven model, particularly. The comparison of the model performances was carried out by calculating the Nash Sutcliffe Efficiency Coefficient (NSE), Root Mean Square Error (RMSE), and Kling–Gupta Efficiency (KGE).
DATA AND METHODOLOGY
Data
In this study, ANN, WANN and GR4J models were applied for Saline River near Rye, Arkansas and Embarras River at Ste. Marie, Illinois in the USA. The data cover 01/01/1981–30/09/2001 for both rivers. The training period consists of the time period between 01/01/1981 and 24/07/1996, while the test period covers the time slot of 25/07/1996–30/09/2001. These data are part of the MOPEX dataset (NOAA 2018). The statistics about the precipitation and streamflow data (minimum, mean, maximum and standard deviation) are shown in Table 1. Furthermore, the hydrographs belonging to Embarras River and Saline River are indicated in Figure 1(a) and 1(b), respectively.
Rivers . | Period . | P (mm) . | Q (mm/d) . | ||||||
---|---|---|---|---|---|---|---|---|---|
Minimum . | Mean . | Maximum . | Std . | Minimum . | Mean . | Maximum . | Std . | ||
Embarras River | 01.01.1981–30.09.2001 | 0 | 2.81 | 69.96 | 6.27 | 0.006 | 0.87 | 18.2 | 1.48 |
Saline River | 01.01.1981–30.09.2001 | 0 | 3.63 | 101.5 | 8.59 | 0.002 | 1.17 | 30.3 | 1.92 |
Rivers . | Period . | P (mm) . | Q (mm/d) . | ||||||
---|---|---|---|---|---|---|---|---|---|
Minimum . | Mean . | Maximum . | Std . | Minimum . | Mean . | Maximum . | Std . | ||
Embarras River | 01.01.1981–30.09.2001 | 0 | 2.81 | 69.96 | 6.27 | 0.006 | 0.87 | 18.2 | 1.48 |
Saline River | 01.01.1981–30.09.2001 | 0 | 3.63 | 101.5 | 8.59 | 0.002 | 1.17 | 30.3 | 1.92 |
Methods
Artificial neural network
The Levenberg–Marquardt (LM) trained feed-forward back-propagation ANN model was preferred for the training of data. The benefits of the Levenberg–Marquardt algorithm have been expressed in some studies (Aqil et al. 2007a, 2007b; Badrzadeh et al. 2013; Tongal & Booij 2018). Aqil et al. (2007b) indicated that the Levenberg–Marquardt algorithm performs better than Bayesian regularization and gradient descent with momentum and adaptive learning rate back-propagation algorithms in various ANN models. Accordingly, the Levenberg–Marquardt algorithm is an efficient training algorithm in terms of its robustness and fast convergence capabilities (Demuth & Beale 1998; Aqil et al. 2007b). Furthermore, the outperformance of the tangent sigmoid function as activation function against other activation functions like logistic sigmoid and linear transfer functions has been illustrated in previous studies (Maier et al. 1998; Zadeh et al. 2010). In this regard, the tangent sigmoid function was chosen as the activation function. In this study, different input combinations were applied for both ANN and WANN models as seen in Table 2. Accordingly, precipitation on that day (Pt), precipitation the preceding day (Pt-1), precipitation the preceding two days (Pt-2), potential evapotranspiration on that day (PEt), the runoff the preceding day (Qt-1) and the runoff the preceding two days (Qt-2) were used for streamflow forecasting. In addition, the number of neurons in the hidden layer was determined as one more than the number of the input. To illustrate, if the number of inputs is two, the number of neurons is appointed as three. When the number of neurons was changed, it was realized that the results did not change remarkably for the different combinations.
Models . | Input combination no. . | Input combination . |
---|---|---|
ANN, WANN | 1 | Pt, PEt |
2 | Pt-1, PEt | |
3 | Pt, Pt-1, PEt | |
4 | Pt, Pt-1, Qt-1, PEt | |
5 | Pt, Pt-1, Qt-1 Qt-2 PEt |
Models . | Input combination no. . | Input combination . |
---|---|---|
ANN, WANN | 1 | Pt, PEt |
2 | Pt-1, PEt | |
3 | Pt, Pt-1, PEt | |
4 | Pt, Pt-1, Qt-1, PEt | |
5 | Pt, Pt-1, Qt-1 Qt-2 PEt |
Wavelet transformation
In Equation (4), is the coefficient of the wavelet for the discrete wavelet of scale and location . The Mallat pyramid algorithm was utilized for multiresolution analysis. One should refer to Mallat (1989) for further details about the algorithm. The Daubechies (db2) wavelet as the mother wavelet was selected in this study. The Daubechies wavelet is a member of the orthogonal wavelets defining a discrete wavelet transform and characterized by a maximal number of vanishing moments for some given support. Many of former studies have generally used db2 and db4 wavelets (Nourani et al. 2014). The general choice of db2 in past studies may be due to fact that significant information of the data is successfully expressed by the relatively simpler wavelet function db2, which is a polynomial with two coefficients (Shoaib et al. 2014).
In this study, wavelet analysis was combined with the ANN model. In this regard, precipitation and streamflow data (Pt, Pt-1, Qt-1 Qt-2) were decomposed by using discrete wavelet transformation (Figure 2). The wavelet components which had high correlations with streamflow data were summed for each variable (Pt, Pt-1, Qt-1 Qt-2) separately as illustrated in Figure 2. Then, the decomposed precipitation or streamflow data was used with PEt in different input combinations as illustrated in Table 2 for the WANN model.
GR4J model
GR4J is a hydrological conceptual model in which precipitation and evapotranspiration are used as input data for daily rainfall–runoff modelling. Perrin et al. (2003) showed the outperformance of a GR4J daily lumped conceptual model against even more complicated and parametric models (e.g., HBV and Xinanjiang models). In this respect, the GR4J model was preferred in this study because of its superior performance to some conceptual models (Perrin et al. 2003) as well as its widespread utilization in the literature. The depiction of the GR4J model is presented in Figure 3. The GR4J model has four free parameters: X1 (maximum capacity of the production store), X2 (groundwater exchange coefficient), X3 (one-day-ahead maximum capacity of the routing store) and X4 (time base of unit hydrograph) as indicated in Figure 3. One should refer to Perrin et al. (2003) to obtain information about GR4J model structure. GR4J daily rainfall–runoff modelling was carried out by using the AirGR R package (Coron et al. 2017, 2018), which is part of the R software (R Development Core Team 2015).
Evaluation of the model performances
Qobs,i, Qsim,i and N stand for the observed flow, simulated flow for the ith time and data length, respectively in Equations (5) and (6). represents the mean of the observed values in Equation (6). Furthermore, r stands for the correlation coefficient, α for the ratio of simulated mean flow to observed mean flow and β the proportion of the standard deviation of simulated flow to the standard deviation of observed flow in Equation (7).
RESULTS AND DISCUSSION
First of all, the precipitation and streamflow data were decomposed to ten wavelets and one approximation component by using the Daubechies wavelets. The correlation coefficients between the decomposed Pt, the streamflow the preceding day (Qt-1) and Qt are shown in Figure 4 for the Embarras River basin as an example. The wavelet components of precipitation and streamflow data which provide the highest correlation (for instance, the highest correlation between Qt-1 and Qt was acquired by the sum of the D2, D3, D4, D5, D6, D7, D8, D9, D10, and S10 components of Qt-1) were summed in order to improve the performance of the WANN model. The same procedure was also employed for Saline River. Precipitation and potential evapotranspiration were used in all models as input data. For the ANN and WANN models, additionally Qt-1 and Qt-2 were also utilized in input combinations as seen in Table 2 in order to observe whether a performance improvement occurs or not. In this regard, it is obtained that when Qt-1 and Qt-2 are added to the input combination for the WANN and ANN models, they yield a better performance than the GR4J model in either Embarras River or Saline River. According to Figures 5, 6, and 7, it can be interpreted that the ANN and WANN models are more successful than the GR4J model in terms of predicting the streamflow in Embarras River. It can also be seen that the WANN and ANN models especially are more useful than the GR4J model for the prediction of extreme flow (i.e low or high flow). Similarly, the WANN and ANN models perform better than the GR4J model for input combination 5 in Saline River (Figure 8 and Table 3). Although the WANN model shows good performance compared with the ANN and GR4J models, it should be noted that the performances of the WANN and ANN models are very close to each other for Saline River. This could be related to the high autocorrelation values between Qt and the streamflow values of preceding days in Saline River (the correlation coefficients Qt-1–Qt, Qt-2–Qt are 0.97 and 0.91, respectively). Similar results for the low-flow simulation performance of the GR4J model were also revealed in previous studies (Le Moine 2008; Pushpalatha et al. 2011; Demirel et al. 2013). Demirel et al. (2013) utilized HBV and GR4J conceptual models for low-flow forecasting in Moselle River. They stated that the performance of GR4J is relatively low in comparison with the HBV model because of mainly parameter uncertainty. Le Moine (2008) and Pushpalatha et al. (2011) implemented the improved GR4J model structure with additional parameters (Génie Rural à 5 paramètres Journalier (GR5J) and Génie Rural à 6 paramètres Journalier (GR6J) models, respectively) in order to enhance the flow simulation performance. They indicated the superiority of the GR5J and GR6J models to the GR4J model.
Models . | Performance of the models . | |||
---|---|---|---|---|
Correlation (R) between simulated and observed flows . | NSE . | RMSE (mm/d) . | KGE . | |
WANN | 0.9997 | 0.9995 | 0.035658 | 0.9910 |
ANN | 0.9973 | 0.9946 | 0.119182 | 0.9900 |
GR4J | 0.9123 | 0.8103 | 0.707 | 0.7447 |
Models . | Performance of the models . | |||
---|---|---|---|---|
Correlation (R) between simulated and observed flows . | NSE . | RMSE (mm/d) . | KGE . | |
WANN | 0.9997 | 0.9995 | 0.035658 | 0.9910 |
ANN | 0.9973 | 0.9946 | 0.119182 | 0.9900 |
GR4J | 0.9123 | 0.8103 | 0.707 | 0.7447 |
The RMSE values (Table 4) were calculated for the simulated and observed flows that are over the threshold (50% of the maximum flow in the test period for each river in Figures 5 and 8) in order to evaluate the high-flow simulation performance of the WANN, ANN and GR4J models. In this respect, it is seen that the WANN model yields more accurate results than the ANN and GR4J models. Tian et al. (2013) compared the GR4J, HBV and Xinanjiang models for daily high-flow simulation in Jinhua River, China. They indicated the outperformance of the GR4J model against HBV and Xinanjiang models with regard to high-flow simulation. In relation to that, they explained that this could be related to the less complex structure of the GR4J model than the HBV and Xinanjiang models. However, they also emphasized the importance of study area characteristics for the changeable performance of hydrological models. On the other hand, we found that the performance of the GR4J model is not as good as the data-driven models (i.e., ANN and WANN) for the high-flow simulation (Figures 5 and 8, and Table 4).
Rivers . | RMSE (mm/d) values for high flows estimated by each model . | ||
---|---|---|---|
Models . | |||
WANN . | ANN . | GR4J . | |
Embarras River | 0.93 | 1.33 | 3.74 |
Saline River | 0.18 | 0.56 | 2.78 |
Rivers . | RMSE (mm/d) values for high flows estimated by each model . | ||
---|---|---|---|
Models . | |||
WANN . | ANN . | GR4J . | |
Embarras River | 0.93 | 1.33 | 3.74 |
Saline River | 0.18 | 0.56 | 2.78 |
When P and PE only are used as input data, the GR4J model outperforms the ANN and WANN models in both rivers (Table 5). This could be related to the low correlations between PEt–Qt (−0.12 for Embarras River; −0.26 for Saline River) and Pt–Qt (0.08 for Embarras River; 0.02 for Saline River). It can be understood that the different input combinations affect the performance of the WANN and ANN models significantly. Zadeh et al. (2010) pointed out that the selection of input variable is very significant for accurate simulation. In this context, input variable selection (IVS) algorithms are being presented to increase the efficiency of data-driven models (Galelli et al. 2014). However, this topic will be the focus of future studies for the development of data-driven model performance.
Rivers . | Performance of the models . | ||||
---|---|---|---|---|---|
Models . | Correlation (R) . | NSE . | RMSE (mm/d) . | KGE . | |
Embarras River | WANN | 0.5237 | 0.25 | 1.15 | 0.20 |
ANN | 0.0377 | −0.03 | 1.35 | −0.29 | |
GR4J | 0.8481 | 0.69 | 0.74 | 0.63 | |
Saline River | WANN | 0.5294 | 0.27 | 1.38 | 0.32 |
ANN | 0.3333 | 0.10 | 1.54 | 0.03 | |
GR4J | 0.9123 | 0.81 | 0.71 | 0.74 |
Rivers . | Performance of the models . | ||||
---|---|---|---|---|---|
Models . | Correlation (R) . | NSE . | RMSE (mm/d) . | KGE . | |
Embarras River | WANN | 0.5237 | 0.25 | 1.15 | 0.20 |
ANN | 0.0377 | −0.03 | 1.35 | −0.29 | |
GR4J | 0.8481 | 0.69 | 0.74 | 0.63 | |
Saline River | WANN | 0.5294 | 0.27 | 1.38 | 0.32 |
ANN | 0.3333 | 0.10 | 1.54 | 0.03 | |
GR4J | 0.9123 | 0.81 | 0.71 | 0.74 |
CONCLUSION
The application of different hydrological models (e.g., data-driven or conceptual models) is important to observe their performances for hydrological modelling. This study aims to compare data-driven models (WANN and ANN) and a lumped conceptual model (GR4J) in order to forecast daily streamflow in Embarras and Saline Rivers in the USA. In this study, it was found that the WANN and ANN models yield better than the GR4J conceptual model with regard to forecast daily streamflow when streamflow of preceding days (Qt-1, Qt-2) is included in input combination. Furthermore, it was observed that the performance of the GR4J model is worse than the WANN and ANN models with regard to low- and high-flow simulation. It was also obtained that when only precipitation and evapotranspiration data were used as input variables, the GR4J model shows a better performance than the WANN and ANN models. This reveals that the selection of input data is significant for obtaining more accurate forecasting results by data-driven models. In addition, the performances of WANN and ANN seem very close to each other particularly for Saline River. Probably, this arises from the high autocorrelations between streamflow of the preceding days (Qt-1, Qt-2) and streamflow data (Qt). As can be seen, further studies need to be carried out in order to comprehend the behaviour of hydrological models in different catchments. In this regard, the authors will focus on the improvement of the hydrological models so as to obtain better rainfall–runoff simulation performance in future studies.