Abstract
Wastewater flow forecasts are key components in the short- and long-term management of sewer systems. Forecasting flows in sewer networks constitutes a considerable uncertainty for operators due to the nonlinear relationship between causal variables and wastewater flows. This work aimed to fill the gaps in the wastewater flow forecasting research by proposing a novel wastewater flow forecasting model (WWFFM) based on the nonlinear autoregressive with exogenous inputs neural network, real-time, and forecasted water consumption with an application to the sewer system of Casablanca in Morocco. Furthermore, this research compared the two approaches of the forecasting model. The first approach consists of forecasting wastewater flows on the basis of real-time water consumption and infiltration flows, and the second approach considers the same input in addition to water distribution flow forecasts. The results indicate that both approaches show accurate and similar performances in predicting wastewater flows, while the forecasting horizon does not exceed the watershed lag time. For prediction horizons that exceed the lag time value, the WWFFM with water distribution forecasts provided more reliable forecasts for long-time horizons. The proposed WWFFM could benefit operators by providing valuable input data for predictive models to enhance sewer system efficiency.
HIGHLIGHTS
Implementation of a novel wastewater flow forecasting model based on the NARX neural network.
New tool for flow forecasting in urban drainage catchments.
The wastewater flow forecasting model provides accurate input data for predictive modeling.
INTRODUCTION
Wastewater flow forecasts are key components in the short- and long-term management of sewer systems. In wastewater treatment plants (WWTPs), a wastewater flow forecasting model (WWFFM) could benefit operators by providing valuable input data for predictive models to simulate plant behavior and optimize performances and costs through the control of biological processes (Fernandez et al. 2009). For pumping stations, selecting the best pump scheduling configuration and running the pumps with an appropriate adjustment of rotation speed could help save energy (Wei et al. 2013). These forecasts could also enhance the performance and cost-effectiveness of real-time chemical dosing controllers, thereby preventing hydrogen sulfide formation (Chen et al. 2014).
Several models based on data-driven modeling for forecasting wastewater flows have been developed to address these challenges during the last decade. Wei et al. (2013) developed a multilayer perceptron (MLP) neural network model for the short-term prediction of influent flow rates in WWTPs. This model takes influent flow rate, rainfall rate, and radar reflectivity as inputs and returns an accurate flow forecast with a prediction horizon of up to 180 min. Boyd et al. (2019) proposed a model based on an autoregressive integrated moving average for daily influent flow forecasts tested at five WWTPs across North America and was completed with a multilayer perceptron neural network proposed by Zhang et al. (2019). These models rely only on historical data with no external inputs. Although these models are efficient, they remain limited in their approach. In fact, for forecasting wastewater flows, these models only consider sewer flow historical data. Moreover, they do not integrate drinking water consumption, which is the main causal variable that may influence forecasted flows in the case of a water shutdown in a sector or water consumption variation due to a given event.
The current work aimed to fill the gaps in the wastewater flow forecasting research by proposing a novel WWFFM based on the nonlinear autoregressive with exogenous inputs neural network (NARX-NN), real-time, and forecasted water consumption with an application to the sewer system of Casablanca in Morocco.
MATERIALS AND METHODS
The WWFFM aims at predicting instantaneous dry weather flows at specific points of watersheds. Dry weather flow usually corresponds to flows with no rainfall influence or at a maximum rainfall intensity of 0.3 mm and without inflows (Staufer et al. 2012). Given that the wastewater flow production function is nonlinear and depends on the spatial and temporal variations of water consumption through watersheds, using a model that can handle nonlinear problems for forecasting purposes is important. The proposed WWFFM is based on the NARX that has shown its efficiency through various nonlinear times-series forecasting applications (Abou Rjeily et al. 2017; Koschwitz et al. 2018; Wunsch et al. 2018; Marcjasz et al. 2019; Di Nunno et al. 2021). The WWFFM considers real-time water consumption and previous infiltration flow records as inputs and predicted wastewater flows with forecast horizons that vary from 30 to 240 min as outputs. These periods offer a sufficient lead time to real-time and predictive control models to process and apply optimal control strategies.
The early stopping method for improving generalization was used, and the divide block method was employed to split the dataset into three subsets. The first subset representing 70% of the data is the training set, which was utilized to compute the gradient and update the network weights and biases to find the model parameters. The second subset is the validation set (15%). The error in the validation set was monitored during the training process to avoid the increase of errors in the validation set and overfitting. When a validation error increases for a specified number of iterations (six iterations in our case), the training is stopped, and the weights and biases at the minimum of the validation error are returned. Furthermore, the total number of allowed epochs was set to 1,000. The remaining 15% of the dataset was employed as a test set to assess the generalization error in the final model.
In the present work, two approaches of the forecasting model were compared (Figure 3):
The first approach consists of forecasting wastewater flows on the basis of real-time water distribution flows for eight district metering areas (DMAs) and infiltration flows.
The second approach comprises forecasting wastewater flows according to infiltration flow, water demand flow, and short-term water demand forecasts for the eight DMAs. The water consumption forecasting model is based on a feed-forward back-propagation neural network. The input dataset is composed of historical temperature, water consumption, and days of specification data.
The water consumption forecasting model is based on a feed-forward back-propagation neural network that has shown its efficiency in forecasting water consumption on the campus of Lille University (Farah et al. 2019). The input dataset comprises historical temperature, water consumption, and days of specification data. The model gives as output, and water demand forecasts are used as inputs for the WWFFM.
In the model, days of specifications are represented as vectors containing information about the following:
Day of the week (i.e., Monday to Sunday, where values range from 1 to 7).
Holidays and special days (New Year's Day and religious celebrations such as Aid El-Adha) are represented with a vector where the values are either 0 or 1.
Special consumption periods as Ramadan, where consumption patterns differ from normal consumption ones. The vector values are either 0 or 1, where 1 corresponds to the Ramadan period.
The daily time is represented with 288 5-min timesteps, where values range between 1 and 288.
EXPERIMENTAL DATA
Site description
The data were collected from a watershed of 3,315 ha, which covers the townships of the Eastern part of Casablanca (Figure 4). The urbanization of the area is fairly heterogeneous and comprises industrial and residential areas. The urban drainage system (UDS) is a combined system in the historical part of the townships with a separate sewer system in the new urbanized areas.
Data collection and processing
The area is equipped with a monitoring system based on quantitative sensors that measure sewer flows at the watershed outlet and water consumption at the eight DMAs. The monitoring system of the DMAs is composed of insertion and electromagnetic flowmeters that conduct measurements at a 5-min time step. The UDS is equipped with a depth meter to measure the water level and a flow meter to measure the discharge at the watershed outlet. The measurement for the UDS is conducted at a 15-min time step.
In the framework of the current study, wastewater flow (Qw), precipitation (P), water consumption (Wc), and temperature (T) data were collected for 3 years between March 2014 and July 2017.
The mean dry weather flow rate pattern presented in Figure 5 shows that wastewater flows vary between 390 L/s for the minimum night flow (MNF) and 900 L/s for the peak flow that occurs around 12:00 pm. Figure 6 illustrates the diurnal patterns for days of the week, average diurnal, seasonal patterns, and special diurnal patterns for specific periods. For normal days, the flow rates of water consumption vary from 270 to 1,100 L/s with an average flow rate of 650 L/s and can reach a value of 1,600 L/s during the Aid El-Adha celebration. Furthermore, Figure 6(a) and 6(b) displays the similar variations of the diurnal patterns for each day of the week and each season, with a rise of the MNF in summer of approximately 70 L/s and the peak flow of nearly 150 L/s. For all the consumption patterns, the peak flow is recorded between 11:00 am and 12:00 pm and decreases to reach the MNF between 2:00 am and 4:00 pm. However, the water consumption diurnal pattern trend changes during Ramadan, where we observe an increase in water consumption during the night with a peak flow around 4:00 am before the beginning of the fast and an MNF that shifts to 6:00 am. We can also observe a fast drop and variation in water consumption roughly 7:00 pm, which corresponds to the fast break time.
However, given that the main sewer system was combined, the first step consisted of identifying rainy days on the basis of the rainfall records of the rain gauges and removing the corresponding data to keep only dry weather flows in the dataset.
For model predictive control systems and forecasting models, missing data constitute a major issue that does not fulfill the requirements of algorithms (Yuri et al. 2016). These problems could result from several factors, such as a power outage or a communication failure between the remote terminal units and the SCADA system (Walski et al. 2003). Many filling methods were proposed and could be found in the literature (Li et al. 2006; Qin et al. 2009; Fan et al. 2012), such as artificial filling, average value filling, special value filling, and regression. The reconstitution of the missing values of the dataset was performed through a linear interpolation.
In addition to missing values, data from field measurements usually include noise (Ruiz et al. 2016) that can affect the efficiency of machine learning algorithms (Lucas 2010; Munawar et al. 2011). The LOESS nonparametric regression method proposed by Cleveland (1979) and further developed by Cleveland et al. (1988), Cleveland & Grosse (1991), and Cleveland et al. (1992) was employed to smooth the collected data (Figure 7).
Data analysis
The visualization of the total distributed water and the wastewater flows (Figure 9) shows that the maximum lag time between the peaks of these two variables is around 80 min. Additional lag time analysis was performed using the cross-correlation analysis between distribution water and wastewater flows (Figure 10). The analysis results show a high correlation between these two variables because the lag is less than 80 min. Above this value, the correlation starts decreasing under 80%, exhibiting a weaker relation between both variables. Thus, the lag value for the NARX model is considered to be 80 min, corresponding to 16-time step delays for the NARX.
RESULTS
During the training stage, the NARX neural network minimizes the error between the model results and the real observed data. A different number of neurons were tested and, after several trials, the best training, testing, and validation results were obtained with a hidden layer with 10 neurons allowing the reduction of the mean squared error (MSE) that decreases from 105 at the beginning of the training stage to 0.17 after 302 iterations. Tables 1 and 2 present the performance statistics of the NARX-NN architectures. The presented results show that increasing the number of neurons increases the efficiency of the model. However, increasing the number of neurons to more than 10 results in poor performances in multistep ahead forecasts. Figure 11 shows the performance of the trained ANN in the training, validation, and testing sets. In addition, Figure 12 highlights that the efficiency of the trained network presented by high regression values (R) of 0.999 is presented for the training, validation, and testing parts.
. | Qt+6 . | Qt+9 . | Qt+12 . | Qt+15 . | Qt+18 . | Qt+24 . | Qt+48 . |
---|---|---|---|---|---|---|---|
1 neuron | 0.397 | 6.311 | 4.119 | 2.287 | 8.081 | 24.108 | 35.447 |
5 neurons | 0.6181 | 5.809 | 3.823 | 2.180 | 7.676 | 17.659 | 25.646 |
10 neurons | 1.828 | 5.603 | 4.828 | 1.729 | 7.238 | 16.922 | 17.868 |
15 neurons | 1.495 | 4.727 | 1.762 | 6.413 | 14.678 | 29.529 | 29.528 |
. | Qt+6 . | Qt+9 . | Qt+12 . | Qt+15 . | Qt+18 . | Qt+24 . | Qt+48 . |
---|---|---|---|---|---|---|---|
1 neuron | 0.397 | 6.311 | 4.119 | 2.287 | 8.081 | 24.108 | 35.447 |
5 neurons | 0.6181 | 5.809 | 3.823 | 2.180 | 7.676 | 17.659 | 25.646 |
10 neurons | 1.828 | 5.603 | 4.828 | 1.729 | 7.238 | 16.922 | 17.868 |
15 neurons | 1.495 | 4.727 | 1.762 | 6.413 | 14.678 | 29.529 | 29.528 |
. | Qt+6 . | Qt+9 . | Qt+12 . | Qt+15 . | Qt+18 . | Qt+24 . | Qt+48 . |
---|---|---|---|---|---|---|---|
1 neuron | 0.9998 | 0.9934 | 0.9918 | 0.9989 | 0.9852 | 0.8348 | 0.6881 |
5 neurons | 0.9999 | 0.9944 | 0.9933 | 0.9990 | 0.9866 | 0.9113 | 0.8367 |
10 neurons | 0.9995 | 0.9948 | 0.9957 | 0.9994 | 0.9806 | 0.9756 | 0.9207 |
15 neurons | 0.9997 | 0.9963 | 0.9924 | 0.9916 | 0.9511 | 0.7521 | 0.7835 |
. | Qt+6 . | Qt+9 . | Qt+12 . | Qt+15 . | Qt+18 . | Qt+24 . | Qt+48 . |
---|---|---|---|---|---|---|---|
1 neuron | 0.9998 | 0.9934 | 0.9918 | 0.9989 | 0.9852 | 0.8348 | 0.6881 |
5 neurons | 0.9999 | 0.9944 | 0.9933 | 0.9990 | 0.9866 | 0.9113 | 0.8367 |
10 neurons | 0.9995 | 0.9948 | 0.9957 | 0.9994 | 0.9806 | 0.9756 | 0.9207 |
15 neurons | 0.9997 | 0.9963 | 0.9924 | 0.9916 | 0.9511 | 0.7521 | 0.7835 |
Once the model had been trained, further validation of the accuracy of the WWFFM was performed through multistep ahead predictions for 5 days, from September 8, 2016 to September 12, 2016, with hidden data not used during the training process. Figure 13 exhibits the water consumption of the eight DMAs and BIF employed for forecasting wastewater flows for a 5-day period. During this period, high water consumption was recorded on September 12 and corresponded to Aid El-Adha celebration day. The predictions of the WWFFM were conducted for different horizons Qt+k. Where Qt designates the wastewater flow at timestep t, while Qt+k stands for the wastewater flow at timestep t + k (k = 6, 9, 12, 15, 18, 24, and 48) with a 5-min time step.
Tables 3 and 4 present the performance statistics of the WWFFM without water demand forecasts and the WWFFM with water demand forecasts, respectively. Figure 14(a)–14(g) depicts the predicted and observed flows for both approaches.
. | Qt+6 . | Qt+9 . | Qt+12 . | Qt+15 . | Qt+18 . | Qt+24 . | Qt+48 . |
---|---|---|---|---|---|---|---|
RMSE (m3 s−1) | 3.300 | 5.492 | 10.383 | 16.166 | 18.918 | 43.487 | 82.855 |
NSE | 0.999 | 0.999 | 0.998 | 0.995 | 0.993 | 0.967 | 0.881 |
. | Qt+6 . | Qt+9 . | Qt+12 . | Qt+15 . | Qt+18 . | Qt+24 . | Qt+48 . |
---|---|---|---|---|---|---|---|
RMSE (m3 s−1) | 3.300 | 5.492 | 10.383 | 16.166 | 18.918 | 43.487 | 82.855 |
NSE | 0.999 | 0.999 | 0.998 | 0.995 | 0.993 | 0.967 | 0.881 |
. | Qt+6 . | Qt+9 . | Qt+12 . | Qt+15 . | Qt+18 . | Qt+24 . | Qt+48 . |
---|---|---|---|---|---|---|---|
RMSE (m3 s−1) | 3.504 | 4.367 | 4.135 | 4.7915 | 11.711 | 11.888 | 12.017 |
NSE | 0.999 | 0.999 | 0.999 | 0.999 | 0.997 | 0.997 | 0.997 |
. | Qt+6 . | Qt+9 . | Qt+12 . | Qt+15 . | Qt+18 . | Qt+24 . | Qt+48 . |
---|---|---|---|---|---|---|---|
RMSE (m3 s−1) | 3.504 | 4.367 | 4.135 | 4.7915 | 11.711 | 11.888 | 12.017 |
NSE | 0.999 | 0.999 | 0.999 | 0.999 | 0.997 | 0.997 | 0.997 |
The analysis of the error statistical results in Figure 14 demonstrates that the WWFFM model with both approaches shows good performances in forecasting dry weather flow as long as the lag time remains less than 80 min. The forecast results are highly accurate, with an RMSE ranging between 3.3 and 16.16 and an NSE ranging between 0.995 and 0.999. Nonetheless, for prediction horizons exceeding 80 min, the WWFFM without water distribution forecasts has a poor performance that decreases with the increase of the forecasting horizon that fails to predict peak, especially for September 12, where the NARX-NN overestimates the peak flow of more than 550 L/s. Conversely, the WWFFM with water distribution forecasts enables the forecast of long-time horizons with a slight variation of the RMSEs over the different forecasting horizons ranging between 3.5 and 12.
DISCUSSION
The current study explored a new approach for predicting instantaneous dry weather flows in the UDS on the basis of the NARX-NN and drinking water consumption, and such an approach was tested on a part of the sewer system of Casablanca, which comprises approximately five million people. The construction of the model required essential steps to reconstitute data through linear interpolation because most modeling techniques cannot deal with missing values and cast out the whole instance value if one of the variable values is missing. In addition, the LOESS nonparametric regression method was used to smooth the data lying far from the bulk of the data range, and a cross-correlation analysis was also conducted to assess the suitable lagged information of the model.
The findings of this study validate that both tested approaches of the WWFFM display accurate results and similar performances in predicting dry weather flows with low RMSEs less than 16.16 and high NSEs as long as the forecasting horizon does not exceed 80 min. Nonetheless, the results further confirm that for prediction horizons that exceed 80 min, the WWFFM without water distribution forecasts presents poor performances that decrease with the increase of the forecasting horizon due to the lack of appropriate causal input variables, thereby making it unsuitable for long-time horizon forecasts for model predictive system use. Conversely, the WWFFM with water distribution forecasts is continuously updated with appropriate lagged input data, thereby enabling it to perform highly accurate forecasts for long-time horizons though representing all the flow ranges. The findings also highlight the importance of the WWFFM that could benefit operators and water engineers, thereby providing valuable input data for predictive model control to enhance the efficiency of sewer systems.
To our knowledge, this is the first study that has explored this new approach of forecasting dry weather flows on the basis of real-time water consumption and the BIF, which thus improves the knowledge of and complements previous research works in forecasting dry weather flows. The currently known models proposed in the literature (Wei et al. 2013; Boyd et al. 2019; Zhang et al. 2019) rely only on historical data with no external inputs. Additionally, they do not integrate drinking water consumption, which is the main causal variable that may influence forecasted flows in case of a water shutdown in a sector or water consumption variation due to a given event.
The limitation of the proposed WWFFM model lies in its use of real-time data, which can pose a problem in the event of data unavailability due to a sensor failure or a communication problem. Therefore, ensuring the good maintenance of the flow meters and continuous data transmissions for the needs of the NARX-NN is essential. Moreover, defining strategies for filling in data in case of communication failures would be interesting. In the meantime, the proposed model only integrates the forecasts of wastewater flows, and it is planned in the perspective of future works to develop the model by integrating the forecasts of combined sewer flows considering the fraction of stormwater flows.
CONCLUSION
The present work aims to fill the gaps in the wastewater flow forecasting research across the world by proposing a novel WWFFM based on the NARX. The proposed model considers real-time and forecasted water consumption as the main causal variable input of wastewater flow production. This study differs from the approaches presented through the literature that remain limited considering the only sewer flow historical data and that would fail to forecast sewer flows in the case of a water shutdown in a sector or water consumption variation due to a given event. This research compares the two approaches of the forecasting model. The first approach consists of forecasting wastewater flows on the basis of real-time water consumption and infiltration flows, and the second approach considers the same input in addition to the water distribution flow forecasts. Consequently, both approaches display accurate results and similar performances in predicting wastewater flows, while the forecasting horizon does not exceed 80 min. Nonetheless, for prediction horizons that exceed 80 min, the WWFFM without water distribution forecasts presents poor performances that decrease with the increase of the forecasting horizon. Conversely, the WWFFM with water distribution forecasts is continuously updated with the appropriate lagged input data, thereby making it able to perform highly accurate forecasts for long-time horizons. Hence, the WWFFM developed in this study could benefit operators and water engineers, providing valuable input data for predictive model control and thus enhancing UDS efficiency.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.