Prediction of rainfall time series using modular soft computingmethods

https://doi.org/10.1016/j.engappai.2012.05.023Get rights and content

Abstract

In this paper, several soft computing approaches were employed for rainfall prediction. Two aspects were considered to improve the accuracy of rainfall prediction: (1)carrying out a data-preprocessing procedure and (2)adopting a modular modeling method. The proposed preprocessing techniques included moving average (MA) and singular spectrum analysis (SSA). The modular models were composed of local support vectors regression (SVR) models or/and local artificial neural networks (ANN) models. In the process of rainfall forecasting, the ANN was first used to choose data-preprocessing method from MA and SSA. Modular models involved preprocessing the training data into three crisp subsets (low, medium and high levels) according to the magnitudes of the training data, and finally two SVRs were performed in the medium and high-level subsets whereas ANN or SVR was involved in training and predicting the low-level subset. For daily rainfall record, the low-level subset tended to be modeled by the ANN because it was overwhelming in the training data, which is based on the fact that the ANN is very efficient in training large-size samples due to its parallel information processing configuration. Four rainfall time series consisting of two monthly rainfalls and two daily rainfalls from different regions were utilized to evaluate modular models at 1-day, 2-day, and 3-day lead-time with the persistence method and the global ANN as benchmarks. Results showed that the MA was superior to the SSA when they were coupled with the ANN. Comparison results indicated that modular models (referred to as ANN-SVR for daily rainfall simulations and MSVR for monthly rainfall simulations) outperformed other models. The ANN-MA also displayed considerable accuracy in rainfall forecasts compared with the benchmark.

Introduction

An accurate and timely rainfall forecast is crucial for reservoir operation and flooding prevention because it can provide an extension of lead-time of the flow forecast, larger than the response time of the watershed, in particular for small and medium-sized mountainous basins.

Rainfall prediction is a very complex problem. Simulating the response using conventional approaches in modeling rainfall time series is far from a trivial task since the hydrologic processes are complex and involve various inherently complex predictors such as geomorphologic and climatic factors, which are still not well understood. As such, the artificial neural network algorithm becomes an attractive inductive approach in rainfall prediction owing to their highly nonlinearity, flexibility and data-driven learning in building models without any prior knowledge about catchment behavior and flow processes. They are purely based on the information retrieved from the hydro-meteorological data and act as blackbox.

Many studies have been conducted for the quantitative precipitation forecast (QPF) using diverse techniques including numerical weather prediction (NWP) models and remote sensing observations (Davolio et al., 2008, Diomede et al., 2008, Ganguly and Bras, 2003, Sheng et al., 2006, Yates et al., 2000), statistical models (Chan and Shi, 1999, Chu and He, 1995, DelSole and Shukla, 2002, Li and Zeng, 2008, Munot and Kumar, 2007, Nayagam et al., 2008), chaos-based approach (Jayawardena and Lai, 1994), non-parametric nearest-neighbors method (Toth et al., 2000), and soft computing-based methods including artificial neural networks (ANN), support vector regression (SVR) and fuzzy logic (FL) (Brath et al., 2002, Dorum et al., 2010, Guhathakurta, 2008, Nasseri et al., 2008, Pongracz et al., 2001, Sedki et al., 2009, Silverman and Dracup, 2000, Sivapragasam et al., 2001, Surajit and Goutami, 2007, Talei et al., 2010, Toth et al., 2000, Venkatesan et al., 1997). The contemporary studies focused on soft computing-based methods. Several examples of such methods can be mentioned. Venkatesan et al. (1997) employed the ANN to predict the all India summer monsoon rainfall with different meteorological parameters as model inputs. Chattopadhyay and Chattopadhyay (2008a) constructed an ANN model to predict monsoon rainfall in India depending on the rainfall series alone. The fuzzy logic theory was applied to monthly rainfall prediction by Pongracz et al. (2001). Toth et al. (2000) applied three time series models, auto-regressive moving average (ARMA), ANN and k-nearest-neighbors (KNN) method, to short-term rainfall prediction. The results showed that the ANN performed the best in the improvement of the runoff forecasting accuracy when the predicted rainfall was used as inputs of the rainfall-runoff model. ANN has also been applied on general circulation model (GCM). Chadwick et al. (2011) employed an artificial neural network approach to downscale GCM temperature and rainfall fields to regional model scale over Europe. Sachindra et al. (2011) developed a model with various soft computing techniques capable of statistically downscaling monthly GCM outputs to catchment scale monthly streamflows, accounting for the climate change.

Recently, models based on combining concepts have been paid more attention in hydrologic forecasting. Depending on different combination methods, combining models can be categorized into ensemble models and modular (or hybrid) models. The basic idea behind the ensemble models is to build several different or similar models for the same process and to combine them in a combining method (Abrahart and See, 2002, Kim et al., 2006, Shamseldin et al., 1997, Shamseldin and O'Connor, 1999, Xiong et al., 2001). For example, Xiong et al. (2001) used a Takagi-Sugeno fuzzy technique to combine several conceptual rainfall-runoff models. Coulibaly et al. (2005) employed an improved weighted-average method to coalesce forecasted daily reservoir inflows from the KNN model, conceptual model and ANN model. Kim et al. (2006) investigated five combining methods for improving ensemble streamflow prediction.

Physical processes in rainfall and/or runoff are generally composed of a number of sub-processes so that their accurate modeling by the building of a single global model is often not possible. Modular models are therefore proposed where sub-processes are first of all identified and then separate models (also called local or expert model) are established for each of them (Solomatine and Ostfeld, 2008). In these modular models, the split of training data can be soft or crisp. The soft split means the dataset can be overlapped and the overall forecasting output is the weighted-average of each local model (Shrestha and Solomatine, 2006, Zhang and Govindaraju, 2000, Wang et al., 2006, Wu et al., 2008). Zhang and Govindaraju (2000) examined the performance of modular networks in predicting monthly discharges based on the Bayesian concept. Wu et al. (2008) employed a distributed SVR for daily river stage prediction. On the contrary, there is no overlap of data in the crisp split and the final forecasting output is generated explicitly from one of the local models (Corzo and Solomatine, 2007, Jain and Srinivasulu, 2006, See and Openshaw, 2000, Sivapragasam and Liong, 2005, Solomatine and Xue, 2004). Solomatine and Xue (2004) used M5 model trees and neural networks in a flood-forecasting problem. Sivapragasam and Liong (2005) divided the flow range into three regions, and employed different SVR models to predict daily flows in high, medium and low regions.

Apart from the adoption of the modular model, the improvement of predictions may be expected by suitable data preprocessing techniques. Besides the conventional rescaling or standardization of training data, preprocessing methods from the perspective of signal analysis are also crucial because rainfall time series may be also viewed as a quasi-periodic signal, which is contaminated by various noises. Hence techniques such as singular spectrum analysis (SSA) were recently introduced to hydrology field by some researchers (Marques et al., 2006, Partal and Kişi, 2007, Sivapragasam et al., 2001). Sivapragasam et al. (2001) established a hybrid model of support vector machine (SVM) and the SSA for rainfall and runoff predictions. The hybrid model resulted in a considerable improvement in the model performance in comparison with the original SVM model. The application of wavelet analysis to precipitation was undertaken by Partal and Kişi (2007). Their results indicated that the wavelet analysis was highly promising. In addition, the issue of lagged predictions in the ANN model was mentioned by some researchers (Dawson and Wilby, 2001, Jain and Srinivasulu, 2004, De Vos and Rientjes, 2005, Muttil and Chau, 2006). A main reason on lagged predictions was the use of previous observed data as ANN inputs (De Vos and Rientjes, 2005). An effective solution was to obtain new model inputs by moving average over the original data series.

The scope of this study was to investigate the effect of the MA and SSA as data-preprocessing techniques and to couple with modular models in improving model performance for rainfall prediction. The modular model included three local models which were associated with three crisp subsets (low-, medium- and high-intensity rainfall) clustered by fuzzy C-mean (FCM) method. The ANN was first used to choose data-preprocessing method from MA and SSA. Depending on the selected data-preprocessing technique, modular models were employed to perform rainfall prediction. Generally, the ANN is very efficient in processing large-size training samples due to its parallel information processing configuration. The biggest drawback is that the model outputs are variable because of the random initialization of weights and biases. The SVR holds a good generalization and more stable model outputs. However, it is suitable for a small-size training sample (e.g. below 200) because the training time exponentially increases with the size of training samples. For the current rainfall data, the majority of subsets after data split belong to a small-size sample except for the low-intensity daily rainfall. Therefore, three local SVRs (hereafter referred as to MSVR) were employed for monthly rainfall data whereas two local SVRs and one ANN (hereafter referred to as ANN-SVR) were adopted for daily rainfall data. For daily rainfall record, the low-intensity subset was modeled by the ANN because it was overwhelming in the training data. For the comparison purpose, the global ANN and the persistence model were used as benchmarks. To ensure generalization of this study, four cases consisting of two monthly rainfall series and two daily rainfall series from India and China, were explored.

Section snippets

Moving average (MA)

The moving average method smoothes data by replacing each data point with the average of the K neighboring data points, where K may be called the length of memory window. The basic idea behind the method is that any large irregular component at any point in time will exert a smaller effect if we average the point with its immediate neighbors (Newbold et al., 2003). The most common moving average method is the unweighted moving average, in which each value of the data carries the same weight in

Case study

Two daily mean rainfall series (at Zhenwan and Wuxi raingauge stations, respectively) from Zhenshui and Da'ninghe watersheds of China, and two monthly mean rainfall series from India and Zhongxian raingauge station of China, were analyzed in thisstudy.

The Zhenshui basin is located in the north of Guangdong province and adjoined by Hunan province and Jianxi Province. The basin belongs to a second-order tributary of the Pearl River and has an area of 7554 km2. The daily rainfall time series data

Decomposition of rainfall data

The decomposition of the daily average rainfall series requires identifying the window length m (or the singular number) if the interval of neighboring points in discrete time series is defaulted as the lag time (i.e. τ=1 day for daily rainfall data or 1month for monthly rainfall data). The reasonable value of m should give rise to a clear resolution of the original signal. The present study does not need accurately resolve any trends or oscillations in the raw rainfall signal. A rough

Results

The overall performances of each model in terms of RMSE, CE, and PI are presented in Table 2 for two monthly rainfall series and Table 3 for two daily rainfall series. It can be seen that two benchmark models of persistence and ANN demonstrated very poor performances for all four cases except for India. The performances from ANN-MA and ANN-SSA indicate that data-preprocessing methods resulted in considerable improvement in the accuracy of the rainfall forecasting. Moreover, the MA seems

Conclusions

The purpose of this study was to investigate the effect of modular models coupled with data-preprocessing techniques in improving the accuracy of rainfall forecasting. The modular models consisted of three local SVR and/or ANN. A three-layer feed-forward ANN was used to examine two data-preprocessing techniques, MA and SSA. Results show that the MA was superior to the SSA. Four rainfall records, India, Zhongxian, Wuxi and Zhenwan, from India and China, were used as testingcases.

With the help of

References (66)

  • A.Y. Shamseldin et al.

    Methods for combining the outputs of different rainfall-runoff models

    J. Hydrol.

    (1997)
  • A. Talei et al.

    A novel application of a neuro-fuzzy computational technique in event-based rainfall–runoff modeling

    Expert Syst. Appl.

    (2010)
  • E. Toth et al.

    Comparison of short-term rainfall prediction models for real-time flood forecasting

    J. Hydrol.

    (2000)
  • R. Vautard et al.

    Singular-spectrum analysis: a toolkit for short, noisy and chaotic signals

    Physica D

    (1992)
  • W. Wang et al.

    Forecasting daily streamflow using hybrid ANN models

    J. Hydrol.

    (2006)
  • C.L. Wu et al.

    River stage prediction based on a distributed support vector regression

    J. Hydrol.

    (2008)
  • L.H. Xiong et al.

    A non-linear combination of the forecasts of rainfall-runoff models by the first-order Takagi-Sugeno fuzzy system

    J. Hydrol.

    (2001)
  • P.S. Yu et al.

    Support vector regression for real-time flood stage forecasting

    J. Hydrol.

    (2006)
  • R.J. Abrahart et al.

    Multi-model data fusion for river flow forecasting: an evaluation of six alternative methods based on two contrasting catchment

    Hydrol. Earth Syst. Sci.

    (2002)
  • A. Brath et al.

    Neural networks and non-parametric methods for improving real time flood forecasting through conceptual hydrological models

    Hydrol. Earth Syst. Sci.

    (2002)
  • R. Chadwick et al.

    An artificial neural network technique for downscaling GCM outputs to RCM spatial scale

    Nonlinear Processes Geophys.

    (2011)
  • J.C.L. Chan et al.

    Prediction of the summer monsoon rainfall over South China

    Int. J. Climatol.

    (1999)
  • S. Chattopadhyay et al.

    Identification of the best hidden layer size for three-layered neural net in predicting monsoon rainfall in India

    J. Hydroinformatics

    (2008)
  • S. Chattopadhyay et al.

    Comparative study among different neural net learning algorithms applied to rainfall time series

    Meteorol. Appl.

    (2008)
  • B. Cheng et al.

    Neural networks: a review from a statistical perspective

    Stat. Sci.

    (1994)
  • P.S. Chu et al.

    Long-range prediction of Hawaiian winter rainfall using canonical correlation-analysis

    Int. J. Climatol.

    (1995)
  • G. Corzo et al.

    Baseflow separation techniques for modular artificial neural network modelling in flow forecasting

    Hydrol. Sci. J.

    (2007)
  • P. Coulibaly et al.

    Improving daily reservoir inflow forecasts with model combination

    J. Hydrol. Eng.

    (2005)
  • C.W. Dawson et al.

    Hydrological modeling using artificial neural networks

    Prog. Phys. Geography

    (2001)
  • S. Davolio et al.

    A meteo-hydrological prediction system based on a multi-model approach for precipitation forecasting

    Nat. Hazards Earth Syst. Sci.

    (2008)
  • T. DelSole et al.

    Linear prediction of Indian monsoon rainfall

    J. Climate

    (2002)
  • N.J. De Vos et al.

    Constraints of artificial neural networks for rainfall-runoff modeling: trade-offs in hydrological state representation and model evaluation

    Hydrol. Earth Syst. Sci.

    (2005)
  • T. Diomede et al.

    Discharge prediction based on multi-model precipitation forecasts

    Meteorol. Atmos. Phys.

    (2008)
  • Cited by (273)

    View all citing articles on Scopus
    View full text