Prediction of rainfall time series using modular soft computingmethods

doi:10.1016/j.engappai.2012.05.023

Engineering Applications of Artificial Intelligence

Volume 26, Issue 3, March 2013, Pages 997-1007

https://doi.org/10.1016/j.engappai.2012.05.023 Get rights and content

Abstract

In this paper, several soft computing approaches were employed for rainfall prediction. Two aspects were considered to improve the accuracy of rainfall prediction: (1)carrying out a data-preprocessing procedure and (2)adopting a modular modeling method. The proposed preprocessing techniques included moving average (MA) and singular spectrum analysis (SSA). The modular models were composed of local support vectors regression (SVR) models or/and local artificial neural networks (ANN) models. In the process of rainfall forecasting, the ANN was first used to choose data-preprocessing method from MA and SSA. Modular models involved preprocessing the training data into three crisp subsets (low, medium and high levels) according to the magnitudes of the training data, and finally two SVRs were performed in the medium and high-level subsets whereas ANN or SVR was involved in training and predicting the low-level subset. For daily rainfall record, the low-level subset tended to be modeled by the ANN because it was overwhelming in the training data, which is based on the fact that the ANN is very efficient in training large-size samples due to its parallel information processing configuration. Four rainfall time series consisting of two monthly rainfalls and two daily rainfalls from different regions were utilized to evaluate modular models at 1-day, 2-day, and 3-day lead-time with the persistence method and the global ANN as benchmarks. Results showed that the MA was superior to the SSA when they were coupled with the ANN. Comparison results indicated that modular models (referred to as ANN-SVR for daily rainfall simulations and MSVR for monthly rainfall simulations) outperformed other models. The ANN-MA also displayed considerable accuracy in rainfall forecasts compared with the benchmark.

Introduction

An accurate and timely rainfall forecast is crucial for reservoir operation and flooding prevention because it can provide an extension of lead-time of the flow forecast, larger than the response time of the watershed, in particular for small and medium-sized mountainous basins.

Rainfall prediction is a very complex problem. Simulating the response using conventional approaches in modeling rainfall time series is far from a trivial task since the hydrologic processes are complex and involve various inherently complex predictors such as geomorphologic and climatic factors, which are still not well understood. As such, the artificial neural network algorithm becomes an attractive inductive approach in rainfall prediction owing to their highly nonlinearity, flexibility and data-driven learning in building models without any prior knowledge about catchment behavior and flow processes. They are purely based on the information retrieved from the hydro-meteorological data and act as blackbox.

Many studies have been conducted for the quantitative precipitation forecast (QPF) using diverse techniques including numerical weather prediction (NWP) models and remote sensing observations (Davolio et al., 2008, Diomede et al., 2008, Ganguly and Bras, 2003, Sheng et al., 2006, Yates et al., 2000), statistical models (Chan and Shi, 1999, Chu and He, 1995, DelSole and Shukla, 2002, Li and Zeng, 2008, Munot and Kumar, 2007, Nayagam et al., 2008), chaos-based approach (Jayawardena and Lai, 1994), non-parametric nearest-neighbors method (Toth et al., 2000), and soft computing-based methods including artificial neural networks (ANN), support vector regression (SVR) and fuzzy logic (FL) (Brath et al., 2002, Dorum et al., 2010, Guhathakurta, 2008, Nasseri et al., 2008, Pongracz et al., 2001, Sedki et al., 2009, Silverman and Dracup, 2000, Sivapragasam et al., 2001, Surajit and Goutami, 2007, Talei et al., 2010, Toth et al., 2000, Venkatesan et al., 1997). The contemporary studies focused on soft computing-based methods. Several examples of such methods can be mentioned. Venkatesan et al. (1997) employed the ANN to predict the all India summer monsoon rainfall with different meteorological parameters as model inputs. Chattopadhyay and Chattopadhyay (2008a) constructed an ANN model to predict monsoon rainfall in India depending on the rainfall series alone. The fuzzy logic theory was applied to monthly rainfall prediction by Pongracz et al. (2001). Toth et al. (2000) applied three time series models, auto-regressive moving average (ARMA), ANN and k-nearest-neighbors (KNN) method, to short-term rainfall prediction. The results showed that the ANN performed the best in the improvement of the runoff forecasting accuracy when the predicted rainfall was used as inputs of the rainfall-runoff model. ANN has also been applied on general circulation model (GCM). Chadwick et al. (2011) employed an artificial neural network approach to downscale GCM temperature and rainfall fields to regional model scale over Europe. Sachindra et al. (2011) developed a model with various soft computing techniques capable of statistically downscaling monthly GCM outputs to catchment scale monthly streamflows, accounting for the climate change.

Recently, models based on combining concepts have been paid more attention in hydrologic forecasting. Depending on different combination methods, combining models can be categorized into ensemble models and modular (or hybrid) models. The basic idea behind the ensemble models is to build several different or similar models for the same process and to combine them in a combining method (Abrahart and See, 2002, Kim et al., 2006, Shamseldin et al., 1997, Shamseldin and O'Connor, 1999, Xiong et al., 2001). For example, Xiong et al. (2001) used a Takagi-Sugeno fuzzy technique to combine several conceptual rainfall-runoff models. Coulibaly et al. (2005) employed an improved weighted-average method to coalesce forecasted daily reservoir inflows from the KNN model, conceptual model and ANN model. Kim et al. (2006) investigated five combining methods for improving ensemble streamflow prediction.

Physical processes in rainfall and/or runoff are generally composed of a number of sub-processes so that their accurate modeling by the building of a single global model is often not possible. Modular models are therefore proposed where sub-processes are first of all identified and then separate models (also called local or expert model) are established for each of them (Solomatine and Ostfeld, 2008). In these modular models, the split of training data can be soft or crisp. The soft split means the dataset can be overlapped and the overall forecasting output is the weighted-average of each local model (Shrestha and Solomatine, 2006, Zhang and Govindaraju, 2000, Wang et al., 2006, Wu et al., 2008). Zhang and Govindaraju (2000) examined the performance of modular networks in predicting monthly discharges based on the Bayesian concept. Wu et al. (2008) employed a distributed SVR for daily river stage prediction. On the contrary, there is no overlap of data in the crisp split and the final forecasting output is generated explicitly from one of the local models (Corzo and Solomatine, 2007, Jain and Srinivasulu, 2006, See and Openshaw, 2000, Sivapragasam and Liong, 2005, Solomatine and Xue, 2004). Solomatine and Xue (2004) used M5 model trees and neural networks in a flood-forecasting problem. Sivapragasam and Liong (2005) divided the flow range into three regions, and employed different SVR models to predict daily flows in high, medium and low regions.

Apart from the adoption of the modular model, the improvement of predictions may be expected by suitable data preprocessing techniques. Besides the conventional rescaling or standardization of training data, preprocessing methods from the perspective of signal analysis are also crucial because rainfall time series may be also viewed as a quasi-periodic signal, which is contaminated by various noises. Hence techniques such as singular spectrum analysis (SSA) were recently introduced to hydrology field by some researchers (Marques et al., 2006, Partal and Kişi, 2007, Sivapragasam et al., 2001). Sivapragasam et al. (2001) established a hybrid model of support vector machine (SVM) and the SSA for rainfall and runoff predictions. The hybrid model resulted in a considerable improvement in the model performance in comparison with the original SVM model. The application of wavelet analysis to precipitation was undertaken by Partal and Kişi (2007). Their results indicated that the wavelet analysis was highly promising. In addition, the issue of lagged predictions in the ANN model was mentioned by some researchers (Dawson and Wilby, 2001, Jain and Srinivasulu, 2004, De Vos and Rientjes, 2005, Muttil and Chau, 2006). A main reason on lagged predictions was the use of previous observed data as ANN inputs (De Vos and Rientjes, 2005). An effective solution was to obtain new model inputs by moving average over the original data series.

The scope of this study was to investigate the effect of the MA and SSA as data-preprocessing techniques and to couple with modular models in improving model performance for rainfall prediction. The modular model included three local models which were associated with three crisp subsets (low-, medium- and high-intensity rainfall) clustered by fuzzy C-mean (FCM) method. The ANN was first used to choose data-preprocessing method from MA and SSA. Depending on the selected data-preprocessing technique, modular models were employed to perform rainfall prediction. Generally, the ANN is very efficient in processing large-size training samples due to its parallel information processing configuration. The biggest drawback is that the model outputs are variable because of the random initialization of weights and biases. The SVR holds a good generalization and more stable model outputs. However, it is suitable for a small-size training sample (e.g. below 200) because the training time exponentially increases with the size of training samples. For the current rainfall data, the majority of subsets after data split belong to a small-size sample except for the low-intensity daily rainfall. Therefore, three local SVRs (hereafter referred as to MSVR) were employed for monthly rainfall data whereas two local SVRs and one ANN (hereafter referred to as ANN-SVR) were adopted for daily rainfall data. For daily rainfall record, the low-intensity subset was modeled by the ANN because it was overwhelming in the training data. For the comparison purpose, the global ANN and the persistence model were used as benchmarks. To ensure generalization of this study, four cases consisting of two monthly rainfall series and two daily rainfall series from India and China, were explored.

Section snippets

Moving average (MA)

The moving average method smoothes data by replacing each data point with the average of the K neighboring data points, where K may be called the length of memory window. The basic idea behind the method is that any large irregular component at any point in time will exert a smaller effect if we average the point with its immediate neighbors (Newbold et al., 2003). The most common moving average method is the unweighted moving average, in which each value of the data carries the same weight in

Case study

Two daily mean rainfall series (at Zhenwan and Wuxi raingauge stations, respectively) from Zhenshui and Da'ninghe watersheds of China, and two monthly mean rainfall series from India and Zhongxian raingauge station of China, were analyzed in thisstudy.

The Zhenshui basin is located in the north of Guangdong province and adjoined by Hunan province and Jianxi Province. The basin belongs to a second-order tributary of the Pearl River and has an area of 7554 km². The daily rainfall time series data

Decomposition of rainfall data

The decomposition of the daily average rainfall series requires identifying the window length m (or the singular number) if the interval of neighboring points in discrete time series is defaulted as the lag time (i.e. τ=1 day for daily rainfall data or 1month for monthly rainfall data). The reasonable value of m should give rise to a clear resolution of the original signal. The present study does not need accurately resolve any trends or oscillations in the raw rainfall signal. A rough

Results

The overall performances of each model in terms of RMSE, CE, and PI are presented in Table 2 for two monthly rainfall series and Table 3 for two daily rainfall series. It can be seen that two benchmark models of persistence and ANN demonstrated very poor performances for all four cases except for India. The performances from ANN-MA and ANN-SSA indicate that data-preprocessing methods resulted in considerable improvement in the accuracy of the rainfall forecasting. Moreover, the MA seems

Conclusions

The purpose of this study was to investigate the effect of modular models coupled with data-preprocessing techniques in improving the accuracy of rainfall forecasting. The modular models consisted of three local SVR and/or ANN. A three-layer feed-forward ANN was used to examine two data-preprocessing techniques, MA and SSA. Results show that the MA was superior to the SSA. Four rainfall records, India, Zhongxian, Wuxi and Zhenwan, from India and China, were used as testingcases.

With the help of

References (66)

G.J. Bowden et al.
Input determination for neural network models in water resources applications: Part 1—background and methodology
J. Hydrol.
(2005)
A. Dorum et al.
Optimized scenario for rainfall forecasting using genetic algorithm coupled with artificial neural network
Expert Syst. Appl.
(2010)
A. Jain et al.
Integrated approach to model decomposed flow hydrograph using artificial neural network and conceptual techniques
J. Hydrol.
(2006)
A.W. Jayawardena et al.
Analysis and prediction of chaos in rainfall and stream-flow time-series
J. Hydrol.
(1994)
O. Kişi
Constructing neural network sediment estimation models using a data-driven algorithm
Math. Comput. Simulation
(2008)
C.A.F. Marques et al.
Singular spectral analysis and forecasting of hydrological time series
Phys. Chem. Earth
(2006)
M. Nasseri et al.
Optimized scenario for rainfall forecasting using genetic algorithm coupled with artificial neural network
Expert Syst. Appl.
(2008)
T. Partal et al.
Wavelet and Neuro-fuzzy conjunction model for precipitation forecasting
J. Hydrol.
(2007)
R. Pongracz et al.
Fuzzy rule-based prediction of monthly precipitation
Phys. Chem. Earth Part B-Hydrol. Oceans Atmos.
(2001)
A. Sedki et al.
Evolving neural network using real coded genetic algorithm for daily rainfall–runoff forecasting
Expert Syst. Appl.
(2009)

A.Y. Shamseldin et al.

Methods for combining the outputs of different rainfall-runoff models

J. Hydrol.

(1997)

A. Talei et al.

A novel application of a neuro-fuzzy computational technique in event-based rainfall–runoff modeling

Expert Syst. Appl.

(2010)

E. Toth et al.

Comparison of short-term rainfall prediction models for real-time flood forecasting

J. Hydrol.

(2000)

R. Vautard et al.

Singular-spectrum analysis: a toolkit for short, noisy and chaotic signals

Physica D

(1992)

W. Wang et al.

Forecasting daily streamflow using hybrid ANN models

J. Hydrol.

(2006)

C.L. Wu et al.

River stage prediction based on a distributed support vector regression

J. Hydrol.

(2008)

L.H. Xiong et al.

A non-linear combination of the forecasts of rainfall-runoff models by the first-order Takagi-Sugeno fuzzy system

J. Hydrol.

(2001)

P.S. Yu et al.

Support vector regression for real-time flood stage forecasting

J. Hydrol.

(2006)

R.J. Abrahart et al.

Multi-model data fusion for river flow forecasting: an evaluation of six alternative methods based on two contrasting catchment

Hydrol. Earth Syst. Sci.

(2002)

A. Brath et al.

Neural networks and non-parametric methods for improving real time flood forecasting through conceptual hydrological models

Hydrol. Earth Syst. Sci.

(2002)

R. Chadwick et al.

An artificial neural network technique for downscaling GCM outputs to RCM spatial scale

Nonlinear Processes Geophys.

(2011)

J.C.L. Chan et al.

Prediction of the summer monsoon rainfall over South China

Int. J. Climatol.

(1999)

S. Chattopadhyay et al.

Identification of the best hidden layer size for three-layered neural net in predicting monsoon rainfall in India

J. Hydroinformatics

(2008)

S. Chattopadhyay et al.

Comparative study among different neural net learning algorithms applied to rainfall time series

Meteorol. Appl.

(2008)

B. Cheng et al.

Neural networks: a review from a statistical perspective

Stat. Sci.

(1994)

P.S. Chu et al.

Long-range prediction of Hawaiian winter rainfall using canonical correlation-analysis

Int. J. Climatol.

(1995)

G. Corzo et al.

Baseflow separation techniques for modular artificial neural network modelling in flow forecasting

Hydrol. Sci. J.

(2007)

P. Coulibaly et al.

Improving daily reservoir inflow forecasts with model combination

J. Hydrol. Eng.

(2005)

C.W. Dawson et al.

Hydrological modeling using artificial neural networks

Prog. Phys. Geography

(2001)

S. Davolio et al.

A meteo-hydrological prediction system based on a multi-model approach for precipitation forecasting

Nat. Hazards Earth Syst. Sci.

(2008)

T. DelSole et al.

Linear prediction of Indian monsoon rainfall

J. Climate

(2002)

N.J. De Vos et al.

Constraints of artificial neural networks for rainfall-runoff modeling: trade-offs in hydrological state representation and model evaluation

Hydrol. Earth Syst. Sci.

(2005)

T. Diomede et al.

Discharge prediction based on multi-model precipitation forecasts

Meteorol. Atmos. Phys.

(2008)

Cited by (273)

Comparative analysis of different rainfall prediction models: A case study of Aligarh City, India
2024, Results in Engineering
This research paper delves into creating and comparing rainfall prediction models, employing diverse machine learning algorithms, including Logistic Regression, Decision Tree Classifier, Multi-Layer Perceptron classifier (neural network), and Random Forest. The study aims not only to predict rainfall patterns but also to evaluate the performance of each model through metrics such as Accuracy, Cohen's kappa coefficient, and Receiver Operating Characteristic (ROC) curve analysis. Additionally, the relevance of the predictors employed in each model is thoroughly assessed. The results of extensive experimentation and analysis reveal that the Logistic Regression (Accuracy = 82.80 %, ROC = 82.45 %, Cohen's Kappa = 65.05 %) and Neural Network model (Accuracy = 82.59 %, ROC = 81.94 %, Cohen's Kappa = 64.40 %) has emerged as the most promising approach, achieving the highest percentage of accuracy, ROC and Cohen's Kappa metrics; among the models considered. This outcome underscores the effectiveness of Logistic Regression and Neural Network architectures in capturing intricate patterns and relationships within rainfall data.
How accurate are the machine learning models in improving monthly rainfall prediction in hyper arid environment?
2024, Journal of Hydrology
Arid regions like the United Arab Emirates (UAE) face a dire challenge of scarce water resources and unpredictable climate patterns. This study investigates the efficacy of advanced Machine Learning (ML) techniques in enhancing rainfall prediction within hyper-arid environments. Leveraging an extensive 30-year dataset from 1991 to 2020, this study harnessed the power of XGBoost, LSTM, Random Forest (RF), Gradient Boost (GB), Support Vector Machine (SVM), Multilayer Perceptron (MLP), Linear Regression (LR), and ensemble methods to significantly enhance the prediction accuracy of monthly rainfall over UAE. In the initial univariate analysis, focused solely on rainfall as the predictor, the ML models displayed encouraging performance during the training phase, achieving an impressive correlation coefficient (CC) of 0.88 for both XGBoost and the ensemble models. However, their predictive efficacy witnessed a decline during the testing phase, where the maximum CC reached 0.45. In contrast, traditional models like Linear Regression and SVM, yielded subpar results in both training and testing, exhibiting correlation values lower than 0.3. To address these limitations, a multivariate analysis is conducted by incorporating additional meteorological parameters, including wind speed, temperature, humidity, and evapotranspiration. This augmentation proved highly beneficial as it substantially enhanced the models' predictive capacities during the testing period. The XGB achieves a CC of 0.76, LSTM improves from 0.21 to 0.71, and stacked models exhibit promising behavior jumping from an average of 0.44 to 0.82 during the testing periods. Additionally, we performed a sensitivity analysis utilizing LASSO regression, which revealed that wind speed and minimum temperature emerged as the most influential parameters for monthly rainfall prediction in the arid context. These two meteorological factors exerted a substantial impact on the accuracy of our predictive models, underscoring their significance in understanding and forecasting rainfall patterns in hyper-arid regions, such as the United Arab Emirates. The identification of these key drivers further strengthens the foundation for effective water resource management and climate adaptation strategies in such challenging environments. This study provides valuable insights for water resource planning, agriculture, and climate resilience strategies in hyper-arid regions. Further research can build upon these results to enhance rainfall prediction models and support sustainable development in arid regions.
Series decomposition Transformer with period-correlation for stock market index prediction
2024, Expert Systems with Applications
Stock price forecasting has been always a difficult and crucial undertaking in the field of finance. In the last few decades, deep learning models based on RNNs and LSTMs have dominated the research, where the stock price data are modeled as time series data. However, the high volatility of stock prices and the decay of information learned from historical data prevented these models from achieving more accurate predictions in this problem. Recently, Transformer has been gradually applied in time series prediction, but the methods aim to feed the highly-uncertain social media information as the additional auxiliary information into Transformer, rather than improving the ability to extract features from historical series. In this paper, we propose a Series Decomposition Transformer with Period-correlation (SDTP), which uses the period-correlation mechanism and series decomposition layers to further discover relation between historical series and learn the changing trends in the stock market for high forecasting accuracy and generalizability. The extensive experimental results show that the proposed SDTP model generally outperforms the state-of-the-art methods on a collection of datasets.
An attention-mechanism-based deep fusion model for improving quantitative precipitation estimation in a sparsely-gauged basin
2024, Journal of Hydrology
Improving quantitative precipitation estimation (QPE) in sparsely-gauged basins via merging remote sensing precipitation data and rain gauge data still remains a challenge since most existing merging models degrade when the rain gauge data become limited. To address the challenge, we propose an attention-mechanism-based deep learning model, Multi-Level Transformer Fusion (MLTF) model, which allows to capture the inner interactions among the multi-source input data (TRMM 3B42 V7 data, GridSat-B1 data, and DEM) in the sparsely-gauged basin. Taking the source region of the Yellow River basin (SRYRB) as a representative case study, we demonstrate the performance of the proposed model and compare it with conventional methods (e.g., Multiplicative Bias Correction, Additive Bias Removal, linear regression, and Kriging) and deep-learning-based models (CNN, CNN-LSTM). Results indicate that the merged precipitation in SRYRB produced by the MLTF model exhibits a RMSE reduction of 27.1 %, MAE decrease of 11.2 %, and CC increase of 19.2 % in comparison to the original TRMM data, outperforming all the selected comparative methods. Finally, an improved daily precipitation dataset during 1999–2019 with a spatial resolution of 0.05° is produced for the study area. This study proposes a new method for QPE improvement in a sparsely-gauged basin, which would provide valuable data support for regional hydrological study and water resources management.
A new few-shot learning model for runoff prediction: Demonstration in two data scarce regions
2023, Environmental Modelling and Software
Most existing hydrologic models and machine learning models failed to perform well on runoff prediction in data scarce regions. As an alternative to this, the Long Short-Term Memory (LSTM)-prototypical network fusion model based on few-shot learning is proposed, where the strong learning ability of LSTM and the low data dependence of prototypical network are combined. The proposed model was calibrated and implemented on monthly runoff prediction in the Lancang River basin (LRB) and the source region of the Yellow River basin (SRYRB). Compared with eight state-of-the-art data driven models (LSTM, SVR, ANN, ARMA, Random Forest, SimpleRNN, GRU, and BiLSTM), the proposed model outperformed especially when less training data were used. Results in the LRB indicate NSE of the proposed model achieved 0.802 and 0.832 when the proportion of training data (K) was 20% and 45%, improved by 0.527 and 0.222 relative to the mean NSE of other models, respectively. In the SRYRB, NSE reached 0.830 and improved by 0.354 when K was 40%. The findings imply that the new few-shot learning model provides a promising tool for runoff prediction in the two investigated basins and possibly other data-scarce basins where precipitation dominates runoff change, which will benefit regional water resources management and water security.
Application of time series models to rainfall forecasting in Senai, Johor
2024, AIP Conference Proceedings

View all citing articles on Scopus

View full text

Prediction of rainfall time series using modular soft computingmethods

Abstract

Introduction

Section snippets

Moving average (MA)

Case study

Decomposition of rainfall data

Results

Conclusions

J. Hydrol.

Expert Syst. Appl.

J. Hydrol.

J. Hydrol.

Math. Comput. Simulation

Phys. Chem. Earth

Expert Syst. Appl.

J. Hydrol.

Phys. Chem. Earth Part B-Hydrol. Oceans Atmos.

Expert Syst. Appl.

J. Hydrol.

Expert Syst. Appl.

J. Hydrol.

Physica D

J. Hydrol.

J. Hydrol.

J. Hydrol.

J. Hydrol.

Multi-model data fusion for river flow forecasting: an evaluation of six alternative methods based on two contrasting catchment

Hydrol. Earth Syst. Sci.

Neural networks and non-parametric methods for improving real time flood forecasting through conceptual hydrological models

Hydrol. Earth Syst. Sci.

An artificial neural network technique for downscaling GCM outputs to RCM spatial scale

Nonlinear Processes Geophys.

Prediction of the summer monsoon rainfall over South China

Int. J. Climatol.

Identification of the best hidden layer size for three-layered neural net in predicting monsoon rainfall in India

J. Hydroinformatics

Comparative study among different neural net learning algorithms applied to rainfall time series

Meteorol. Appl.

Neural networks: a review from a statistical perspective

Stat. Sci.

Long-range prediction of Hawaiian winter rainfall using canonical correlation-analysis

Int. J. Climatol.

Baseflow separation techniques for modular artificial neural network modelling in flow forecasting

Hydrol. Sci. J.

Improving daily reservoir inflow forecasts with model combination

J. Hydrol. Eng.

Hydrological modeling using artificial neural networks

Prog. Phys. Geography

A meteo-hydrological prediction system based on a multi-model approach for precipitation forecasting

Nat. Hazards Earth Syst. Sci.

Linear prediction of Indian monsoon rainfall

J. Climate

Constraints of artificial neural networks for rainfall-runoff modeling: trade-offs in hydrological state representation and model evaluation

Hydrol. Earth Syst. Sci.

Discharge prediction based on multi-model precipitation forecasts

Meteorol. Atmos. Phys.