A randomized-algorithm-based decomposition-ensemble learning methodology for energy price forecasting

doi:10.1016/j.energy.2018.05.146

Energy

Volume 157, 15 August 2018, Pages 526-538

https://doi.org/10.1016/j.energy.2018.05.146 Get rights and content

Highlights

•
A randomized-algorithm-based decomposition-ensemble learning method is proposed.
•
Randomized algorithms are used to construct extremely fast individual predictors.
•
The proposed method outperforms popular single methods and ensemble variants in accuracy.
•
EEMD-RVFL is extremely efficient and fast technique for energy price forecasting.

Abstract

Inspired by the interesting idea of randomization, some powerful but time-consuming decomposition-ensemble learning paradigms can be extended into extremely efficient and fast variants by using randomized algorithms as individual forecasting tools. In the proposed methodology, Three major steps, (1) data decomposition via ensemble empirical mode decomposition, (2) individual prediction via a randomized algorithm (using randomization to mitigate training time and parameter sensitivity), and (3) results ensemble to produce final prediction, are included. Different from other existing decomposition-ensemble models using traditional econometric approaches or computational intelligence methods in individual prediction, this study employs some emerging randomized algorithms—extreme learning machine, random vector functional link network (using randomly fixed weights and bias in neural networks), and random kitchen sinks (using randomly mapping features to approximate kernels)—to dramatically save computational time and enhance prediction accuracy. With the Brent oil prices and the Henry Hub natural gas prices as studying samples, the empirical study statistically confirms that the proposed randomized-algorithm-based decomposition-ensemble learning models are proved to be excellently efficient and fast, relative to popular single techniques (including computational intelligence methods and randomized algorithms) and similar decomposition-ensemble counterparts (using the aforementioned single techniques as individual forecasting tools).

Introduction

In the era of big data, energy price prediction, which is for capturing evolution laws of energy systems based on sufficient history observations (a typical case of big data) and thus providing a reliable future evaluation, has become an increasingly hot but challenging issue [1]. On the one hand, with the rapid development of the Internet and big data techniques, there exist a rich of data available concerning energy markets, which requires an urgent innovation of energy price forecasting techniques toward fast algorithms. Taking oil prices for example, besides historical time series data in different oil markets (e.g., the Brent and West Texas Intermediate), there are also a lot of myriad information with reference to the influencing factors like market factors (e.g., supplies and demands) [2] and external factors (e.g., substitutability with other energy resources, weather, stock levels, economic growth, political changes, demographics, emergency events, and even psychological expectations) [3]. Accordingly, a fast learning algorithm is extremely desirable to effectively process these big data and produce prediction results rapidly. On the other hand, given that a high level of noise cannot be avoided within energy systems, how to capture the true information and enhance prediction accuracy still remains a key issue in the area of energy price prediction. For example, an accurate prediction for oil prices can help to improve the corresponding plans of production, marketing and investment, control potential risks and increase future profits in the oil-related sectors [4]. For this purpose, this paper will focus on improving the existing forecasting techniques toward efficient and fast algorithms, in the context of big data.

According to the existing studies, there are an abundance of forecasting techniques for energy prices, which generally fall into three major groups—econometric approaches, computational intelligences (CIs) and hybrid algorithms (integrating two or more single models in any aforementioned type(s)) [5]. In energy price prediction, for example, popular econometric models are auto-regressive integrated moving average (ARIMA) [6], generalized autoregressive conditional heteroskedasticity (GARCH) [7], random walk (RW) [8], vector auto-regression (VAR) [9] and error correction models (ECM) [10]. However, these models hold the data assumptions of stationarity and nonlinearity which contradict the real energy systems. In such a context, CIs have been become the most dominant approaches in energy price forecasting, such as artificial neural networks (ANNs) [5], support vector regression (SVR) [11], least squares support vector regression (LSSVR) [12] and various CI-based optimization tools. However, these conventional CI techniques have two intrinsic shortcomings—time consuming and parameter sensitivity [13]. For example, ANNs, using gradient descent methods for tuning parameters (such as weights and bias), take a long training time but frequently fall into local optimum [14]. Similarly, SVR and LSSVR, using iterative learning algorithms (such as the grid searching method or the trial-and-error method) to determine regularization and kernel parameters, cannot avoid the double problems of time consuming and parameter sensitivity [15]. Due to the respective disadvantages of the first two types, the third type, i.e., hybrid techniques combining two or more single algorithms, have emerged and offered an excellent performance in energy price prediction.

In particular, the decomposition-ensemble learning paradigms based on the promising concept of “decomposition and ensemble” has been widely considered as an excellent case among hybrid methods [16]. In a typical decomposition-ensemble model, three major steps are included—data decomposition to decompose the original complex data into relatively simple components for reducing data complexity, individual prediction to model each extracted component independently, and results ensemble to aggregate individual predictions to the final predictions [12,16]. The superiority of decomposition-ensemble techniques has been proved in terms of prediction accuracy in the forecasting of energy prices such as oil prices [6] and gas prices [17]. However, the “decomposition and ensemble” strategy poses a big challenge, i.e., a large computational burden for modeling all the decomposed components individually. Furthermore, most existing decomposition-ensemble models employed CI-based individual predictors with iterative tuning processes, such as ANN, SVR and LSSVR [5,11,12], which largely aggravates the time-consuming problem [14]. In addition, the performances of these CI algorithms are heavily dependent on the predesigned parameters concerning the iterative learning process, any one of which set inappropriately will make a great difference in the final prediction [15]. In this sense, the emerging decomposition-ensemble learning methodology severely suffers from the double problems of time consuming and parameter sensitivity. For this purpose, this study will try to address the both issues of time consuming and parameter sensitivity.

Fortunately, the double issues of time consuming and parameter sensitivity can be nicely overcome by using the interesting idea of randomization, thereby an extremely efficient and fast decomposition-ensemble learning methodology can be developed. Based on randomization, some randomized algorithms have recently presented and shown excellent capabilities in terms of fast speed and prediction accuracy. In particular, the randomized algorithms employ randomly fixed parameters, randomly mapping features, randomly generated samples or randomly selected variables rather than iteratively tuned ones in conventional CIs, which effectively ensures an extremely fast learning speed and an excellent generalization performance [14]. Furthermore, without setting stopping criteria, learning rate, learning epochs and other parameters in learning processes, the problem of parameter sensitivity can be greatly solved. Typical cases include extreme learning machine (ELM) [18] and random vector functional links (RVFL) network [19] using randomly fixed input weights and hidden bias in neural networks, random kitchen sinks (RKS) [20] using randomly mapping features to approximate shift invariant kernels, and random forest (RF) [21] using randomly bootstrapped samples and randomly selected variables to grow a decision tree. These above randomized algorithms have extensively been applied to various complex systems such as electricity load [22,23], electricity production [24], pedestrian detection [25], wave energy flux [26], water demand [27], wind farm power ramp rates [28], mineral prospectivity [29], etc. Therefore, this paper try to introduce such emerging randomized algorithms to formulate some efficient and fast decomposition-ensemble learning models. To have a better overview of the conventional CIs and randomized algorithms, Table 1 synthetically presented the advantages and disadvantages of those intelligent approaches.

By using randomized algorithms, some randomized-algorithm-based decomposition-ensemble learning models have recently been developed and obtained satisfactory forecasting results. For example, Tang et al. [35] introduced ELM as the individual predictor into the “decomposition and ensemble” framework and observed the effectiveness of the proposed methodology in oil price prediction in terms of time-saving and accuracy. Wang et al. [36] used two-phase decomposition technique and modified extreme learning machine to forecast air quality index. Lu and Shao [37] developed an ensemble learning approach with ELM as the individual forecasting tool for computer products sales forecasting. Shrivastava [38] built a wavelet-based ELM decomposition-ensemble model for electricity price forecasting. Tang et al. [15] using RVFL developed a decomposition-ensemble learning paradigm for oil price forecasting. However, to the best of our knowledge, there were few decomposition-ensemble learning paradigms by using other probably more competitive randomized algorithms like RKS etc. Therefore, this study especially fills in such a literature gap by introducing various promising randomized algorithms and conducting a thorough comparison to explore whether the idea of randomization does improve the existing decomposition-ensemble learning paradigms in terms of speed and accuracy.

Generally speaking, this study aims to formulate some efficient and fast decomposition-ensemble learning models by using the emerging randomized algorithms in individual prediction for energy price forecasting, which well solves the double problems of time consuming and parameter sensitivity. The major contributions of this study can be summarized into two perspectives. First, by introducing various randomized algorithms, a series of randomized-algorithm-based decomposition-ensemble models are formulated. Second, a thorough comparison is conducted to check whether the idea of randomization does improve the existing decomposition-ensemble learning paradigms from the perspectives of speed and accuracy, and to explore the most efficient and fast one in energy price prediction.

The rest of this study is organized as follows. Section 2 describes the formulation process of the proposed methodology. For illustration and verification, the proposed methodology is performed to predict the Brent crude oil spot prices and the Henry Hub natural gas prices, as the results presented in Section 3. Finally, Section 4 concludes the major contributions of the paper, and discusses some interesting directions for future research.

Section snippets

Methodology formulation

This section presents the formulation process of the proposed methodology. In particular, Section 2.1 designs the general model framework, and Sections 2.2 EEMD, 2.3 Randomized algorithms describe the related techniques in detail.

Empirical study

For illustration, the Brent crude oil spot prices and the Henry Hub natural gas prices are selected as the studying samples. The experiment is designed in Section 3.1, and Section 3.2 presents the empirical results and discusses whether the proposed model by using randomized algorithms statistically improves the energy price prediction in terms of speed, accuracy and robustness.

Conclusions

To solve the time-consuming issue in existing decomposition-ensemble learning paradigms, this study introduces the emerging randomized algorithms to develop an efficient and fast decomposition-ensemble learning methodology for energy price forecasting. In particular, the randomized algorithms use randomization, in terms of randomly fixed parameters, randomly mapping features, randomly generated samples or randomly selected variables rather than iteratively tuned ones, which effectively ensures

Acknowledgements

This work is supported by grants from the National Natural Science Foundation of China (NSFC Nos. 71622011, 71433001 and 71301006), the National Program for Support of Top Notch Young Professionals, and Beijing Advanced Innovation Center for Soft Matter Science and Engineering.

References (45)

M. Monge et al.
Crude oil price behaviour before and after military conflicts and geopolitical events
Energy
(2017)
Y. Zhao et al.
A deep learning ensemble approach for crude oil price forecasting
Energy Econ
(2017)
H. Van Goor et al.
Modeling natural gas price volatility: the case of the UK gas market
Energy
(2014)
A. Murat et al.
Forecasting oil price movements with crack spread futures
Energy Econ
(2009)
F.P. Chiu et al.
Modeling the price relationships between crude oil, energy crops and biofuels
Energy
(2016)
A. Lanza et al.
Modeling and forecasting cointegrated relationships among heavy oil and product prices
Energy Econ
(2005)
B. Zhu et al.
Forecasting carbon price using empirical mode decomposition and evolutionary least squares support vector regression
Appl Energy
(2017)
L. Zhang et al.
A survey of randomized algorithms for training neural networks
Inf Sci
(2016)
X. Qiu et al.
Short-term electricity price forecasting with empirical mode decomposition based ensemble kernel machines
Procedia Comput Sci
(2017)
H.T. Nguyen et al.
Short-term electricity demand and gas price forecasts using wavelet transforms and adaptive models
Energy
(2010)

G.B. Huang et al.

Extreme learning machine: theory and applications

Neurocomputing

(2006)

Y.H. Pao et al.

Learning and generalization characteristics of the random vector functional-link net

Neurocomputing

(1994)

Y. Ren et al.

Random vector functional link network for short-term electricity load demand forecasting

Inf Sci

(2016)

R. Nedellec et al.

GEFCom2012: electric load forecasting and backcasting with semi-parametric models

Int J Forecast

(2014)

M. Zamo et al.

A benchmark of statistical regression methods for short-term forecasting of photovoltaic electricity production, part I: deterministic forecast of hourly production

Sol Energy

(2014)

Z. Wang et al.

A high accuracy pedestrian detection system combining a cascade AdaBoost detector and random vector functional-link net

Sci World J

(2014)

G. Ibarra-Berastegi et al.

Short-term forecasting of the wave energy flux: analogues, random forests, and physics-based models

Ocean Eng

(2015)

M. Herrera et al.

Predictive models for forecasting hourly urban water demand

J Hydrol

(2010)

V. Rodriguez-Galiano et al.

Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines

Ore Geol Rev

(2015)

J.V. Tu

Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes

J Clin Epidemiol

(1996)

J.R. Zhang et al.

A hybrid particle swarm optimization–back-propagation algorithm for feedforward neural network training

Appl Math Comput

(2007)

D. Wang et al.

A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine

Sci Total Environ

(2017)

Cited by (62)

A novel hybrid model for crude oil price forecasting based on MEEMD and Mix-KELM
2024, Expert Systems with Applications
It is of vital importance for governments, enterprises, and investors to forecast crude oil prices accurately, while this task is beset with difficulties and challenges due to the complex patterns in oil prices. This paper aims to propose a novel hybrid method to model and forecast the crude oil price by integrating median ensemble empirical mode decomposition (MEEMD) and mix-kernel extreme learning machine (Mix-KELM). Firstly, the emerging MEEMD is employed to decompose the crude oil price into several simple subseries. Secondly, a novel mix-kernel is developed for extreme learning machine (ELM) by combining the advantage of the local kernel (i.e., Radial Basis Function in learning ability) and global kernel (i.e., Sigmoid in generalization ability), with weights of the kernels optimized through genetic algorithm. Thirdly, the proposed Mix-KELM is applied to forecast the subseries of crude oil price, and the sub-forecasting results are integrated to generate the final results. The empirical results show that our proposed MEEMD-Mix-KELM model with different forecasting horizons significantly outperforms the benchmarks in terms of forecasting accuracy and robustness test. Taking one-step-ahead forecasting as an example, the proposed model exhibits the lowest prediction errors in terms of mean absolute error, symmetric mean absolute percentage error, and root mean squared error with values of 1.1767, 0.0135, and 1.5717, respectively.
A novel secondary decomposition method for forecasting crude oil price with twitter sentiment
2024, Energy
With the ubiquity of the Internet, valuable social media data have been generated, and a promising idea for using sentiment from social media has emerged in oil price forecasting. To address the complexity of oil price, this study proposed a novel secondary decomposition framework with Twitter sentiment, in which Twitter sentiment provides new information and secondary decomposition technique reduces the difficulty of oil price forecasting. This methodology involves three major steps: (1) sentiment extraction, to collect and extract Twitter sentiment via the state-of-the-art sentiment analysis technique—Bidirectional Encoder Representations from Transformers (BERT); (2) secondary decomposition, to extract scale-aligned components from crude oil price and Twitter sentiment using bivariate empirical mode decomposition (BEMD) first and then decomposing the residual terms through GA-VMD; and (3) oil price prediction, including individual prediction at each intrinsic mode function (IMF) and ensemble prediction across different IMFs. With WTI oil price as a sample, the empirical study results indicate that the proposed novel learning paradigms statistically outperform their corresponding original techniques (without Twitter sentiment and secondary decomposition), semi-improved variants (with either Twitter sentiment or secondary decomposition), and similar counterparts (with one-time decomposition analysis) in terms of prediction accuracy.
A novel interval-based hybrid framework for crude oil price forecasting and trading
2024, Energy Economics
Existing research has demonstrated the effectiveness of hybrid models in improving the accuracy of crude oil forecasting compared to single models. However, these works usually focus on point-valued crude oil closing prices which may suffer from information loss. Instead, this paper proposes a novel interval-based framework based on the principle of “divide and conquer”. After deploying variational mode decomposition (VMD) on an original training series to decompose it into low- and high-frequency components, a newly proposed autoregressive conditional interval (ACI) model is applied to predict the interval-valued low-frequency component which is treated as an inseparable random set, while the interval-valued high-frequency component is predicted by interval long short-term memory (iLSTM) networks. Combination of the two parts yields the final interval-valued prediction. A trading strategy for interval-valued data is designed and executed on a daily basis. Compared to benchmark models and competing trading strategies, the proposed framework can generate superior forecasts and deliver enhanced trading performances. The analysis within this study indicates that the framework’s outstanding performance is robust to various forecasting horizons.
Ensemble learning methods using the Hodrick–Prescott filter for fault forecasting in insulators of the electrical power grids
2023, International Journal of Electrical Power and Energy Systems
Electrical power grid insulators installed outdoors are exposed to environmental conditions, such as the accumulation of contaminants on their surface. The contaminants increase the surface conductivity of the insulators, increasing leakage current until there is a flashover. Evaluating the increase in leakage current in relation to the contamination level is one way to determine the insulation condition. This paper evaluates a time series of leakage current from a high-voltage laboratory experiment using porcelain pin-type insulators. Time series forecasting is performed with a collection of machine learning models known as ensemble learning approaches, which include blending, bootstrap aggregation (bagging), sequential learning (boosting), random subspace, and stacked generalization. According to this paper’s findings, applying these ensemble learning approaches is useful for enhancing the performance of the machine learning models in forecasting the occurrence of breakdowns in the electrical power system. The Hodrick–Prescott filter reduces the root mean square error performance metric (to be minimized) by 2.69 times using the ensemble random subspace approach. According to the results of this paper, the proposed method is stable, with low variance when a statistical analysis is performed, being superior to the long short-term memory neural network.
Random vector functional link network: Recent developments, applications, and future directions
2023, Applied Soft Computing
Neural networks have been successfully employed in various domains such as classification, regression and clustering, etc. Generally, the back propagation (BP) based iterative approaches are used to train the neural networks, however, it results in the issues of local minima, sensitivity to learning rate and slow convergence. To overcome these issues, randomization based neural networks such as random vector functional link (RVFL) network have been proposed. RVFL model has several characteristics such as fast training speed, direct links, simple architecture, and universal approximation capability, that make it a viable randomized neural network. This article presents the first comprehensive review of the evolution of RVFL model, which can serve as the extensive summary for the beginners as well as practitioners. We discuss the shallow RVFLs, ensemble RVFLs, deep RVFLs and ensemble deep RVFL models. The variations, improvements and applications of RVFL models are discussed in detail. Moreover, we discuss the different hyperparameter optimization techniques followed in the literature to improve the generalization performance of the RVFL model. Finally, we present potential future research directions/opportunities that can inspire the researchers to improve the RVFL’s architecture and learning algorithm further.
What can be learned from the historical trend of crude oil prices? An ensemble approach for crude oil price forecasting
2023, Energy Economics
Crude oil price series are nonlinear and highly volatile, making it difficult to obtain satisfactory performance for traditional statistical-based forecasting methods. To improve forecasting accuracy, this study proposes a novel learning paradigm by integrating the trajectory similarity method with machine learning models based on the decomposition-ensemble framework. In the proposed learning paradigm, raw data of international crude oil prices are first decomposed using variational mode decomposition (VMD), after which, using sample entropy (SE), the resulting essential modal functions are divided into high and low frequencies. The process aims to reorganize the data by using the forecasting properties of different models. Finally, to obtain the final forecasting results, two models, i.e., the trajectory similarity method (TS) and long short term memory neural network (LSTM), are applied to predict and sum up the low and high-frequency subseries, respectively. As sample data for validation, this study selected the international crude oil price series of West Texas Intermediate (WTI) and Brent. Experimental results showed that the proposed VMD-SE-TS/LSTM learning paradigm significantly outperforms all other benchmark models, including the single models without decomposition and the hybrid models with decomposition. The proposed approach performs best in different evaluation metrics and statistical tests under different horizons, indicating that the proposed VMD-SE-TS/LSTM learning paradigm is effective and robust in crude oil price forecasting.

View all citing articles on Scopus

View full text

A randomized-algorithm-based decomposition-ensemble learning methodology for energy price forecasting

Highlights

Abstract

Introduction

Section snippets

Methodology formulation

Empirical study

Conclusions

Acknowledgements

Energy

Energy Econ

Energy

Energy Econ

Energy

Energy Econ

Appl Energy

Inf Sci

Procedia Comput Sci

Energy

Neurocomputing

Neurocomputing

Inf Sci

Int J Forecast

Sol Energy

Sci World J

Ocean Eng

J Hydrol

Ore Geol Rev

J Clin Epidemiol

Appl Math Comput

Sci Total Environ