Elsevier

Energy

Volume 157, 15 August 2018, Pages 526-538
Energy

A randomized-algorithm-based decomposition-ensemble learning methodology for energy price forecasting

https://doi.org/10.1016/j.energy.2018.05.146Get rights and content

Highlights

  • A randomized-algorithm-based decomposition-ensemble learning method is proposed.

  • Randomized algorithms are used to construct extremely fast individual predictors.

  • The proposed method outperforms popular single methods and ensemble variants in accuracy.

  • EEMD-RVFL is extremely efficient and fast technique for energy price forecasting.

Abstract

Inspired by the interesting idea of randomization, some powerful but time-consuming decomposition-ensemble learning paradigms can be extended into extremely efficient and fast variants by using randomized algorithms as individual forecasting tools. In the proposed methodology, Three major steps, (1) data decomposition via ensemble empirical mode decomposition, (2) individual prediction via a randomized algorithm (using randomization to mitigate training time and parameter sensitivity), and (3) results ensemble to produce final prediction, are included. Different from other existing decomposition-ensemble models using traditional econometric approaches or computational intelligence methods in individual prediction, this study employs some emerging randomized algorithms—extreme learning machine, random vector functional link network (using randomly fixed weights and bias in neural networks), and random kitchen sinks (using randomly mapping features to approximate kernels)—to dramatically save computational time and enhance prediction accuracy. With the Brent oil prices and the Henry Hub natural gas prices as studying samples, the empirical study statistically confirms that the proposed randomized-algorithm-based decomposition-ensemble learning models are proved to be excellently efficient and fast, relative to popular single techniques (including computational intelligence methods and randomized algorithms) and similar decomposition-ensemble counterparts (using the aforementioned single techniques as individual forecasting tools).

Introduction

In the era of big data, energy price prediction, which is for capturing evolution laws of energy systems based on sufficient history observations (a typical case of big data) and thus providing a reliable future evaluation, has become an increasingly hot but challenging issue [1]. On the one hand, with the rapid development of the Internet and big data techniques, there exist a rich of data available concerning energy markets, which requires an urgent innovation of energy price forecasting techniques toward fast algorithms. Taking oil prices for example, besides historical time series data in different oil markets (e.g., the Brent and West Texas Intermediate), there are also a lot of myriad information with reference to the influencing factors like market factors (e.g., supplies and demands) [2] and external factors (e.g., substitutability with other energy resources, weather, stock levels, economic growth, political changes, demographics, emergency events, and even psychological expectations) [3]. Accordingly, a fast learning algorithm is extremely desirable to effectively process these big data and produce prediction results rapidly. On the other hand, given that a high level of noise cannot be avoided within energy systems, how to capture the true information and enhance prediction accuracy still remains a key issue in the area of energy price prediction. For example, an accurate prediction for oil prices can help to improve the corresponding plans of production, marketing and investment, control potential risks and increase future profits in the oil-related sectors [4]. For this purpose, this paper will focus on improving the existing forecasting techniques toward efficient and fast algorithms, in the context of big data.

According to the existing studies, there are an abundance of forecasting techniques for energy prices, which generally fall into three major groups—econometric approaches, computational intelligences (CIs) and hybrid algorithms (integrating two or more single models in any aforementioned type(s)) [5]. In energy price prediction, for example, popular econometric models are auto-regressive integrated moving average (ARIMA) [6], generalized autoregressive conditional heteroskedasticity (GARCH) [7], random walk (RW) [8], vector auto-regression (VAR) [9] and error correction models (ECM) [10]. However, these models hold the data assumptions of stationarity and nonlinearity which contradict the real energy systems. In such a context, CIs have been become the most dominant approaches in energy price forecasting, such as artificial neural networks (ANNs) [5], support vector regression (SVR) [11], least squares support vector regression (LSSVR) [12] and various CI-based optimization tools. However, these conventional CI techniques have two intrinsic shortcomings—time consuming and parameter sensitivity [13]. For example, ANNs, using gradient descent methods for tuning parameters (such as weights and bias), take a long training time but frequently fall into local optimum [14]. Similarly, SVR and LSSVR, using iterative learning algorithms (such as the grid searching method or the trial-and-error method) to determine regularization and kernel parameters, cannot avoid the double problems of time consuming and parameter sensitivity [15]. Due to the respective disadvantages of the first two types, the third type, i.e., hybrid techniques combining two or more single algorithms, have emerged and offered an excellent performance in energy price prediction.

In particular, the decomposition-ensemble learning paradigms based on the promising concept of “decomposition and ensemble” has been widely considered as an excellent case among hybrid methods [16]. In a typical decomposition-ensemble model, three major steps are included—data decomposition to decompose the original complex data into relatively simple components for reducing data complexity, individual prediction to model each extracted component independently, and results ensemble to aggregate individual predictions to the final predictions [12,16]. The superiority of decomposition-ensemble techniques has been proved in terms of prediction accuracy in the forecasting of energy prices such as oil prices [6] and gas prices [17]. However, the “decomposition and ensemble” strategy poses a big challenge, i.e., a large computational burden for modeling all the decomposed components individually. Furthermore, most existing decomposition-ensemble models employed CI-based individual predictors with iterative tuning processes, such as ANN, SVR and LSSVR [5,11,12], which largely aggravates the time-consuming problem [14]. In addition, the performances of these CI algorithms are heavily dependent on the predesigned parameters concerning the iterative learning process, any one of which set inappropriately will make a great difference in the final prediction [15]. In this sense, the emerging decomposition-ensemble learning methodology severely suffers from the double problems of time consuming and parameter sensitivity. For this purpose, this study will try to address the both issues of time consuming and parameter sensitivity.

Fortunately, the double issues of time consuming and parameter sensitivity can be nicely overcome by using the interesting idea of randomization, thereby an extremely efficient and fast decomposition-ensemble learning methodology can be developed. Based on randomization, some randomized algorithms have recently presented and shown excellent capabilities in terms of fast speed and prediction accuracy. In particular, the randomized algorithms employ randomly fixed parameters, randomly mapping features, randomly generated samples or randomly selected variables rather than iteratively tuned ones in conventional CIs, which effectively ensures an extremely fast learning speed and an excellent generalization performance [14]. Furthermore, without setting stopping criteria, learning rate, learning epochs and other parameters in learning processes, the problem of parameter sensitivity can be greatly solved. Typical cases include extreme learning machine (ELM) [18] and random vector functional links (RVFL) network [19] using randomly fixed input weights and hidden bias in neural networks, random kitchen sinks (RKS) [20] using randomly mapping features to approximate shift invariant kernels, and random forest (RF) [21] using randomly bootstrapped samples and randomly selected variables to grow a decision tree. These above randomized algorithms have extensively been applied to various complex systems such as electricity load [22,23], electricity production [24], pedestrian detection [25], wave energy flux [26], water demand [27], wind farm power ramp rates [28], mineral prospectivity [29], etc. Therefore, this paper try to introduce such emerging randomized algorithms to formulate some efficient and fast decomposition-ensemble learning models. To have a better overview of the conventional CIs and randomized algorithms, Table 1 synthetically presented the advantages and disadvantages of those intelligent approaches.

By using randomized algorithms, some randomized-algorithm-based decomposition-ensemble learning models have recently been developed and obtained satisfactory forecasting results. For example, Tang et al. [35] introduced ELM as the individual predictor into the “decomposition and ensemble” framework and observed the effectiveness of the proposed methodology in oil price prediction in terms of time-saving and accuracy. Wang et al. [36] used two-phase decomposition technique and modified extreme learning machine to forecast air quality index. Lu and Shao [37] developed an ensemble learning approach with ELM as the individual forecasting tool for computer products sales forecasting. Shrivastava [38] built a wavelet-based ELM decomposition-ensemble model for electricity price forecasting. Tang et al. [15] using RVFL developed a decomposition-ensemble learning paradigm for oil price forecasting. However, to the best of our knowledge, there were few decomposition-ensemble learning paradigms by using other probably more competitive randomized algorithms like RKS etc. Therefore, this study especially fills in such a literature gap by introducing various promising randomized algorithms and conducting a thorough comparison to explore whether the idea of randomization does improve the existing decomposition-ensemble learning paradigms in terms of speed and accuracy.

Generally speaking, this study aims to formulate some efficient and fast decomposition-ensemble learning models by using the emerging randomized algorithms in individual prediction for energy price forecasting, which well solves the double problems of time consuming and parameter sensitivity. The major contributions of this study can be summarized into two perspectives. First, by introducing various randomized algorithms, a series of randomized-algorithm-based decomposition-ensemble models are formulated. Second, a thorough comparison is conducted to check whether the idea of randomization does improve the existing decomposition-ensemble learning paradigms from the perspectives of speed and accuracy, and to explore the most efficient and fast one in energy price prediction.

The rest of this study is organized as follows. Section 2 describes the formulation process of the proposed methodology. For illustration and verification, the proposed methodology is performed to predict the Brent crude oil spot prices and the Henry Hub natural gas prices, as the results presented in Section 3. Finally, Section 4 concludes the major contributions of the paper, and discusses some interesting directions for future research.

Section snippets

Methodology formulation

This section presents the formulation process of the proposed methodology. In particular, Section 2.1 designs the general model framework, and Sections 2.2 EEMD, 2.3 Randomized algorithms describe the related techniques in detail.

Empirical study

For illustration, the Brent crude oil spot prices and the Henry Hub natural gas prices are selected as the studying samples. The experiment is designed in Section 3.1, and Section 3.2 presents the empirical results and discusses whether the proposed model by using randomized algorithms statistically improves the energy price prediction in terms of speed, accuracy and robustness.

Conclusions

To solve the time-consuming issue in existing decomposition-ensemble learning paradigms, this study introduces the emerging randomized algorithms to develop an efficient and fast decomposition-ensemble learning methodology for energy price forecasting. In particular, the randomized algorithms use randomization, in terms of randomly fixed parameters, randomly mapping features, randomly generated samples or randomly selected variables rather than iteratively tuned ones, which effectively ensures

Acknowledgements

This work is supported by grants from the National Natural Science Foundation of China (NSFC Nos. 71622011, 71433001 and 71301006), the National Program for Support of Top Notch Young Professionals, and Beijing Advanced Innovation Center for Soft Matter Science and Engineering.

References (45)

  • G.B. Huang et al.

    Extreme learning machine: theory and applications

    Neurocomputing

    (2006)
  • Y.H. Pao et al.

    Learning and generalization characteristics of the random vector functional-link net

    Neurocomputing

    (1994)
  • Y. Ren et al.

    Random vector functional link network for short-term electricity load demand forecasting

    Inf Sci

    (2016)
  • R. Nedellec et al.

    GEFCom2012: electric load forecasting and backcasting with semi-parametric models

    Int J Forecast

    (2014)
  • M. Zamo et al.

    A benchmark of statistical regression methods for short-term forecasting of photovoltaic electricity production, part I: deterministic forecast of hourly production

    Sol Energy

    (2014)
  • Z. Wang et al.

    A high accuracy pedestrian detection system combining a cascade AdaBoost detector and random vector functional-link net

    Sci World J

    (2014)
  • G. Ibarra-Berastegi et al.

    Short-term forecasting of the wave energy flux: analogues, random forests, and physics-based models

    Ocean Eng

    (2015)
  • M. Herrera et al.

    Predictive models for forecasting hourly urban water demand

    J Hydrol

    (2010)
  • V. Rodriguez-Galiano et al.

    Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines

    Ore Geol Rev

    (2015)
  • J.V. Tu

    Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes

    J Clin Epidemiol

    (1996)
  • J.R. Zhang et al.

    A hybrid particle swarm optimization–back-propagation algorithm for feedforward neural network training

    Appl Math Comput

    (2007)
  • D. Wang et al.

    A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine

    Sci Total Environ

    (2017)
  • Cited by (62)

    View all citing articles on Scopus
    View full text