Online hierarchical forecasting for power consumption data

doi:10.1016/j.ijforecast.2021.05.011

International Journal of Forecasting

Volume 38, Issue 1, January–March 2022, Pages 339-351

https://doi.org/10.1016/j.ijforecast.2021.05.011 Get rights and content

Abstract

This paper proposes a three-step approach to forecasting time series of electricity consumption at different levels of household aggregation. These series are linked by hierarchical constraints—global consumption is the sum of regional consumption, for example. First, benchmark forecasts are generated for all series using generalized additive models. Second, for each series, the aggregation algorithm ML-Poly, introduced by Gaillard, Stoltz, and van Erven in 2014, finds an optimal linear combination of the benchmarks. Finally, the forecasts are projected onto a coherent subspace to ensure that the final forecasts satisfy the hierarchical constraints. By minimizing a regret criterion, we show that the aggregation and projection steps improve the root mean square error of the forecasts. Our approach is tested on household electricity consumption data; experimental results suggest that successive aggregation and projection steps improve the benchmark forecasts at different levels of household aggregation.

Introduction

New opportunities come with the recent deployment of smart grids and the installation of meters: they record consumption quasi- instantaneously in households. From these records, time series of demand are obtained at various levels of aggregation, such as consumption profiles and regions. For privacy reasons, household records may not be used directly. Moreover, consumption at the individual level is erratic and difficult to predict. This is why we focus on household aggregations. For demand management, it is useful to predict the global consumption. Furthermore, to dispatch the electricity into the grid correctly, forecasting demand at a regional level is also an important goal. Finally, a good estimation of the consumption of some groups of consumers (with the same profile) could be helpful for the electricity provider, which may adapt its offer to perform effective demand-side management. Thus, forecasts at various aggregated levels (the entire population, particular geographical areas, or groups of the same consumption profile) are useful for an efficient management of consumption. In this work, we first build benchmark forecasts at each aggregation level, and independently, using generalized additive models. Noticing that these time series may be correlated (e.g. the consumption of a given region may be close to that of a neighboring region) and connected to each other through summation constraints (e.g. the global consumption is the sum of each region’s consumption), the problem considered falls under the umbrella of hierarchical time series forecasting (see, among others, Hyndman, Ahmed, Athanasopoulos, & Shang, 2011). Using these hierarchical relationships may improve the benchmark forecasts that were generated. Our approach consists in combining two methods: benchmark aggregation and projection in a constrained space. Our aim is to improve forecasts at both the global and local levels.

Traditionally, two types of methods have been used for hierarchical forecasting: bottom-up and top-down approaches. In the bottom-up approaches (see Dunn, Williams, & DeChaine, 1976) forecasts are constructed for lower-level quantities and are then summed up to obtain forecasts at the upper levels. In contrast, top-down approaches (see Gross & Sohl, 1990) work by forecasting aggregated quantities and then determining dis-aggregate proportions to compute lower-level predictions. Shlifer and Wolff (1979) compare these two families of methods and conclude that bottom-up approaches work better. Recently, bottom-up approaches have indeed proven successful for load forecasting to improve the global consumption prediction error (see, among others, Auder, Cugliari, Goude, & Poggi, 2018). Other approaches (neither bottom-up nor top-down) were recently introduced. For example, Wickramasuriya, Athanasopoulos, and Hyndman (2019) forecast all nodes in the hierarchy and reconcile (i.e. impose the respect of hierarchical constraints) them by projection. Their general minimum trace (MinT) approach attempts to capture some cross-sectional information between times series via the covariance matrix of the errors of the base forecasts. It includes both oblique and orthogonal projections (this is discussed from a geometric perspective in Panagiotelis, Athanasopoulos, Gamakumara, & Hyndman, 2020). Moreover, Van Erven and Cugliari (2015) introduce a game-theoretically optimal reconciliation method to improve a given set of forecasts. Firstly, one comes up with some forecasts for the time series without worrying about hierarchical constraints, and then a reconciliation procedure is used to make the forecasts aggregate-consistent. This generalizes the previous orthogonal projection to other possible projections in the constrained space, and thus ensures that the forecasts satisfy the hierarchy. Most work on hierarchical forecasting concentrates on the mean, but some recent work has addressed probabilistic forecasting, including that of Ben Taieb et al., 2017, Ben Taieb et al., 2020, and Panagiotelis et al. (2020).

Aggregation methods (also called ensemble methods) for individual sequence forecasting originate from theoretical works by Vovk (1990), Cover (1991), and Littlestone and Warmuth (1994). Their distinguishing feature with respect to classical ensemble methods is that they do not rely on any stochastic modeling of the observations, and thus are able to combine forecasts independently of their generating process. They have proven to be very effective at predicting time series (see, for instance, Mallet, Stoltz, & Mauricette, 2009 and Devaine, Gaillard, Goude, & Stoltz, 2013) and have been used to win forecasting competitions (see Gaillard, Goude, & Nedellec, 2016). This aggregation approach has recently been extended to the hierarchical setting by Goehry, Goude, Massart, and Poggi (2020). They used a bottom-up forecasting approach that consists in aggregating the consumption forecasts of small clusters of customers.

In this article we combine the reconciliation approach based on orthogonal projection with some aggregation algorithms to propose a three-stage meta-algorithm, which is as follows:

1. Generate base forecasts for all times series in the hierarchy.

2. Apply, for each series, the aggregation algorithm that finds an optimal linear combination of the base forecasts.

3. Project the combination forecasts onto a coherent subspace to ensure that the final forecasts satisfy the hierarchical constraints.

The second step here provides the innovation (Steps 1 and 3 on their own are equivalent to the ordinary least squares version of the MinT algorithm—see Wickramasuriya et al., 2019). By including an aggregation algorithm between these steps, much more of the cross-sectional information can be captured, thus improving the forecasts. A theoretical result is provided for the regret bound of the meta-algorithm, which ensures that the aggregation and projection steps improve the root mean square error of the forecasts. We then illustrate the proposed methods using smart meter data collected in Great Britain by multiple energy providers (see Schellong, 2011 and AECOM, 2018). The data were provided by the Energy Demand Research Project, which gathered power consumption data from multiple households. We consider two population segmentations: a spatial segmentation based on the location of the households, and a behavioral one based on household consumption profiles. For all aggregation levels, we generate benchmark forecasts using generalized additive models (see Wood, 2006) and use the polynomially weighted average forecaster with multiple learning rates (ML-Poly, see Gaillard, Stoltz, & van Erven, 2014 and Gaillard, 2015) as an aggregation algorithm to combine these predictions. We evaluate the performance of four types of predictions: benchmarks, aggregated benchmarks, projected benchmarks, and, finally, aggregated and projected benchmarks. The results show that the proposed approach improves the root mean square error of the forecasts at the different levels of household aggregation.

Without further indications, $∥ x ∥$ denotes the Euclidean norm of a vector $x$ . For the other norms, a subscript is used: e.g. the $L 1$ -norm and the infinity norm of $x$ are denoted by $∥ x ∥_{1}$ and $∥ x ∥_{\infty}$ , respectively. Moreover, vectors are in bold type and unless stated otherwise, they are column vectors. Matrices are underlined in bold. We denote the inner product of two vectors $x$ and $y$ of the same size by $x \cdot y = x^{T} y$ . Finally, the cardinality of a finite set $D$ is denoted by $| D |$ .

Section snippets

Methodology

With $Γ$ , a set of aggregation levels (e.g. the entire population, particular regions, or behavioral clusters of households), we consider the set of time series ${{(y_{t}^{γ})}_{_{t > 0}}, γ \in Γ}$ connected to each other by some summation constraints: a few of them are equal to the sum of several others. To forecast these time series, a set of benchmark forecasts is generated. At any time step $t$ , we want to forecast the vector of the values of the $| Γ |$ times series at $t$ , denoted by $y_{t} \overset{def}{=} {(y_{t}^{γ})}_{γ \in Γ}$ . For each node $γ$ and

Main theoretical result

Here, we introduce the following notation concerning the regret bound of Algorithm $A$ .

Assumption 1

We assume that, for any set $D \in R^{| Γ |}$ , for any $γ \in Γ$ with the initialization parameter vector $s_{0}^{γ}$ , for $T > 0$ , any $x_{1 : T} = x_{1}, \dots x_{T}$ , and any $y_{1 : T}^{γ} = y_{1}^{γ}, \dots, y_{T}^{γ}$ , Algorithm $A^{γ}$ provides a regret bound of the following form: $R_{T}^{γ} (D) \overset{def}{=} \sum_{t = 1}^{T} {(y_{t}^{γ} - {\hat{y}}_{t}^{γ})}^{2} - min_{u^{γ} \in D} \sum_{t = 1}^{T} {(y_{t}^{γ} - u^{γ} \cdot x_{t})}^{2} ⩽ B (x_{1 : T}^{γ}, y_{1 : T}^{γ}, s_{0}^{γ}) .$

As getting a linear bound is trivial (by using the common assumption that prediction errors are bounded), the bounds $B (\dots)$ must be

An example of an aggregation algorithm: Polynomially weighted average forecaster with multiple learning rates (ML-Poly)

At a time step $t$ , for a node $γ \in Γ$ , a copy $A^{γ}$ of an aggregation algorithm $A$ takes the benchmark vector $x_{t}$ , which contains the predictions of all the nodes (including that of the considered node), as an input and outputs a weight vector $u_{t}^{γ}$ and thus the forecast $y_{t}^{γ}$ with $u_{t}^{γ} \cdot x_{t}$ . Recall that the benchmark forecasts ${(x_{t}^{γ})}_{γ \in Γ}$ are generated independently with possibly different exogenous variables but that the observations ${(y_{t}^{γ})}_{γ \in Γ}$ may be strongly correlated. This is why we consider aggregation to

Experiments

Our application relies on the electricity consumption data of a large number of households to which we have added meteorological data. The regions of the households are also provided. The full data set is presented in Section 5.1. From these temporal and non-temporal data, we dispatch the households into two segmentations: the first one is based on the household location information; the second is behavioral and relies on the method presented in Section 5.2. We describe the experiments and

Conclusion

We proposed a three-step approach to forecasting electricity consumption time series at different levels of household aggregation and linked by hierarchical constraints. After generating benchmark forecasts using generalized additive models, our method aggregates them with the ML-Poly algorithm. Finally, the forecasts are projected onto a coherent subspace to ensure that the final forecasts satisfy the hierarchical constraints. A theoretical result ensures, via a regret bound, that this

Acknowledgments

Many thanks to Gilles Stoltz for helpful discussions and many re-reads of the paper. We also thank Yannig Goude and Pierre Gaillard for spotting technical imprecisions and typographical errors.

References (30)

AmatC. et al.
Fundamentals and exchange rate forecastability with simple machine learning methods
Journal of International Money and Finance
(2018)
GaillardP. et al.
Additive models and robust aggregation for gefcom2014 probabilistic electric load and electricity price forecasting
International Journal of Forecasting
(2016)
HyndmanR.J. et al.
Optimal combination forecasts for hierarchical time series
Computational Statistics & Data Analysis
(2011)
KivinenJ. et al.
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
(1997)
LittlestoneN. et al.
The weighted majority algorithm
Information and Computation
(1994)
AECOM
Energy demand research project: Early smart meter trials, 2007–2010
(2018)
AuderB. et al.
Scalable clustering of individual electrical curves for profiling and bottom-up forecasting
Energies
(2018)
Ben TaiebS. et al.
Coherent probabilistic forecasts for hierarchical time series
Ben TaiebS. et al.
Hierarchical probabilistic forecasting of electricity demand with smart meter data
Journal of the American Statistical Association
(2020)
Cesa-BianchiN. et al.
Prediction, learning, and games
(2006)

CoverT.M.

Universal portfolios

Mathematical Finance

(1991)

Deswarte, R., Gervais, V., Stoltz, G., & Da Veiga, S. (2018). Sequential model aggregation for production forecasting....

DevaineM. et al.

Forecasting electricity consumption by aggregating specialized experts

Machine Learning

(2013)

DunnD.M. et al.

Aggregate versus subaggregate models in local area forecasting

Journal of the American Statistical Association

(1976)

FanS. et al.

Short-term load forecasting based on a semi-parametric additive model

IEEE Transactions on Power Systems

(2011)

Cited by (13)

Integrated Approaches in Resilient Hierarchical Load Forecasting via TCN and Optimal Valley Filling Based Demand Response Application
2024, Applied Energy
Considering the electricity market, data analytics paves the way for completely new strategies regarding demand and supply-side policies. In this manner, predictive analysis of the demanded power accuracy is carried out to boost profits and increase the penetration of similar demand response (DR) programs across all levels of end-user categories. Residential loads experience stiff spikes and unpredictable variations due to occupancy activities and environmental factors. To address this, we first propose a robust short-term multivariate-multistep forecasting framework that is resilient to missing or erroneous data, employing temporal convolution networks (TCNs). We then incorporate two distinct valley-filling indices to optimize the charging of electric vehicle loads according to DR requirements, showcasing the efficacy of leveraging artificial intelligence to enhance the utilization of clean energy resources. Simulation studies are conducted using real-world nodal residential loads with hourly granularity. The results demonstrate that the forecasting method is reliable for residential locations, even when dealing with highly damaged data. The case studies effectively fill the load into the valleys and minimize fluctuations in residential locations. Through the integration of emission-aware forecasting and optimization strategies, our study lays the groundwork for a comprehensive approach that not only improves economic outcomes and grid stability but also advances the imperative of reducing carbon emissions.
Forecast reconciliation: A review
2024, International Journal of Forecasting
Collections of time series formed via aggregation are prevalent in many fields. These are commonly referred to as hierarchical time series and may be constructed cross-sectionally across different variables, temporally by aggregating a single series at different frequencies, or even generalised beyond aggregation as time series that respect linear constraints. When forecasting such time series, a desirable condition is for forecasts to be coherent: to respect the constraints. The past decades have seen substantial growth in this field with the development of reconciliation methods that ensure coherent forecasts and improve forecast accuracy. This paper serves as a comprehensive review of forecast reconciliation and an entry point for researchers and practitioners dealing with hierarchical time series. The scope of the article includes perspectives on forecast reconciliation from machine learning, Bayesian statistics and probabilistic forecasting, as well as applications in economics, energy, tourism, retail demand and demography.
Hierarchical transfer learning with applications to electricity load forecasting
2024, International Journal of Forecasting
The recent abundance of electricity consumption data available at different scales provides new opportunities and highlights the need for new techniques to leverage information present at finer scales in order to improve forecasts at wider scales. In this study, we take advantage of the similarity between this hierarchical prediction problem and transfer learning where source data are observed at a low aggregation level and target data at a global level. We develop two methods for hierarchical transfer learning based on stacking generalized additive models and random forests (GAM-RF). We also propose and compare adaptations of online aggregation of experts in a hierarchical context using quantile GAM-RF as experts. We apply these methods to two electricity load forecasting problems at the national scale by using smart meter data in the first case and regional data in the second case. For these two user cases, we compared the performance of our methods and benchmark algorithms, and investigated their behavior using variable importance analysis. Our results demonstrate that both methods can lead to significantly improved predictions.
Hierarchical forecasting for aggregated curves with an application to day-ahead electricity price auctions
2024, International Journal of Forecasting
Citation Excerpt :
The most notable is the optimal minimum trace reconciling method (Wickramasuriya et al., 2019). Other methods have also been proposed, such as a game-theoretically optimal reconciliation approach (Van Erven & Cugliari, 2015), averaging approaches called level conditional coherent and combined conditional coherent point forecasts (Di Fonzo & Girolimetto, 2021; Hollyman, Petropoulos, & Tipping, 2021), and machine-learning based reconciliation (Brégère & Huard, 2022; Huard, Garnier, & Stoltz, 2020; Spiliotis et al., 2021). In this study, we introduce four novel features into the field of hierarchical forecasting.
Aggregated curves are common structures in economics and finance, and the most prominent examples are supply and demand curves. In this study, we exploit the fact that all aggregated curves have an intrinsic hierarchical structure, and thus hierarchical reconciliation methods can be used to improve the forecasting accuracy. We provide an in-depth theory of how aggregated curves can be constructed or deconstructed, and conclude that these methods are equivalent under weak assumptions. We consider multiple reconciliation methods for aggregated curves, including previously established bottom-up, top-down, and linear optimal reconciliation approaches. We also present a new benchmark reconciliation method called ‘aggregated-down’ with similar complexity to bottom-up and top-down approaches, but it tends to provide better accuracy in this setup. We conducted an empirical forecasting study based on the German day-ahead power auction market by predicting the demand and supply curves, where their equilibrium determines the electricity price for the next day. Our results demonstrate that hierarchical reconciliation methods can be used to improve the forecasting accuracy of aggregated curves.
CRPS learning
2023, Journal of Econometrics
Combination and aggregation techniques can significantly improve forecast accuracy. This also holds for probabilistic forecasting methods where predictive distributions are combined. There are several time-varying and adaptive weighting schemes such as Bayesian model averaging (BMA). However, the quality of different forecasts may vary not only over time but also within the distribution. For example, some distribution forecasts may be more accurate in the center of the distributions, while others are better at predicting the tails. Therefore, we introduce a new weighting method that considers the differences in performance over time and within the distribution. We discuss pointwise combination algorithms based on aggregation across quantiles that optimize with respect to the continuous ranked probability score (CRPS). After analyzing the theoretical properties of pointwise CRPS learning, we discuss B- and P-Spline-based estimation techniques for batch and online learning, based on quantile regression and prediction with expert advice. We prove that the proposed fully adaptive Bernstein online aggregation (BOA) method for pointwise CRPS online learning has optimal convergence properties. They are confirmed in simulations and a probabilistic forecasting study for European emission allowance (EUA) prices.
Improving the Bi-LSTM model with XGBoost and attention mechanism: A combined approach for short-term power load prediction
2022, Applied Soft Computing
Citation Excerpt :
Then, this model is used to map all new data samples to the corresponding output results. It is worth noting that above prediction methods have inherent limitations such as complex calculation [12–14], poor generalization ability [15], and over fitting [16–18], which all challenge power load predictions. To overcome the weaknesses of above prediction methods, the hybrid prediction models have been developed by various optimization algorithms used to optimize the prediction performance, which include the modified fire-fly optimization (mFFO) algorithm [19], particle swarm optimization (PSO) [20], and Bayesian optimization (BOA) [21].
Short term power load forecasting plays an important role in the management and development of power systems with a focus on the reduction in power wastes and economic losses. In this paper, we construct a novel, short-term power load forecasting method by improving the bidirectional long short-term memory (Bi-LSTM) model with Extreme Gradient Boosting (XGBoost) and Attention mechanism. Our model differs from existing methods in the following three aspects. First, we use the weighted grey relational projection algorithm to distinguish the holidays and non-holidays in the data preprocessing. Secondly, we add the Attention mechanism to the Bi-LSTM model to improve the validity and accuracy of prediction. Thirdly, XGBoost is a newly-developed, well-performing prediction model, which is used together with the Attention mechanism to optimize the Bi-LSTM model. Therefore, we develop a novel, combined power load prediction model “Attention-Bi-LSTM + XGBoost” with the weight determination theory-error reciprocal method. Using two power market datasets, we evaluate our prediction method by comparing it with two benchmark models and four other models. With our prediction method, the MAPE, MAE, and RMSE for the Singapore’s power market are 0.387, 43.206, and 54.357, respectively; and those for the Norway’s power market are 0.682, 96.278, and 125.343, respectively. The test results are smaller than the results for six other models. This indicates that our prediction method outperforms the LSTM, Bi-LSTM, Attention-RNN, Attention-LSTM, Attention-Bi-LSTM, and XGBoost in effectiveness, accuracy, and practicability.

View all citing articles on Scopus

View full text

Online hierarchical forecasting for power consumption data

Abstract

Introduction

Section snippets

Methodology

Main theoretical result

An example of an aggregation algorithm: Polynomially weighted average forecaster with multiple learning rates (ML-Poly)

Experiments

Conclusion

Acknowledgments

Journal of International Money and Finance

International Journal of Forecasting

Computational Statistics & Data Analysis

Information and Computation

Information and Computation

Energy demand research project: Early smart meter trials, 2007–2010

Scalable clustering of individual electrical curves for profiling and bottom-up forecasting

Energies

Coherent probabilistic forecasts for hierarchical time series

Hierarchical probabilistic forecasting of electricity demand with smart meter data

Journal of the American Statistical Association

Prediction, learning, and games

Universal portfolios

Mathematical Finance

Forecasting electricity consumption by aggregating specialized experts

Machine Learning

Aggregate versus subaggregate models in local area forecasting

Journal of the American Statistical Association

Short-term load forecasting based on a semi-parametric additive model

IEEE Transactions on Power Systems