Elsevier

International Journal of Forecasting

Volume 38, Issue 1, January–March 2022, Pages 339-351
International Journal of Forecasting

Online hierarchical forecasting for power consumption data

https://doi.org/10.1016/j.ijforecast.2021.05.011Get rights and content

Abstract

This paper proposes a three-step approach to forecasting time series of electricity consumption at different levels of household aggregation. These series are linked by hierarchical constraints—global consumption is the sum of regional consumption, for example. First, benchmark forecasts are generated for all series using generalized additive models. Second, for each series, the aggregation algorithm ML-Poly, introduced by Gaillard, Stoltz, and van Erven in 2014, finds an optimal linear combination of the benchmarks. Finally, the forecasts are projected onto a coherent subspace to ensure that the final forecasts satisfy the hierarchical constraints. By minimizing a regret criterion, we show that the aggregation and projection steps improve the root mean square error of the forecasts. Our approach is tested on household electricity consumption data; experimental results suggest that successive aggregation and projection steps improve the benchmark forecasts at different levels of household aggregation.

Introduction

New opportunities come with the recent deployment of smart grids and the installation of meters: they record consumption quasi- instantaneously in households. From these records, time series of demand are obtained at various levels of aggregation, such as consumption profiles and regions. For privacy reasons, household records may not be used directly. Moreover, consumption at the individual level is erratic and difficult to predict. This is why we focus on household aggregations. For demand management, it is useful to predict the global consumption. Furthermore, to dispatch the electricity into the grid correctly, forecasting demand at a regional level is also an important goal. Finally, a good estimation of the consumption of some groups of consumers (with the same profile) could be helpful for the electricity provider, which may adapt its offer to perform effective demand-side management. Thus, forecasts at various aggregated levels (the entire population, particular geographical areas, or groups of the same consumption profile) are useful for an efficient management of consumption. In this work, we first build benchmark forecasts at each aggregation level, and independently, using generalized additive models. Noticing that these time series may be correlated (e.g. the consumption of a given region may be close to that of a neighboring region) and connected to each other through summation constraints (e.g. the global consumption is the sum of each region’s consumption), the problem considered falls under the umbrella of hierarchical time series forecasting (see, among others, Hyndman, Ahmed, Athanasopoulos, & Shang, 2011). Using these hierarchical relationships may improve the benchmark forecasts that were generated. Our approach consists in combining two methods: benchmark aggregation and projection in a constrained space. Our aim is to improve forecasts at both the global and local levels.

Traditionally, two types of methods have been used for hierarchical forecasting: bottom-up and top-down approaches. In the bottom-up approaches (see Dunn, Williams, & DeChaine, 1976) forecasts are constructed for lower-level quantities and are then summed up to obtain forecasts at the upper levels. In contrast, top-down approaches (see Gross & Sohl, 1990) work by forecasting aggregated quantities and then determining dis-aggregate proportions to compute lower-level predictions. Shlifer and Wolff (1979) compare these two families of methods and conclude that bottom-up approaches work better. Recently, bottom-up approaches have indeed proven successful for load forecasting to improve the global consumption prediction error (see, among others, Auder, Cugliari, Goude, & Poggi, 2018). Other approaches (neither bottom-up nor top-down) were recently introduced. For example, Wickramasuriya, Athanasopoulos, and Hyndman (2019) forecast all nodes in the hierarchy and reconcile (i.e. impose the respect of hierarchical constraints) them by projection. Their general minimum trace (MinT) approach attempts to capture some cross-sectional information between times series via the covariance matrix of the errors of the base forecasts. It includes both oblique and orthogonal projections (this is discussed from a geometric perspective in Panagiotelis, Athanasopoulos, Gamakumara, & Hyndman, 2020). Moreover, Van Erven and Cugliari (2015) introduce a game-theoretically optimal reconciliation method to improve a given set of forecasts. Firstly, one comes up with some forecasts for the time series without worrying about hierarchical constraints, and then a reconciliation procedure is used to make the forecasts aggregate-consistent. This generalizes the previous orthogonal projection to other possible projections in the constrained space, and thus ensures that the forecasts satisfy the hierarchy. Most work on hierarchical forecasting concentrates on the mean, but some recent work has addressed probabilistic forecasting, including that of Ben Taieb et al., 2017, Ben Taieb et al., 2020, and Panagiotelis et al. (2020).

Aggregation methods (also called ensemble methods) for individual sequence forecasting originate from theoretical works by Vovk (1990), Cover (1991), and Littlestone and Warmuth (1994). Their distinguishing feature with respect to classical ensemble methods is that they do not rely on any stochastic modeling of the observations, and thus are able to combine forecasts independently of their generating process. They have proven to be very effective at predicting time series (see, for instance, Mallet, Stoltz, & Mauricette, 2009 and Devaine, Gaillard, Goude, & Stoltz, 2013) and have been used to win forecasting competitions (see Gaillard, Goude, & Nedellec, 2016). This aggregation approach has recently been extended to the hierarchical setting by Goehry, Goude, Massart, and Poggi (2020). They used a bottom-up forecasting approach that consists in aggregating the consumption forecasts of small clusters of customers.

In this article we combine the reconciliation approach based on orthogonal projection with some aggregation algorithms to propose a three-stage meta-algorithm, which is as follows:

1. Generate base forecasts for all times series in the hierarchy.

2. Apply, for each series, the aggregation algorithm that finds an optimal linear combination of the base forecasts.

3. Project the combination forecasts onto a coherent subspace to ensure that the final forecasts satisfy the hierarchical constraints.

The second step here provides the innovation (Steps 1 and 3 on their own are equivalent to the ordinary least squares version of the MinT algorithm—see Wickramasuriya et al., 2019). By including an aggregation algorithm between these steps, much more of the cross-sectional information can be captured, thus improving the forecasts. A theoretical result is provided for the regret bound of the meta-algorithm, which ensures that the aggregation and projection steps improve the root mean square error of the forecasts. We then illustrate the proposed methods using smart meter data collected in Great Britain by multiple energy providers (see Schellong, 2011 and AECOM, 2018). The data were provided by the Energy Demand Research Project, which gathered power consumption data from multiple households. We consider two population segmentations: a spatial segmentation based on the location of the households, and a behavioral one based on household consumption profiles. For all aggregation levels, we generate benchmark forecasts using generalized additive models (see Wood, 2006) and use the polynomially weighted average forecaster with multiple learning rates (ML-Poly, see Gaillard, Stoltz, & van Erven, 2014 and Gaillard, 2015) as an aggregation algorithm to combine these predictions. We evaluate the performance of four types of predictions: benchmarks, aggregated benchmarks, projected benchmarks, and, finally, aggregated and projected benchmarks. The results show that the proposed approach improves the root mean square error of the forecasts at the different levels of household aggregation.

Without further indications, x denotes the Euclidean norm of a vector x. For the other norms, a subscript is used: e.g. the L1-norm and the infinity norm of x are denoted by x1 and x, respectively. Moreover, vectors are in bold type and unless stated otherwise, they are column vectors. Matrices are underlined in bold. We denote the inner product of two vectors x and y of the same size by xy=xTy. Finally, the cardinality of a finite set D is denoted by |D|.

Section snippets

Methodology

With Γ, a set of aggregation levels (e.g. the entire population, particular regions, or behavioral clusters of households), we consider the set of time series {(ytγ)t>0,γΓ} connected to each other by some summation constraints: a few of them are equal to the sum of several others. To forecast these time series, a set of benchmark forecasts is generated. At any time step t, we want to forecast the vector of the values of the |Γ| times series at t, denoted by yt=def(ytγ)γΓ. For each node γ and

Main theoretical result

Here, we introduce the following notation concerning the regret bound of Algorithm A.

Assumption 1

We assume that, for any set DR|Γ|, for any γΓ with the initialization parameter vector s0γ, for T>0, any x1:T=x1,xT, and any y1:Tγ=y1γ,,yTγ, Algorithm Aγ provides a regret bound of the following form: RTγ(D)=deft=1T(ytγŷtγ)2minuγDt=1T(ytγuγxt)2B(x1:Tγ,y1:Tγ,s0γ).

As getting a linear bound is trivial (by using the common assumption that prediction errors are bounded), the bounds B() must be

An example of an aggregation algorithm: Polynomially weighted average forecaster with multiple learning rates (ML-Poly)

At a time step t, for a node γΓ, a copy Aγ of an aggregation algorithm A takes the benchmark vector xt, which contains the predictions of all the nodes (including that of the considered node), as an input and outputs a weight vector utγ and thus the forecast ytγ with utγxt. Recall that the benchmark forecasts (xtγ)γΓ are generated independently with possibly different exogenous variables but that the observations (ytγ)γΓ may be strongly correlated. This is why we consider aggregation to

Experiments

Our application relies on the electricity consumption data of a large number of households to which we have added meteorological data. The regions of the households are also provided. The full data set is presented in Section 5.1. From these temporal and non-temporal data, we dispatch the households into two segmentations: the first one is based on the household location information; the second is behavioral and relies on the method presented in Section 5.2. We describe the experiments and

Conclusion

We proposed a three-step approach to forecasting electricity consumption time series at different levels of household aggregation and linked by hierarchical constraints. After generating benchmark forecasts using generalized additive models, our method aggregates them with the ML-Poly algorithm. Finally, the forecasts are projected onto a coherent subspace to ensure that the final forecasts satisfy the hierarchical constraints. A theoretical result ensures, via a regret bound, that this

Acknowledgments

Many thanks to Gilles Stoltz for helpful discussions and many re-reads of the paper. We also thank Yannig Goude and Pierre Gaillard for spotting technical imprecisions and typographical errors.

References (30)

  • CoverT.M.

    Universal portfolios

    Mathematical Finance

    (1991)
  • Deswarte, R., Gervais, V., Stoltz, G., & Da Veiga, S. (2018). Sequential model aggregation for production forecasting....
  • DevaineM. et al.

    Forecasting electricity consumption by aggregating specialized experts

    Machine Learning

    (2013)
  • DunnD.M. et al.

    Aggregate versus subaggregate models in local area forecasting

    Journal of the American Statistical Association

    (1976)
  • FanS. et al.

    Short-term load forecasting based on a semi-parametric additive model

    IEEE Transactions on Power Systems

    (2011)
  • Cited by (13)

    • Forecast reconciliation: A review

      2024, International Journal of Forecasting
    • Hierarchical forecasting for aggregated curves with an application to day-ahead electricity price auctions

      2024, International Journal of Forecasting
      Citation Excerpt :

      The most notable is the optimal minimum trace reconciling method (Wickramasuriya et al., 2019). Other methods have also been proposed, such as a game-theoretically optimal reconciliation approach (Van Erven & Cugliari, 2015), averaging approaches called level conditional coherent and combined conditional coherent point forecasts (Di Fonzo & Girolimetto, 2021; Hollyman, Petropoulos, & Tipping, 2021), and machine-learning based reconciliation (Brégère & Huard, 2022; Huard, Garnier, & Stoltz, 2020; Spiliotis et al., 2021). In this study, we introduce four novel features into the field of hierarchical forecasting.

    • CRPS learning

      2023, Journal of Econometrics
    • Improving the Bi-LSTM model with XGBoost and attention mechanism: A combined approach for short-term power load prediction

      2022, Applied Soft Computing
      Citation Excerpt :

      Then, this model is used to map all new data samples to the corresponding output results. It is worth noting that above prediction methods have inherent limitations such as complex calculation [12–14], poor generalization ability [15], and over fitting [16–18], which all challenge power load predictions. To overcome the weaknesses of above prediction methods, the hybrid prediction models have been developed by various optimization algorithms used to optimize the prediction performance, which include the modified fire-fly optimization (mFFO) algorithm [19], particle swarm optimization (PSO) [20], and Bayesian optimization (BOA) [21].

    View all citing articles on Scopus
    View full text