doi:10.1016/j.csda.2007.01.001
Copyright © 2007 Elsevier B.V. All rights reserved.
Learning and approximate inference in dynamic hierarchical models
aHigh Tech Campus 11, Prof. Holstlaan 4, 5656 AE Eindhoven, The Netherlands
bRadboud University Nijmegen, Toernooiveld 1, Room A4026, 6525 ED Nijmegen, The Netherlands
Available online 12 January 2007.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
A new variant of the dynamic hierarchical model (DHM) that describes a large number of parallel time series is presented. The separate series, which may be interdependent, are modeled through dynamic linear models (DLMs). This interdependence is included in the model through the definition of a ‘top-level’ or ‘average’ DLM. The model features explicit dependences between the latent states of the parallel DLMs and the states of the average model, and thus the many parallel time series are linked to each other. The combination of dependences within each time series and dependences between the different DLMs makes the computation time that is required for exact inference cubic in the number of parallel time series, however, which is unacceptable for practical tasks that involve large numbers of parallel time series. Therefore, two methods for fast, approximate inference are proposed: a variational approximation and a factorial approach. Under these approximations, inference can be performed in linear time, and it still features exact means. Learning is implemented through a maximum likelihood (ML) estimation of the model parameters. This estimation is realized through an expectation maximization (EM) algorithm with approximate inference in the E-step. Examples of learning and forecasting on two data sets show that the addition of direct dependences has a ‘smoothing’ effect on the evolution of the states of the individual time series, and leads to better prediction results. The use of approximate instead of exact inference is further shown not to lead to inferior results on either data set.
Keywords: Time series; Dynamic linear model; Maximum likelihood estimation; Variational approximation; Expectation propagation
Fig. 1. (Graphical model) The shaded areas represent the observations yi,t, the open ellipses represent the latent states. The top-level states (upper ellipses) are connected to all of the lower-level states (lower ellipses). Covariates xi,t are left out for clarity.
Fig. 2. Graphical structure of the approximate models used in the variational approximation (left) and the factorial approximation (right). Ellipses represent latent states Mt for the top-level DLM and θi,t for the lower-level time DLMs. Observations are left out for clarity. The dashed lines in the right graph indicate that although the approximate model is fully factorized, the connections between states are incorporated in each iteration step. (a) Graphical structure of the approximate model used in the variational approximation, and (b) graphical structure of the approximate model used in the factorial approximation.
Fig. 3. Means for the first elements of the latent states over time for four outlets from the newspaper data set, and the corresponding top-level DLM. Dotted lines correspond to lower-level states, solid lines represent top-level states. The left panel plots the exact means for the hierarchical model presented in this article, means inferred from the standard hierarchical model are plotted in the right panel.
Fig. 4. Top: average variance-dependent parts of the KL-divergences between the approximated marginals and the exact marginals. The left panel plots the average divergences between marginals of the top-level DLM, the right panel plots the average divergences between the marginals of the lower-level DLMs. The dash-dotted line plots the divergence between the variational approach and the exact model, the dashed line the divergence for the factorial approach. Bottom: the variance in the first dimension of the top-level state (left) and one from a set of 10 lower-level states (right) over time. Solid lines correspond to the exact model, dash-dotted lines to the variational approximation and dashed lines to the factorial approximation.
Fig. 5. Average squared error (left) and computation time (right) as a function of the number of parallel tasks for the variational approximation (dash-dotted line), the factorial approximation (dashed line) and the exact model (solid line).
Fig. 6. The average squared error on the newspaper data for the variational approximation (bar 2), the factorial approximation (bar 3) and the standard DHM (bar 4). The performance of the extended model with exact inference is represented by bar 1. Each error is an average over 16 parallel tasks; a larger number of parallel tasks did not further decrease the sum-squared error.