Elsevier

Signal Processing

Volume 116, November 2015, Pages 13-28
Signal Processing

Multivariate time-series analysis and diffusion maps

https://doi.org/10.1016/j.sigpro.2015.04.003Get rights and content

Author-Highlights

  • We build a class of Bayesian models to learn the evolving statistics of time series.

  • We construct diffusion maps based on the time-evolving distributional information.

  • The proposed method recovers the underlying process controlling the time series.

  • The proposed framework is applied to the analysis of music and icEEG recordings.

Abstract

Dimensionality reduction in multivariate time series analysis has broad applications, ranging from financial data analysis to biomedical research. However, high levels of ambient noise and various interferences result in nonstationary signals, which may lead to inefficient performance of conventional methods. In this paper, we propose a nonlinear dimensionality reduction framework using diffusion maps on a learned statistical manifold, which gives rise to the construction of a low-dimensional representation of the high-dimensional nonstationary time series. We show that diffusion maps, with affinity kernels based on the Kullback–Leibler divergence between the local statistics of samples, allow for efficient approximation of pairwise geodesic distances. To construct the statistical manifold, we estimate time-evolving parametric distributions by designing a family of Bayesian generative models. The proposed framework can be applied to problems in which the time-evolving distributions (of temporally localized data), rather than the samples themselves, are driven by a low-dimensional underlying process. We provide efficient parameter estimation and dimensionality reduction methodologies, and apply them to two applications: music analysis and epileptic-seizure prediction.

Introduction

In the study of high-dimensional data, it is often of interest to embed the high-dimensional observations in a low-dimensional space, where hidden parameters may be discovered, noise suppressed, and interesting and significant structures revealed. Due to high dimensionality and nonlinearity in many real-world applications, nonlinear dimensionality reduction techniques have become increasingly popular [1], [2], [3]. These manifold-learning algorithms build data-driven models, organizing data samples according to local affinities on a low-dimensional manifold. Such methods have broad applications to, for example, analysis of financial data, computer vision, hyperspectral imaging, and biomedical engineering [4], [5], [6].

The notion of dimensionality reduction is useful in multivariate time series analysis. In the corresponding low-dimensional space, hidden states may be revealed, change points detected, and temporal trajectories visualized [7], [8], [9], [10]. Recently, various nonlinear dimensionality reduction techniques have been extended to time series, including spatio-temporal Isomap [11] and temporal Laplacian eigenmap [12]. In these methods, besides local affinities in the space of the data, available temporal covariate information is incorporated, leading to significant improvements in discovering the latent states of the series.

The basic assumption in dimensionality reduction is that the observed data samples do not fill the ambient space uniformly, but rather lie on a low-dimensional manifold. Such an assumption does not hold for many types of signals, for example, data with high levels of noise [4], [13], [14], [15]. In [14], [15], the authors consider a different, relaxed dimensionality reduction problem on the domain of the underlying probability distributions. The main idea is that the varying distributions, rather than the samples themselves, are driven by few underlying controlling processes, yielding a low-dimensional smooth manifold in the domain of the distribution parameters. An information-geometric dimensionality reduction (IGDR) approach is then applied to obtain an embedding of high-dimensional data using Isomap [1], thereby preserving the geodesic distances on the manifold of distributions.

Two practical problems arise in these methods, limiting their applicability to time series analysis. First, in [14], [15] multiple datasets were assumed to be available, where the data in each set drawn from the same distributional form, with fixed distribution parameters. Then, the embedding was inferred in the space of the distribution parameters. By taking into account the time dependency in the evolution of the distribution parameters from a single time series, we may substantially reduce the number of required datasets. A second limitation of previous work concerns how geodesic distances were computed. In [14], [15] the approximation of the geodesic distance between all pairs of samples was computed using a step-by-step walk on the manifold, requiring O(N3) operations, which may be intractable for large N.

In this paper, we present a dimensionality-reduction approach using diffusion maps for nonstationary high-dimensional time series, which addresses the above shortcomings. Diffusion maps constitute an effective data-driven method to uncover the low-dimensional manifold, and provide a parametrization of the underlying process [16]. The main idea in diffusion maps resides in aggregating local connections between samples into a global parameterization via a kernel. Many kernels implicitly induce a mixture of local statistical models in the domain of the measurements. In particular, it is shown that using distributional information outperforms using sample information when the distributions are available [14]. We exploit this assumption and articulate that the observed multivariate time series XtRN,t=1,,T, is generated from a smoothly varying parametric distribution p(Xt|βt), where βt is a local parameterization of the time evolving distribution. We propose to construct a Bayesian generative model with constraints on βt, and use Markov Chain Monte Carlo (MCMC) to estimate βt. Diffusion maps are then applied to reveal the statistical manifold (of the estimated distributions) using a kernel with the Kullback–Leibler (KL) divergence as the distance measure. Noting that the parametric form of distributions significantly affects the structure of the mapped data, the Bayesian generative model should avoid using a strong informative prior without substantial evidence.

Diffusion maps rely on the construction of a Laplace operator, whose eigenvectors approximate the eigenfunctions of the backward Fokker–Planck operator. These eigenfunctions describe the dynamics of the system [17]. Hence, the trajectories embedded in the coordinate system formulated by the principal eigenvectors of the Laplace operator can be regarded as a representation of the underlying controlling process θt of the time series Xt.

One of the main benefits of embedding the time series samples into a low-dimensional domain is the ability to define meaningful distances. In particular, diffusion-maps embody the property that the Euclidean distance between the samples in the embedding domain corresponds to a diffusion distance in the distribution domain. Diffusion distance measures the similarity between two samples according to their connectivity on the low-dimensional manifold [3] and has a close connection to the geodesic distance. Thus, diffusion maps circumvent the step-by-step walk on the manifold [14], computing an approximation to the geodesic distance in a single low-cost operation. Another practical advantage of the proposed method is that we may first reveal the low-dimensional coordinate system based on reference data, and then in an online manner extend the model to newly acquired data with a low computational cost. This is demonstrated further when considering applications in Section 4.

The proposed framework is applied to two applications in which the data are best characterized by temporally evolving local statistics, rather than based on measures directly applied to the data itself: music analysis and epileptic seizure prediction based on intracranial electroencephalography (icEEG) recordings. In the first application, we show that using the proposed approach, we can uncover the key underlying processes: human voice and instrumental sounds. In particular, we exploit the efficient computation of diffusion distances to obtain intra-piece similarity measures on well-known music, which are compared with the state-of-the-art techniques.

In the second application, one goal is to map the recordings to the unknown underlying “brain activity states”. This is especially crucial in epileptic seizure prediction, where preseizure (dangerous) states can be distinguished from interictal (safe) states, so that patients can be warned prior to seizures [18]. In this application, the observed time series is the icEEG recordings and the underlying process is the brain state, e.g., preseizure or interictal. IcEEG recordings tend to be noisy, and hence, the mapping between the state of the patient׳s brain and the available measurements is not deterministic, and the measurements do not lie on a smooth manifold. Thus, the intermediate step of mapping the observations to a time-evolving parametric family of distributions is essential to overcome this challenge. We use the proposed approach to infer a parameterization of the signal, viewed as a model summarizing the signal׳s distributional information. Based on the inferred parameterization, we show that preseizure state intervals can be distinguished from interictal state intervals. In particular, we show the possibility of predicting seizures by visualization and simple detection algorithms, tested on an anonymous patient.

This paper makes three principal contributions. First, we present a data-driven method to fit flexible statistical models adapted to time series. In particular, we propose a class of Bayesian models with various prior specifications to learn the time-evolving statistics from a single trajectory realization that accurately models local distributional parameters β. Second, we complement the analysis with diffusion maps based on the distributional information embodied in time-series dynamics. By relying on a kernel, which enables to compare segments from the entire time series, our nonlinear method allows for the association of similar patterns, and in turn, gives rise to the recovery of an underlying process that consists of the global controlling factors θ. In addition, diffusion maps enable to compute meaningful distances (i.e., diffusion distances [3]), which approximate the geodesic distances between the time series samples on the statistical manifold. Finally, we apply the proposed framework to two applications: music analysis and the analysis of icEEG recordings.

The remainder of the paper is organized as follows. In Section 2 we review the diffusion-maps technique, propose an extended construction and examine its theoretical and practical properties. We propose in Section 3 multiple approaches to model multivariate time series with time-evolving distributions. In Section 4, results on two real-world applications are discussed. Conclusions and future work are outlined in Section 5.

Section snippets

Underlying parametric model

Let XtRN be the raw data or extracted features at time t. The key concept is that the high-dimensional representation of Xt exhibits a characteristic geometric structure. This structure is assumed to be governed by an underlying process on a low-dimensional manifold, denoted by θt, that propagates over time as a diffusion process according to the following stochastic differential equation (SDE)1

Modeling time evolving covariance matrices

To calculate the KL divergence, we need to estimate the local/intermediate parametric distribution p(Xt|βt) at each time. The amount of data in each time window is limited, and therefore, assumptions have to be made to constrain the parametric space. In this paper, we assume that the data sample at each time, Xt, is drawn from a multivariate Gaussian distribution with time evolving parameters. For simplicity, we focus on zero mean Gaussian distributions. We assume that the time evolving

Applications

The proposed framework is applied to a toy example and two real-world applications. In the synthetic toy example, we show that the estimated diffusion distance between data points recovers the geodesic distance on the statistical manifold, where IGDR [14] is used as a baseline method. In the first real-world application, we analyze a well-known music piece by estimating the diffusion distance between time points to discover the intra-piece similarities as a function of time. In the second

Conclusions

A dimensionality-reduction method for high-dimensional time series is presented. The method exhibits two key components. First, multiple approaches to estimate time evolving covariance matrices are presented and compared. Second, using the Kullback–Leibler divergence as a distance metric, diffusion maps are applied to the probability distributions estimated from samples, instead of to the samples themselves, to obtain a low-dimensional embedding of the high-dimensional time series. Theoretical

References (47)

  • J.B. Tenenbaum et al.

    A global geometric framework for nonlinear dimensionality reduction

    Science

    (2000)
  • M. Belkin et al.

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural Comput.

    (2003)
  • D. Durante, B. Scarpa, D.B. Dunson, Locally adaptive Bayesian covariance regression, ArXiv e-prints...
  • E.B. Fox, M. West, Autoregressive models for variance matrices: stationary inverse Wishart processes, ArXiv e-prints...
  • A.F. Zuur et al.

    Estimating common trends in multivariate time series using dynamic factor analysis

    Environmetrics

    (2003)
  • R. Talmon et al.

    Empirical intrinsic geometry for intrinsic modeling and nonlinear filtering

    Proc. Natl. Acad. Sci. USA

    (2013)
  • R. Talmon, R.R. Coifman Intrinsic modeling of stochastic dynamical systems using empirical geometry. Applied and...
  • R. Talmon, S. Mallat, H. Zaveri, R.R. Coifman, "Manifold Learning for Latent Variable Inference in Dynamical Systems."...
  • O.C. Jenkins, M.J. Matarić, A spatio-temporal extension to isomap nonlinear dimension reduction, in: Proceedings of the...
  • M. Lewandowski, J. Martinez-del Rincon, D. Makris, J.C. Nebel, Temporal extension of Laplacian eigenmaps for...
  • E. Fox, D. Dunson, Bayesian nonparametric covariance regression, Arxiv preprint...
  • K.M. Carter et al.

    Information-geometric dimensionality reduction

    IEEE Signal Process. Mag.

    (2011)
  • K.M. Carter et al.

    FineFisher information nonparametric embedding

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • Cited by (27)

    • Dynamic artist-based embeddings with application to playlist generation

      2024, Engineering Applications of Artificial Intelligence
    • Performance degradation assessment for aircraft environmental control system: A method based on visual cognition

      2021, ISA Transactions
      Citation Excerpt :

      Manifold learning is a practical method based on the MSC, which simulates the processing mechanism of the MSC in constructing the intrinsic manifold with a low-dimensional structure [38]. Among the applied manifold learning methods, DM, proposed by Coifman et al. [39], preserve the global properties, which has a wide application in data representation and dimensionality reduction [40,41]. The DM is a geometrically motivated method in manifold learning that provides a global representation of properties and structures by integrating the local geometry.

    • Visual knowledge discovery and machine learning for investment strategy

      2017, Cognitive Systems Research
      Citation Excerpt :

      Difficulties of defining investment strategy algorithmically are well documented in the literature (Ellis & Parbery, 2005; Kovalerchuk & Vityaev, 2000; Bingham, 2014; Li, Deng, & Luo, 2009; Martin, 2001; Wilinski, Bera, Nowicki, & Blaszynski, 2014; Guo, Wang, Liu, & Yang, 2014; Hoffman, 2014). One of them is multivariate and multidimensional nature of data that complicated both knowledge representation and discovery including: (1) identifying a class of predictive models (SVM, regression, ANN, kNN and so on) with associated trading strategies with parameters to be learned, and (2) analyzing multidimensional data with a naked eye to stimulate both intuitive discovery of patterns and formal models (Lian, Talmon, Zaveri, Carin, & Coifman, 2015; Wichard & Ogorzalek, 2004). The most efficient strategy should take into account the proper balance between both directions of investment (long and short positions) typical for foreign exchange markets (the pair EURUSD belongs to them).

    • Dynamical system classification with diffusion embedding for ECG-based person identification

      2017, Signal Processing
      Citation Excerpt :

      The diffusion maps algorithm has been shown to be efficient in recovering the underlying states of synthetic systems [12,14]. It has also been employed in several studies to analyze real complex systems [31,32]. In particular, the work in [14] presented an application of the ideas described in this section to the prediction of epileptic seizures from intracranial electroencephalographic (iEEG) signals.

    • Diffusion-based kernel methods on Euclidean metric measure spaces

      2016, Applied and Computational Harmonic Analysis
    • Functional diffusion maps

      2024, Statistics and Computing
    View all citing articles on Scopus
    View full text