Multivariate time-series analysis and diffusion maps
Introduction
In the study of high-dimensional data, it is often of interest to embed the high-dimensional observations in a low-dimensional space, where hidden parameters may be discovered, noise suppressed, and interesting and significant structures revealed. Due to high dimensionality and nonlinearity in many real-world applications, nonlinear dimensionality reduction techniques have become increasingly popular [1], [2], [3]. These manifold-learning algorithms build data-driven models, organizing data samples according to local affinities on a low-dimensional manifold. Such methods have broad applications to, for example, analysis of financial data, computer vision, hyperspectral imaging, and biomedical engineering [4], [5], [6].
The notion of dimensionality reduction is useful in multivariate time series analysis. In the corresponding low-dimensional space, hidden states may be revealed, change points detected, and temporal trajectories visualized [7], [8], [9], [10]. Recently, various nonlinear dimensionality reduction techniques have been extended to time series, including spatio-temporal Isomap [11] and temporal Laplacian eigenmap [12]. In these methods, besides local affinities in the space of the data, available temporal covariate information is incorporated, leading to significant improvements in discovering the latent states of the series.
The basic assumption in dimensionality reduction is that the observed data samples do not fill the ambient space uniformly, but rather lie on a low-dimensional manifold. Such an assumption does not hold for many types of signals, for example, data with high levels of noise [4], [13], [14], [15]. In [14], [15], the authors consider a different, relaxed dimensionality reduction problem on the domain of the underlying probability distributions. The main idea is that the varying distributions, rather than the samples themselves, are driven by few underlying controlling processes, yielding a low-dimensional smooth manifold in the domain of the distribution parameters. An information-geometric dimensionality reduction (IGDR) approach is then applied to obtain an embedding of high-dimensional data using Isomap [1], thereby preserving the geodesic distances on the manifold of distributions.
Two practical problems arise in these methods, limiting their applicability to time series analysis. First, in [14], [15] multiple datasets were assumed to be available, where the data in each set drawn from the same distributional form, with fixed distribution parameters. Then, the embedding was inferred in the space of the distribution parameters. By taking into account the time dependency in the evolution of the distribution parameters from a single time series, we may substantially reduce the number of required datasets. A second limitation of previous work concerns how geodesic distances were computed. In [14], [15] the approximation of the geodesic distance between all pairs of samples was computed using a step-by-step walk on the manifold, requiring operations, which may be intractable for large .
In this paper, we present a dimensionality-reduction approach using diffusion maps for nonstationary high-dimensional time series, which addresses the above shortcomings. Diffusion maps constitute an effective data-driven method to uncover the low-dimensional manifold, and provide a parametrization of the underlying process [16]. The main idea in diffusion maps resides in aggregating local connections between samples into a global parameterization via a kernel. Many kernels implicitly induce a mixture of local statistical models in the domain of the measurements. In particular, it is shown that using distributional information outperforms using sample information when the distributions are available [14]. We exploit this assumption and articulate that the observed multivariate time series , is generated from a smoothly varying parametric distribution , where is a local parameterization of the time evolving distribution. We propose to construct a Bayesian generative model with constraints on , and use Markov Chain Monte Carlo (MCMC) to estimate . Diffusion maps are then applied to reveal the statistical manifold (of the estimated distributions) using a kernel with the Kullback–Leibler (KL) divergence as the distance measure. Noting that the parametric form of distributions significantly affects the structure of the mapped data, the Bayesian generative model should avoid using a strong informative prior without substantial evidence.
Diffusion maps rely on the construction of a Laplace operator, whose eigenvectors approximate the eigenfunctions of the backward Fokker–Planck operator. These eigenfunctions describe the dynamics of the system [17]. Hence, the trajectories embedded in the coordinate system formulated by the principal eigenvectors of the Laplace operator can be regarded as a representation of the underlying controlling process of the time series .
One of the main benefits of embedding the time series samples into a low-dimensional domain is the ability to define meaningful distances. In particular, diffusion-maps embody the property that the Euclidean distance between the samples in the embedding domain corresponds to a diffusion distance in the distribution domain. Diffusion distance measures the similarity between two samples according to their connectivity on the low-dimensional manifold [3] and has a close connection to the geodesic distance. Thus, diffusion maps circumvent the step-by-step walk on the manifold [14], computing an approximation to the geodesic distance in a single low-cost operation. Another practical advantage of the proposed method is that we may first reveal the low-dimensional coordinate system based on reference data, and then in an online manner extend the model to newly acquired data with a low computational cost. This is demonstrated further when considering applications in Section 4.
The proposed framework is applied to two applications in which the data are best characterized by temporally evolving local statistics, rather than based on measures directly applied to the data itself: music analysis and epileptic seizure prediction based on intracranial electroencephalography (icEEG) recordings. In the first application, we show that using the proposed approach, we can uncover the key underlying processes: human voice and instrumental sounds. In particular, we exploit the efficient computation of diffusion distances to obtain intra-piece similarity measures on well-known music, which are compared with the state-of-the-art techniques.
In the second application, one goal is to map the recordings to the unknown underlying “brain activity states”. This is especially crucial in epileptic seizure prediction, where preseizure (dangerous) states can be distinguished from interictal (safe) states, so that patients can be warned prior to seizures [18]. In this application, the observed time series is the icEEG recordings and the underlying process is the brain state, e.g., preseizure or interictal. IcEEG recordings tend to be noisy, and hence, the mapping between the state of the patient׳s brain and the available measurements is not deterministic, and the measurements do not lie on a smooth manifold. Thus, the intermediate step of mapping the observations to a time-evolving parametric family of distributions is essential to overcome this challenge. We use the proposed approach to infer a parameterization of the signal, viewed as a model summarizing the signal׳s distributional information. Based on the inferred parameterization, we show that preseizure state intervals can be distinguished from interictal state intervals. In particular, we show the possibility of predicting seizures by visualization and simple detection algorithms, tested on an anonymous patient.
This paper makes three principal contributions. First, we present a data-driven method to fit flexible statistical models adapted to time series. In particular, we propose a class of Bayesian models with various prior specifications to learn the time-evolving statistics from a single trajectory realization that accurately models local distributional parameters . Second, we complement the analysis with diffusion maps based on the distributional information embodied in time-series dynamics. By relying on a kernel, which enables to compare segments from the entire time series, our nonlinear method allows for the association of similar patterns, and in turn, gives rise to the recovery of an underlying process that consists of the global controlling factors . In addition, diffusion maps enable to compute meaningful distances (i.e., diffusion distances [3]), which approximate the geodesic distances between the time series samples on the statistical manifold. Finally, we apply the proposed framework to two applications: music analysis and the analysis of icEEG recordings.
The remainder of the paper is organized as follows. In Section 2 we review the diffusion-maps technique, propose an extended construction and examine its theoretical and practical properties. We propose in Section 3 multiple approaches to model multivariate time series with time-evolving distributions. In Section 4, results on two real-world applications are discussed. Conclusions and future work are outlined in Section 5.
Section snippets
Underlying parametric model
Let be the raw data or extracted features at time . The key concept is that the high-dimensional representation of exhibits a characteristic geometric structure. This structure is assumed to be governed by an underlying process on a low-dimensional manifold, denoted by , that propagates over time as a diffusion process according to the following stochastic differential equation (SDE)1
Modeling time evolving covariance matrices
To calculate the KL divergence, we need to estimate the local/intermediate parametric distribution at each time. The amount of data in each time window is limited, and therefore, assumptions have to be made to constrain the parametric space. In this paper, we assume that the data sample at each time, , is drawn from a multivariate Gaussian distribution with time evolving parameters. For simplicity, we focus on zero mean Gaussian distributions. We assume that the time evolving
Applications
The proposed framework is applied to a toy example and two real-world applications. In the synthetic toy example, we show that the estimated diffusion distance between data points recovers the geodesic distance on the statistical manifold, where IGDR [14] is used as a baseline method. In the first real-world application, we analyze a well-known music piece by estimating the diffusion distance between time points to discover the intra-piece similarities as a function of time. In the second
Conclusions
A dimensionality-reduction method for high-dimensional time series is presented. The method exhibits two key components. First, multiple approaches to estimate time evolving covariance matrices are presented and compared. Second, using the Kullback–Leibler divergence as a distance metric, diffusion maps are applied to the probability distributions estimated from samples, instead of to the samples themselves, to obtain a low-dimensional embedding of the high-dimensional time series. Theoretical
References (47)
- et al.
Diffusion maps
Appl. Comput. Harmon. Anal.
(2006) - et al.
Diffusion maps for changing data
Appl. Comput. Harmon. Anal.
(2014) - et al.
Non-linear independent component analysis with diffusion maps
Appl. Comput. Harmon. Anal.
(2008) - et al.
Diffusion maps, spectral clustering and reaction coordinates of dynamical systems
Appl. Comput. Harmon. Anal.
(2006) - et al.
Controversies in epilepsydebates held during the fourth international workshop on seizure prediction
Epilepsy Behav.
(2010) On the Kullback–Leibler information divergence of locally stationary processes
Stoch. Process. Appl.
(1996)- et al.
Anisotropic diffusion on sub-manifolds with application to earth structure classification
Appl. Comput. Harmon. Anal.
(2012) - et al.
Texture separation via a reference set
Appl. Comput. Harmon. Anal.
(2014) - et al.
Seizure detection, seizure prediction, and closed-loop warning systems in epilepsy
Epilepsy Behav.
(2014) - et al.
Seizure prediction in patients with mesial temporal lobe epilepsy using eeg measures of state similarity
Clin. Neurophysiol.
(2013)
A global geometric framework for nonlinear dimensionality reduction
Science
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Comput.
Estimating common trends in multivariate time series using dynamic factor analysis
Environmetrics
Empirical intrinsic geometry for intrinsic modeling and nonlinear filtering
Proc. Natl. Acad. Sci. USA
Information-geometric dimensionality reduction
IEEE Signal Process. Mag.
FineFisher information nonparametric embedding
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (27)
Dynamic artist-based embeddings with application to playlist generation
2024, Engineering Applications of Artificial IntelligencePerformance degradation assessment for aircraft environmental control system: A method based on visual cognition
2021, ISA TransactionsCitation Excerpt :Manifold learning is a practical method based on the MSC, which simulates the processing mechanism of the MSC in constructing the intrinsic manifold with a low-dimensional structure [38]. Among the applied manifold learning methods, DM, proposed by Coifman et al. [39], preserve the global properties, which has a wide application in data representation and dimensionality reduction [40,41]. The DM is a geometrically motivated method in manifold learning that provides a global representation of properties and structures by integrating the local geometry.
Visual knowledge discovery and machine learning for investment strategy
2017, Cognitive Systems ResearchCitation Excerpt :Difficulties of defining investment strategy algorithmically are well documented in the literature (Ellis & Parbery, 2005; Kovalerchuk & Vityaev, 2000; Bingham, 2014; Li, Deng, & Luo, 2009; Martin, 2001; Wilinski, Bera, Nowicki, & Blaszynski, 2014; Guo, Wang, Liu, & Yang, 2014; Hoffman, 2014). One of them is multivariate and multidimensional nature of data that complicated both knowledge representation and discovery including: (1) identifying a class of predictive models (SVM, regression, ANN, kNN and so on) with associated trading strategies with parameters to be learned, and (2) analyzing multidimensional data with a naked eye to stimulate both intuitive discovery of patterns and formal models (Lian, Talmon, Zaveri, Carin, & Coifman, 2015; Wichard & Ogorzalek, 2004). The most efficient strategy should take into account the proper balance between both directions of investment (long and short positions) typical for foreign exchange markets (the pair EURUSD belongs to them).
Dynamical system classification with diffusion embedding for ECG-based person identification
2017, Signal ProcessingCitation Excerpt :The diffusion maps algorithm has been shown to be efficient in recovering the underlying states of synthetic systems [12,14]. It has also been employed in several studies to analyze real complex systems [31,32]. In particular, the work in [14] presented an application of the ideas described in this section to the prediction of epileptic seizures from intracranial electroencephalographic (iEEG) signals.
Diffusion-based kernel methods on Euclidean metric measure spaces
2016, Applied and Computational Harmonic AnalysisFunctional diffusion maps
2024, Statistics and Computing