Elsevier

Neurocomputing

Volume 167, 1 November 2015, Pages 8-17
Neurocomputing

A multi-scale smoothing kernel for measuring time-series similarity

https://doi.org/10.1016/j.neucom.2014.08.099Get rights and content

Abstract

In this paper a kernel for time-series data is introduced so that it can be used for any data mining task that relies on a similarity or distance metric. The main idea of our kernel is that it should recognize as highly similar time-series that are essentially the same but may be slightly perturbed from each other: for example, if one series is shifted with respect to the other or if it slightly misaligned. Namely, our kernel tries to focus on the shape of the time-series and ignores small perturbations such as misalignments or shifts. First, a recursive formulation of the kernel directly based on its definition is proposed. Then it is shown how to efficiently compute the kernel using an equivalent matrix-based formulation. To validate the proposed kernel three experiments have been carried out. As an initial step, several synthetic datasets have been generated from UCR time-series repository and the KDD challenge of 2007 with the purpose of validating the kernel-derived distance over shifted time-series. Also, the kernel has been applied to the original UCR time-series to analyze its potential in time-series classification in conjunction with Support Vector Machines. Finally, two real-world applications related to ozone concentration in atmosphere and electricity demand have been considered.

Introduction

Time-series analysis is an important problem with application in domains as diverse as engineering, medicine, astronomy or finance [11], [29]. In particular, the problem of time-series classification and prediction is attracting a lot of attention among researchers. One of the most successful and popular methods for classification and prediction are kernel-based methods such as support vector machines (SVM) [26], [12], [35], [25]. Despite their popularity, there seem to be only a handful of kernels designed for time-series. This paper tries to fill this gap, and proposes a kernel exclusively designed for time-series. Moreover, using a standard trick, we are able to convert our kernel into a distance for time-series, therefore allowing us to use our kernel in distance-based algorithms as well.

A crucial aspect when dealing with time-series is to find a good measure, either a kernel similarity or a distance, that captures the essence of the time-series according to the domain of application.

For example, Euclidean distance between time-series is commonly used due to its computational efficiency; however, it is very brittle and small shifts in one time-series can result in huge changes in the Euclidean distance. Therefore, more sophisticated distances have been devised and designed to be more robust to small fluctuations of the input time-series. Notably, Dynamic Time Warping (DTW) [30] is held as the state-of-the-art method for comparing the similarity among time-series. The DTW is very powerful in the sense that it can deal optimally with contractions, expansions and shifts in time-series in addition to being able to handle time-series of different lengths. Unfortunately, computing the DTW distance is prohibitively costly for many practical applications [33]. Moreover, it cannot be used to define a positive definite kernel since it violates the triangle inequality [3].

Therefore, researchers are coming up with distances for time-series that approximate the DTW at lower computational costs either by adding global path constraints [30], [36] or by reducing the number of instances e.g. in nearest neighbor classification [35].

In this paper we introduce a new kernel, called MUlti-Scale Smoothing Kernel (MUSS). The basic idea behind our kernel is to take into account many smoothed versions of the time-series and compute the similarity of the time-series as the aggregation of the similarities of the multiple smoothed versions of the original time-series. The underlying idea is that by smoothing the original time-series we will get rid of slight perturbations, and so the basic trends will become apparent and more easily detected. The main strength of this kernel is the integration of multiple time-scales, that is, at a high level, the MUSS kernel is a combination of linear kernels obtained by using several smoothed versions over different scales from the original time-series. In a sense, the kernel-derived distance that is proposed here tries to fix the brittleness of Euclidean distance without incurring in the high computational costs of DTW. Moreover, our kernel can easily be adapted to deal with multidimensional time series by considering multi-variate versions of the point-wise distance between time-series. In addition, we can derive a distance metric from the kernel definition that satisfies the triangle inequality.

The main goal of the proposed kernel is to recognize as similar time-series that may be slightly perturbed from one another. Namely, it tries to focus on the shape of the time-series and not so much on the details. It is conceivable that small errors in measurement or delivery of data may result in slight shifts or misalignments of time-series. Consequently, any data that is sent through complicated machinery can suffer from this type of misalignment as for example astronomic data, and could benefit from our kernel.

In this work, two ways of computing the kernel are presented: a recursive formulation and an equivalent matrix-based formulation. To evaluate the proposed kernel three experiments have been carried out. As an initial step, several synthetic datasets have been generated from UCR time-series repository [20] and the KDD challenge of 2007 [19], with the purpose of validating our kernel-derived distance over shifted time-series. In particular, a comparison with DTW and Euclidean distances shows that our kernel-derived distance outperforms the Euclidean distance and is competitive with respect to the DTW distance while having a much lower computational cost. The DTW distance is designed to deal with misalignments and shifts optimally. Therefore, our objective is not to beat the DTW, but to approach its performance without incurring its high computational cost. On the other hand, the Euclidean distance has been considered as baseline distance. In the second experiment, the proposed kernel has been applied to the original UCR time-series [20] to analyze its potential in time-series classification using an SVM. In this case, the proposed kernel shows a remarkable performance when comparing with a kernel based on DTW [10] and a linear kernel. Finally, two real-world applications related to ozone concentration in atmosphere and electricity demand have been considered to show the performance of the MUSS kernel over very specific datasets. In this case, an accuracy ranging from 97% to 99% has been obtained.

The paper is structured as follows. Section 2 presents the most relevant related works found in literature. Section 3 describes our time-series kernel and its corresponding derived distance. The experimental results are presented in 4 Results, 5 Time-series classification, 6 Real-world applications: ozone concentration in atmosphere and electricity demand classification. Finally, Section 7 concludes with a summary of our main contributions and possible directions for future work.

Section snippets

Related work

Similarity and distance measures for time-series are a crucial ingredient in solving time-series classification and forecasting problems [29], [15]. For this reason, many distances have been proposed. For example, [1] defines a distance between two time-series representing the convexities/concavities of two shape contours. In [4] the authors modify the Euclidean distance with a correction factor based on the complexity of the input time-series.

The success and popularity of Support Vector

Kernel description

This section presents the notation used in this paper and also provides the definitions underlying the proposed kernel.

Results

This section presents the results obtained by the application of the MUSS kernel to forty datasets for measuring the similarity in shifted time-series. Section 4.1 provides a detailed description of all datasets used in the experiments. In Section 4.2 the kernel has been applied to forty time-series to validate its potential for separating classes in shifted time series. Finally, a statistical analysis of the results is reported in Section 4.3.

Time-series classification

In our second experiment, we use Support Vector Machines to perform classification using three different kernels: our MUSS kernel, the GA-DTW kernel based on global alignments in [8], and a linear kernel. We compare the prediction accuracy achieved by these three kernels over the 20 datasets of the UCR repository [20]. The results in this section show that our kernel is able to achieve good results for general time-series (not necessarily shifted). We have used the implementation LIBSVM [5] of

Real-world applications: ozone concentration in atmosphere and electricity demand classification

Finally, we present results in two real-world applications: atmospheric pollutants (ozone), and electricity demand. The pattern recognition in ozone time-series data is an important task as it allows governments to activate alert protocols and implement good environmental policies if high ozone concentration levels are predicted. On the other hand, electricity-producing companies are interested in predicting spikes of demand in order to schedule the energy production to maximize their profit.

Conclusions

In this paper we have presented the MUSS kernel for time-series data and its associated distance metric. Initial experiments show promise in detecting similarity between time-series. The MUSS kernel has been compared to the Euclidean distance as a reference distance and the DTW distance as one of the most competitive distances that exist in the literature. Also, the MUSS kernel has been used in conjunction with the SVM classifier to time-series classification and compared with the GA-DTW kernel

Acknowledgments

This work is partially supported by the Spanish Ministry of Science and Technology contracts TIN2011-27479-C04-03 and TIN2011-28956-C02, by the Generalitat de Catalunya 2009-SGR-1428 (LARCA), by the Junta de Andalucía P12-TIC-1728, by the Pablo de Olavide University APPB813097 and by the EU PASCAL2 Network of Excellence (FP7-ICT-216886). Part of this work was done while A.T. and M.A. were at Columbia University in New York and University of California in San Diego, supported in part by grant

Alicia Troncoso received the Ph.D. degree in Computer Science from the University of Seville, Spain, in 2005. From 2002 to 2005, she was with the Department of Computer Science, University of Seville. Presently, she is an Associate Professor at the Pablo de Olavide University of Seville. Her primary areas of interest are time series, data mining and evolutionary computation.

References (36)

  • C. Bahlmann, B. Haasdonk, H. Burkhardt, On-line handwriting recognition with support vector machines—a kernel approach,...
  • G.E.A.P.A. Batista, X. Wang, E.J. Keogh, A complexity-invariant distance measure for time series, in: SIAM...
  • C.C. Chang, C.J. Lin, LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (3) (2011):...
  • H. Chen, F. Teng, P. Tino, X. Yao, Model-based kernel for efficient time series analysis, in: Proceedings of the 19th...
  • Y. Chen, B. Yu, E. Keogh, G. E.A.P.A. Batista. DTW-D: time series semi-supervised learning from a single example, in:...
  • M. Cuturi, Fast global alignment kernels, in: Internacional Conference on Machine Learning,...
  • M. Cuturi, A. Doucet, Autoregressive kernels for time series, 2011,...
  • M. Cuturi, J.P. Vert, O. Birkenes, T. Matsui, A kernel for time series based on global alignments, in: IEEE...
  • Cited by (12)

    • Multi-scale local LSSVM based spatiotemporal modeling and optimal control for the goethite process

      2020, Neurocomputing
      Citation Excerpt :

      For traditional LS-SVM, it is difficult to obtain good results for data samples with different trends. To solve this problem, studies on multi-kernel learning [22,23] and multi-scale kernel learning [24–26] were investigated. Multi-kernel learning combines different kernel functions to obtain the advantages of different kernels which can obtain better performance for data samples with different trends.

    • A multi-scale kernel learning method and its application in image classification

      2017, Neurocomputing
      Citation Excerpt :

      Zheng [30] and Yang [31] proposed multi-scale support vector regression which was used to estimate functions and forecast the time series. In addition, multi-scale kernel and SVMs can be combined together and applied to image compression [32], hot spot detection and modeling [33] and measuring time-series similarity [38]. Our paper uses polarization as the objective function to construct an optimal problem which can be used to learn the proposed multi-scale kernel.

    View all citing articles on Scopus

    Alicia Troncoso received the Ph.D. degree in Computer Science from the University of Seville, Spain, in 2005. From 2002 to 2005, she was with the Department of Computer Science, University of Seville. Presently, she is an Associate Professor at the Pablo de Olavide University of Seville. Her primary areas of interest are time series, data mining and evolutionary computation.

    Jose C. Riquelme received the M.Sc. degree in Mathematics and the Ph.D. degree in Computer Science from the University of Seville, Spain. Since 1987 he has been with the Department of Computer Science, University of Seville, where he is currently Full Professor. His primary areas of interest are data mining, machine learning techniques, and evolutionary computation.

    Marta Arias received a Ph.D. in Computer Science from Tufts University (Boston, USA) in 2004. From 2004 to 2007 she was a Research Scientist at the Center for Computational Learning Systems from Columbia University (New York City, USA). In 2007 she moved to the Technical University of Catalonia (Barcelona, Spain) where she currently is an Associate Professor. Her main interests are data mining and machine learning.

    View full text