A multi-scale smoothing kernel for measuring time-series similarity
Introduction
Time-series analysis is an important problem with application in domains as diverse as engineering, medicine, astronomy or finance [11], [29]. In particular, the problem of time-series classification and prediction is attracting a lot of attention among researchers. One of the most successful and popular methods for classification and prediction are kernel-based methods such as support vector machines (SVM) [26], [12], [35], [25]. Despite their popularity, there seem to be only a handful of kernels designed for time-series. This paper tries to fill this gap, and proposes a kernel exclusively designed for time-series. Moreover, using a standard trick, we are able to convert our kernel into a distance for time-series, therefore allowing us to use our kernel in distance-based algorithms as well.
A crucial aspect when dealing with time-series is to find a good measure, either a kernel similarity or a distance, that captures the essence of the time-series according to the domain of application.
For example, Euclidean distance between time-series is commonly used due to its computational efficiency; however, it is very brittle and small shifts in one time-series can result in huge changes in the Euclidean distance. Therefore, more sophisticated distances have been devised and designed to be more robust to small fluctuations of the input time-series. Notably, Dynamic Time Warping (DTW) [30] is held as the state-of-the-art method for comparing the similarity among time-series. The DTW is very powerful in the sense that it can deal optimally with contractions, expansions and shifts in time-series in addition to being able to handle time-series of different lengths. Unfortunately, computing the DTW distance is prohibitively costly for many practical applications [33]. Moreover, it cannot be used to define a positive definite kernel since it violates the triangle inequality [3].
Therefore, researchers are coming up with distances for time-series that approximate the DTW at lower computational costs either by adding global path constraints [30], [36] or by reducing the number of instances e.g. in nearest neighbor classification [35].
In this paper we introduce a new kernel, called MUlti-Scale Smoothing Kernel (MUSS). The basic idea behind our kernel is to take into account many smoothed versions of the time-series and compute the similarity of the time-series as the aggregation of the similarities of the multiple smoothed versions of the original time-series. The underlying idea is that by smoothing the original time-series we will get rid of slight perturbations, and so the basic trends will become apparent and more easily detected. The main strength of this kernel is the integration of multiple time-scales, that is, at a high level, the MUSS kernel is a combination of linear kernels obtained by using several smoothed versions over different scales from the original time-series. In a sense, the kernel-derived distance that is proposed here tries to fix the brittleness of Euclidean distance without incurring in the high computational costs of DTW. Moreover, our kernel can easily be adapted to deal with multidimensional time series by considering multi-variate versions of the point-wise distance between time-series. In addition, we can derive a distance metric from the kernel definition that satisfies the triangle inequality.
The main goal of the proposed kernel is to recognize as similar time-series that may be slightly perturbed from one another. Namely, it tries to focus on the shape of the time-series and not so much on the details. It is conceivable that small errors in measurement or delivery of data may result in slight shifts or misalignments of time-series. Consequently, any data that is sent through complicated machinery can suffer from this type of misalignment as for example astronomic data, and could benefit from our kernel.
In this work, two ways of computing the kernel are presented: a recursive formulation and an equivalent matrix-based formulation. To evaluate the proposed kernel three experiments have been carried out. As an initial step, several synthetic datasets have been generated from UCR time-series repository [20] and the KDD challenge of 2007 [19], with the purpose of validating our kernel-derived distance over shifted time-series. In particular, a comparison with DTW and Euclidean distances shows that our kernel-derived distance outperforms the Euclidean distance and is competitive with respect to the DTW distance while having a much lower computational cost. The DTW distance is designed to deal with misalignments and shifts optimally. Therefore, our objective is not to beat the DTW, but to approach its performance without incurring its high computational cost. On the other hand, the Euclidean distance has been considered as baseline distance. In the second experiment, the proposed kernel has been applied to the original UCR time-series [20] to analyze its potential in time-series classification using an SVM. In this case, the proposed kernel shows a remarkable performance when comparing with a kernel based on DTW [10] and a linear kernel. Finally, two real-world applications related to ozone concentration in atmosphere and electricity demand have been considered to show the performance of the MUSS kernel over very specific datasets. In this case, an accuracy ranging from 97% to 99% has been obtained.
The paper is structured as follows. Section 2 presents the most relevant related works found in literature. Section 3 describes our time-series kernel and its corresponding derived distance. The experimental results are presented in 4 Results, 5 Time-series classification, 6 Real-world applications: ozone concentration in atmosphere and electricity demand classification. Finally, Section 7 concludes with a summary of our main contributions and possible directions for future work.
Section snippets
Related work
Similarity and distance measures for time-series are a crucial ingredient in solving time-series classification and forecasting problems [29], [15]. For this reason, many distances have been proposed. For example, [1] defines a distance between two time-series representing the convexities/concavities of two shape contours. In [4] the authors modify the Euclidean distance with a correction factor based on the complexity of the input time-series.
The success and popularity of Support Vector
Kernel description
This section presents the notation used in this paper and also provides the definitions underlying the proposed kernel.
Results
This section presents the results obtained by the application of the MUSS kernel to forty datasets for measuring the similarity in shifted time-series. Section 4.1 provides a detailed description of all datasets used in the experiments. In Section 4.2 the kernel has been applied to forty time-series to validate its potential for separating classes in shifted time series. Finally, a statistical analysis of the results is reported in Section 4.3.
Time-series classification
In our second experiment, we use Support Vector Machines to perform classification using three different kernels: our MUSS kernel, the GA-DTW kernel based on global alignments in [8], and a linear kernel. We compare the prediction accuracy achieved by these three kernels over the 20 datasets of the UCR repository [20]. The results in this section show that our kernel is able to achieve good results for general time-series (not necessarily shifted). We have used the implementation LIBSVM [5] of
Real-world applications: ozone concentration in atmosphere and electricity demand classification
Finally, we present results in two real-world applications: atmospheric pollutants (ozone), and electricity demand. The pattern recognition in ozone time-series data is an important task as it allows governments to activate alert protocols and implement good environmental policies if high ozone concentration levels are predicted. On the other hand, electricity-producing companies are interested in predicting spikes of demand in order to schedule the energy production to maximize their profit.
Conclusions
In this paper we have presented the MUSS kernel for time-series data and its associated distance metric. Initial experiments show promise in detecting similarity between time-series. The MUSS kernel has been compared to the Euclidean distance as a reference distance and the DTW distance as one of the most competitive distances that exist in the literature. Also, the MUSS kernel has been used in conjunction with the SVM classifier to time-series classification and compared with the GA-DTW kernel
Acknowledgments
This work is partially supported by the Spanish Ministry of Science and Technology contracts TIN2011-27479-C04-03 and TIN2011-28956-C02, by the Generalitat de Catalunya 2009-SGR-1428 (LARCA), by the Junta de Andalucía P12-TIC-1728, by the Pablo de Olavide University APPB813097 and by the EU PASCAL2 Network of Excellence (FP7-ICT-216886). Part of this work was done while A.T. and M.A. were at Columbia University in New York and University of California in San Diego, supported in part by grant
Alicia Troncoso received the Ph.D. degree in Computer Science from the University of Seville, Spain, in 2005. From 2002 to 2005, she was with the Department of Computer Science, University of Seville. Presently, she is an Associate Professor at the Pablo de Olavide University of Seville. Her primary areas of interest are time series, data mining and evolutionary computation.
References (36)
- et al.
A sparse kernel algorithm for online time series data prediction
Expert Syst. Appl.
(2013) - et al.
Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data miningexperimental analysis of power
Inf. Sci.
(2010) - et al.
GDTW-P-SVMsvariable-length time series analysis using support vector machines
Neurocomputing
(2013) - et al.
Weighted dynamic time warping for time series classification
Pattern Recognit.
(2011) Semi-supervised learning of hidden conditional random fields for time-series classification
Neurocomputing
(2013)- et al.
Reservoir computing approaches to recurrent neural network training
Comput. Sci. Rev.
(2009) - et al.
Design of specific-to-problem kernels and use of kernel weighted k-nearest neighbors for time series modeling
Neurocomputing
(2010) - et al.
On time series features and kernels for machine olfaction
Sensors Actuators B: Chem.
(2012) - et al.
A multiscale representation method for nonrigid shapes with a single closed contour
IEEE Trans. Circuits Syst. Video Technol.
(2004) - et al.
KEELa software tool to assess evolutionary algorithms for data mining problems
Soft Comput.
(2009)
Cited by (12)
A hierarchical agglomerative clustering for product sales forecasting
2023, Decision Analytics JournalMulti-scale local LSSVM based spatiotemporal modeling and optimal control for the goethite process
2020, NeurocomputingCitation Excerpt :For traditional LS-SVM, it is difficult to obtain good results for data samples with different trends. To solve this problem, studies on multi-kernel learning [22,23] and multi-scale kernel learning [24–26] were investigated. Multi-kernel learning combines different kernel functions to obtain the advantages of different kernels which can obtain better performance for data samples with different trends.
A multi-scale kernel learning method and its application in image classification
2017, NeurocomputingCitation Excerpt :Zheng [30] and Yang [31] proposed multi-scale support vector regression which was used to estimate functions and forecast the time series. In addition, multi-scale kernel and SVMs can be combined together and applied to image compression [32], hot spot detection and modeling [33] and measuring time-series similarity [38]. Our paper uses polarization as the objective function to construct an optimal problem which can be used to learn the proposed multi-scale kernel.
Urban Mobility and Knowledge Extraction from Chaotic Time Series Data: A Comparative Analysis for Uncovering COVID-19 Effects
2023, Annals of the American Association of Geographers
Alicia Troncoso received the Ph.D. degree in Computer Science from the University of Seville, Spain, in 2005. From 2002 to 2005, she was with the Department of Computer Science, University of Seville. Presently, she is an Associate Professor at the Pablo de Olavide University of Seville. Her primary areas of interest are time series, data mining and evolutionary computation.
Jose C. Riquelme received the M.Sc. degree in Mathematics and the Ph.D. degree in Computer Science from the University of Seville, Spain. Since 1987 he has been with the Department of Computer Science, University of Seville, where he is currently Full Professor. His primary areas of interest are data mining, machine learning techniques, and evolutionary computation.
Marta Arias received a Ph.D. in Computer Science from Tufts University (Boston, USA) in 2004. From 2004 to 2007 she was a Research Scientist at the Center for Computational Learning Systems from Columbia University (New York City, USA). In 2007 she moved to the Technical University of Catalonia (Barcelona, Spain) where she currently is an Associate Professor. Her main interests are data mining and machine learning.