Abstract
Subsequence matching has appeared to be an ideal approach for solving many problems related to the fields of data mining and similarity retrieval. It has been shown that almost any data class (audio, image, biometrics, signals) is or can be represented by some kind of time series or string of symbols, which can be seen as an input for various subsequence matching approaches. The variety of data types, specific tasks and their solutions is so wide that their proper comparison and combination suitable for a particular task might be very complicated and time-consuming. In this work, we present a new generic Subsequence Matching Framework (SMF) that tries to overcome the aforementioned problem by a uniform frame that simplifies and speeds up the design, development and evaluation of subsequence matching related systems. We identify several relatively separate subtasks solved differently over the literature and SMF enables to combine them in a straightforward manner achieving new quality and efficiency. The strictly modular architecture and openness of SMF enables also involvement of efficient solutions from different fields, for instance advanced metric-based indexes.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Keogh, E., Kasetty, S.: On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In: Proceedings of ACM SIGKDD 2002, pp. 102–111. ACM Press (2002)
Keogh, E., Zhu, Q., Hu, B., Hay, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR Time Series Classification/Clustering Homepage (2011)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Springer (2006)
Novak, D., Batko, M., Zezula, P.: Metric Index: An Efficient and Scalable Solution for Precise and Approximate Similarity Search. Information Systems 36(4), 721–733 (2011)
Novak, D., Volny, P., Zezula, P.: Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency. Technical report, arXiv:1206.2510v1 (2012)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast Subsequence Matching in Time-Series Databases. ACM SIGMOD Record 23(2), 419–429 (1994)
Guttman, A.: R-Trees: A Dynamic Index Structure for Spacial Searching. ACM SIGMOD Record 14(2), 47–57 (1984)
Moon, Y.S., Whang, K.Y., Loh, W.K.: Duality-Based Subsequence Matching in Time-Series Databases. In: Proceedings of the 17th International Conference on Data Engineering, p. 263 (2001)
Moon, Y.S., Whang, K.Y., Han, W.S.: General Match: A Subsequence Matching Method in Time-series Databases Based on Generalized Windows. In: International Conference on Management of Data, p. 382 (2002)
Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowledge and Information Systems 7(3), 358–386 (2004)
Han, W.S., Lee, J., Moon, Y.S., Jiang, H.: Ranked Subsequence Matching in Time-series Databases. In: Proceedings VLDB 2007, pp. 423–434. ACM (2007)
Chan, K.P., Fu, A.W.C.: Efficient Time Series Matching by Wavelets. In: Proceedings ICDE 1999, pp. 126–133 (1999)
Korn, F., Jagadish, H.V., Faloutsos, C.: Efficiently supporting ad hoc queries in large datasets of time sequences. ACM SIGMOD Record 26(2), 289–300 (1997)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowledge and Information Systems 3(3), 263–286 (2001)
Perng, C.S., Wang, H., Zhang, S.R., Parker, D.S.: Landmarks: A New Model for Similarity-based Pattern Querying in Time Series Databases. In: Proceedings of ICDE 2000, pp. 33–42. IEEE Computer Society, Washington, DC (2000)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics Speech and Signal Processing 26(1), 43–49 (1978)
Chen, L., Ng, R.: On the Marriage of Lp-norms and Edit Distance. In: Proceedings of VLDB 2004, pp. 792–803 (2004)
Shieh, J., Keogh, E.: i SAX. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, p. 623. ACM Press, New York (2008)
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: Indexing and Mining One Billion Time Series. In: 2010 IEEE International Conference on Data Mining, pp. 58–67. IEEE (2010)
Batko, M., Novak, D., Zezula, P.: MESSIF: Metric Similarity Search Implementation Framework. In: Thanos, C., Borri, F., Candela, L. (eds.) Digital Libraries: R&D. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Novak, D., Volny, P., Zezula, P. (2012). Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32597-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-32597-7_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32596-0
Online ISBN: 978-3-642-32597-7
eBook Packages: Computer ScienceComputer Science (R0)