Skip to main content

Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7447))

Abstract

Subsequence matching has appeared to be an ideal approach for solving many problems related to the fields of data mining and similarity retrieval. It has been shown that almost any data class (audio, image, biometrics, signals) is or can be represented by some kind of time series or string of symbols, which can be seen as an input for various subsequence matching approaches. The variety of data types, specific tasks and their solutions is so wide that their proper comparison and combination suitable for a particular task might be very complicated and time-consuming. In this work, we present a new generic Subsequence Matching Framework (SMF) that tries to overcome the aforementioned problem by a uniform frame that simplifies and speeds up the design, development and evaluation of subsequence matching related systems. We identify several relatively separate subtasks solved differently over the literature and SMF enables to combine them in a straightforward manner achieving new quality and efficiency. The strictly modular architecture and openness of SMF enables also involvement of efficient solutions from different fields, for instance advanced metric-based indexes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Keogh, E., Kasetty, S.: On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In: Proceedings of ACM SIGKDD 2002, pp. 102–111. ACM Press (2002)

    Google Scholar 

  2. Keogh, E., Zhu, Q., Hu, B., Hay, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR Time Series Classification/Clustering Homepage (2011)

    Google Scholar 

  3. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Springer (2006)

    Google Scholar 

  4. Novak, D., Batko, M., Zezula, P.: Metric Index: An Efficient and Scalable Solution for Precise and Approximate Similarity Search. Information Systems 36(4), 721–733 (2011)

    Article  Google Scholar 

  5. Novak, D., Volny, P., Zezula, P.: Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency. Technical report, arXiv:1206.2510v1 (2012)

    Google Scholar 

  6. Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast Subsequence Matching in Time-Series Databases. ACM SIGMOD Record 23(2), 419–429 (1994)

    Article  Google Scholar 

  7. Guttman, A.: R-Trees: A Dynamic Index Structure for Spacial Searching. ACM SIGMOD Record 14(2), 47–57 (1984)

    Article  Google Scholar 

  8. Moon, Y.S., Whang, K.Y., Loh, W.K.: Duality-Based Subsequence Matching in Time-Series Databases. In: Proceedings of the 17th International Conference on Data Engineering, p. 263 (2001)

    Google Scholar 

  9. Moon, Y.S., Whang, K.Y., Han, W.S.: General Match: A Subsequence Matching Method in Time-series Databases Based on Generalized Windows. In: International Conference on Management of Data, p. 382 (2002)

    Google Scholar 

  10. Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowledge and Information Systems 7(3), 358–386 (2004)

    Article  Google Scholar 

  11. Han, W.S., Lee, J., Moon, Y.S., Jiang, H.: Ranked Subsequence Matching in Time-series Databases. In: Proceedings VLDB 2007, pp. 423–434. ACM (2007)

    Google Scholar 

  12. Chan, K.P., Fu, A.W.C.: Efficient Time Series Matching by Wavelets. In: Proceedings ICDE 1999, pp. 126–133 (1999)

    Google Scholar 

  13. Korn, F., Jagadish, H.V., Faloutsos, C.: Efficiently supporting ad hoc queries in large datasets of time sequences. ACM SIGMOD Record 26(2), 289–300 (1997)

    Article  Google Scholar 

  14. Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowledge and Information Systems 3(3), 263–286 (2001)

    Article  MATH  Google Scholar 

  15. Perng, C.S., Wang, H., Zhang, S.R., Parker, D.S.: Landmarks: A New Model for Similarity-based Pattern Querying in Time Series Databases. In: Proceedings of ICDE 2000, pp. 33–42. IEEE Computer Society, Washington, DC (2000)

    Google Scholar 

  16. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics Speech and Signal Processing 26(1), 43–49 (1978)

    Article  MATH  Google Scholar 

  17. Chen, L., Ng, R.: On the Marriage of Lp-norms and Edit Distance. In: Proceedings of VLDB 2004, pp. 792–803 (2004)

    Google Scholar 

  18. Shieh, J., Keogh, E.: i SAX. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, p. 623. ACM Press, New York (2008)

    Chapter  Google Scholar 

  19. Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: Indexing and Mining One Billion Time Series. In: 2010 IEEE International Conference on Data Mining, pp. 58–67. IEEE (2010)

    Google Scholar 

  20. Batko, M., Novak, D., Zezula, P.: MESSIF: Metric Similarity Search Implementation Framework. In: Thanos, C., Borri, F., Candela, L. (eds.) Digital Libraries: R&D. LNCS, vol. 4877, pp. 1–10. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Novak, D., Volny, P., Zezula, P. (2012). Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32597-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32597-7_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32596-0

  • Online ISBN: 978-3-642-32597-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics