Skip to main content

StreamXM: An Adaptive Partitional Clustering Solution for Evolving Data Streams

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9263))

Abstract

A challenge imposed by continuously arriving data streams is to analyze them and to modify the models that explain them as new data arrives. We propose StreamXM, a stream clustering technique that does not require an arbitrary selection of number of clusters, repeated and expensive heuristics or in-depth prior knowledge of the data to create an informed clustering that relates to the data. It allows a clustering that can adapt its number of classes to those present in the underlying distribution. In this paper, we propose two different variants of StreamXM and compare them against a current, state-of-the-art technique, StreamKM. We evaluate our proposed techniques using both synthetic and real world datasets. From our results, we show StreamXM and StreamKM run in similar time and with similar accuracy when running with similar numbers of clusters. We show our algorithms can provide superior stream clustering if true clusters are not known or if emerging or disappearing concepts will exist within the data stream.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Data stream mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 759–787. Springer, US (2010)

    Google Scholar 

  2. Miller, Z., Dickinson, B., Deitrick, W., Hu, W., Wang, A.H.: Twitter spammer detection using data stream clustering. Inf. Sci. 260, 64–73 (2014)

    Article  Google Scholar 

  3. Hanagandi, V., Dhar, A., Buescher, K.: Density-based clustering and radial basis function modeling to generate credit card fraud scores. In: Proceedings of the IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering, pp. 247–251. IEEE (1996)

    Google Scholar 

  4. Leung, K., Leckie, C.: Unsupervised anomaly detection in network intrusion detection using clusters. In: Proceedings of the Twenty-Eighth Australasian Conference on Computer Science - Volume 38. ACSC 2005, pp. 333–342. Australian Computer Society, Inc., Darlinghurst (2005)

    Google Scholar 

  5. Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: a clustering algorithm for data streams. J. Exp. Algorithmics 17, 2.4:2.1–2.4:2.30 (2012)

    Article  Google Scholar 

  6. Wang, C.D., Lai, J.H., Huang, D., Zheng, W.S.: Svstream: a support vector-based algorithm for clustering data streams. IEEE Trans. Knowl. Data Eng. 25, 1410–1424 (2013)

    Article  Google Scholar 

  7. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases-Volume 29, VLDB Endowment, pp. 81–92 (2003)

    Google Scholar 

  8. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SDM. vol. 6, SIAM, pp. 326–337 (2006)

    Google Scholar 

  9. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press (1996)

    Google Scholar 

  10. Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15, 515–528 (2003)

    Article  Google Scholar 

  11. Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, pp. 727–734 (2000)

    Google Scholar 

  12. Lloyd, S.: Least squares quantization in pcm. IEEE Trans. Inf. Theor. 28, 129–137 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  13. Pelleg, D., Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281. ACM (1999)

    Google Scholar 

  14. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge university press, Cambridge (2008)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Sing Koh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Anderson, R., Koh, Y.S. (2015). StreamXM: An Adaptive Partitional Clustering Solution for Evolving Data Streams. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22729-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22728-3

  • Online ISBN: 978-3-319-22729-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics