StreamXM: An Adaptive Partitional Clustering Solution for Evolving Data Streams

Anderson, Robert; Koh, Yun Sing

doi:10.1007/978-3-319-22729-0_21

StreamXM: An Adaptive Partitional Clustering Solution for Evolving Data Streams

Robert Anderson¹⁵ &
Yun Sing Koh¹⁵

Conference paper
First Online: 01 January 2015

1727 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9263))

Abstract

A challenge imposed by continuously arriving data streams is to analyze them and to modify the models that explain them as new data arrives. We propose StreamXM, a stream clustering technique that does not require an arbitrary selection of number of clusters, repeated and expensive heuristics or in-depth prior knowledge of the data to create an informed clustering that relates to the data. It allows a clustering that can adapt its number of classes to those present in the underlying distribution. In this paper, we propose two different variants of StreamXM and compare them against a current, state-of-the-art technique, StreamKM. We evaluate our proposed techniques using both synthetic and real world datasets. From our results, we show StreamXM and StreamKM run in similar time and with similar accuracy when running with similar numbers of clusters. We show our algorithms can provide superior stream clustering if true clusters are not known or if emerging or disappearing concepts will exist within the data stream.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Data stream mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 759–787. Springer, US (2010)
Google Scholar
Miller, Z., Dickinson, B., Deitrick, W., Hu, W., Wang, A.H.: Twitter spammer detection using data stream clustering. Inf. Sci. 260, 64–73 (2014)
Article Google Scholar
Hanagandi, V., Dhar, A., Buescher, K.: Density-based clustering and radial basis function modeling to generate credit card fraud scores. In: Proceedings of the IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering, pp. 247–251. IEEE (1996)
Google Scholar
Leung, K., Leckie, C.: Unsupervised anomaly detection in network intrusion detection using clusters. In: Proceedings of the Twenty-Eighth Australasian Conference on Computer Science - Volume 38. ACSC 2005, pp. 333–342. Australian Computer Society, Inc., Darlinghurst (2005)
Google Scholar
Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: a clustering algorithm for data streams. J. Exp. Algorithmics 17, 2.4:2.1–2.4:2.30 (2012)
Article Google Scholar
Wang, C.D., Lai, J.H., Huang, D., Zheng, W.S.: Svstream: a support vector-based algorithm for clustering data streams. IEEE Trans. Knowl. Data Eng. 25, 1410–1424 (2013)
Article Google Scholar
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases-Volume 29, VLDB Endowment, pp. 81–92 (2003)
Google Scholar
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SDM. vol. 6, SIAM, pp. 326–337 (2006)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press (1996)
Google Scholar
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15, 515–528 (2003)
Article Google Scholar
Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, pp. 727–734 (2000)
Google Scholar
Lloyd, S.: Least squares quantization in pcm. IEEE Trans. Inf. Theor. 28, 129–137 (1982)
Article MathSciNet MATH Google Scholar
Pelleg, D., Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281. ACM (1999)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge university press, Cambridge (2008)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Auckland, Auckland, New Zealand
Robert Anderson & Yun Sing Koh

Authors

Robert Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Yun Sing Koh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yun Sing Koh .

Editor information

Editors and Affiliations

University of Science and Technology, Rolla, Missouri, USA
Sanjay Madria
Osaka University, Osaka, Japan
Takahiro Hara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anderson, R., Koh, Y.S. (2015). StreamXM: An Adaptive Partitional Clustering Solution for Evolving Data Streams. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-22729-0_21
Published: 05 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22728-3
Online ISBN: 978-3-319-22729-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics