Abstract
A growing number of real world applications deal with multiple evolving data streams. In this paper, a framework for clustering over evolving data streams is proposed taking advantage of recent-biased approximation. In recent-biased approximation, more details are preserved for recent data and fewer coefficients are kept for the whole data stream, which improves the efficiency of clustering and space usability greatly. Our framework consists of two phases. One is an online phase which approximates data streams and maintains the summary statistics incrementally. The other is an offline clustering phase which is able to perform dynamic clustering over data streams on all possible time horizons. As shown in complexity analyses and also validated by our empirical studies, our framework performed efficiently in the data stream environment while producing clustering results of very high quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: On Demand Classification of Data Streams. In: 10th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 503–508. ACM Press, Seattle (2004)
Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient Computation of Frequent and Top-k Elements in Data Streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 398–412. Springer, Heidelberg (2004)
Gaber, M.M., Zaslavsky, A., krishnaswamy, S.: Mining Data Streams: A Review. ACM SIGMOD Record 34(2), 18–26 (2005)
O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-Data Algorithms for High-Quality Clustering. In: 18th International Conference of Data Engineering, pp. 685–694. IEEE Press, San Jose (2002)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Clustering Evolving Data Streams. In: 29th International Conference on Very Large Data Bases, pp. 81–92. Morgan Kaufmann, Berlin (2003)
Chakrabarti, K., Keogh, E.J., Mehrotra, S., Pazzani, M.J.: Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM Trans. Database Systems 27(2), 188–228 (2002)
Rafiei, D.: On Similarity-Based Queries for Time Series Data. In: 15th IEEE International Conference on Data Engineering. IEEE Press, Sydney (1999)
Popivanov, I., Miller, R.J.: Similarity Search Over Time Series Data Using Wavelets. In: 18th IEEE International Conference on Data Engineering, pp. 802–813. IEEE Press, San Jose (2002)
Keogh, E.J., Pazzani, M.J.: An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 239–243. ACM Press, Montréal (1998)
Zhu, Y.Y., Shasha, D.: StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In: 28th International Conference on Very Large Data Bases, pp. 358–369. Morgan Kaufmann, Hong Kong (2002)
Bulut, A., Singh, A.K.: SWAT: Hierarchical stream summarization in large networks. In: 19th IEEE International Conference on Data Engineering, pp. 303–314. IEEE Press, Bangalore (2003)
Yang, J.: Dynamic Clustering of Evolving Streams with a Single Pass. In: 19th International Conference of Data Mining, pp. 695–697. IEEE Press, Bangalore (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fan, W., Koyanagi, Y., Asakura, K., Watanabe, T. (2009). Clustering over Evolving Data Streams Based on Online Recent-Biased Approximation. In: Richards, D., Kang, BH. (eds) Knowledge Acquisition: Approaches, Algorithms and Applications. PKAW 2008. Lecture Notes in Computer Science(), vol 5465. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01715-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-01715-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01714-8
Online ISBN: 978-3-642-01715-5
eBook Packages: Computer ScienceComputer Science (R0)