Skip to main content

Clustering over Evolving Data Streams Based on Online Recent-Biased Approximation

  • Conference paper
Knowledge Acquisition: Approaches, Algorithms and Applications (PKAW 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5465))

Included in the following conference series:

Abstract

A growing number of real world applications deal with multiple evolving data streams. In this paper, a framework for clustering over evolving data streams is proposed taking advantage of recent-biased approximation. In recent-biased approximation, more details are preserved for recent data and fewer coefficients are kept for the whole data stream, which improves the efficiency of clustering and space usability greatly. Our framework consists of two phases. One is an online phase which approximates data streams and maintains the summary statistics incrementally. The other is an offline clustering phase which is able to perform dynamic clustering over data streams on all possible time horizons. As shown in complexity analyses and also validated by our empirical studies, our framework performed efficiently in the data stream environment while producing clustering results of very high quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: On Demand Classification of Data Streams. In: 10th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 503–508. ACM Press, Seattle (2004)

    Google Scholar 

  2. Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient Computation of Frequent and Top-k Elements in Data Streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 398–412. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Gaber, M.M., Zaslavsky, A., krishnaswamy, S.: Mining Data Streams: A Review. ACM SIGMOD Record 34(2), 18–26 (2005)

    Article  MATH  Google Scholar 

  4. O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-Data Algorithms for High-Quality Clustering. In: 18th International Conference of Data Engineering, pp. 685–694. IEEE Press, San Jose (2002)

    Chapter  Google Scholar 

  5. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Clustering Evolving Data Streams. In: 29th International Conference on Very Large Data Bases, pp. 81–92. Morgan Kaufmann, Berlin (2003)

    Google Scholar 

  6. Chakrabarti, K., Keogh, E.J., Mehrotra, S., Pazzani, M.J.: Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM Trans. Database Systems 27(2), 188–228 (2002)

    Article  Google Scholar 

  7. Rafiei, D.: On Similarity-Based Queries for Time Series Data. In: 15th IEEE International Conference on Data Engineering. IEEE Press, Sydney (1999)

    Google Scholar 

  8. Popivanov, I., Miller, R.J.: Similarity Search Over Time Series Data Using Wavelets. In: 18th IEEE International Conference on Data Engineering, pp. 802–813. IEEE Press, San Jose (2002)

    Google Scholar 

  9. Keogh, E.J., Pazzani, M.J.: An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 239–243. ACM Press, Montréal (1998)

    Google Scholar 

  10. Zhu, Y.Y., Shasha, D.: StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In: 28th International Conference on Very Large Data Bases, pp. 358–369. Morgan Kaufmann, Hong Kong (2002)

    Chapter  Google Scholar 

  11. Bulut, A., Singh, A.K.: SWAT: Hierarchical stream summarization in large networks. In: 19th IEEE International Conference on Data Engineering, pp. 303–314. IEEE Press, Bangalore (2003)

    Google Scholar 

  12. Yang, J.: Dynamic Clustering of Evolving Streams with a Single Pass. In: 19th International Conference of Data Mining, pp. 695–697. IEEE Press, Bangalore (2003)

    Google Scholar 

  13. http://kuo.swcp.com/stocks/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fan, W., Koyanagi, Y., Asakura, K., Watanabe, T. (2009). Clustering over Evolving Data Streams Based on Online Recent-Biased Approximation. In: Richards, D., Kang, BH. (eds) Knowledge Acquisition: Approaches, Algorithms and Applications. PKAW 2008. Lecture Notes in Computer Science(), vol 5465. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01715-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01715-5_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01714-8

  • Online ISBN: 978-3-642-01715-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics