ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
Data & Knowledge Engineering
Volume 58, Issue 2, August 2006, Pages 180-204
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (522 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.datak.2005.05.009    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2005 Elsevier B.V. All rights reserved.

Online clustering of parallel data streams

Jürgen Beringera, E-mail The Corresponding Author and Eyke HüllermeierCorresponding Author Contact Information, a, E-mail The Corresponding Author

aFakultät für Informatik, Otto-von-Guericke-Universität, Magdeburg, Germany

Received 10 March 2005; 
accepted 25 May 2005. 
Available online 24 June 2005.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

In recent years, the management and processing of so-called data streams has become a topic of active research in several fields of computer science such as, e.g., distributed systems, database systems, and data mining. A data stream can roughly be thought of as a transient, continuously increasing sequence of time-stamped data. In this paper, we consider the problem of clustering parallel streams of real-valued data, that is to say, continuously evolving time series. In other words, we are interested in grouping data streams the evolution over time of which is similar in a specific sense. In order to maintain an up-to-date clustering structure, it is necessary to analyze the incoming data in an online manner, tolerating not more than a constant time delay. For this purpose, we develop an efficient online version of the classical K-means clustering algorithm. Our method’s efficiency is mainly due to a scalable online transformation of the original data which allows for a fast computation of approximate distances between streams.

Keywords: Data mining; Clustering; Data streams; Fuzzy sets

Article Outline

1. Introduction
2. Background
2.1. The data stream model
2.2. Clustering
2.3. Related work
3. Preprocessing and maintaining data streams
3.1. Data streams and sliding windows
3.2. Normalization
3.3. Discrete Fourier transform
3.4. Computation of DFT coefficients
3.5. Distance approximation and smoothing
4. Clustering data streams
5. Fuzzy clustering
6. Implementation
7. Experimental validation
7.1. Synthetic data
7.1.1. Efficiency
7.1.2. Quality
7.2. Real-world data: stock rates
8. Summary
Acknowledgements
References
Vitae


















 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.