ABSTRACT
In ubiquitous streaming data sources, such as sensor networks, clustering nodes by the data they produce is an important problem that gives insights on the phenomenon being monitored by such networks. However, if these techniques require data to be gathered centrally, communication and storage requirements are often unbounded. The goal of this paper is to assess the feasibility of computing local clustering at each node, using only neighbors' centroids, as an approximation of the global clustering computed by a centralized process. A local algorithm is proposed to perform clustering of sensors based on the moving average of each node's data over time: the moving average of each node is approximated using memory-less fading average; clustering is based on the furthest point algorithm applied to the centroids computed by the node's direct neighbors. The algorithm was evaluated on a state-of-the-art sensor network simulator, measuring the agreement between local and global clustering. Experimental work on synthetic data with spherical Gaussian clusters is consistently analyzed for different network size, number of clusters and cluster overlapping. Results show a high level of agreement between each node's clustering definitions and the global clustering definition, with special emphasis on separability agreement. Overall, local approaches are able to keep a good approximation of the global clustering, improving privacy among nodes, and decreasing communication and computation load in the network. Hence, the basic requirements for distributed clustering of streaming data sensors recommend that clustering on these settings should be performed locally.
- C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Procs of the 29th Int Conf on Very Large Data Bases, pages 81--92. Morgan Kaufmann, September 2003. Google ScholarDigital Library
- I. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. A Survey on Sensor Networks. IEEE Communications Magazine, 40(8): 102--114, 2002. Google ScholarDigital Library
- P. Baldwin, S. Kohli, E. A. Lee, X. Liu, and Y. Zhao. Modelling of Sensor Nets in Ptolemy II. In Procs of the 3rd Int Symp on Information Processing in Sensor Networks, pages 359--368. ACM Press, 2004. Google ScholarDigital Library
- S. Bandyopadhyay, C. Giannella, U. Maulik, H. Kargupta, K. Liu, and S. Datta. Clustering distributed data streams in peer-to-peer environments. Information Sciences, 176(14): 1952--1985, 2006. Google ScholarDigital Library
- D. Barbará. Requirements for clustering data streams. SIGKDD Explorations, 3(2): 23--27, January 2002. Google ScholarDigital Library
- J. Beringer and E. Hüllermeier. Online clustering of parallel data streams. Data and Knowledge Engineering, 58(2): 180--204, August 2006. Google ScholarDigital Library
- H. Chan, M. Luk, and A. Perrig. Using clustering information for sensor network localization. In Procs of the 1st IEEE International Conference on Distributed Computing in Sensor Systems, pages 109--125, 2005. Google ScholarDigital Library
- J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20: 37--46, 1960.Google ScholarCross Ref
- G. Cormode, S. Muthukrishnan, and W. Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In Procs of the 23rd Int Conf on Data Engineering, pages 1036--1045, 2007.Google ScholarCross Ref
- S. Datta, K. Bhaduri, C. Giannella, R. Wolff, and H. Kargupta. Distributed data mining in peer-to-peer networks. IEEE Internet Computing, 10(4): 18--26, 2006. Google ScholarDigital Library
- P. Domingos and G. Hulten. A general method for scaling up machine learning algorithms and its application to clustering. In Procs of the 18th Int Conf on Machine Learning, pages 106--113, 2001. Google ScholarDigital Library
- M. M. Gaber and P. S. Yu. A framework for resource-aware knowledge discovery in data streams: a holistic approach with its application to clustering. In Procs of the ACM Symposium on Applied Computing, pages 649--656, 2006. Google ScholarDigital Library
- J. Gama and P. P. Rodrigues. Data stream processing. In Learning from Data Streams - Processing Techniques in Sensor Networks, chapter 3, pages 25--39. Springer Verlag, 2007.Google Scholar
- J. Gama, R. Sebastião, and P. P. Rodrigues. Issues in evaluation of stream learning algorithms. In Procs of the 15th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, pages 329--337, Paris, France, 2009. ACM Press. Google ScholarDigital Library
- T. F. Gonzalez. Clustering to minimize the maximum inter-cluster distance. Theoretical Computer Science, 38: 293--306, 1985.Google ScholarCross Ref
- M. Halkidi, Y. Batistakis, and M. Varzirgiannis. On clustering validation techniques. Journal of Intelligent Information Systems, 17(2--3): 107--145, 2001. Google ScholarDigital Library
- A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall, 1988. Google ScholarDigital Library
- H. Kargupta, W. Huang, K. Sivakumar, and E. L. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems, 3(4): 422--448, 2001. Google ScholarDigital Library
- M. Klusch, S. Lodi, and G. Moro. Distributed clustering based on sampling local density estimates. In Procs of the International Joint Conference on Artificial Intelligence, pages 485--490, 2003. Google ScholarDigital Library
- S. Muthukrishnan. Data Streams: Algorithms and Applications. Now Publishers Inc., New York, NY, 2005.Google Scholar
- P. P. Rodrigues and J. Gama. Clustering techniques in sensor networks. In Learning from Data Streams, chapter 9, pages 125--142. Springer Verlag, 2007.Google ScholarCross Ref
- P. P. Rodrigues and J. Gama. A system for analysis and prediction of electricity load streams. Intelligent Data Analysis, 13(3): 477--496, June 2009. Google ScholarDigital Library
- P. P. Rodrigues, J. Gama, and L. Lopes. Requirements for clustering streaming sensors. In Knowledge Discovery from Sensor Data, chapter 4, pages 33--51. CRC Press, 2008.Google Scholar
- P. P. Rodrigues, J. Gama, and L. Lopes. Knowledge discovery for sensor network comprehension. In Intelligent Techniques for Warehousing and Mining Sensor Network Data, chapter 6, pages 118--135. IGI Global, 2010.Google ScholarCross Ref
- P. P. Rodrigues, J. Gama, and J. P. Pedroso. Hierarchical clustering of time-series data streams. IEEE Transactions on Knowledge and Data Engineering, 20(5): 615--627, May 2008. Google ScholarDigital Library
- D. M. Sherrill, M. L. Moy, J. J. Reilly, and P. Bonato. Using hierarchical clustering methods to classify motor activities of copd patients from wearable sensor data. Journal of Neuroengineering and Rehabilitation, 2(16), 2005.Google Scholar
- J.-Z. Sun and J. Sauvola. Towards advanced modeling techniques for wireless sensor networks. In Procs of the 1st Int Symp on Pervasive Computing and Applications, pages 133--138. IEEE Press, 2006.Google ScholarCross Ref
- M. J. Warrens. On the equivalence of cohen's kappa and the hubert-arabie adjusted rand index. Journal of Classification, 25(2): 177--183, November 2008. Google ScholarDigital Library
- J. Yin and M. M. Gaber. Clustering distributed time series in sensor networks. Procs of the 8th IEEE Int Conf on Data Mining, pages 678--687, 2008. Google ScholarDigital Library
- K. Zhang, K. Torkkola, H. Li, C. Schreiner, H. Zhang, M. Gardner, and Z. Zhao. A context aware automatic traffic notification system for cell phones. In Procs of the 27th Int Conf on Distributed Computing Systems Workshops, pages 48--50. IEEE Press, 2007. Google ScholarDigital Library
Index Terms
- L2GClust: local-to-global clustering of stream sources
Recommendations
Local Algorithms for Sensor Selection
PE-WASUN'18: Proceedings of the 15th ACM International Symposium on Performance Evaluation of Wireless Ad Hoc, Sensor, & Ubiquitous NetworksWe study local algorithms for sensor selection, in which each sensor in a network uses information from nearby sensors alone to decide if it should be selected to predict the data of non-selected sensors. Our goal is to show how the prediction quality ...
A Local Clustering Algorithm for Massive Graphs and Its Application to Nearly Linear Time Graph Partitioning
We study the design of local algorithms for massive graphs. A local graph algorithm is one that finds a solution containing or near a given vertex without looking at the whole graph. We present a local clustering algorithm. Our algorithm finds a good cluster---...
Limits of local algorithms over sparse random graphs
ITCS '14: Proceedings of the 5th conference on Innovations in theoretical computer scienceLocal algorithms on graphs are algorithms that run in parallel on the nodes of a graph to compute some global structural feature of the graph. Such algorithms use only local information available at nodes to determine local aspects of the global ...
Comments