skip to main content
10.1145/1982185.1982405acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

L2GClust: local-to-global clustering of stream sources

Published:21 March 2011Publication History

ABSTRACT

In ubiquitous streaming data sources, such as sensor networks, clustering nodes by the data they produce is an important problem that gives insights on the phenomenon being monitored by such networks. However, if these techniques require data to be gathered centrally, communication and storage requirements are often unbounded. The goal of this paper is to assess the feasibility of computing local clustering at each node, using only neighbors' centroids, as an approximation of the global clustering computed by a centralized process. A local algorithm is proposed to perform clustering of sensors based on the moving average of each node's data over time: the moving average of each node is approximated using memory-less fading average; clustering is based on the furthest point algorithm applied to the centroids computed by the node's direct neighbors. The algorithm was evaluated on a state-of-the-art sensor network simulator, measuring the agreement between local and global clustering. Experimental work on synthetic data with spherical Gaussian clusters is consistently analyzed for different network size, number of clusters and cluster overlapping. Results show a high level of agreement between each node's clustering definitions and the global clustering definition, with special emphasis on separability agreement. Overall, local approaches are able to keep a good approximation of the global clustering, improving privacy among nodes, and decreasing communication and computation load in the network. Hence, the basic requirements for distributed clustering of streaming data sensors recommend that clustering on these settings should be performed locally.

References

  1. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Procs of the 29th Int Conf on Very Large Data Bases, pages 81--92. Morgan Kaufmann, September 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. I. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. A Survey on Sensor Networks. IEEE Communications Magazine, 40(8): 102--114, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Baldwin, S. Kohli, E. A. Lee, X. Liu, and Y. Zhao. Modelling of Sensor Nets in Ptolemy II. In Procs of the 3rd Int Symp on Information Processing in Sensor Networks, pages 359--368. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Bandyopadhyay, C. Giannella, U. Maulik, H. Kargupta, K. Liu, and S. Datta. Clustering distributed data streams in peer-to-peer environments. Information Sciences, 176(14): 1952--1985, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Barbará. Requirements for clustering data streams. SIGKDD Explorations, 3(2): 23--27, January 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Beringer and E. Hüllermeier. Online clustering of parallel data streams. Data and Knowledge Engineering, 58(2): 180--204, August 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Chan, M. Luk, and A. Perrig. Using clustering information for sensor network localization. In Procs of the 1st IEEE International Conference on Distributed Computing in Sensor Systems, pages 109--125, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20: 37--46, 1960.Google ScholarGoogle ScholarCross RefCross Ref
  9. G. Cormode, S. Muthukrishnan, and W. Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In Procs of the 23rd Int Conf on Data Engineering, pages 1036--1045, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Datta, K. Bhaduri, C. Giannella, R. Wolff, and H. Kargupta. Distributed data mining in peer-to-peer networks. IEEE Internet Computing, 10(4): 18--26, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Domingos and G. Hulten. A general method for scaling up machine learning algorithms and its application to clustering. In Procs of the 18th Int Conf on Machine Learning, pages 106--113, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. M. Gaber and P. S. Yu. A framework for resource-aware knowledge discovery in data streams: a holistic approach with its application to clustering. In Procs of the ACM Symposium on Applied Computing, pages 649--656, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Gama and P. P. Rodrigues. Data stream processing. In Learning from Data Streams - Processing Techniques in Sensor Networks, chapter 3, pages 25--39. Springer Verlag, 2007.Google ScholarGoogle Scholar
  14. J. Gama, R. Sebastião, and P. P. Rodrigues. Issues in evaluation of stream learning algorithms. In Procs of the 15th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, pages 329--337, Paris, France, 2009. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. F. Gonzalez. Clustering to minimize the maximum inter-cluster distance. Theoretical Computer Science, 38: 293--306, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. Halkidi, Y. Batistakis, and M. Varzirgiannis. On clustering validation techniques. Journal of Intelligent Information Systems, 17(2--3): 107--145, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Kargupta, W. Huang, K. Sivakumar, and E. L. Johnson. Distributed clustering using collective principal component analysis. Knowledge and Information Systems, 3(4): 422--448, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Klusch, S. Lodi, and G. Moro. Distributed clustering based on sampling local density estimates. In Procs of the International Joint Conference on Artificial Intelligence, pages 485--490, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Muthukrishnan. Data Streams: Algorithms and Applications. Now Publishers Inc., New York, NY, 2005.Google ScholarGoogle Scholar
  21. P. P. Rodrigues and J. Gama. Clustering techniques in sensor networks. In Learning from Data Streams, chapter 9, pages 125--142. Springer Verlag, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  22. P. P. Rodrigues and J. Gama. A system for analysis and prediction of electricity load streams. Intelligent Data Analysis, 13(3): 477--496, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. P. Rodrigues, J. Gama, and L. Lopes. Requirements for clustering streaming sensors. In Knowledge Discovery from Sensor Data, chapter 4, pages 33--51. CRC Press, 2008.Google ScholarGoogle Scholar
  24. P. P. Rodrigues, J. Gama, and L. Lopes. Knowledge discovery for sensor network comprehension. In Intelligent Techniques for Warehousing and Mining Sensor Network Data, chapter 6, pages 118--135. IGI Global, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  25. P. P. Rodrigues, J. Gama, and J. P. Pedroso. Hierarchical clustering of time-series data streams. IEEE Transactions on Knowledge and Data Engineering, 20(5): 615--627, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. M. Sherrill, M. L. Moy, J. J. Reilly, and P. Bonato. Using hierarchical clustering methods to classify motor activities of copd patients from wearable sensor data. Journal of Neuroengineering and Rehabilitation, 2(16), 2005.Google ScholarGoogle Scholar
  27. J.-Z. Sun and J. Sauvola. Towards advanced modeling techniques for wireless sensor networks. In Procs of the 1st Int Symp on Pervasive Computing and Applications, pages 133--138. IEEE Press, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  28. M. J. Warrens. On the equivalence of cohen's kappa and the hubert-arabie adjusted rand index. Journal of Classification, 25(2): 177--183, November 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Yin and M. M. Gaber. Clustering distributed time series in sensor networks. Procs of the 8th IEEE Int Conf on Data Mining, pages 678--687, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Zhang, K. Torkkola, H. Li, C. Schreiner, H. Zhang, M. Gardner, and Z. Zhao. A context aware automatic traffic notification system for cell phones. In Procs of the 27th Int Conf on Distributed Computing Systems Workshops, pages 48--50. IEEE Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. L2GClust: local-to-global clustering of stream sources

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SAC '11: Proceedings of the 2011 ACM Symposium on Applied Computing
        March 2011
        1868 pages
        ISBN:9781450301138
        DOI:10.1145/1982185

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 March 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,650of6,669submissions,25%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader