skip to main content
research-article

Experiments on Density-Constrained Graph Clustering

Published:07 January 2015Publication History
Skip Abstract Section

Abstract

Clustering a graph means identifying internally dense subgraphs that are only sparsely interconnected. Formalizations of this notion lead to measures that quantify the quality of a clustering and to algorithms that actually find clusterings. Since, most generally, corresponding optimization problems are hard, heuristic clustering algorithms are used in practice, or other approaches that are not based on an objective function. In this work, we conduct a comprehensive experimental evaluation of the qualitative behavior of greedy bottom-up heuristics driven by cut-based objectives and constrained by intracluster density, using both real-world data and artificial instances. Our study documents that a greedy strategy based on local movement is superior to one based on merging. We further reveal that the former approach generally outperforms alternative setups and reference algorithms from the literature in terms of its own objective, while a modularity-based algorithm competes surprisingly well. Finally, we exhibit which combinations of cut-based inter- and intracluster measures are suitable for identifying a hidden reference clustering in synthetic random graphs and discuss the skewness of the resulting cluster size distributions. Our results serve as a guideline to the usage of bicriterial, cut-based measures for graph clusterings.

References

  1. Alex Arenas. 2009. Network Data Sets. Retrieved from http://deim.urv.cat/~aarenas/data/welcome.htm.Google ScholarGoogle Scholar
  2. Alex Arenas, Leon Danon, Albert Díaz-Guilera, Pablo Gleiser, and Roger Guimerà. 2004. Community analysis in social networks. European Physical Journal B 38, 2 (2004), 373--380.Google ScholarGoogle ScholarCross RefCross Ref
  3. Pavel Berkhin. 2006. A survey of clustering data mining techniques. In Grouping Multidimensional Data: Recent Advances in Clustering, Jacob Kogan, Charles Nicholas, and Marc Teboulle (Eds.). Springer, 25--71.Google ScholarGoogle Scholar
  4. Charles-Edmond Bichot and Patrick Siarry (Eds.). 2011. Graph Partitioning. Wiley.Google ScholarGoogle Scholar
  5. Vincent Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10.Google ScholarGoogle ScholarCross RefCross Ref
  6. Ulrik Brandes, Marco Gaertler, and Dorothea Wagner. 2003. Experiments on graph clustering algorithms. In Proceedings of the 11th Annual European Symposium on Algorithms (ESA'03), Lecture Notes in Computer Science, Vol. 2832. Springer, 568--579.Google ScholarGoogle ScholarCross RefCross Ref
  7. Ulrik Brandes, Marco Gaertler, and Dorothea Wagner. 2007. Engineering graph clustering: Models and experimental evaluation. ACM Journal of Experimental Algorithmics 12, 1.1 (2007), 1--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Aaron Clauset, Mark E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical Review E 70, 066111 (2004).Google ScholarGoogle ScholarCross RefCross Ref
  9. Daniel Delling, Marco Gaertler, Robert Görke, and Dorothea Wagner. 2008. Engineering comparators for graph clusterings. In Proceedings of the 4th International Conference on Algorithmic Aspects in Information and Management (AAIM'08), Lecture Notes in Computer Science, 5034, Springer, 131--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gary William Flake, Robert E. Tarjan, and Kostas Tsioutsiouliklis. 2004. Graph clustering and minimum cut trees. Internet Mathematics 1, 4 (2004), 385--408.Google ScholarGoogle ScholarCross RefCross Ref
  11. Santo Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3--5 (2010), 75--174.Google ScholarGoogle ScholarCross RefCross Ref
  12. Santo Fortunato and Marc Barthélemy. 2007. Resolution limit in community detection. Proceedings of the National Academy of Science of the United States of America 104, 1 (2007), 36--41.Google ScholarGoogle ScholarCross RefCross Ref
  13. Corrado Gini. 1921. Measurement of inequality of incomes. Economic Journal 31, 121 (March 1921), 124--126.Google ScholarGoogle ScholarCross RefCross Ref
  14. Robert Görke, Andrea Schumm, and Dorothea Wagner. 2011. Density-constrained graph clustering. In Algorithms and Data Structures, 12th International Symposium (WADS'11), Frank Dehne, John Iacono, and Jörg-Rüdiger Sack (Eds.), Lecture Notes in Computer Science, Vol. 6844. Springer, 679--690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Robert Görke, Andrea Schumm, and Dorothea Wagner. 2012. Experiments on density-constrained graph clustering. In Proceedings of the 14th Meeting on Algorithm Engineering and Experiments (ALENEX'12). SIAM, 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Robert Görke and Christian Staudt. 2009. A Generator for Dynamic Clustered Random Graphs. Technical Report. iti_wagner. Informatik, Uni Karlsruhe, TR 2009-7.Google ScholarGoogle Scholar
  17. Shlomo Hoory, Nathan Linial, and Avi Wigderson. 2006. Expander graphs and their applications. Bulletin of the American Mathematical Society 43 (2006), 439--561.Google ScholarGoogle ScholarCross RefCross Ref
  18. Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ravi Kannan, Santosh Vempala, and Adrian Vetta. 2000. On clusterings—good, bad and spectral. In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science (FOCS'00). 367--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Andrea Lancichinetti and Santo Fortunato. 2009. Community detection algorithms: A comparative analysis. Physical Review E 80, 5 (November 2009).Google ScholarGoogle ScholarCross RefCross Ref
  21. David Lusseau, Karsten Schneider, Oliver Boisseau, Patti Haase, Elisabeth Slooten, and Steve Dawson. 2004. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology 54, 4 (September 2004), 396--405.Google ScholarGoogle Scholar
  22. Marc Newman. 2011. Network Data. Retrieved from http://www-personal.umich.edu/~mejn/netdata/.Google ScholarGoogle Scholar
  23. Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical Review E 69, 026113 (2004), 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  24. Gergely Palla, Imre Derényi, Illés Farkas, and Tamás Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435 (2005), 814--818.Google ScholarGoogle ScholarCross RefCross Ref
  25. Randolf Rotta and Andreas Noack. 2011. Multilevel local search algorithms for modularity clustering. ACM Journal of Experimental Algorithmics 16 (July 2011), 2.3:2.1--2.3:2.27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Stijn M. van Dongen. 2000. Graph Clustering by Flow Simulation. Ph.D. dissertation. University of Utrecht. http://micans.org/mcl/lit/.Google ScholarGoogle Scholar
  27. Stijn M. van Dongen. 2008. Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications 30, 1 (2008), 121--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ken Wakita and Toshiyuki Tsurumi. 2007. Finding Community Structure in Mega-scale Social Networks. (February 2007). http://arxiv.org/abs/cs/0702048v1 Technical Report on arXiv.Google ScholarGoogle Scholar
  29. yWorks GmbH. 2008. yFiles for Java. Retrieved from http://www.yworks.com/en/products_yfiles_about.html.Google ScholarGoogle Scholar

Index Terms

  1. Experiments on Density-Constrained Graph Clustering

            Recommendations

            Reviews

            Hui Liu

            Graph clustering is the task of identifying dense sub-graphs of a given graph such that these sub-graphs are sparsely interconnected. In this paper, the authors conduct an experimental evaluation of greedy graph clustering algorithms. First, the problem (density-constrained clustering) and its background are briefly introduced. Second, two greedy algorithms are presented. The first one is the greedy merge (GM) algorithm, which greedily merges two clusters. The second one is the greedy vertex moving (GVM) algorithm, which allows vertices to move to another cluster to improve objective function. Then several measures are applied to compare GM and GVM algorithms, including intercluster density, balancedness, cluster size distribution, and effectiveness of different objective functions. The GVM algorithm is also compared with several reference algorithms. In the end, the GVM and multilevel modularity (ML-MOD) algorithms are employed to compare different objective functions. Graph clustering is very important to the modern Internet, including uses such as online shopping, social network analysis, and link prediction; it is a popular research area. Through various experiments, the authors show advantages and disadvantages of different clustering algorithms. The results of this paper are interesting and practical, which make them useful to software developers and algorithm researchers; they can serve as a guideline. Online Computing Reviews Service

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Journal of Experimental Algorithmics
              ACM Journal of Experimental Algorithmics  Volume 19, Issue
              2014
              402 pages
              ISSN:1084-6654
              EISSN:1084-6654
              DOI:10.1145/2627368
              Issue’s Table of Contents

              Copyright © 2015 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 7 January 2015
              • Accepted: 1 June 2014
              • Revised: 1 March 2014
              • Received: 1 April 2012
              Published in jea Volume 19, Issue

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader