Abstract
Clustering a graph means identifying internally dense subgraphs that are only sparsely interconnected. Formalizations of this notion lead to measures that quantify the quality of a clustering and to algorithms that actually find clusterings. Since, most generally, corresponding optimization problems are hard, heuristic clustering algorithms are used in practice, or other approaches that are not based on an objective function. In this work, we conduct a comprehensive experimental evaluation of the qualitative behavior of greedy bottom-up heuristics driven by cut-based objectives and constrained by intracluster density, using both real-world data and artificial instances. Our study documents that a greedy strategy based on local movement is superior to one based on merging. We further reveal that the former approach generally outperforms alternative setups and reference algorithms from the literature in terms of its own objective, while a modularity-based algorithm competes surprisingly well. Finally, we exhibit which combinations of cut-based inter- and intracluster measures are suitable for identifying a hidden reference clustering in synthetic random graphs and discuss the skewness of the resulting cluster size distributions. Our results serve as a guideline to the usage of bicriterial, cut-based measures for graph clusterings.
- Alex Arenas. 2009. Network Data Sets. Retrieved from http://deim.urv.cat/~aarenas/data/welcome.htm.Google Scholar
- Alex Arenas, Leon Danon, Albert Díaz-Guilera, Pablo Gleiser, and Roger Guimerà. 2004. Community analysis in social networks. European Physical Journal B 38, 2 (2004), 373--380.Google ScholarCross Ref
- Pavel Berkhin. 2006. A survey of clustering data mining techniques. In Grouping Multidimensional Data: Recent Advances in Clustering, Jacob Kogan, Charles Nicholas, and Marc Teboulle (Eds.). Springer, 25--71.Google Scholar
- Charles-Edmond Bichot and Patrick Siarry (Eds.). 2011. Graph Partitioning. Wiley.Google Scholar
- Vincent Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10.Google ScholarCross Ref
- Ulrik Brandes, Marco Gaertler, and Dorothea Wagner. 2003. Experiments on graph clustering algorithms. In Proceedings of the 11th Annual European Symposium on Algorithms (ESA'03), Lecture Notes in Computer Science, Vol. 2832. Springer, 568--579.Google ScholarCross Ref
- Ulrik Brandes, Marco Gaertler, and Dorothea Wagner. 2007. Engineering graph clustering: Models and experimental evaluation. ACM Journal of Experimental Algorithmics 12, 1.1 (2007), 1--26. Google ScholarDigital Library
- Aaron Clauset, Mark E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical Review E 70, 066111 (2004).Google ScholarCross Ref
- Daniel Delling, Marco Gaertler, Robert Görke, and Dorothea Wagner. 2008. Engineering comparators for graph clusterings. In Proceedings of the 4th International Conference on Algorithmic Aspects in Information and Management (AAIM'08), Lecture Notes in Computer Science, 5034, Springer, 131--142. Google ScholarDigital Library
- Gary William Flake, Robert E. Tarjan, and Kostas Tsioutsiouliklis. 2004. Graph clustering and minimum cut trees. Internet Mathematics 1, 4 (2004), 385--408.Google ScholarCross Ref
- Santo Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3--5 (2010), 75--174.Google ScholarCross Ref
- Santo Fortunato and Marc Barthélemy. 2007. Resolution limit in community detection. Proceedings of the National Academy of Science of the United States of America 104, 1 (2007), 36--41.Google ScholarCross Ref
- Corrado Gini. 1921. Measurement of inequality of incomes. Economic Journal 31, 121 (March 1921), 124--126.Google ScholarCross Ref
- Robert Görke, Andrea Schumm, and Dorothea Wagner. 2011. Density-constrained graph clustering. In Algorithms and Data Structures, 12th International Symposium (WADS'11), Frank Dehne, John Iacono, and Jörg-Rüdiger Sack (Eds.), Lecture Notes in Computer Science, Vol. 6844. Springer, 679--690. Google ScholarDigital Library
- Robert Görke, Andrea Schumm, and Dorothea Wagner. 2012. Experiments on density-constrained graph clustering. In Proceedings of the 14th Meeting on Algorithm Engineering and Experiments (ALENEX'12). SIAM, 1--15.Google ScholarDigital Library
- Robert Görke and Christian Staudt. 2009. A Generator for Dynamic Clustered Random Graphs. Technical Report. iti_wagner. Informatik, Uni Karlsruhe, TR 2009-7.Google Scholar
- Shlomo Hoory, Nathan Linial, and Avi Wigderson. 2006. Expander graphs and their applications. Bulletin of the American Mathematical Society 43 (2006), 439--561.Google ScholarCross Ref
- Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice Hall. Google ScholarDigital Library
- Ravi Kannan, Santosh Vempala, and Adrian Vetta. 2000. On clusterings—good, bad and spectral. In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science (FOCS'00). 367--378. Google ScholarDigital Library
- Andrea Lancichinetti and Santo Fortunato. 2009. Community detection algorithms: A comparative analysis. Physical Review E 80, 5 (November 2009).Google ScholarCross Ref
- David Lusseau, Karsten Schneider, Oliver Boisseau, Patti Haase, Elisabeth Slooten, and Steve Dawson. 2004. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology 54, 4 (September 2004), 396--405.Google Scholar
- Marc Newman. 2011. Network Data. Retrieved from http://www-personal.umich.edu/~mejn/netdata/.Google Scholar
- Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical Review E 69, 026113 (2004), 1--16.Google ScholarCross Ref
- Gergely Palla, Imre Derényi, Illés Farkas, and Tamás Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435 (2005), 814--818.Google ScholarCross Ref
- Randolf Rotta and Andreas Noack. 2011. Multilevel local search algorithms for modularity clustering. ACM Journal of Experimental Algorithmics 16 (July 2011), 2.3:2.1--2.3:2.27. Google ScholarDigital Library
- Stijn M. van Dongen. 2000. Graph Clustering by Flow Simulation. Ph.D. dissertation. University of Utrecht. http://micans.org/mcl/lit/.Google Scholar
- Stijn M. van Dongen. 2008. Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications 30, 1 (2008), 121--141. Google ScholarDigital Library
- Ken Wakita and Toshiyuki Tsurumi. 2007. Finding Community Structure in Mega-scale Social Networks. (February 2007). http://arxiv.org/abs/cs/0702048v1 Technical Report on arXiv.Google Scholar
- yWorks GmbH. 2008. yFiles for Java. Retrieved from http://www.yworks.com/en/products_yfiles_about.html.Google Scholar
Index Terms
- Experiments on Density-Constrained Graph Clustering
Recommendations
Engineering graph clustering: Models and experimental evaluation
A promising approach to graph clustering is based on the intuitive notion of intracluster density versus intercluster sparsity. As for the weighted case, clusters should accumulate lots of weight, in contrast to their connection to the remaining graph, ...
Experiments on density-constrained graph clustering
ALENEX '12: Proceedings of the Meeting on Algorithm Engineering & ExpermimentsClustering a graph means identifying internally dense subgraphs which are only sparsely interconnected. Formalizations of this notion lead to measures that quantify the quality of a clustering and to algorithms that actually find clusterings. Since, ...
Enhanced bisecting k-means clustering using intermediate cooperation
Bisecting k-means (BKM) is very attractive in many applications as document-retrieval/indexing and gene expression analysis problems. However, in some scenarios when a fraction of the dataset is left behind with no other way to re-cluster it again at ...
Comments