Abstract
In this chapter, we will provide a survey of clustering algorithms for graph data. We will discuss the different categories of clustering algorithms and recent efforts to design clustering methods for various kinds of graphical data. Clustering algorithms are typically of two types. The first type consists of node clustering algorithms in which we attempt to determine dense regions of the graph based on edge behavior. The second type consists of structural clustering algorithms, in which we attempt to cluster the different graphs based on overall structural behavior. We will also discuss the applicability of the approach to other kinds of data such as semi-structured data, and the utility of graph mining algorithms to such representations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. Abello, M. G. Resende, S. Sudarsky, Massive quasi-clique detection. Proceedings of the 5th Latin American Symposium on Theoretical Informatics (LATIN), pp. 598–612, 2002.
C. Aggarwal, N. Ta, J. Feng, J. Wang, M. J. Zaki. XProj: A Framework for Projected Structural Clustering of XML Documents, KDD Conference, 2007.
R. Agrawal, A. Borgida, H.V. Jagadish. Efficient Maintenance of transitive relationships in large data and knowledge bases, ACM SIGMOD Conference, 1989.
R. Ahuja, J. Orlin, T. Magnanti. Network Flows: Theory, Algorithms, and Applications, Prentice Hall, Englewood Cliffs, NJ, 1992.
A. Z. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher, Syntactic clustering of the web, WWW Conference, Computer Networks, 29(8–13):1157–1166, 1997.
D. Chakrabarti, Y. Zhan, C. Faloutsos R-MAT: A Recursive Model for Graph Mining. SDM Conference, 2004.
S.S. Chawathe. Comparing Hierachical data in external memory. Very Large Data Bases Conference, 1999.
J. Cheriyan, T. Hagerup, K. Melhorn An O(n 3)-time maximum-flow algorithm, SIAM Journal on Computing, Volume 25, Issue 6, pp. 1144–1170, 1996.
F. Chung,. Spectral graph theory. Washington: Conference Board of the Mathematical Sciences, 1997.
T. Dalamagas, T. Cheng, K. Winkel, T. Sellis. Clustering XML Documents Using Structural Summaries. Information Systems, Elsevier, January 2005.
J. Cheng, J. Xu Yu, X. Lin, H. Wang, and P. S. Yu, Fast Computing Reachability Labelings for Large Graphs with High Compression Rate, EDBT Conference, 2008.
J. Cheng, J. Xu Yu, X. Lin, H. Wang, and P. S. Yu, Fast Computation of Reachability Labelings in Large Graphs, EDBT Conference, 2006.
E. Cohen. Size-estimation framework with applications to transitive closure and reachability, Journal of Computer and System Sciences, v.55 n.3, p.441–453, Dec. 1997.
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick, Reachability and distance queries via 2-hop labels, ACM Symposium on Discrete Algorithms, 2002.
D. Cook, L. Holder, Mining Graph Data, John Wiley & Sons Inc, 2007.
E. W. Dijkstra, A note on two problems in connection with graphs. Numerische Mathematik, 1 (1959), S. 269–271.
M. Faloutsos, P. Faloutsos, C. Faloutsos, On Power Law Relationships of the Internet Topology. SIGCOMM Conference, 1999.
P.-O. Fjallstrom, Algorithms for Graph Partitioning: A Survey, Linkoping Electronic Articles in Computer and Information Science Vol 3, no 10, 1998.
G. Flake, R. Tarjan, M. Tsioutsiouliklis. Graph Clustering and Minimum Cut Trees, Internet Mathematics, 1(4), 385–408, 2003.
I. Freeman. Centrality in Social Networks, Social Networks, 1, 215–239, 1979.
M. S. Garey, D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness, W. H. Freeman, 1979.
D. Gibson, R. Kumar, A. Tomkins, Discovering Large Dense Subgraphs in Massive Graphs, VLDB Conference, 2005.
M. Girvan, M. Newman. Community Structure in Social and Biological Networks, Proceedings of the National Academy of Science, 99, 7821–7826, 2002.
A. Jain and R. Dubes, Algorithms for Clustering Data, Prentice Hall, New Jersey, 1998.
H. Kashima, K. Tsuda, A. Inokuchi. Marginalized Kernels between Labeled Graphs, ICML, 2003.
B.W. Kernighan, S. Lin. An efficient heuristic procedure for partitioning graphs, Bell System Tech. Journal, vol. 49, Feb. 1970, pp. 291–307.
T. Kudo, E. Maeda, Y. Matsumoto. An Application of Boosting to Graph Classification, NIPS Conf. 2004.
M. Lee, W. Hsu, L. Yang, X. Yang. XClust: Clustering XML Schemas for Effective Integration. ACM Conference on Information and Knowledge Management, 2002.
W. Lian, D.W. Cheung, N. Mamoulis, S. Yiu. An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, IEEE Transactions on Knowledge and Data Engineering, Vol 16, No. 1, 2004.
R. Kumar, P Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal. The Web as a Graph. ACM PODS Conference, 2000.
M. Matsuda et al. Classifying molecular sequences using a linkage graph with their pairwise similarities. Theoretical Computer Science, 210(2):305–325, 1999.
J. Pei, D. Jiang, A. Zhang. On Mining Cross-Graph Quasi-Cliques, ACM KDD Conference, 2005.
J. Pei, D. Jiang, A. Zhang. Mining Cross-Graph Quasi-Cliques in Gene Expression and Protein Interaction Data, ICDE Conference, 2005.
S. Raghavan, H. Garcia-Molina. Representing web graphs. ICDE Conference, pages 405–416, 2003.
M. Rattigan, M. Maier, D. Jensen: Graph Clustering with Network Sructure Indices. ICML, 2007.
M. Rattigan, M. Maier, D. Jensen: Using structure indices for approximation of network properties. ACM KDD Conference, 2006.
A. A. Tsay, W. S. Lovejoy, David R. Karger, Random Sampling in Cut, Flow, and Network Design Problems, Mathematics of Operations Research, 24(2):383–413, 1999.
H. Wang, H. He, J. Yang, J. Xu-Yu, P. Yu. Dual Labeling: Answering Graph Reachability Queries in Constant Time. ICDE Conference, 2006.
X. Yan, J. Han. CloseGraph: Mining Closed Frequent Graph Patterns, ACM KDD Conference, 2003.
X. Yan, H. Cheng, J. Han, and P. S. Yu, Mining Significant Graph Patterns by Scalable Leap Search, SIGMOD Conference, 2008.
X. Yan, P. S. Yu, and J. Han, Graph Indexing: A Frequent Structure-based Approach, SIGMOD Conference, 2004.
M. J. Zaki, C. C. Aggarwal. XRules: An Effective Structural Classifier for XML Data, KDD Conference, 2003.
Z. Zeng, J. Wang, L. Zhou, G. Karypis, Out-of-core Coherent Closed Quasi-Clique Mining from Large Dense Graph Databases, ACM Transactions on Database Systems, Vol 31(2), 2007.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag US
About this chapter
Cite this chapter
Aggarwal, C.C., Wang, H. (2010). A Survey of Clustering Algorithms for Graph Data. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_9
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6045-0_9
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)