A Survey of Clustering Algorithms for Graph Data

Aggarwal, Charu C.; Wang, Haixun

doi:10.1007/978-1-4419-6045-0_9

Charu C. Aggarwal³ &
Haixun Wang⁴

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

8127 Accesses
43 Citations

Abstract

In this chapter, we will provide a survey of clustering algorithms for graph data. We will discuss the different categories of clustering algorithms and recent efforts to design clustering methods for various kinds of graphical data. Clustering algorithms are typically of two types. The first type consists of node clustering algorithms in which we attempt to determine dense regions of the graph based on edge behavior. The second type consists of structural clustering algorithms, in which we attempt to cluster the different graphs based on overall structural behavior. We will also discuss the applicability of the approach to other kinds of data such as semi-structured data, and the utility of graph mining algorithms to such representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Abello, M. G. Resende, S. Sudarsky, Massive quasi-clique detection. Proceedings of the 5th Latin American Symposium on Theoretical Informatics (LATIN), pp. 598–612, 2002.
Google Scholar
C. Aggarwal, N. Ta, J. Feng, J. Wang, M. J. Zaki. XProj: A Framework for Projected Structural Clustering of XML Documents, KDD Conference, 2007.
Google Scholar
R. Agrawal, A. Borgida, H.V. Jagadish. Efficient Maintenance of transitive relationships in large data and knowledge bases, ACM SIGMOD Conference, 1989.
Google Scholar
R. Ahuja, J. Orlin, T. Magnanti. Network Flows: Theory, Algorithms, and Applications, Prentice Hall, Englewood Cliffs, NJ, 1992.
Google Scholar
A. Z. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher, Syntactic clustering of the web, WWW Conference, Computer Networks, 29(8–13):1157–1166, 1997.
Google Scholar
D. Chakrabarti, Y. Zhan, C. Faloutsos R-MAT: A Recursive Model for Graph Mining. SDM Conference, 2004.
Google Scholar
S.S. Chawathe. Comparing Hierachical data in external memory. Very Large Data Bases Conference, 1999.
Google Scholar
J. Cheriyan, T. Hagerup, K. Melhorn An O(n ³)-time maximum-flow algorithm, SIAM Journal on Computing, Volume 25, Issue 6, pp. 1144–1170, 1996.
Article MATH MathSciNet Google Scholar
F. Chung,. Spectral graph theory. Washington: Conference Board of the Mathematical Sciences, 1997.
Google Scholar
T. Dalamagas, T. Cheng, K. Winkel, T. Sellis. Clustering XML Documents Using Structural Summaries. Information Systems, Elsevier, January 2005.
Google Scholar
J. Cheng, J. Xu Yu, X. Lin, H. Wang, and P. S. Yu, Fast Computing Reachability Labelings for Large Graphs with High Compression Rate, EDBT Conference, 2008.
Google Scholar
J. Cheng, J. Xu Yu, X. Lin, H. Wang, and P. S. Yu, Fast Computation of Reachability Labelings in Large Graphs, EDBT Conference, 2006.
Google Scholar
E. Cohen. Size-estimation framework with applications to transitive closure and reachability, Journal of Computer and System Sciences, v.55 n.3, p.441–453, Dec. 1997.
Article MATH MathSciNet Google Scholar
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick, Reachability and distance queries via 2-hop labels, ACM Symposium on Discrete Algorithms, 2002.
Google Scholar
D. Cook, L. Holder, Mining Graph Data, John Wiley & Sons Inc, 2007.
Google Scholar
E. W. Dijkstra, A note on two problems in connection with graphs. Numerische Mathematik, 1 (1959), S. 269–271.
Article MATH MathSciNet Google Scholar
M. Faloutsos, P. Faloutsos, C. Faloutsos, On Power Law Relationships of the Internet Topology. SIGCOMM Conference, 1999.
Google Scholar
P.-O. Fjallstrom, Algorithms for Graph Partitioning: A Survey, Linkoping Electronic Articles in Computer and Information Science Vol 3, no 10, 1998.
Google Scholar
G. Flake, R. Tarjan, M. Tsioutsiouliklis. Graph Clustering and Minimum Cut Trees, Internet Mathematics, 1(4), 385–408, 2003.
MathSciNet Google Scholar
I. Freeman. Centrality in Social Networks, Social Networks, 1, 215–239, 1979.
Article Google Scholar
M. S. Garey, D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness, W. H. Freeman, 1979.
Google Scholar
D. Gibson, R. Kumar, A. Tomkins, Discovering Large Dense Subgraphs in Massive Graphs, VLDB Conference, 2005.
Google Scholar
M. Girvan, M. Newman. Community Structure in Social and Biological Networks, Proceedings of the National Academy of Science, 99, 7821–7826, 2002.
Article MATH MathSciNet Google Scholar
A. Jain and R. Dubes, Algorithms for Clustering Data, Prentice Hall, New Jersey, 1998.
Google Scholar
H. Kashima, K. Tsuda, A. Inokuchi. Marginalized Kernels between Labeled Graphs, ICML, 2003.
Google Scholar
B.W. Kernighan, S. Lin. An efficient heuristic procedure for partitioning graphs, Bell System Tech. Journal, vol. 49, Feb. 1970, pp. 291–307.
Google Scholar
T. Kudo, E. Maeda, Y. Matsumoto. An Application of Boosting to Graph Classification, NIPS Conf. 2004.
Google Scholar
M. Lee, W. Hsu, L. Yang, X. Yang. XClust: Clustering XML Schemas for Effective Integration. ACM Conference on Information and Knowledge Management, 2002.
Google Scholar
W. Lian, D.W. Cheung, N. Mamoulis, S. Yiu. An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, IEEE Transactions on Knowledge and Data Engineering, Vol 16, No. 1, 2004.
Google Scholar
R. Kumar, P Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal. The Web as a Graph. ACM PODS Conference, 2000.
Google Scholar
M. Matsuda et al. Classifying molecular sequences using a linkage graph with their pairwise similarities. Theoretical Computer Science, 210(2):305–325, 1999.
Article MATH MathSciNet Google Scholar
J. Pei, D. Jiang, A. Zhang. On Mining Cross-Graph Quasi-Cliques, ACM KDD Conference, 2005.
Google Scholar
J. Pei, D. Jiang, A. Zhang. Mining Cross-Graph Quasi-Cliques in Gene Expression and Protein Interaction Data, ICDE Conference, 2005.
Google Scholar
S. Raghavan, H. Garcia-Molina. Representing web graphs. ICDE Conference, pages 405–416, 2003.
Google Scholar
M. Rattigan, M. Maier, D. Jensen: Graph Clustering with Network Sructure Indices. ICML, 2007.
Google Scholar
M. Rattigan, M. Maier, D. Jensen: Using structure indices for approximation of network properties. ACM KDD Conference, 2006.
Google Scholar
A. A. Tsay, W. S. Lovejoy, David R. Karger, Random Sampling in Cut, Flow, and Network Design Problems, Mathematics of Operations Research, 24(2):383–413, 1999.
Article MathSciNet Google Scholar
H. Wang, H. He, J. Yang, J. Xu-Yu, P. Yu. Dual Labeling: Answering Graph Reachability Queries in Constant Time. ICDE Conference, 2006.
Google Scholar
X. Yan, J. Han. CloseGraph: Mining Closed Frequent Graph Patterns, ACM KDD Conference, 2003.
Google Scholar
X. Yan, H. Cheng, J. Han, and P. S. Yu, Mining Significant Graph Patterns by Scalable Leap Search, SIGMOD Conference, 2008.
Google Scholar
X. Yan, P. S. Yu, and J. Han, Graph Indexing: A Frequent Structure-based Approach, SIGMOD Conference, 2004.
Google Scholar
M. J. Zaki, C. C. Aggarwal. XRules: An Effective Structural Classifier for XML Data, KDD Conference, 2003.
Google Scholar
Z. Zeng, J. Wang, L. Zhou, G. Karypis, Out-of-core Coherent Closed Quasi-Clique Mining from Large Dense Graph Databases, ACM Transactions on Database Systems, Vol 31(2), 2007.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Hawthorne, NY, 10532, USA
Charu C. Aggarwal
Microsoft Research Asia, Beijing, China, 100190
Haixun Wang

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Haixun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charu C. Aggarwal .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, U.S.A.
Charu C. Aggarwal
Microsoft Research Asia, Zhichun Road 49, Beijing, 100080, China, People's Republic
Haixun Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C., Wang, H. (2010). A Survey of Clustering Algorithms for Graph Data. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_9

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6045-0_9
Published: 18 January 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics