Skip to main content

A Survey of Clustering Algorithms for Graph Data

  • Chapter
  • First Online:
Managing and Mining Graph Data

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

Abstract

In this chapter, we will provide a survey of clustering algorithms for graph data. We will discuss the different categories of clustering algorithms and recent efforts to design clustering methods for various kinds of graphical data. Clustering algorithms are typically of two types. The first type consists of node clustering algorithms in which we attempt to determine dense regions of the graph based on edge behavior. The second type consists of structural clustering algorithms, in which we attempt to cluster the different graphs based on overall structural behavior. We will also discuss the applicability of the approach to other kinds of data such as semi-structured data, and the utility of graph mining algorithms to such representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Abello, M. G. Resende, S. Sudarsky, Massive quasi-clique detection. Proceedings of the 5th Latin American Symposium on Theoretical Informatics (LATIN), pp. 598–612, 2002.

    Google Scholar 

  2. C. Aggarwal, N. Ta, J. Feng, J. Wang, M. J. Zaki. XProj: A Framework for Projected Structural Clustering of XML Documents, KDD Conference, 2007.

    Google Scholar 

  3. R. Agrawal, A. Borgida, H.V. Jagadish. Efficient Maintenance of transitive relationships in large data and knowledge bases, ACM SIGMOD Conference, 1989.

    Google Scholar 

  4. R. Ahuja, J. Orlin, T. Magnanti. Network Flows: Theory, Algorithms, and Applications, Prentice Hall, Englewood Cliffs, NJ, 1992.

    Google Scholar 

  5. A. Z. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher, Syntactic clustering of the web, WWW Conference, Computer Networks, 29(8–13):1157–1166, 1997.

    Google Scholar 

  6. D. Chakrabarti, Y. Zhan, C. Faloutsos R-MAT: A Recursive Model for Graph Mining. SDM Conference, 2004.

    Google Scholar 

  7. S.S. Chawathe. Comparing Hierachical data in external memory. Very Large Data Bases Conference, 1999.

    Google Scholar 

  8. J. Cheriyan, T. Hagerup, K. Melhorn An O(n 3)-time maximum-flow algorithm, SIAM Journal on Computing, Volume 25, Issue 6, pp. 1144–1170, 1996.

    Article  MATH  MathSciNet  Google Scholar 

  9. F. Chung,. Spectral graph theory. Washington: Conference Board of the Mathematical Sciences, 1997.

    Google Scholar 

  10. T. Dalamagas, T. Cheng, K. Winkel, T. Sellis. Clustering XML Documents Using Structural Summaries. Information Systems, Elsevier, January 2005.

    Google Scholar 

  11. J. Cheng, J. Xu Yu, X. Lin, H. Wang, and P. S. Yu, Fast Computing Reachability Labelings for Large Graphs with High Compression Rate, EDBT Conference, 2008.

    Google Scholar 

  12. J. Cheng, J. Xu Yu, X. Lin, H. Wang, and P. S. Yu, Fast Computation of Reachability Labelings in Large Graphs, EDBT Conference, 2006.

    Google Scholar 

  13. E. Cohen. Size-estimation framework with applications to transitive closure and reachability, Journal of Computer and System Sciences, v.55 n.3, p.441–453, Dec. 1997.

    Article  MATH  MathSciNet  Google Scholar 

  14. E. Cohen, E. Halperin, H. Kaplan, and U. Zwick, Reachability and distance queries via 2-hop labels, ACM Symposium on Discrete Algorithms, 2002.

    Google Scholar 

  15. D. Cook, L. Holder, Mining Graph Data, John Wiley & Sons Inc, 2007.

    Google Scholar 

  16. E. W. Dijkstra, A note on two problems in connection with graphs. Numerische Mathematik, 1 (1959), S. 269–271.

    Article  MATH  MathSciNet  Google Scholar 

  17. M. Faloutsos, P. Faloutsos, C. Faloutsos, On Power Law Relationships of the Internet Topology. SIGCOMM Conference, 1999.

    Google Scholar 

  18. P.-O. Fjallstrom, Algorithms for Graph Partitioning: A Survey, Linkoping Electronic Articles in Computer and Information Science Vol 3, no 10, 1998.

    Google Scholar 

  19. G. Flake, R. Tarjan, M. Tsioutsiouliklis. Graph Clustering and Minimum Cut Trees, Internet Mathematics, 1(4), 385–408, 2003.

    MathSciNet  Google Scholar 

  20. I. Freeman. Centrality in Social Networks, Social Networks, 1, 215–239, 1979.

    Article  Google Scholar 

  21. M. S. Garey, D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness, W. H. Freeman, 1979.

    Google Scholar 

  22. D. Gibson, R. Kumar, A. Tomkins, Discovering Large Dense Subgraphs in Massive Graphs, VLDB Conference, 2005.

    Google Scholar 

  23. M. Girvan, M. Newman. Community Structure in Social and Biological Networks, Proceedings of the National Academy of Science, 99, 7821–7826, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  24. A. Jain and R. Dubes, Algorithms for Clustering Data, Prentice Hall, New Jersey, 1998.

    Google Scholar 

  25. H. Kashima, K. Tsuda, A. Inokuchi. Marginalized Kernels between Labeled Graphs, ICML, 2003.

    Google Scholar 

  26. B.W. Kernighan, S. Lin. An efficient heuristic procedure for partitioning graphs, Bell System Tech. Journal, vol. 49, Feb. 1970, pp. 291–307.

    Google Scholar 

  27. T. Kudo, E. Maeda, Y. Matsumoto. An Application of Boosting to Graph Classification, NIPS Conf. 2004.

    Google Scholar 

  28. M. Lee, W. Hsu, L. Yang, X. Yang. XClust: Clustering XML Schemas for Effective Integration. ACM Conference on Information and Knowledge Management, 2002.

    Google Scholar 

  29. W. Lian, D.W. Cheung, N. Mamoulis, S. Yiu. An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, IEEE Transactions on Knowledge and Data Engineering, Vol 16, No. 1, 2004.

    Google Scholar 

  30. R. Kumar, P Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal. The Web as a Graph. ACM PODS Conference, 2000.

    Google Scholar 

  31. M. Matsuda et al. Classifying molecular sequences using a linkage graph with their pairwise similarities. Theoretical Computer Science, 210(2):305–325, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  32. J. Pei, D. Jiang, A. Zhang. On Mining Cross-Graph Quasi-Cliques, ACM KDD Conference, 2005.

    Google Scholar 

  33. J. Pei, D. Jiang, A. Zhang. Mining Cross-Graph Quasi-Cliques in Gene Expression and Protein Interaction Data, ICDE Conference, 2005.

    Google Scholar 

  34. S. Raghavan, H. Garcia-Molina. Representing web graphs. ICDE Conference, pages 405–416, 2003.

    Google Scholar 

  35. M. Rattigan, M. Maier, D. Jensen: Graph Clustering with Network Sructure Indices. ICML, 2007.

    Google Scholar 

  36. M. Rattigan, M. Maier, D. Jensen: Using structure indices for approximation of network properties. ACM KDD Conference, 2006.

    Google Scholar 

  37. A. A. Tsay, W. S. Lovejoy, David R. Karger, Random Sampling in Cut, Flow, and Network Design Problems, Mathematics of Operations Research, 24(2):383–413, 1999.

    Article  MathSciNet  Google Scholar 

  38. H. Wang, H. He, J. Yang, J. Xu-Yu, P. Yu. Dual Labeling: Answering Graph Reachability Queries in Constant Time. ICDE Conference, 2006.

    Google Scholar 

  39. X. Yan, J. Han. CloseGraph: Mining Closed Frequent Graph Patterns, ACM KDD Conference, 2003.

    Google Scholar 

  40. X. Yan, H. Cheng, J. Han, and P. S. Yu, Mining Significant Graph Patterns by Scalable Leap Search, SIGMOD Conference, 2008.

    Google Scholar 

  41. X. Yan, P. S. Yu, and J. Han, Graph Indexing: A Frequent Structure-based Approach, SIGMOD Conference, 2004.

    Google Scholar 

  42. M. J. Zaki, C. C. Aggarwal. XRules: An Effective Structural Classifier for XML Data, KDD Conference, 2003.

    Google Scholar 

  43. Z. Zeng, J. Wang, L. Zhou, G. Karypis, Out-of-core Coherent Closed Quasi-Clique Mining from Large Dense Graph Databases, ACM Transactions on Database Systems, Vol 31(2), 2007.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charu C. Aggarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag US

About this chapter

Cite this chapter

Aggarwal, C.C., Wang, H. (2010). A Survey of Clustering Algorithms for Graph Data. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6045-0_9

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-6044-3

  • Online ISBN: 978-1-4419-6045-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics