ABSTRACT
Network clustering (or graph partitioning) is an important task for the discovery of underlying structures in networks. Many algorithms find clusters by maximizing the number of intra-cluster edges. While such algorithms find useful and interesting structures, they tend to fail to identify and isolate two kinds of vertices that play special roles - vertices that bridge clusters (hubs) and vertices that are marginally connected to clusters (outliers). Identifying hubs is useful for applications such as viral marketing and epidemiology since hubs are responsible for spreading ideas or disease. In contrast, outliers have little or no influence, and may be isolated as noise in the data. In this paper, we proposed a novel algorithm called SCAN (Structural Clustering Algorithm for Networks), which detects clusters, hubs and outliers in networks. It clusters vertices based on a structural similarity measure. The algorithm is fast and efficient, visiting each vertex only once. An empirical evaluation of the method using both synthetic and real datasets demonstrates superior performance over other methods such as the modularity-based algorithms.
Supplemental Material
- S. Wasserman and K. Faust, "Social Network Analysis." Cambridge University Press, Cambridge (1994).Google Scholar
- R. Albert, H. Jeong, and A.-L. Barabási, "Diameter of the world-wide web." Nature 401, 130--131 (1999).Google ScholarCross Ref
- J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, "The Web as a graph: Measurements, models and methods." In Proceedings of the International Conference on Combinatorics and Computing, number 1627 in Lecture Notes in Computer Science, pp. 1--18, Springer, Berlin (1999). Google ScholarDigital Library
- C. Ding, X. He, H. Zha, M. Gu, and H. Simon, "A min-max cut algorithm for graph partitioning and data clustering", Proc. of ICDM 2001. Google ScholarDigital Library
- J. Shi and J. Malik, "Normalized cuts and image segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 22, No. 8, 2000. Google ScholarDigital Library
- R. Guimera and L. A. N. Amaral, "Functional cartography of complex metabolic networks." Nature 433, 895--900 (2005).Google ScholarCross Ref
- J. Kleinberg. "Authoritative sources in a hyperlinked environment." Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Google ScholarDigital Library
- P. Domingos and M. Richardson, "Mining the Network Value of Customers", Proc. 7th ACM SIGKDD, pp. 57--66, 2001. Google ScholarDigital Library
- Y. Wang, D. Chakrabarti, C. Wang and C. Faloutsos, "Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint", SRDS 2003 (pages 25--34), Florence, ItalyGoogle Scholar
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise". In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), Portland, OR, pages 291--316. AAAI Press, 1996.Google Scholar
- M. E. J. Newman and M. Girvan, "Finding and evaluating community structure in networks", Phys. Rev. E 69, 026113 (2004).Google ScholarCross Ref
- A. Clauset, M. E. J. Newman, and C. Moore, "Finding community in very large networks", Physical Review E 70, 066111 (2004).Google ScholarCross Ref
- D. J. Watts and S. H. Strogatz, "Collective dynamics of 'small-world' networks," Nature, 393:440--442 (1998)Google ScholarCross Ref
- W. M. Rand, "Objective criteria for the evaluation of clustering methods." Journal of the American Statistical Association, 66, pp 846--850 (1971).Google ScholarCross Ref
- L. Hubert and P. Arabie, "Comparing Partitions". Journal of Classification, 193--218, 1985.Google ScholarCross Ref
- G. W. Milligan and M. C. Cooper, "A study of the comparability of external criteria for hierarchical cluster analysis", Multivariate BehavioralResearch, 21, 441--458, 1986.Google ScholarCross Ref
- http://cs.unm.edu/~aaron/research/fastmodularity.htm.Google Scholar
- http://www.orgnet.com/.Google Scholar
- http://www-personal.umich.edu/~mejn/netdata/.Google Scholar
- P. Erdös and A. Rényi, Publ. Math. (Debrecen) 6, 290 (1959).Google Scholar
- M. Faloutsos, P. Faloutsos and C. Faloutsos, On Power-Law Relationships of the Internet Topology, SIGCOMM 1999. Google ScholarDigital Library
- A.-L. Barabási and Z. N. Oltvai, Nature Reviews Genetics 5, 101--113 (2004).Google ScholarCross Ref
Index Terms
- SCAN: a structural clustering algorithm for networks
Recommendations
Clustering dense graphs: A web site graph paradigm
Typically graph-clustering approaches assume that a cluster is a vertex subset such that for all of its vertices, the number of links connecting a vertex to its cluster is higher than the number of links connecting the vertex to the remaining graph. We ...
Refining graph partitioning for social network clustering
WISE'10: Proceedings of the 11th international conference on Web information systems engineeringGraph partitioning is a traditional problem with many applications and a number of high-quality algorithms have been developed. Recently, demand for social network analysis arouses the new research interest on graph clustering. Social networks differ ...
Networks, communities and kronecker products
CNIKM '09: Proceedings of the 1st ACM international workshop on Complex networks meet information & knowledge managementEmergence of the web and online computing applications gave rise to rich large scale social activity data. One of the principal challenges then is to build models and understanding of the structure of such large social and information networks. Here I ...
Comments