Abstract
Given a graph, SimRank is one of the most popular measures of the similarity between two vertices. We focus on efficiently calculating SimRank, which has been studied intensively over the last decade. This has led to many algorithms that efficiently calculate or approximate SimRank being proposed by researchers. Despite these abundant research efforts, there is no systematic comparison of these algorithms. In this paper, we conduct a study to compare these algorithms to understand their pros and cons.
We first introduce a taxonomy for different algorithms that calculate SimRank and classify each algorithm into one of the following three classes, namely, iterative-, non-iterative-, and random walk-based method. We implement ten algorithms published from 2002 to 2015, and compare them using synthetic and real-world graphs. To ensure the fairness of our study, our implementations use the same data structure and execution framework, and we try our best to optimize each of these algorithms. Our study reveals that none of these algorithms dominates the others: algorithms based on iterative method often have higher accuracy while algorithms based on random walk can be more scalable. One noniterative algorithm has good effectiveness and efficiency on graphs with medium size. Thus, depending on the requirements of different applications, the optimal choice of algorithms differs. This paper provides an empirical guideline for making such choices.
- Z. Abbassi and V. S. Mirrokni. A recommender system based on local random walks and spectral methods. In WebKDD/SNA-KDD, pages 102--108, 2007. Google ScholarDigital Library
- I. Antonellis, H. G. Molina, and C. C. Chang. Simrank++: Query rewriting through link analysis of the click graph. In PVLDB, pages 408--421, 2008. Google ScholarDigital Library
- A. A. Benczúr, K. Csalogány, and T. Sarlós. Link-based similarity search to fight web spam. In AIRWEB, pages 9--16, 2006.Google Scholar
- B. Cui, H. Mei, and B. C. Ooi. Big data: the driver for innovation in databases. 2014.Google Scholar
- D. Fogaras and B. Rácz. Scaling link-based similarity search. In WWW, pages 641--650, 2005. Google ScholarDigital Library
- Y. Fujiwara, M. Nakatsuji, H. Shiokawa, and M. Onizuka. Efficient search algorithm for simrank. In ICDE, pages 589--600, 2013. Google ScholarDigital Library
- D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. Using collaborative filtering to weave an information tapestry. Commun. ACM, 35(12):61--70, Dec. 1992. Google ScholarDigital Library
- G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In KDD, pages 538--543, 2002. Google ScholarDigital Library
- J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl. Grouplens: Applying collaborative filtering to usenet news. Commun. ACM, 40(3):77--87, Mar. 1997. Google ScholarDigital Library
- M. Kusumoto, T. Maehara, and K.-i. Kawarabayashi. Scalable similarity search for simrank. In SIGMOD, pages 325--336, 2014. Google ScholarDigital Library
- P. Lee, L. V. S. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE, pages 774--785, 2012. Google ScholarDigital Library
- C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of simrank for static and dynamic information networks. In EDBT, pages 465--476, 2010. Google ScholarDigital Library
- P. Li, Y. Cai, H. Liu, J. He, and X. Du. Exploiting the block structure of link graph for efficient similarity computation. In AKDDM. pages 389--400, 2009. Google ScholarDigital Library
- P. Li, H. Liu, J. Xu, Y. Jun, and H. X. Du. Fast single-pair simrank computation. In SDM, 2010.Google ScholarCross Ref
- D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003. Google ScholarDigital Library
- D. Lizorkin, P. Velikhov, M. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. In PVLDB, pages 45--66, 2010. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web, 1998.Google Scholar
- C. Sanderson. Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments. Technical report, NICTA, Sept. 2010.Google Scholar
- Y. Shao, L. Chen, and B. Cui. Efficient cohesive subgraphs detection in parallel. In SIGMOD, pages 613--624, 2014. Google ScholarDigital Library
- Y. Shao, B. Cui, L. Chen, M. Liu, and X. Xie. An efficient similarity search framework for simrank over large dynamic graphs. PVLDB, 8(8):838--849, 2015. Google ScholarDigital Library
- Y. Shao, B. Cui, L. Chen, L. Ma, J. Yao, and N. Xu. Parallel subgraph listing in a large-scale graph. In SIGMOD, pages 625--636, 2014. Google ScholarDigital Library
- Y. Shao, B. Cui, and L. Ma. Page: a partition aware engine for parallel graph computation. TKDE, 27(2):518--530, 2015.Google ScholarCross Ref
- U. Shardanand and P. Maes. Social information filtering: Algorithms for automating: Word of mouth. In CHI, pages 210--217, 1995. Google ScholarDigital Library
- W. Tao and G. Li. Efficient top-k simrank-based similarity join. In SIGMOD, pages 1603--1604, 2014. Google ScholarDigital Library
- W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: Measuring similarity using unified relationship matrix. In SIGIR, pages 130--137, 2005. Google ScholarDigital Library
- W. Yu, X. Lin, and W. Zhang. Towards efficient simrank computation on large networks. In ICDE, pages 601--612, 2013. Google ScholarDigital Library
- W. Yu and J. A. McCann. Efficient partial-pairs simrank search for large networks. PVLDB, 8(5):569--580, 2015. Google ScholarDigital Library
- W. Zheng, L. Zou, Y. Feng, L. Chen, and D. Zhao. Efficient simrank-based similarity join over large graphs. In PVLDB, pages 493--504, 2013. Google ScholarDigital Library
Index Terms
- An experimental evaluation of simrank-based similarity search algorithms
Recommendations
Scalable similarity search for SimRank
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataSimRank, proposed by Jeh and Widom, provides a good similarity score and has been successfully used in many of the above mentioned applications. While there are many algorithms proposed so far to compute SimRank, but unfortunately, none of them are ...
An efficient similarity search framework for SimRank over large dynamic graphs
SimRank is an important measure of vertex-pair similarity according to the structure of graphs. The similarity search based on SimRank is an important operation for identifying similar vertices in a graph and has been employed in many data analysis ...
SimRank: a measure of structural-context similarity
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningThe problem of measuring "similarity" of objects arises in many applications, and many domain-specific measures have been developed, e.g., matching text across documents or computing overlap among item-sets. We propose a complementary approach, ...
Comments