skip to main content
research-article

An experimental evaluation of simrank-based similarity search algorithms

Published:01 January 2017Publication History
Skip Abstract Section

Abstract

Given a graph, SimRank is one of the most popular measures of the similarity between two vertices. We focus on efficiently calculating SimRank, which has been studied intensively over the last decade. This has led to many algorithms that efficiently calculate or approximate SimRank being proposed by researchers. Despite these abundant research efforts, there is no systematic comparison of these algorithms. In this paper, we conduct a study to compare these algorithms to understand their pros and cons.

We first introduce a taxonomy for different algorithms that calculate SimRank and classify each algorithm into one of the following three classes, namely, iterative-, non-iterative-, and random walk-based method. We implement ten algorithms published from 2002 to 2015, and compare them using synthetic and real-world graphs. To ensure the fairness of our study, our implementations use the same data structure and execution framework, and we try our best to optimize each of these algorithms. Our study reveals that none of these algorithms dominates the others: algorithms based on iterative method often have higher accuracy while algorithms based on random walk can be more scalable. One noniterative algorithm has good effectiveness and efficiency on graphs with medium size. Thus, depending on the requirements of different applications, the optimal choice of algorithms differs. This paper provides an empirical guideline for making such choices.

References

  1. Z. Abbassi and V. S. Mirrokni. A recommender system based on local random walks and spectral methods. In WebKDD/SNA-KDD, pages 102--108, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. I. Antonellis, H. G. Molina, and C. C. Chang. Simrank++: Query rewriting through link analysis of the click graph. In PVLDB, pages 408--421, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. A. Benczúr, K. Csalogány, and T. Sarlós. Link-based similarity search to fight web spam. In AIRWEB, pages 9--16, 2006.Google ScholarGoogle Scholar
  4. B. Cui, H. Mei, and B. C. Ooi. Big data: the driver for innovation in databases. 2014.Google ScholarGoogle Scholar
  5. D. Fogaras and B. Rácz. Scaling link-based similarity search. In WWW, pages 641--650, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Fujiwara, M. Nakatsuji, H. Shiokawa, and M. Onizuka. Efficient search algorithm for simrank. In ICDE, pages 589--600, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. Using collaborative filtering to weave an information tapestry. Commun. ACM, 35(12):61--70, Dec. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In KDD, pages 538--543, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl. Grouplens: Applying collaborative filtering to usenet news. Commun. ACM, 40(3):77--87, Mar. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Kusumoto, T. Maehara, and K.-i. Kawarabayashi. Scalable similarity search for simrank. In SIGMOD, pages 325--336, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Lee, L. V. S. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE, pages 774--785, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of simrank for static and dynamic information networks. In EDBT, pages 465--476, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Li, Y. Cai, H. Liu, J. He, and X. Du. Exploiting the block structure of link graph for efficient similarity computation. In AKDDM. pages 389--400, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Li, H. Liu, J. Xu, Y. Jun, and H. X. Du. Fast single-pair simrank computation. In SDM, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  15. D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Lizorkin, P. Velikhov, M. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. In PVLDB, pages 45--66, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web, 1998.Google ScholarGoogle Scholar
  18. C. Sanderson. Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments. Technical report, NICTA, Sept. 2010.Google ScholarGoogle Scholar
  19. Y. Shao, L. Chen, and B. Cui. Efficient cohesive subgraphs detection in parallel. In SIGMOD, pages 613--624, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Shao, B. Cui, L. Chen, M. Liu, and X. Xie. An efficient similarity search framework for simrank over large dynamic graphs. PVLDB, 8(8):838--849, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Shao, B. Cui, L. Chen, L. Ma, J. Yao, and N. Xu. Parallel subgraph listing in a large-scale graph. In SIGMOD, pages 625--636, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Shao, B. Cui, and L. Ma. Page: a partition aware engine for parallel graph computation. TKDE, 27(2):518--530, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  23. U. Shardanand and P. Maes. Social information filtering: Algorithms for automating: Word of mouth. In CHI, pages 210--217, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Tao and G. Li. Efficient top-k simrank-based similarity join. In SIGMOD, pages 1603--1604, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: Measuring similarity using unified relationship matrix. In SIGIR, pages 130--137, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. Yu, X. Lin, and W. Zhang. Towards efficient simrank computation on large networks. In ICDE, pages 601--612, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W. Yu and J. A. McCann. Efficient partial-pairs simrank search for large networks. PVLDB, 8(5):569--580, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Zheng, L. Zou, Y. Feng, L. Chen, and D. Zhao. Efficient simrank-based similarity join over large graphs. In PVLDB, pages 493--504, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An experimental evaluation of simrank-based similarity search algorithms
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 10, Issue 5
      January 2017
      168 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 January 2017
      Published in pvldb Volume 10, Issue 5

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader