research-article

An experimental evaluation of simrank-based similarity search algorithms

Authors:
Zhipeng Zhang

Peking University

Peking University
View Profile

,
Yingxia Shao

Peking University

Peking University
View Profile

,
Bin Cui

Peking University

Peking University
View Profile

,
Ce Zhang

ETH Zurich

ETH Zurich
View Profile

Proceedings of the VLDB Endowment Volume 10 Issue 5pp 601–612https://doi.org/10.14778/3055540.3055552

Published:01 January 2017Publication History

Proceedings of the VLDB Endowment

Abstract

Given a graph, SimRank is one of the most popular measures of the similarity between two vertices. We focus on efficiently calculating SimRank, which has been studied intensively over the last decade. This has led to many algorithms that efficiently calculate or approximate SimRank being proposed by researchers. Despite these abundant research efforts, there is no systematic comparison of these algorithms. In this paper, we conduct a study to compare these algorithms to understand their pros and cons.

We first introduce a taxonomy for different algorithms that calculate SimRank and classify each algorithm into one of the following three classes, namely, iterative-, non-iterative-, and random walk-based method. We implement ten algorithms published from 2002 to 2015, and compare them using synthetic and real-world graphs. To ensure the fairness of our study, our implementations use the same data structure and execution framework, and we try our best to optimize each of these algorithms. Our study reveals that none of these algorithms dominates the others: algorithms based on iterative method often have higher accuracy while algorithms based on random walk can be more scalable. One noniterative algorithm has good effectiveness and efficiency on graphs with medium size. Thus, depending on the requirements of different applications, the optimal choice of algorithms differs. This paper provides an empirical guideline for making such choices.

References

Z. Abbassi and V. S. Mirrokni. A recommender system based on local random walks and spectral methods. In WebKDD/SNA-KDD, pages 102--108, 2007. Google ScholarDigital Library
I. Antonellis, H. G. Molina, and C. C. Chang. Simrank++: Query rewriting through link analysis of the click graph. In PVLDB, pages 408--421, 2008. Google ScholarDigital Library
A. A. Benczúr, K. Csalogány, and T. Sarlós. Link-based similarity search to fight web spam. In AIRWEB, pages 9--16, 2006.Google Scholar
B. Cui, H. Mei, and B. C. Ooi. Big data: the driver for innovation in databases. 2014.Google Scholar
D. Fogaras and B. Rácz. Scaling link-based similarity search. In WWW, pages 641--650, 2005. Google ScholarDigital Library
Y. Fujiwara, M. Nakatsuji, H. Shiokawa, and M. Onizuka. Efficient search algorithm for simrank. In ICDE, pages 589--600, 2013. Google ScholarDigital Library
D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. Using collaborative filtering to weave an information tapestry. Commun. ACM, 35(12):61--70, Dec. 1992. Google ScholarDigital Library
G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In KDD, pages 538--543, 2002. Google ScholarDigital Library
J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl. Grouplens: Applying collaborative filtering to usenet news. Commun. ACM, 40(3):77--87, Mar. 1997. Google ScholarDigital Library
M. Kusumoto, T. Maehara, and K.-i. Kawarabayashi. Scalable similarity search for simrank. In SIGMOD, pages 325--336, 2014. Google ScholarDigital Library
P. Lee, L. V. S. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE, pages 774--785, 2012. Google ScholarDigital Library
C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of simrank for static and dynamic information networks. In EDBT, pages 465--476, 2010. Google ScholarDigital Library
P. Li, Y. Cai, H. Liu, J. He, and X. Du. Exploiting the block structure of link graph for efficient similarity computation. In AKDDM. pages 389--400, 2009. Google ScholarDigital Library
P. Li, H. Liu, J. Xu, Y. Jun, and H. X. Du. Fast single-pair simrank computation. In SDM, 2010.Google ScholarCross Ref
D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003. Google ScholarDigital Library
D. Lizorkin, P. Velikhov, M. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. In PVLDB, pages 45--66, 2010. Google ScholarDigital Library
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web, 1998.Google Scholar
C. Sanderson. Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments. Technical report, NICTA, Sept. 2010.Google Scholar
Y. Shao, L. Chen, and B. Cui. Efficient cohesive subgraphs detection in parallel. In SIGMOD, pages 613--624, 2014. Google ScholarDigital Library
Y. Shao, B. Cui, L. Chen, M. Liu, and X. Xie. An efficient similarity search framework for simrank over large dynamic graphs. PVLDB, 8(8):838--849, 2015. Google ScholarDigital Library
Y. Shao, B. Cui, L. Chen, L. Ma, J. Yao, and N. Xu. Parallel subgraph listing in a large-scale graph. In SIGMOD, pages 625--636, 2014. Google ScholarDigital Library
Y. Shao, B. Cui, and L. Ma. Page: a partition aware engine for parallel graph computation. TKDE, 27(2):518--530, 2015.Google ScholarCross Ref
U. Shardanand and P. Maes. Social information filtering: Algorithms for automating: Word of mouth. In CHI, pages 210--217, 1995. Google ScholarDigital Library
W. Tao and G. Li. Efficient top-k simrank-based similarity join. In SIGMOD, pages 1603--1604, 2014. Google ScholarDigital Library
W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: Measuring similarity using unified relationship matrix. In SIGIR, pages 130--137, 2005. Google ScholarDigital Library
W. Yu, X. Lin, and W. Zhang. Towards efficient simrank computation on large networks. In ICDE, pages 601--612, 2013. Google ScholarDigital Library
W. Yu and J. A. McCann. Efficient partial-pairs simrank search for large networks. PVLDB, 8(5):569--580, 2015. Google ScholarDigital Library
W. Zheng, L. Zou, Y. Feng, L. Chen, and D. Zhao. Efficient simrank-based similarity join over large graphs. In PVLDB, pages 493--504, 2013. Google ScholarDigital Library

Index Terms

An experimental evaluation of simrank-based similarity search algorithms
1. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

Scalable similarity search for SimRank
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

SimRank, proposed by Jeh and Widom, provides a good similarity score and has been successfully used in many of the above mentioned applications. While there are many algorithms proposed so far to compute SimRank, but unfortunately, none of them are ...
Read More
An efficient similarity search framework for SimRank over large dynamic graphs

SimRank is an important measure of vertex-pair similarity according to the structure of graphs. The similarity search based on SimRank is an important operation for identifying similar vertices in a graph and has been employed in many data analysis ...
Read More
SimRank: a measure of structural-context similarity
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

The problem of measuring "similarity" of objects arises in many applications, and many domain-specific measures have been developed, e.g., matching text across documents or computing overlap among item-sets. We propose a complementary approach, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 10, Issue 5
January 2017
168 pages
ISSN:2150-8097
Editor:
Divesh Srivastava
AT&T Labs
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 January 2017
Published in pvldb Volume 10, Issue 5
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 168
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An experimental evaluation of simrank-based similarity search algorithms

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Scalable similarity search for SimRank

An efficient similarity search framework for SimRank over large dynamic graphs

SimRank: a measure of structural-context similarity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An experimental evaluation of simrank-based similarity search algorithms

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Scalable similarity search for SimRank

An efficient similarity search framework for SimRank over large dynamic graphs

SimRank: a measure of structural-context similarity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media