research-article

Open Access

Efficient Interaction-based Neural Ranking with Locality Sensitive Hashing

Authors:
Shiyu Ji

University of California at Santa Barbara, USA

University of California at Santa Barbara, USA
View Profile

,
Jinjin Shao

University of California at Santa Barbara, USA

University of California at Santa Barbara, USA
View Profile

,
Tao Yang

University of California at Santa Barbara, USA

University of California at Santa Barbara, USA
View Profile

Authors Info & Claims

WWW '19: The World Wide Web ConferenceMay 2019Pages 2858–2864https://doi.org/10.1145/3308558.3313576

Published:13 May 2019Publication History

WWW '19: The World Wide Web Conference

Pages 2858–2864

ABSTRACT

Interaction-based neural ranking has been shown to be effective for document search using distributed word representations. However the time or space required is very expensive for online query processing with neural ranking. This paper investigates fast approximation of three interaction-based neural ranking algorithms using Locality Sensitive Hashing (LSH). It accelerates query-document interaction computation by using a runtime cache with precomputed term vectors, and speeds up kernel calculation by taking advantages of limited integer similarity values. This paper presents the design choices with cost analysis, and an evaluation that assesses efficiency benefits and relevance tradeoffs for the tested datasets.

References

Alexandr Andoni and Piotr Indyk. 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on. IEEE, 459-468. Google ScholarDigital Library
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Advances in Neural Information Processing Systems. 1225-1233. Google ScholarDigital Library
Alexandr Andoni and Ilya Razenshteyn. 2015. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. ACM, 793-801. Google ScholarDigital Library
Moses S Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 380-388. Google ScholarDigital Library
Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 126-134. Google ScholarDigital Library
Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on World Wide Web. ACM, 271-280. Google ScholarDigital Library
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry. ACM, 253-262. Google ScholarDigital Library
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity Search in High Dimensions via Hashing. In Proceedings of the 25th International Conference on Very Large Data Bases(VLDB '99). 518-529. Google ScholarDigital Library
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of CIKM'16. ACM, 55-64. Google ScholarDigital Library
Monika Henzinger. 2006. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 284-291. Google ScholarDigital Library
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 2333-2338. Google ScholarDigital Library
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in neural information processing systems. 4107-4115. Google ScholarDigital Library
Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. ACM, 604-613. Google ScholarDigital Library
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4 (Oct. 2002), 422-446. Google ScholarDigital Library
Brian Kulis and Trevor Darrell. 2009. Learning to hash with binary reconstructive embeddings. In Advances in neural information processing systems. 1042-1050. Google ScholarDigital Library
Lemur. {n. d.}. http://www.lemurproject.org/.Google Scholar
Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of massive datasets. Cambridge university press. Google ScholarDigital Library
Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma. 2007. Detecting near-duplicates for web crawling. In Proceedings of the 16th international conference on World Wide Web. ACM, 141-150. Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111-3119. Google ScholarDigital Library
Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1291-1299. Google ScholarDigital Library
Bhaskar Mitra, Eric Nalisnick, Nick Craswell, and Rich Caruana. 2016. A dual embedding space model for document ranking. arXiv preprint arXiv:1602.01137(2016).Google Scholar
Gregory B Newby, Chris Fallen, and Kylie McCormick. 2009. Lucene for n-grams using the ClueWeb Collection. Technical Report. ALASKA UNIV ANCHORAGE ARTIC REGION SUPERCOMPUTING CENTER.Google Scholar
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2017. A deep investigation of deep IR models. In SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17).Google Scholar
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. Deeprank: A new deep architecture for relevance ranking in information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 257-266. Google ScholarDigital Library
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532-1543.Google ScholarCross Ref
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525-542.Google ScholarCross Ref
Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Gre´goire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 373-374. Google ScholarDigital Library
Wei Tang, Gang Hua, and Liang Wang. 2017. How to train a compact binary neural network with high accuracy?. In AAAI. 2625-2631. Google ScholarDigital Library
J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J. R. Scott, and N. Wilkins-Diehr. 2014. XSEDE: Accelerating Scientific Discovery. Computing in Science & Engineering 16, 5 (Sept.-Oct. 2014), 62-74.Google ScholarCross Ref
Ferhan Ture, Tamer Elsayed, and Jimmy Lin. 2011. No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 943-952. Google ScholarDigital Library
Dennis Wackerly, William Mendenhall, and Richard L Scheaffer. 2014. Mathematical statistics with applications. Cengage Learning.Google Scholar
Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, and Alex Smola. 2009. Feature hashing for large scale multitask learning. arXiv preprint arXiv:0902.2206(2009).Google Scholar
Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. 2017. Word-entity duet representations for document ranking. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 763-772. Google ScholarDigital Library
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 55-64. Google ScholarDigital Library
Chenyan Xiong, Zhengzhong Liu, Jamie Callan, and Tie-Yan Liu. 2018. Towards Better Text Understanding and Retrieval Through Kernel Entity Salience Modeling. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval(SIGIR '18). ACM, 575-584. Google ScholarDigital Library
Chenyan Xiong, Russell Power, and Jamie Callan. 2017. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th international conference on world wide web. International World Wide Web Conferences Steering Committee, 1271-1279. Google ScholarDigital Library
Han Zhu, Mingsheng Long, Jianmin Wang, and Yue Cao. 2016. Deep Hashing Network for Efficient Similarity Retrieval.. In AAAI. 2415-2421. Google ScholarDigital Library

Recommendations

A posteriori multi-probe locality sensitive hashing
MM '08: Proceedings of the 16th ACM international conference on Multimedia

Efficient high-dimensional similarity search structures are essential for building scalable content-based search systems on feature-rich multimedia data. In the last decade, Locality Sensitive Hashing (LSH) has been proposed as indexing technique for ...
Read More
Data-Dependent Locality Sensitive Hashing
Proceedings of the 15th Pacific-Rim Conference on Advances in Multimedia Information Processing --- PCM 2014 - Volume 8879

Locality sensitive hashing LSH is the most popular algorithm for approximate nearest neighbor ANN search. As LSH partitions vector space uniformly and the distribution of vectors is usually non-uniform, it poorly fits real dataset and has limited ...
Read More
Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

The problem of c-Approximate Nearest Neighbor (c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Editors:
Ling Liu
Georgia Tech, USA
,
Ryen White
Microsoft Research, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 701
  Total Downloads
- Downloads (Last 12 months)112
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Efficient Interaction-based Neural Ranking with Locality Sensitive Hashing

WWW '19: The World Wide Web Conference

ABSTRACT

References

Cited By

Recommendations

A posteriori multi-probe locality sensitive hashing

Data-Dependent Locality Sensitive Hashing

Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Efficient Interaction-based Neural Ranking with Locality Sensitive Hashing

WWW '19: The World Wide Web Conference

ABSTRACT

References

Cited By

Recommendations

A posteriori multi-probe locality sensitive hashing

Data-Dependent Locality Sensitive Hashing

Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media