skip to main content
10.1145/3308558.3313576acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Open Access

Efficient Interaction-based Neural Ranking with Locality Sensitive Hashing

Authors Info & Claims
Published:13 May 2019Publication History

ABSTRACT

Interaction-based neural ranking has been shown to be effective for document search using distributed word representations. However the time or space required is very expensive for online query processing with neural ranking. This paper investigates fast approximation of three interaction-based neural ranking algorithms using Locality Sensitive Hashing (LSH). It accelerates query-document interaction computation by using a runtime cache with precomputed term vectors, and speeds up kernel calculation by taking advantages of limited integer similarity values. This paper presents the design choices with cost analysis, and an evaluation that assesses efficiency benefits and relevance tradeoffs for the tested datasets.

References

  1. Alexandr Andoni and Piotr Indyk. 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on. IEEE, 459-468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Advances in Neural Information Processing Systems. 1225-1233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexandr Andoni and Ilya Razenshteyn. 2015. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. ACM, 793-801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Moses S Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 380-388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 126-134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th international conference on World Wide Web. ACM, 271-280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry. ACM, 253-262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity Search in High Dimensions via Hashing. In Proceedings of the 25th International Conference on Very Large Data Bases(VLDB '99). 518-529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of CIKM'16. ACM, 55-64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Monika Henzinger. 2006. Finding near-duplicate web pages: a large-scale evaluation of algorithms. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 284-291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 2333-2338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in neural information processing systems. 4107-4115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. ACM, 604-613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4 (Oct. 2002), 422-446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Brian Kulis and Trevor Darrell. 2009. Learning to hash with binary reconstructive embeddings. In Advances in neural information processing systems. 1042-1050. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lemur. {n. d.}. http://www.lemurproject.org/.Google ScholarGoogle Scholar
  17. Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of massive datasets. Cambridge university press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma. 2007. Detecting near-duplicates for web crawling. In Proceedings of the 16th international conference on World Wide Web. ACM, 141-150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111-3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1291-1299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Bhaskar Mitra, Eric Nalisnick, Nick Craswell, and Rich Caruana. 2016. A dual embedding space model for document ranking. arXiv preprint arXiv:1602.01137(2016).Google ScholarGoogle Scholar
  22. Gregory B Newby, Chris Fallen, and Kylie McCormick. 2009. Lucene for n-grams using the ClueWeb Collection. Technical Report. ALASKA UNIV ANCHORAGE ARTIC REGION SUPERCOMPUTING CENTER.Google ScholarGoogle Scholar
  23. Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2017. A deep investigation of deep IR models. In SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17).Google ScholarGoogle Scholar
  24. Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Jingfang Xu, and Xueqi Cheng. 2017. Deeprank: A new deep architecture for relevance ranking in information retrieval. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 257-266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532-1543.Google ScholarGoogle ScholarCross RefCross Ref
  26. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525-542.Google ScholarGoogle ScholarCross RefCross Ref
  27. Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Gre´goire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 373-374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wei Tang, Gang Hua, and Liang Wang. 2017. How to train a compact binary neural network with high accuracy?. In AAAI. 2625-2631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J. R. Scott, and N. Wilkins-Diehr. 2014. XSEDE: Accelerating Scientific Discovery. Computing in Science & Engineering 16, 5 (Sept.-Oct. 2014), 62-74.Google ScholarGoogle ScholarCross RefCross Ref
  30. Ferhan Ture, Tamer Elsayed, and Jimmy Lin. 2011. No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 943-952. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Dennis Wackerly, William Mendenhall, and Richard L Scheaffer. 2014. Mathematical statistics with applications. Cengage Learning.Google ScholarGoogle Scholar
  32. Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, and Alex Smola. 2009. Feature hashing for large scale multitask learning. arXiv preprint arXiv:0902.2206(2009).Google ScholarGoogle Scholar
  33. Chenyan Xiong, Jamie Callan, and Tie-Yan Liu. 2017. Word-entity duet representations for document ranking. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 763-772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 55-64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Chenyan Xiong, Zhengzhong Liu, Jamie Callan, and Tie-Yan Liu. 2018. Towards Better Text Understanding and Retrieval Through Kernel Entity Salience Modeling. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval(SIGIR '18). ACM, 575-584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Chenyan Xiong, Russell Power, and Jamie Callan. 2017. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th international conference on world wide web. International World Wide Web Conferences Steering Committee, 1271-1279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Han Zhu, Mingsheng Long, Jianmin Wang, and Yue Cao. 2016. Deep Hashing Network for Efficient Similarity Retrieval.. In AAAI. 2415-2421. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    WWW '19: The World Wide Web Conference
    May 2019
    3620 pages
    ISBN:9781450366748
    DOI:10.1145/3308558

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 May 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format