ABSTRACT
Nearest Neighbor Search (NNS) has recently drawn a rapid growth of interest because of its core role in high-dimensional vector data management in data science and AI applications. The interest is fueled by the success of neural embedding, where deep learning models transform unstructured data into semantically correlated feature vectors for data analysis, e.g., recommending popular items. Among several categories of methods for fast NNS, graph-based approximate nearest neighbor search algorithms have led to the best-in-class search performance on a wide range of real-world datasets. While prior works improve graph-based NNS search efficiency mainly through exploiting the structure of the graph with sophisticated heuristic rules, in this work, we show that the frequency distributions of edge visits for graph-based NNS can be highly skewed. This finding leads to the study of pruning unnecessary edges to avoid redundant computation during graph traversal by utilizing the query distribution, an important yet under-explored aspect of graph-based NNS. In particular, we formulate graph pruning as a discrete optimization problem, and introduce a graph optimization algorithm GraSP that improves the search efficiency of similarity graphs by learning to prune redundant edges. GraSP enhances an existing similarity graph with a probabilistic model. It then performs a novel subgraph sampling and iterative refinement optimization to explicitly maximize search efficiency when removing a subset of edges in expectation over a graph for a large set of training queries. The evaluation shows that GraSP consistently improves the search efficiency on real-world datasets, providing up to 2.24X faster search speed than state-of-the-art methods without losing accuracy.
Supplemental Material
- Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM (2008).Google Scholar
- Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya P. Razenshteyn, and Ludwig Schmidt. 2015a. Practical and Optimal LSH for Angular Distance. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015. 1225--1233.Google Scholar
- Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya P. Razenshteyn, and Ludwig Schmidt. 2015b. Practical and Optimal LSH for Angular Distance. In NeurIPS. 1225--1233.Google Scholar
- Alexandr Andoni, Piotr Indyk, and Ilya P. Razenshteyn. 2018. Approximate Nearest Neighbor Search in High Dimensions . CoRR , Vol. abs/1806.09823 (2018).Google Scholar
- Martin Aumü ller, Erik Bernhardsson, and Alexander John Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. , Vol. 87 (2020). https://doi.org/10.1016/j.is.2019.02.006Google ScholarDigital Library
- Artem Babenko and Victor S. Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. In CVPR 2016 . 2055--2063.Google Scholar
- Dmitry Baranchuk and Artem Babenko. 2019. Towards Similarity Graphs Constructed by Deep Reinforcement Learning. arxiv: 1911.12122 [cs.LG]Google Scholar
- Dmitry Baranchuk, Artem Babenko, and Yury Malkov. 2018. Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors. In ECCV 2018 . 209--224.Google Scholar
- Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. In SIGMOD 1990 . 322--331.Google ScholarDigital Library
- Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching . Commun. ACM , Vol. 18, 9 (Sept. 1975), 509--517.Google ScholarDigital Library
- Duncan S Callaway, Mark EJ Newman, Steven H Strogatz, and Duncan J Watts. 2000. Network robustness and fragility: Percolation on random graphs . Physical review letters , Vol. 85, 25 (2000), 5468.Google Scholar
- Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lintao Zhang, and Jingdong Wang. 2018. SPTAG: A library for fast approximate nearest neighbor search.Google Scholar
- Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Shilad Sen, Werner Geyer, Jill Freyne, and Pablo Castells (Eds.). ACM .Google ScholarDigital Library
- Paul Adrien Maurice Dirac. 1926. On the theory of quantum mechanics . Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character , Vol. 112, 762 (1926), 661--677.Google Scholar
- Matthijs Douze, Hervé Jé gou, Harsimrat Sandhawalia, Laurent Amsaleg, and Cordelia Schmid. 2009. Evaluation of GIST descriptors for web-scale image search. In Proceedings of the 8th ACM International Conference on Image and Video Retrieval, CIVR 2009, Santorini Island, Greece, July 8--10, 2009 , , Sté phane Marchand-Maillet and Yiannis Kompatsiaris (Eds.). ACM .Google ScholarDigital Library
- Matthijs Douze, Alexandre Sablayrolles, and Hervé Jé gou. 2018. Link and Code: Fast Indexing With Graphs and Compact Regression Codes. In CVPR 2018.Google Scholar
- Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, and Houda Benbrahim. 2019. Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search. Proc. VLDB Endow. , Vol. 13, 3 (2019), 403--420.Google ScholarDigital Library
- Tobias Flach, Nandita Dukkipati, Andreas Terzis, Barath Raghavan, Neal Cardwell, Yuchung Cheng, Ankur Jain, Shuai Hao, Ethan Katz-Bassett, and Ramesh Govindan. 2013. Reducing Web Latency: The Virtue of Gentle Aggression. In SIGCOMM '13 . 159--170.Google ScholarDigital Library
- Cong Fu and Deng Cai. 2016. EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph . CoRR , Vol. abs/1609.07228 (2016).Google Scholar
- Cong Fu, Changxu Wang, and Deng Cai. 2019 a. Satellite System Graph: Towards the Efficiency Up-Boundary of Graph-Based Approximate Nearest Neighbor Search. CoRR , Vol. abs/1907.06146 (2019).Google Scholar
- Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019 b. Fast Approximate Nearest Neighbor Search with the Navigating Spreading-out Graph. In VLDB'19 .Google Scholar
- Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. CoRR , Vol. abs/1503.02531 (2015).Google Scholar
- Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search. In TPAMI 2011 .Google Scholar
- Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data (2019).Google ScholarCross Ref
- Yannis Kalantidis and Yannis S. Avrithis. 2014. Locally Optimized Product Quantization for Approximate Nearest Neighbor Search. In CVPR 2014 .Google Scholar
- Quoc V. Le and Tomá s Mikolov. 2014. Distributed Representations of Sentences and Documents. In ICML 2014 , Vol. 32. JMLR.org, 1188--1196.Google Scholar
- D. T. Lee and C. K. Wong. 1977. Worst-case Analysis for Region and Partial Region Searches in Multidimensional Binary Search Trees and Balanced Quad Trees . Acta Informatica , Vol. 9, 1 (March 1977), 23--29.Google ScholarDigital Library
- Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Wenjie Zhang, and Xuemin Lin. 2019. Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement . IEEE Transactions on Knowledge and Data Engineering (2019).Google Scholar
- David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints . Int. J. Comput. Vision , Vol. 60, 2 (Nov. 2004).Google ScholarDigital Library
- Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs . Inf. Syst. , Vol. 45 (2014), 61--68.Google ScholarCross Ref
- Yury A. Malkov. 2015. Growing homophilic networks are natural optimal navigable small worlds. CoRR , Vol. abs/1507.06529 (2015).Google Scholar
- Yury A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Trans. Pattern Anal. Mach. Intell. , Vol. 42, 4 (2020), 824--836.Google ScholarDigital Library
- Tomá s Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR 2013 , , Yoshua Bengio and Yann LeCun (Eds.).Google Scholar
- Jose G. Moreno-Torres, Troy Raeder, Roc'i o Ala'i z-Rodr'i guez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recognit. (2012), 521--530.Google Scholar
- Marius Muja and David G. Lowe. 2014. Scalable Nearest Neighbor Algorithms for High Dimensional Data . TPAMI 2014 , Vol. 36, 11 (2014), 2227--2240.Google Scholar
- Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Allen Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019 a. Semantic Product Search. In KDD 2019 , , Ankur Teredesai, Vipin Kumar, Ying Li, Ró mer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 2876--2885.Google Scholar
- Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Allen Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019 b. Semantic Product Search. In KDD 2019 , , Ankur Teredesai, Vipin Kumar, Ying Li, Ró mer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 2876--2885.Google Scholar
- Mohammad Norouzi and David J. Fleet. 2013. Cartesian K-Means. In CVPR 2013 .Google Scholar
- Mohammad Norouzi, David J. Fleet, and Ruslan Salakhutdinov. 2012. Hamming Distance Metric Learning. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3--6, 2012, Lake Tahoe, Nevada, United States. 1070--1078.Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation . In EMNLP. 1532--1543.Google Scholar
- Xiang Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, and Jiawei Han. 2014. Heterogeneous Graph-based Intent Learning with Queries, Web Pages and Wikipedia Concepts. In WSDM '14 (New York, New York, USA). 23--32.Google Scholar
- Christian M. Schneider, André A. Moreira, José S. Andrade Jr., Shlomo Havlin, and Hans J. Herrmann. 2011. Mitigation of Malicious Attacks on Networks. CoRR , Vol. abs/1103.1741 (2011). arxiv: 1103.1741Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition . arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Rand-NSG: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. In NeurIPS . 13748--13758.Google Scholar
- Danny Sullivan. 2018. FAQ: All about the Google RankBrain algorithm. https://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 .Google Scholar
- Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In KDD 2018 . 839--848.Google Scholar
- Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In SIGIR 2017 . 55--64.Google Scholar
- Peter N. Yianilos. 1993. Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces. In SODA '93. 311--321.Google Scholar
- Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, Erik G. Learned-Miller, and Jaap Kamps. 2018. From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing. In CIKM 2018. 497--506.Google ScholarDigital Library
- Jialiang Zhang, Soroosh Khoram, and Jing Li. 2018. Efficient Large-Scale Approximate Nearest Neighbor Search on OpenCL FPGA. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 4924--4932.Google ScholarCross Ref
- Minjia Zhang and Yuxiong He. 2019. GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine. In CIKM 2019. 1673--1682.Google Scholar
- Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, and Rong Jin. 2021. Visual Search at Alibaba. CoRR , Vol. abs/2102.04674 (2021). arxiv: 2102.04674Google Scholar
Index Terms
- GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning
Recommendations
Enhanced Iterative-Deepening Search
Iterative-deepening searches mimic a breadth-first node expansion with a series of depth-first searches that operate with successively extended search horizons. They have been proposed as a simple way to reduce the space complexity of best-first ...
A novel approach to improving search efficiency in unstructured peer-to-peer networks
The decentralized peer-to-peer (P2P) technique has been widely used to implement scalable file sharing systems. It organizes nodes in a system into a structured or unstructured network. The advantages of the unstructured P2P systems are that they have ...
Pessimal Guesses may be Optimal: A Counterintuitive Search Result
A particular style of search is considered that is motivated by the problem of reconstructing the surface of three-dimensional objects given a collection of planar contours representing cross-sections through the objects. An improvement on the simple ...
Comments