research-article

GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning

Authors:
Minjia Zhang

Microsoft, Bellevue, WA, USA

Microsoft, Bellevue, WA, USA
View Profile

,
Wenhan Wang

Microsoft, Bellevue, WA, USA

Microsoft, Bellevue, WA, USA
View Profile

,
Yuxiong He

Microsoft, Bellevue, WA, USA

Microsoft, Bellevue, WA, USA
View Profile

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data MiningFebruary 2022Pages 1395–1405https://doi.org/10.1145/3488560.3498425

Published:15 February 2022Publication History

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

Pages 1395–1405

ABSTRACT

Nearest Neighbor Search (NNS) has recently drawn a rapid growth of interest because of its core role in high-dimensional vector data management in data science and AI applications. The interest is fueled by the success of neural embedding, where deep learning models transform unstructured data into semantically correlated feature vectors for data analysis, e.g., recommending popular items. Among several categories of methods for fast NNS, graph-based approximate nearest neighbor search algorithms have led to the best-in-class search performance on a wide range of real-world datasets. While prior works improve graph-based NNS search efficiency mainly through exploiting the structure of the graph with sophisticated heuristic rules, in this work, we show that the frequency distributions of edge visits for graph-based NNS can be highly skewed. This finding leads to the study of pruning unnecessary edges to avoid redundant computation during graph traversal by utilizing the query distribution, an important yet under-explored aspect of graph-based NNS. In particular, we formulate graph pruning as a discrete optimization problem, and introduce a graph optimization algorithm GraSP that improves the search efficiency of similarity graphs by learning to prune redundant edges. GraSP enhances an existing similarity graph with a probabilistic model. It then performs a novel subgraph sampling and iterative refinement optimization to explicitly maximize search efficiency when removing a subset of edges in expectation over a graph for a large set of training queries. The evaluation shows that GraSP consistently improves the search efficiency on real-world datasets, providing up to 2.24X faster search speed than state-of-the-art methods without losing accuracy.

Supplemental Material

WSDM22-fp304.mp4

mp4

116 MB

Download

References

Alexandr Andoni and Piotr Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM (2008).Google Scholar
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya P. Razenshteyn, and Ludwig Schmidt. 2015a. Practical and Optimal LSH for Angular Distance. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015. 1225--1233.Google Scholar
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya P. Razenshteyn, and Ludwig Schmidt. 2015b. Practical and Optimal LSH for Angular Distance. In NeurIPS. 1225--1233.Google Scholar
Alexandr Andoni, Piotr Indyk, and Ilya P. Razenshteyn. 2018. Approximate Nearest Neighbor Search in High Dimensions . CoRR , Vol. abs/1806.09823 (2018).Google Scholar
Martin Aumü ller, Erik Bernhardsson, and Alexander John Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. , Vol. 87 (2020). https://doi.org/10.1016/j.is.2019.02.006Google ScholarDigital Library
Artem Babenko and Victor S. Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. In CVPR 2016 . 2055--2063.Google Scholar
Dmitry Baranchuk and Artem Babenko. 2019. Towards Similarity Graphs Constructed by Deep Reinforcement Learning. arxiv: 1911.12122 [cs.LG]Google Scholar
Dmitry Baranchuk, Artem Babenko, and Yury Malkov. 2018. Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors. In ECCV 2018 . 209--224.Google Scholar
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. In SIGMOD 1990 . 322--331.Google ScholarDigital Library
Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching . Commun. ACM , Vol. 18, 9 (Sept. 1975), 509--517.Google ScholarDigital Library
Duncan S Callaway, Mark EJ Newman, Steven H Strogatz, and Duncan J Watts. 2000. Network robustness and fragility: Percolation on random graphs . Physical review letters , Vol. 85, 25 (2000), 5468.Google Scholar
Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jeffery Zhu, Jason Li, Chuanjie Liu, Lintao Zhang, and Jingdong Wang. 2018. SPTAG: A library for fast approximate nearest neighbor search.Google Scholar
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Shilad Sen, Werner Geyer, Jill Freyne, and Pablo Castells (Eds.). ACM .Google ScholarDigital Library
Paul Adrien Maurice Dirac. 1926. On the theory of quantum mechanics . Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character , Vol. 112, 762 (1926), 661--677.Google Scholar
Matthijs Douze, Hervé Jé gou, Harsimrat Sandhawalia, Laurent Amsaleg, and Cordelia Schmid. 2009. Evaluation of GIST descriptors for web-scale image search. In Proceedings of the 8th ACM International Conference on Image and Video Retrieval, CIVR 2009, Santorini Island, Greece, July 8--10, 2009 , , Sté phane Marchand-Maillet and Yiannis Kompatsiaris (Eds.). ACM .Google ScholarDigital Library
Matthijs Douze, Alexandre Sablayrolles, and Hervé Jé gou. 2018. Link and Code: Fast Indexing With Graphs and Compact Regression Codes. In CVPR 2018.Google Scholar
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, and Houda Benbrahim. 2019. Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search. Proc. VLDB Endow. , Vol. 13, 3 (2019), 403--420.Google ScholarDigital Library
Tobias Flach, Nandita Dukkipati, Andreas Terzis, Barath Raghavan, Neal Cardwell, Yuchung Cheng, Ankur Jain, Shuai Hao, Ethan Katz-Bassett, and Ramesh Govindan. 2013. Reducing Web Latency: The Virtue of Gentle Aggression. In SIGCOMM '13 . 159--170.Google ScholarDigital Library
Cong Fu and Deng Cai. 2016. EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph . CoRR , Vol. abs/1609.07228 (2016).Google Scholar
Cong Fu, Changxu Wang, and Deng Cai. 2019 a. Satellite System Graph: Towards the Efficiency Up-Boundary of Graph-Based Approximate Nearest Neighbor Search. CoRR , Vol. abs/1907.06146 (2019).Google Scholar
Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019 b. Fast Approximate Nearest Neighbor Search with the Navigating Spreading-out Graph. In VLDB'19 .Google Scholar
Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network. CoRR , Vol. abs/1503.02531 (2015).Google Scholar
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search. In TPAMI 2011 .Google Scholar
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data (2019).Google ScholarCross Ref
Yannis Kalantidis and Yannis S. Avrithis. 2014. Locally Optimized Product Quantization for Approximate Nearest Neighbor Search. In CVPR 2014 .Google Scholar
Quoc V. Le and Tomá s Mikolov. 2014. Distributed Representations of Sentences and Documents. In ICML 2014 , Vol. 32. JMLR.org, 1188--1196.Google Scholar
D. T. Lee and C. K. Wong. 1977. Worst-case Analysis for Region and Partial Region Searches in Multidimensional Binary Search Trees and Balanced Quad Trees . Acta Informatica , Vol. 9, 1 (March 1977), 23--29.Google ScholarDigital Library
Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Wenjie Zhang, and Xuemin Lin. 2019. Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement . IEEE Transactions on Knowledge and Data Engineering (2019).Google Scholar
David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints . Int. J. Comput. Vision , Vol. 60, 2 (Nov. 2004).Google ScholarDigital Library
Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs . Inf. Syst. , Vol. 45 (2014), 61--68.Google ScholarCross Ref
Yury A. Malkov. 2015. Growing homophilic networks are natural optimal navigable small worlds. CoRR , Vol. abs/1507.06529 (2015).Google Scholar
Yury A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Trans. Pattern Anal. Mach. Intell. , Vol. 42, 4 (2020), 824--836.Google ScholarDigital Library
Tomá s Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR 2013 , , Yoshua Bengio and Yann LeCun (Eds.).Google Scholar
Jose G. Moreno-Torres, Troy Raeder, Roc'i o Ala'i z-Rodr'i guez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recognit. (2012), 521--530.Google Scholar
Marius Muja and David G. Lowe. 2014. Scalable Nearest Neighbor Algorithms for High Dimensional Data . TPAMI 2014 , Vol. 36, 11 (2014), 2227--2240.Google Scholar
Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Allen Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019 a. Semantic Product Search. In KDD 2019 , , Ankur Teredesai, Vipin Kumar, Ying Li, Ró mer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 2876--2885.Google Scholar
Priyanka Nigam, Yiwei Song, Vijai Mohan, Vihan Lakshman, Weitian Allen Ding, Ankit Shingavi, Choon Hui Teo, Hao Gu, and Bing Yin. 2019 b. Semantic Product Search. In KDD 2019 , , Ankur Teredesai, Vipin Kumar, Ying Li, Ró mer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 2876--2885.Google Scholar
Mohammad Norouzi and David J. Fleet. 2013. Cartesian K-Means. In CVPR 2013 .Google Scholar
Mohammad Norouzi, David J. Fleet, and Ruslan Salakhutdinov. 2012. Hamming Distance Metric Learning. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3--6, 2012, Lake Tahoe, Nevada, United States. 1070--1078.Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation . In EMNLP. 1532--1543.Google Scholar
Xiang Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, and Jiawei Han. 2014. Heterogeneous Graph-based Intent Learning with Queries, Web Pages and Wikipedia Concepts. In WSDM '14 (New York, New York, USA). 23--32.Google Scholar
Christian M. Schneider, André A. Moreira, José S. Andrade Jr., Shlomo Havlin, and Hans J. Herrmann. 2011. Mitigation of Malicious Attacks on Networks. CoRR , Vol. abs/1103.1741 (2011). arxiv: 1103.1741Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition . arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, and Rohan Kadekodi. 2019. Rand-NSG: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. In NeurIPS . 13748--13758.Google Scholar
Danny Sullivan. 2018. FAQ: All about the Google RankBrain algorithm. https://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440 .Google Scholar
Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba. In KDD 2018 . 839--848.Google Scholar
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-End Neural Ad-hoc Ranking with Kernel Pooling. In SIGIR 2017 . 55--64.Google Scholar
Peter N. Yianilos. 1993. Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces. In SODA '93. 311--321.Google Scholar
Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, Erik G. Learned-Miller, and Jaap Kamps. 2018. From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing. In CIKM 2018. 497--506.Google ScholarDigital Library
Jialiang Zhang, Soroosh Khoram, and Jing Li. 2018. Efficient Large-Scale Approximate Nearest Neighbor Search on OpenCL FPGA. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 4924--4932.Google ScholarCross Ref
Minjia Zhang and Yuxiong He. 2019. GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine. In CIKM 2019. 1673--1682.Google Scholar
Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, and Rong Jin. 2021. Visual Search at Alibaba. CoRR , Vol. abs/2102.04674 (2021). arxiv: 2102.04674Google Scholar

Index Terms

GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Enhanced Iterative-Deepening Search

Iterative-deepening searches mimic a breadth-first node expansion with a series of depth-first searches that operate with successively extended search horizons. They have been proposed as a simple way to reduce the space complexity of best-first ...
Read More
A novel approach to improving search efficiency in unstructured peer-to-peer networks

The decentralized peer-to-peer (P2P) technique has been widely used to implement scalable file sharing systems. It organizes nodes in a system into a structured or unstructured network. The advantages of the unstructured P2P systems are that they have ...
Read More
Pessimal Guesses may be Optimal: A Counterintuitive Search Result

A particular style of search is considered that is motivated by the problem of reconstructing the surface of three-dimensional objects given a collection of planar contours representing cross-sections through the objects. An improvement on the simple ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
February 2022
1690 pages
ISBN:9781450391320
DOI:10.1145/3488560
General Chairs:
K. Selcuk Candan
Arizona State University, USA
,
Huan Liu
Arizona State University, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Xin Luna Dong
Meta Platforms, Inc. (former Facebook), USA
,
Jiliang Tang
Michigan State University, USA
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 February 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph sampling
search efficiency
vector management and search
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate498of2,863submissions,17%
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 366
  Total Downloads
- Downloads (Last 12 months)82
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning

WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Enhanced Iterative-Deepening Search

A novel approach to improving search efficiency in unstructured peer-to-peer networks

Pessimal Guesses may be Optimal: A Counterintuitive Search Result