Large-scale high-dimensional indexing by sparse hashing with l 0 approximation

Borges, Pedro; Mourão, André; Magalhães, João

doi:10.1007/s11042-016-4152-1

Large-scale high-dimensional indexing by sparse hashing with l ₀ approximation

Published: 02 December 2016

Volume 76, pages 24389–24412, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Pedro Borges¹,
André Mourão¹ &
João Magalhães¹

240 Accesses
1 Citation
Explore all metrics

Abstract

In this paper we propose a large-scale high-dimensional indexing algorithm based on sparse approximation and inverted indexing. Our goal was to devise a method that smoothly scales to handle databases with over 100 million descriptors on a single machine. To meet this goal, we implemented an inverted indexed based on a sparsifying dictionary with l ₀ regression to assign documents to buckets. The sparsifying dictionary is optimized to reduce the data dimensionality, by concentrating the energy of the original vector on a few coefficients of a higher dimensional representation. These descriptors are added to an inverted index explores the locality of the coefficients of sparse representations to enable efficient pruned search. Evaluation on four large-scale datasets with multiple types of features showed that our method compares favorably to state-of-the-art techniques. On a 100 million dataset of SIFT descriptors, our method achieved 47.6 % precision at 50, by inspecting only 1 % of the full dataset, and by using only 1/20 of the time of a linear search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Search with Quantized Sparse Representations

An Efficient Indexing Scheme Based on Linked-Node m-Ary Tree Structure

Scalability of the NV-tree: Three Experiments

Notes

References

Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54 (11):4311–4322
Article MATH Google Scholar
Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM 51(1):117–122
Article Google Scholar
Babenko A, Lempitsky V (2015) The inverted multi-index. IEEE Trans Pattern Anal Mach Intell 37(6):1247–1260
Article Google Scholar
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Img Sci 2(1):183–202
Article MATH MathSciNet Google Scholar
Borges P, Mourão A, Magalhães J (2015) High-dimensional indexing by sparse approximation. In: Proceedings of the ACM ICMR. ACM, pp 163–170
Candes E J, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51(12):4203–4215
Article MATH MathSciNet Google Scholar
Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: British machine vision conference
Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd international conference on very large data bases, VLDB ’97. Morgan Kaufmann Publishers Inc, San Francisco, pp 426–435
Datar M, Immorlica N, Indyk P, Mirrokni V S (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on computational geometry, SCG ’04. ACM, New York, pp 253–262
Donoho D L, Elad M (2002) Optimally sparse representation in general (non-orthogonal) dictionaries via l1 minimization. In: Proc. Natl Acad. Sci.
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th international conference on very large data bases, VLDB ’99. Morgan Kaufmann Publishers Inc, San Francisco, pp 518–529
Heo J-P, Lee Y, He J, Chang S-F, Yoon S-E (2012) Spherical hashing. In: 2012 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2957–2964
Hinton G E, Salakhutdinov R R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MATH MathSciNet Google Scholar
Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Proceedings of the 10th European conference on computer vision: Part I, ECCV ’08. Berlin, Heidelberg, pp 304–317
Google Scholar
Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell 33(1):117–128
Article Google Scholar
Jégou H, Furon T, Fuchs J-J (2012) Anti-sparse coding for approximate nearest neighbor search. In: ICASSP, pp 2029–2032
Jégou H, Tavenard R, Douze M, Amsaleg L (2011) Searching in one billion vectors: re-rank with source coding. ArXiv e-prints
Li Z, Ning H, Cao L, Zhan T, Gong Y, Huang T S (2011) Learning to search efficiently in high dimensions. In: Neural information processing systems
Mourão A, Magalhães J (2015) Scalable multimodal search with distributed indexing by sparse hashing. In: Proceedings of ACM ICMR. ACM, pp 283–290
Muja M, Lowe D G (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: International conference on computer vision theory and application (VISSAPP’09). INSTICC Press, pp 331–340
Navarro G (2002) Searching in metric spaces by spatial approximation. VLDB J 11:28–46
Article Google Scholar
Nocedal J, Wright S J (2000) Numerical optimization. Springer
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42:145–175
Article MATH Google Scholar
Pati Y, Rezaiifar R, Krishnaprasad P (1993) Orthogonal matching pursuit: recursive function approximation with application to wavelet decomposition. In: Asilomar conf. on signals, systems and computer
Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: NIPS, pp 1509–1517
Schapire R E (1990) The strength of weak learnability. Mach Learn 5:197–227
Google Scholar
Tavenard R, Jégou H, Amsaleg L (2011) Balancing clusters to reduce response time variability in large scale image search. In: International workshop on content-based multimedia indexing (CBMI 2011). QUAERO, Madrid
Tiakas E, Rafailidis D, Dimou A, Daras P (2013) Msidx: multi-sort indexing for efficient content-based image search and retrieval. IEEE Trans Multimed 15(6):1415–1430
Article Google Scholar
Torralba A, Fergus R, Freeman W (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970
Article Google Scholar
Torralba A, Fergus R, Weiss Y (2008) Small codes and large image databases for recognition. In: IEEE Conference on computer vision and pattern recognition, 2008. CVPR 2008, pp 1–8
Wang J, Kumar S, Chang S F (2012) Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406
Article Google Scholar
Weber R, Schek H-J, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24th international conference on very large data bases, VLDB ’98. Morgan Kaufmann Publishers Inc, San Francisco , pp 194–205
Google Scholar
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. NIPS 9(1):6
Google Scholar
Xia Y, He K, Wen F, Sun J (2013) Joint inverted indexing. In: IEEE International conference on computer vision, pp 3416–3423
Yianilos P N (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the fourth annual ACM-SIAM symposium on discrete algorithms, SODA ’93, Philadelphia, pp 311–321
Zhang J, Peng Y, Zhang J (2016) SSDH: semi-supervised deep hashing for large scale image retrieval. ArXiv e-prints
Zhang L, Zhang Q, Zhang L, Tao D, Huang X, Du B (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recogn 48(10):3102–3112
Article Google Scholar
Zheng L, Wang S, Liu Z, Tian Q (2014) Packing and padding: coupled multi-index for accurate image retrieval. In: IEEE Conference on computer vision and pattern recognition, pp 1947–1954

Download references

Acknowledgments

We would like to thank Microsoft Research for providing us with a Microsoft Azure Research Award sponsorship, which enabled us to do larger scale indexing experiments. This work has been partially funded by the projects PTDC/EIA-EIA/111518/2009, UTA-Est/MAI/0010/2009 and NOVA LINCS UID/CEC/04516/2013, funded by the Portuguese National Foundation for Science and Technology (FCT).

Author information

Authors and Affiliations

NOVA LINCS, Department of Computer Science, Faculty of Science and Technology, Universidade Nova de Lisboa, Lisboa, Portugal
Pedro Borges, André Mourão & João Magalhães

Authors

Pedro Borges
View author publications
You can also search for this author in PubMed Google Scholar
André Mourão
View author publications
You can also search for this author in PubMed Google Scholar
João Magalhães
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to André Mourão.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Borges, P., Mourão, A. & Magalhães, J. Large-scale high-dimensional indexing by sparse hashing with l ₀ approximation. Multimed Tools Appl 76, 24389–24412 (2017). https://doi.org/10.1007/s11042-016-4152-1

Download citation

Received: 28 February 2015
Revised: 28 September 2016
Accepted: 14 November 2016
Published: 02 December 2016
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11042-016-4152-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-scale high-dimensional indexing by sparse hashing with l ₀ approximation

Abstract

Access this article

Similar content being viewed by others

Approximate Search with Quantized Sparse Representations

An Efficient Indexing Scheme Based on Linked-Node m-Ary Tree Structure

Scalability of the NV-tree: Three Experiments

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large-scale high-dimensional indexing by sparse hashing with l 0 approximation

Abstract

Access this article

Similar content being viewed by others

Approximate Search with Quantized Sparse Representations

An Efficient Indexing Scheme Based on Linked-Node m-Ary Tree Structure

Scalability of the NV-tree: Three Experiments

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Large-scale high-dimensional indexing by sparse hashing with l ₀ approximation