Abstract
In this paper we present an original approach for finding approximate nearest neighbours in collections of locality-sensitive hashes. The paper demonstrates that this approach makes high-performance nearest-neighbour searching feasible on Web-scale collections and commodity hardware with minimal degradation in search quality.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Broder, A.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences 1997, pp. 21–29. IEEE (1997)
Broder, A.: Identifying and filtering near-duplicate documents. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 1–10. Springer, Heidelberg (2000)
Chum, O., Matas, J.: Fast computation of min-hash signatures for image collections. In: CVPR 2012, pp. 3077–3084 (2012)
Faloutsos, C., Christodoulakis, S.: Signature files: An access method for documents and its analytical performance evaluation. TOIS 2(4), 267–288 (1984)
Faloutsos, C., Chan, R.: Fast text access methods for optical and large magnetic disks: Designs and performance comparison. VLDB 88, 280–293 (1988)
Feistel, H.: Cryptography and computer privacy. Sci. Am. 228, 15–23 (1973)
Geva, S., De Vries, C.: Topsig: topology preserving document signatures. In: CIKM 2011, pp. 333–338 (2011)
Hamming, R.: Error detecting and error correcting codes. Bell System Tech. J. 29(2), 147–160 (1950)
Jiang, Q., Sun, M.: Semi-supervised simhash for efficient document similarity search. In: ACL 2011, pp. 93–101 (2011)
Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: ICCV 2009, pp. 2130–2137 (2009)
Lin, Z., Faloutsos, C.: Frame-sliced signature files. IEEE Transactions on Knowledge and Data Engineering 4(3), 281–289 (1992)
Manku, G., Jain, A., Das Sarma, A.: Detecting near-duplicates for web crawling. In: WWW 2007, pp. 141–150 (2007)
Potthast, M., Stein, B.: New issues in near-duplicate detection. In: Data Analysis, Machine Learning and Applications, pp. 601–609. Springer (2008)
Van Rijsbergen, C.J.: Information Retrieval. In: Butterworth (1979)
Sadowski, C., Levin, G.: Simhash: Hash-based similarity detection. Technical report, Google Tech. Rep. (2007)
Slaney, M., Casey, M.: Locality-sensitive hashing for finding nearest neighbors. Signal Processing Magazine 25(2), 128–131 (2008)
Sood, S., Loguinov, D.: Probabilistic near-duplicate detection using simhash. In: CIKM 2011, pp. 1117–1126 (2011)
Zobel, J., Moffat, A., Ramamohanarao, K.: Inverted files versus signature files for text indexing. TODS 23(4), 453–490 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chappell, T., Geva, S., Zuccon, G. (2015). Approximate Nearest-Neighbour Search with Inverted Signature Slice Lists. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds) Advances in Information Retrieval. ECIR 2015. Lecture Notes in Computer Science, vol 9022. Springer, Cham. https://doi.org/10.1007/978-3-319-16354-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-16354-3_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16353-6
Online ISBN: 978-3-319-16354-3
eBook Packages: Computer ScienceComputer Science (R0)