Skip to main content

Approximate Nearest-Neighbour Search with Inverted Signature Slice Lists

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9022))

Abstract

In this paper we present an original approach for finding approximate nearest neighbours in collections of locality-sensitive hashes. The paper demonstrates that this approach makes high-performance nearest-neighbour searching feasible on Web-scale collections and commodity hardware with minimal degradation in search quality.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Broder, A.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences 1997, pp. 21–29. IEEE (1997)

    Google Scholar 

  2. Broder, A.: Identifying and filtering near-duplicate documents. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 1–10. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  3. Chum, O., Matas, J.: Fast computation of min-hash signatures for image collections. In: CVPR 2012, pp. 3077–3084 (2012)

    Google Scholar 

  4. Faloutsos, C., Christodoulakis, S.: Signature files: An access method for documents and its analytical performance evaluation. TOIS 2(4), 267–288 (1984)

    Article  Google Scholar 

  5. Faloutsos, C., Chan, R.: Fast text access methods for optical and large magnetic disks: Designs and performance comparison. VLDB 88, 280–293 (1988)

    Google Scholar 

  6. Feistel, H.: Cryptography and computer privacy. Sci. Am. 228, 15–23 (1973)

    Article  Google Scholar 

  7. Geva, S., De Vries, C.: Topsig: topology preserving document signatures. In: CIKM 2011, pp. 333–338 (2011)

    Google Scholar 

  8. Hamming, R.: Error detecting and error correcting codes. Bell System Tech. J. 29(2), 147–160 (1950)

    Article  MathSciNet  Google Scholar 

  9. Jiang, Q., Sun, M.: Semi-supervised simhash for efficient document similarity search. In: ACL 2011, pp. 93–101 (2011)

    Google Scholar 

  10. Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scalable image search. In: ICCV 2009, pp. 2130–2137 (2009)

    Google Scholar 

  11. Lin, Z., Faloutsos, C.: Frame-sliced signature files. IEEE Transactions on Knowledge and Data Engineering 4(3), 281–289 (1992)

    Article  Google Scholar 

  12. Manku, G., Jain, A., Das Sarma, A.: Detecting near-duplicates for web crawling. In: WWW 2007, pp. 141–150 (2007)

    Google Scholar 

  13. Potthast, M., Stein, B.: New issues in near-duplicate detection. In: Data Analysis, Machine Learning and Applications, pp. 601–609. Springer (2008)

    Google Scholar 

  14. Van Rijsbergen, C.J.: Information Retrieval. In: Butterworth (1979)

    Google Scholar 

  15. Sadowski, C., Levin, G.: Simhash: Hash-based similarity detection. Technical report, Google Tech. Rep. (2007)

    Google Scholar 

  16. Slaney, M., Casey, M.: Locality-sensitive hashing for finding nearest neighbors. Signal Processing Magazine 25(2), 128–131 (2008)

    Article  Google Scholar 

  17. Sood, S., Loguinov, D.: Probabilistic near-duplicate detection using simhash. In: CIKM 2011, pp. 1117–1126 (2011)

    Google Scholar 

  18. Zobel, J., Moffat, A., Ramamohanarao, K.: Inverted files versus signature files for text indexing. TODS 23(4), 453–490 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chappell, T., Geva, S., Zuccon, G. (2015). Approximate Nearest-Neighbour Search with Inverted Signature Slice Lists. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds) Advances in Information Retrieval. ECIR 2015. Lecture Notes in Computer Science, vol 9022. Springer, Cham. https://doi.org/10.1007/978-3-319-16354-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16354-3_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16353-6

  • Online ISBN: 978-3-319-16354-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics