Skip to main content

Fast Single-Pass Construction of a Half-Inverted Index

  • Conference paper
String Processing and Information Retrieval (SPIRE 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5721))

Included in the following conference series:

Abstract

We show how a half-inverted index can be constructed twice as fast as an ordinary inverted index. As shown in a series of recent works, the half-inverted index enables very fast prefix search, which in turn is the basis for very fast processing of many other types of advanced queries. Our construction algorithm is truly single-pass in that every posting (word occurrence) is touched (read and written) only once in the whole construction by avoiding an expensive merge of the index. The algorithm has been carefully engineered, with special attention paid to cache-efficiency and disk cost. We compared our algorithm against the state-of-the-art index construction from Zettair.

This work was partially supported by DFG-SPP 1307, project Efficient Search in Very Large Text Collections, Databases, and Ontologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: Compressing and indexing documents and images (1999)

    Google Scholar 

  2. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. (2006)

    Google Scholar 

  3. Holger Bast, I.W.: Type less, find more: fast autocompletion search with a succinct index. In: SIGIR (2006)

    Google Scholar 

  4. Bast, H., Weber, I.: The CompleteSearch engine: Interactive, efficient, and towards IR & DB integration. In: CIDR (2007)

    Google Scholar 

  5. Heinz, S., Zobel, J.: Efficient single-pass index construction for text databases. Jour. of the American Society for Information Science and Technology (2003)

    Google Scholar 

  6. Rogers, W., Gerald, C, Harman, D.: Space and time improvements for indexing in information retrieval. In: Proceedings of 4th Annual Symposium on Document Analysis and Information Retrieval (1995)

    Google Scholar 

  7. Moffat, A., Bell, T.A.H. In situ generation of compressed inverted files. Journal of the American Society for Information Science (1995)

    Google Scholar 

  8. Grama, A., Karypis, G., Kumar, V., Gupta, A.: Introduction to Parallel Computing, 2nd edn. Addison-Wesley, Reading (2003)

    MATH  Google Scholar 

  9. Buttcher, S., Clarke, C.L.A.: Memory management strategies for single-pass index construction in text retrieval systems. Technical report, School of Computer Science, University of Waterloo, Canada (2005)

    Google Scholar 

  10. Heinz, S., Zobel, J.: Performance of data structure for small sets of strings. In: Proc. of the Australasian conference on Computer Science (2002)

    Google Scholar 

  11. Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. (1996)

    Google Scholar 

  12. Popovici, F.I., Arpaci-dusseau, A.C., Arpaci-dusseau, R.H.: Robust, portable i/o scheduling with the disk mimic. In: USENIX Annual Technical Conference (2003)

    Google Scholar 

  13. Middleton, C., Baeza-Yates, R.: A comparison of open source search engines (2007), http://wrg.upf.edu/WRG/dctos/Middleton-Baeza.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Celikik, M., Bast, H. (2009). Fast Single-Pass Construction of a Half-Inverted Index. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03784-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03784-9_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03783-2

  • Online ISBN: 978-3-642-03784-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics