Skip to main content

Phrase Queries with Inverted + Direct Indexes

  • Conference paper
Web Information Systems Engineering – WISE 2014 (WISE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8786))

Included in the following conference series:

  • 1573 Accesses

Abstract

Phrase queries play an important role in web search and other applications. Traditionally, phrase queries have been processed using a positional inverted index, potentially augmented by selected multi-word sequences (e.g., n-grams or frequent noun phrases). In this work, instead of augmenting the inverted index, we take a radically different approach and leverage the direct index, which provides efficient access to compact representations of documents. Modern retrieval systems maintain such a direct index, for instance, to generate snippets or compute proximity features. We present extensions of the established term-at-a-time and document-at-a-time query-processing methods that make effective combined use of the inverted index and the direct index. Our experiments on two real-world document collections using diverse query workloads demonstrate that our methods improve response time substantially without requiring additional index space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The ClueWeb09 Dataset, http://lemurproject.org/clueweb09/

  2. The New York Times Annotated Corpus, http://corpus.nytimes.com/

  3. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks 30(1-7), 107–117 (1998)

    Article  Google Scholar 

  4. Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.Y.: Efficient query evaluation using a two-level retrieval process. In: CIKM 2003 (2003)

    Google Scholar 

  5. Büttcher, S., Clarke, C., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engines. MIT Press (2010)

    Google Scholar 

  6. Chang, M., Poon, C.K.: Efficient phrase querying with common phrase index. Inf. Process. Manage. 44(2), 756–769 (2008)

    Article  Google Scholar 

  7. Culpepper, J.S., Petri, M., Scholer, F.: Efficient in-memory top-k document retrieval. In: SIGIR 2012 (2012)

    Google Scholar 

  8. Fagan, J.: Automatic phrase indexing for document retrieval. In: SIGIR 1987 (1987)

    Google Scholar 

  9. Ferragina, P., Venturini, R.: The compressed permuterm index. ACM Transactions on Algorithms 7(1) (2010)

    Google Scholar 

  10. Gog, S., Moffat, A., Culpepper, J.S., Turpin, A., Wirth, A.: Large-scale pattern search using reduced-space on-disk suffix arrays. CoRR abs/1303.6481 (2013)

    Google Scholar 

  11. Gutwin, C., Paynter, G., Witten, I., Nevill-Manning, C., Frank, E.: Improving browsing in digital libraries with keyphrase indexes. Decision Support Systems 27(1-2), 81–104 (1999)

    Article  Google Scholar 

  12. Hagen, M., Potthast, M., Beyer, A., Stein, B.: Towards optimum query segmentation: in doubt without. In: CIKM 2012 (2012)

    Google Scholar 

  13. He, J., Suel, T.: Optimizing positional index structures for versioned document collections. In: SIGIR 2012 (2012)

    Google Scholar 

  14. Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  15. Knuth, D., Morris, J. J., Pratt, V.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  16. Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  17. Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14(4), 349–379 (1996)

    Article  Google Scholar 

  18. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier information retrieval platform. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 517–519. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  19. Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. Journal of the American Society for Information Science 26(1), 33–44 (1975)

    Article  Google Scholar 

  20. Shan, D., Zhao, W.X., He, J., Yan, R., Yan, H., Li, X.: Efficient phrase querying with flat position index. In: CIKM 2011 (2011)

    Google Scholar 

  21. Transier, F., Sanders, P.: Out of the box phrase indexing. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 200–211. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  22. Vigna, S.: Quasi-succinct indices. In: WSDM 2013 (2013)

    Google Scholar 

  23. Wang, J., Lo, E., Yiu, M.L., Tong, J., Wang, G., Liu, X.: The impact of solid state drive on search engine cache management. In: SIGIR 2013 (2013)

    Google Scholar 

  24. Williams, H.E., Zobel, J., Bahle, D.: Fast phrase querying with combined indexes. ACM Trans. Inf. Syst. 22(4), 573–594 (2004)

    Article  Google Scholar 

  25. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Panev, K., Berberich, K. (2014). Phrase Queries with Inverted + Direct Indexes. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11749-2_13

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11748-5

  • Online ISBN: 978-3-319-11749-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics