Abstract
Phrase queries play an important role in web search and other applications. Traditionally, phrase queries have been processed using a positional inverted index, potentially augmented by selected multi-word sequences (e.g., n-grams or frequent noun phrases). In this work, instead of augmenting the inverted index, we take a radically different approach and leverage the direct index, which provides efficient access to compact representations of documents. Modern retrieval systems maintain such a direct index, for instance, to generate snippets or compute proximity features. We present extensions of the established term-at-a-time and document-at-a-time query-processing methods that make effective combined use of the inverted index and the direct index. Our experiments on two real-world document collections using diverse query workloads demonstrate that our methods improve response time substantially without requiring additional index space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
The ClueWeb09 Dataset, http://lemurproject.org/clueweb09/
The New York Times Annotated Corpus, http://corpus.nytimes.com/
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks 30(1-7), 107–117 (1998)
Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.Y.: Efficient query evaluation using a two-level retrieval process. In: CIKM 2003 (2003)
Büttcher, S., Clarke, C., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engines. MIT Press (2010)
Chang, M., Poon, C.K.: Efficient phrase querying with common phrase index. Inf. Process. Manage. 44(2), 756–769 (2008)
Culpepper, J.S., Petri, M., Scholer, F.: Efficient in-memory top-k document retrieval. In: SIGIR 2012 (2012)
Fagan, J.: Automatic phrase indexing for document retrieval. In: SIGIR 1987 (1987)
Ferragina, P., Venturini, R.: The compressed permuterm index. ACM Transactions on Algorithms 7(1) (2010)
Gog, S., Moffat, A., Culpepper, J.S., Turpin, A., Wirth, A.: Large-scale pattern search using reduced-space on-disk suffix arrays. CoRR abs/1303.6481 (2013)
Gutwin, C., Paynter, G., Witten, I., Nevill-Manning, C., Frank, E.: Improving browsing in digital libraries with keyphrase indexes. Decision Support Systems 27(1-2), 81–104 (1999)
Hagen, M., Potthast, M., Beyer, A., Stein, B.: Towards optimum query segmentation: in doubt without. In: CIKM 2012 (2012)
He, J., Suel, T.: Optimizing positional index structures for versioned document collections. In: SIGIR 2012 (2012)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)
Knuth, D., Morris, J. J., Pratt, V.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977)
Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14(4), 349–379 (1996)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier information retrieval platform. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 517–519. Springer, Heidelberg (2005)
Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. Journal of the American Society for Information Science 26(1), 33–44 (1975)
Shan, D., Zhao, W.X., He, J., Yan, R., Yan, H., Li, X.: Efficient phrase querying with flat position index. In: CIKM 2011 (2011)
Transier, F., Sanders, P.: Out of the box phrase indexing. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 200–211. Springer, Heidelberg (2008)
Vigna, S.: Quasi-succinct indices. In: WSDM 2013 (2013)
Wang, J., Lo, E., Yiu, M.L., Tong, J., Wang, G., Liu, X.: The impact of solid state drive on search engine cache management. In: SIGIR 2013 (2013)
Williams, H.E., Zobel, J., Bahle, D.: Fast phrase querying with combined indexes. ACM Trans. Inf. Syst. 22(4), 573–594 (2004)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Panev, K., Berberich, K. (2014). Phrase Queries with Inverted + Direct Indexes. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-11749-2_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11748-5
Online ISBN: 978-3-319-11749-2
eBook Packages: Computer ScienceComputer Science (R0)