Abstract
Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Crestani, F., Lalmas, M., Rijsbergen, C.J.V., Campbell, I.: Is this document relevant?.probably: a survey of probabilistic models in information retrieval. ACM Comput. Surv. 30(4), 528–552 (1998) ISSN: 0360-0300
Fuhr, N.: Probabilistic models in information retrieval. Comput. J. 35(3), 243–255 (1992)
Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, University of Twente, Enschede (January 2001)
Hosmer, D.W., Lemeshow, S.: Applied logistic regression. Wiley-Interscience Publication, Hoboken (September 2000) ISBN 0471356328
Lafferty, J., Zhai, C.: Probabilistic Relevance Models Based on Document and Query Generation, ch. 1, pp. 1–10. Kluwer Academic Pub., Dordrecht (2003)
Lavrenko, V., Croft, W.B.: Relevance models in information retrieval. In: Language Modeling for Information Retrieval, pp. 11–56. Kluwer Academic Publishers, Dordrecht (2003)
Luk, R.W.P.: On event space and rank equivalence between probabilistic retrieval models. Information Retrieval 11(6), 539–561 (2008), ISSN 1386-4564 (Print) 1573-7659 (Online), doi:10.1007/s10791-008-9062-z
Manning, C.D., Schuetze, H.: Foundations of Statistical Natural Language Processing, 1st edn. The MIT Press, Cambridge (June 1999) ISBN 0-26213-360-1
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998, pp. 275–281. ACM, New York (1998) ISBN 1-58113-015-5, doi:10.1145/290941.291008
Robertson, S.E.: The probability ranking principle in IR. Journal of Documentation 33, 294–304 (1977)
Robertson, S.E.: On event spaces and probabilistic models in information retrieval. Information Retrieval 8(2), 319–329 (2005) ISSN 1386-4564 (Print) 1573-7659 (Online), doi:10.1007/s10791-005-5665-9
Robertson, S.E., Spärck-Jones, K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976), doi:10.1002/asi.4630270302
Robertson, S.E., Maron, M.E., Cooper, W.S.: Probability of relevance: A unification of two competing models for document retrieval. Information Technology: Research and Development 1(1), 1–21 (1982)
Roelleke, T., Wang, J.: A parallel derivation of probabilistic information retrieval models. In: SIGIR 2006, pp. 107–114. ACM, New York (2006) ISBN 1-59593-369-7, doi:10.1145/1148170.1148192
Roelleke, T., Wang, J.: Tf-idf uncovered: a study of theories and probabilities. In: SIGIR 2008, pp. 435–442. ACM, New York (2008) ISBN 978-1-60558-164-4, doi:10.1145/1390334.1390409
Spärck-Jones, K., Robertson, S.E., Zaragoza, H., Hiemstra, D.: Language modelling and relevance. In: Language Modelling for Information Retrieval, pp. 57–71. Kluwer, Dordrecht (2003)
Yan, R.: Probabilistic Models for Combining Diverse Knowledge Sources in Multimedia Retrieval. PhD thesis, Canegie Mellon University (2006)
Zhai, C.: Statistical language models for information retrieval a critical review. Found. Trends Inf. Retr. 2(3), 137–213 (2008) ISSN 1554-0669, doi:10.1561/1500000008
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004) ISSN 1046-8188, doi:10.1145/984321.984322
Zhai, C., Lafferty, J.: A risk minimization framework for information retrieval. Inf. Process. Manage. 42(1), 31–55 (2006) ISSN 0306-4573, doi:10.1016/j.ipm.2004.11.003
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aly, R., Demeester, T. (2011). Towards a Better Understanding of the Relationship between Probabilistic Models in IR. In: Amati, G., Crestani, F. (eds) Advances in Information Retrieval Theory. ICTIR 2011. Lecture Notes in Computer Science, vol 6931. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23318-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-23318-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23317-3
Online ISBN: 978-3-642-23318-0
eBook Packages: Computer ScienceComputer Science (R0)