Skip to main content

Towards a Better Understanding of the Relationship between Probabilistic Models in IR

  • Conference paper
Book cover Advances in Information Retrieval Theory (ICTIR 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6931))

Included in the following conference series:

Abstract

Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Crestani, F., Lalmas, M., Rijsbergen, C.J.V., Campbell, I.: Is this document relevant?.probably: a survey of probabilistic models in information retrieval. ACM Comput. Surv. 30(4), 528–552 (1998) ISSN: 0360-0300

    Article  Google Scholar 

  2. Fuhr, N.: Probabilistic models in information retrieval. Comput. J. 35(3), 243–255 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  3. Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, University of Twente, Enschede (January 2001)

    Google Scholar 

  4. Hosmer, D.W., Lemeshow, S.: Applied logistic regression. Wiley-Interscience Publication, Hoboken (September 2000) ISBN 0471356328

    Book  MATH  Google Scholar 

  5. Lafferty, J., Zhai, C.: Probabilistic Relevance Models Based on Document and Query Generation, ch. 1, pp. 1–10. Kluwer Academic Pub., Dordrecht (2003)

    MATH  Google Scholar 

  6. Lavrenko, V., Croft, W.B.: Relevance models in information retrieval. In: Language Modeling for Information Retrieval, pp. 11–56. Kluwer Academic Publishers, Dordrecht (2003)

    Chapter  Google Scholar 

  7. Luk, R.W.P.: On event space and rank equivalence between probabilistic retrieval models. Information Retrieval 11(6), 539–561 (2008), ISSN 1386-4564 (Print) 1573-7659 (Online), doi:10.1007/s10791-008-9062-z

    Article  Google Scholar 

  8. Manning, C.D., Schuetze, H.: Foundations of Statistical Natural Language Processing, 1st edn. The MIT Press, Cambridge (June 1999) ISBN 0-26213-360-1

    Google Scholar 

  9. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998, pp. 275–281. ACM, New York (1998) ISBN 1-58113-015-5, doi:10.1145/290941.291008

    Google Scholar 

  10. Robertson, S.E.: The probability ranking principle in IR. Journal of Documentation 33, 294–304 (1977)

    Article  Google Scholar 

  11. Robertson, S.E.: On event spaces and probabilistic models in information retrieval. Information Retrieval 8(2), 319–329 (2005) ISSN 1386-4564 (Print) 1573-7659 (Online), doi:10.1007/s10791-005-5665-9

    Article  Google Scholar 

  12. Robertson, S.E., Spärck-Jones, K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976), doi:10.1002/asi.4630270302

    Article  Google Scholar 

  13. Robertson, S.E., Maron, M.E., Cooper, W.S.: Probability of relevance: A unification of two competing models for document retrieval. Information Technology: Research and Development 1(1), 1–21 (1982)

    Google Scholar 

  14. Roelleke, T., Wang, J.: A parallel derivation of probabilistic information retrieval models. In: SIGIR 2006, pp. 107–114. ACM, New York (2006) ISBN 1-59593-369-7, doi:10.1145/1148170.1148192

    Google Scholar 

  15. Roelleke, T., Wang, J.: Tf-idf uncovered: a study of theories and probabilities. In: SIGIR 2008, pp. 435–442. ACM, New York (2008) ISBN 978-1-60558-164-4, doi:10.1145/1390334.1390409

    Google Scholar 

  16. Spärck-Jones, K., Robertson, S.E., Zaragoza, H., Hiemstra, D.: Language modelling and relevance. In: Language Modelling for Information Retrieval, pp. 57–71. Kluwer, Dordrecht (2003)

    Chapter  Google Scholar 

  17. Yan, R.: Probabilistic Models for Combining Diverse Knowledge Sources in Multimedia Retrieval. PhD thesis, Canegie Mellon University (2006)

    Google Scholar 

  18. Zhai, C.: Statistical language models for information retrieval a critical review. Found. Trends Inf. Retr. 2(3), 137–213 (2008) ISSN 1554-0669, doi:10.1561/1500000008

    Article  Google Scholar 

  19. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004) ISSN 1046-8188, doi:10.1145/984321.984322

    Article  Google Scholar 

  20. Zhai, C., Lafferty, J.: A risk minimization framework for information retrieval. Inf. Process. Manage. 42(1), 31–55 (2006) ISSN 0306-4573, doi:10.1016/j.ipm.2004.11.003

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aly, R., Demeester, T. (2011). Towards a Better Understanding of the Relationship between Probabilistic Models in IR. In: Amati, G., Crestani, F. (eds) Advances in Information Retrieval Theory. ICTIR 2011. Lecture Notes in Computer Science, vol 6931. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23318-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23318-0_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23317-3

  • Online ISBN: 978-3-642-23318-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics