Skip to main content

Effective Query Model Estimation Using Parsimonious Translation Model in Language Modeling Approach

  • Conference paper
  • 995 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Abstract

The KL divergence framework, the extended language modeling approach has a critical problem with estimation of query model, which is the probabilistic model that encodes user’s information need. At initial retrieval, estimation of query model by translation model had been proposed that involves term co-occurrence statistics. However, the translation model has a difficulty to applying, because term co-occurrence statistics must be constructed in offline. Especially in large collection, constructing such large matrix of term co-occurrences statistics prohibitively increases time and space complexity. More seriously, because translation model comprises noisy non-topical terms in documents, reliable retrieval performance cannot be guaranteed. This paper proposes an effective method to construct co-occurrence statistics and eliminate noisy terms by employing parsimonious translation model. Parsimonious translation model is a compact version of translation model and enables to drastically reduce number of terms that includes non-zero probabilities by eliminating non-topical terms in documents. From experimentations, we show that query model estimated from parsimonious translation model significantly outperforms not only baseline language modeling but also non-parsimonious model.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berger, A., Lafferty, J.: Information Retrieval as Statistical Translation. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 222–229 (1999)

    Google Scholar 

  2. Dempster, A.: Maximum Likelihood from Incomplete Data via the EM algorithm. Journal of Royal Statistical Society 39(1), 1–39 (1977)

    MATH  MathSciNet  Google Scholar 

  3. Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious Language Models for Information Retrieval. In: Proceedings of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185 (2004)

    Google Scholar 

  4. Hiemstra, D.: Term Specific Smoothing for Language Modeling Approach to Information Retrieval: The Importance of a Query Term. In: Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 35–41 (2002)

    Google Scholar 

  5. Hiemstra, D.: Using Language Models for Information Retrieval. PhD Thesis, University of Twente (2001)

    Google Scholar 

  6. Hofmann, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning 42(1-2), 177–196 (2001)

    Article  MATH  Google Scholar 

  7. Ide, N., Veronis, J.: Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art. Computational Linguistics 24(1), 1–40 (1998)

    Google Scholar 

  8. Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 111–119 (2001)

    Google Scholar 

  9. Lavrenko, V., Choquette, M., Croft, W.: Cross-Lingual Relevance Model. In: Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 175–182 (2002)

    Google Scholar 

  10. Lavrenko, V., Croft, B.: Relevance-based Language Models. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127 (2001)

    Google Scholar 

  11. Liu, X.: Cluster-Based Retrieval Using Language Models. In: Proceedings of 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 186–193 (2004)

    Google Scholar 

  12. Lee, J., Cho, H., Park, H.: n-Gram-based Indexing for Korean Text Retrieval. Information Processing & Management 35(4), 427–441 (1999)

    Article  MathSciNet  Google Scholar 

  13. Miller, D., Leek, T., Schwartz, R.: A Hidden Markov Model Information Retrieval System. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 214–221 (1999)

    Google Scholar 

  14. Nallapati, R., Allen, J.: Capturing Term Dependencies using a Language Model based on Sentence Trees. In: Proceedings of the 10th International Conference on Information and Knowledge Management, pp. 383–390 (2002)

    Google Scholar 

  15. Ponte, A., Croft, J.: A Language Modeling Approach to Information Retrieval. In: Proceedings of 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)

    Google Scholar 

  16. Ponte, A.: A Language Modeling Approach to Information Retrieval. PhD thesis, University of Massachusetts (1998)

    Google Scholar 

  17. Robertson, S., Hiemstra, D.: Language Models and Probability of Relevance. In: Proceedings of the Workshop on Language Modeling and Information Retrieval (2001)

    Google Scholar 

  18. Sperer, R., Oard, D.: Structured Translation for Cross-Language Information Retrieval. In: Proceedings of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127 (2000)

    Google Scholar 

  19. Song, F., Croft, W.: A General Language Model for Information Retrieval. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 279–280 (1999)

    Google Scholar 

  20. Srikanth, M., Srihari, R.: Biterm Language Models for Document Retrieval. In: Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 425–426 (2002)

    Google Scholar 

  21. Zaragoza, H., Hiemstra, D.: Bayesian Extension to the Language Model for Ad Hoc Information Retrieval. In: Proceedings of 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4–9 (2003)

    Google Scholar 

  22. Zhai, C., Lafferty, J.: Model-based Feedback in the Language Modeling Approach to Information Retrieval. In: Proceedings of the 10th International Conference on Information and Knowledge Management, pp. 430–410 (2002)

    Google Scholar 

  23. Zhai, C., Lafferty, J.: A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 334–342 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Na, SH., Kang, IS., Roh, JE., Lee, JH. (2005). Effective Query Model Estimation Using Parsimonious Translation Model in Language Modeling Approach. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_22

Download citation

  • DOI: https://doi.org/10.1007/11562382_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29186-2

  • Online ISBN: 978-3-540-32001-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics