ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Information Processing & Management
Volume 43, Issue 1, January 2007, Pages 121-145
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (353 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
Special issue
View Record in Scopus
 
doi:10.1016/j.ipm.2006.04.005    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2006 Elsevier Ltd All rights reserved.

Parsimonious translation models for information retrieval

Seung-Hoon NaCorresponding Author Contact Information, a, E-mail The Corresponding Author, In-Su Kanga and Jong-Hyeok Leea

aDivision of Electrical and Computer Engineering, Pohang University of Science and Technology, POSTECH, AITrc, San 31, Hyojadong, Namgu, Pohang, Kyeongbook 790784, Republic of Korea

Received 19 July 2005; 
revised 19 April 2006; 
accepted 19 April 2006. 
Available online 12 June 2006.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

In the KL divergence framework, the extended language modeling approach has a critical problem of estimating a query model, which is the probabilistic model that encodes the user’s information need. For query expansion in initial retrieval, the translation model had been proposed to involve term co-occurrence statistics. However, the translation model was difficult to apply, because the term co-occurrence statistics must be constructed in the offline time. Especially in a large collection, constructing such a large matrix of term co-occurrences statistics prohibitively increases time and space complexity. In addition, reliable retrieval performance cannot be guaranteed because the translation model may comprise noisy non-topical terms in documents. To resolve these problems, this paper investigates an effective method to construct co-occurrence statistics and eliminate noisy terms by employing a parsimonious translation model. The parsimonious translation model is a compact version of a translation model that can reduce the number of terms containing non-zero probabilities by eliminating non-topical terms in documents. Through experimentation on seven different test collections, we show that the query model estimated from the parsimonious translation model significantly outperforms not only the baseline language modeling, but also the non-parsimonious models.

Keywords: Information retrieval; Language model; Parsimonious translation model; Query expansion

Article Outline

1. Introduction
2. Background
2.1. Language modeling approach to information retrieval
2.2. Markov chain translation model
2.2.1. Translation model
2.2.2. Query model estimation
2.3. Related works
3. Analysis of Markov chain translation model
3.1. Computational complexity
3.2. Retrieval risk
4. Parsimonious translation model
4.1. Motivation
4.2. Parsimonious document model
5. Experimentation
5.1. Experimental setting
5.2. Effects of parsimonious translation model
5.2.1. Comparison of term selection methods
5.2.2. Effects of reduction of storage overhead
5.2.3. Influence of selected terms weighting
5.2.4. Forward query model vs. backward query model
5.2.5. Example of estimated query model using translation model
5.3. Application: pseudo relevance feedback on parsimonious translation model
6. Conclusion
Acknowledgements
References







 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.