article

Flexible pseudo-relevance feedback via selective sampling

Authors:
Tetsuya Sakai

Knowledge Media Laboratory, Toshiba Corporate R&D Center, Kawasaki, JAPAN

Knowledge Media Laboratory, Toshiba Corporate R&D Center, Kawasaki, JAPAN
View Profile

,
Toshihiko Manabe

Knowledge Media Laboratory, Toshiba Corporate R&D Center, Kawasaki, JAPAN

Knowledge Media Laboratory, Toshiba Corporate R&D Center, Kawasaki, JAPAN
View Profile

,
Makoto Koyama

Knowledge Media Laboratory, Toshiba Corporate R&D Center, Kawasaki, JAPAN

Knowledge Media Laboratory, Toshiba Corporate R&D Center, Kawasaki, JAPAN
View Profile

ACM Transactions on Asian Language Information Processing Volume 4 Issue 2pp 111–135https://doi.org/10.1145/1105696.1105699

Published:01 June 2005Publication History

ACM Transactions on Asian Language Information Processing

Abstract

Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top P documents as relevant is often not the best strategy. However, predicting when Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.

References

Amati, G., Carpineto, C., and Romano, G. 2004. Fondazione ugo bordoni at TREC 2003: robust and web track. In NIST Special Publication 500-255: The 12th Text Retrieval Conference (TREC 2003).Google Scholar
Billerbeck, B. and Zobel, J. 2003. When query expansion fails. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada. 387--388. Google Scholar
Buckley, C. 2004. Topic prediction based on comparative retrieval rankings. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 506--507. Google Scholar
Buckley, C. and Harman, D. 2004. Reliable information access final workshop report.Google Scholar
Buckley, C., Mitra, M., Walz, J., and Cardie, C. 1998. Using clustering and superconcepts within SMART: TREC 6. In NIST Special Publication 500-240: The 6th Text REtrieval Conference (TREC-6).Google Scholar
Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2002. Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland. 299--306. Google Scholar
Evans, D. A. and Lefferts, R. G. 1994. Design and evaluation of the CLARIT-TREC-2 system. In NIST Special Publication 500-215: The 2nd Text REtrieval Conference (TREC-2).Google Scholar
Harman, D. and Buckley, C. 2004. The NRRC reliable information access (RIA) workshop. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 528--529. Google Scholar
Kishida, K., Chen, K.-H., Lee, S., Kuriyama, K., Kando, N., Chen, H.-H., Myaeng, S.-H., and Eguchi, K. 2004. Overview of CLIR task at the fourth NTCIR workshop. In Working Notes of the Fourth NTCIR Workshop Meeting (NTCIR-4), Tokyo, Japan. 1--59.Google Scholar
Lam-Adesina, A. M. and Jones, G. J. F. 2001. Applying summarization techniques for term selection in relevance feedback. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, 1--9. Google Scholar
Lu, A., Ayoub, M., and Dong, J. 1997. Ad hoc experiments using EUREKA. In NIST Special Publication 500-238: The 5th Text REtrieval Conference (TREC-5).Google Scholar
Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. 206--214. Google Scholar
Montgomery, J., Si, L., Callan, J., and Evans, D. A. 2004. Effect of varying number of documents in blind feedback. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 476--477. Google Scholar
Sakai, T. 2000. MT-based Japanese-English cross-language IR experiments using the TREC test collections. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages (IRAL 2000), Hong Kong. 181--188. Google Scholar
Sakai, T. 2001. Japanese-English cross-language information retrieval using machine translation and pseudo-relevance feedback. International Journal of Computer Processing of Oriental Languages (IJCPOL) 14, 2, 83--107.Google Scholar
Sakai, T. 2004a. New performance metrics based on multigrade relevance: Their application to question answering. In Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies: Information Retrieval, Question Answering and Summarization (NTCIR-4), Tokyo, Japan.Google Scholar
Sakai, T. 2004b. Ranking the NTCIR systems based on multigrade relevance. In Proceedings of the 1st Asia Information Retrieval Symposium (AIRS 2004), Beijing, China. 170--177. Google Scholar
Sakai, T. 2005. Ranking the NTCIR systems based on multigrade relevance. In Lecture Notes in Computer Science 3411: Information Retrieval Technology (AIRS 2004 Revised Selected Papers). Springer-Verlag, New York. 251--262. Google Scholar
Sakai, T., Jones, G. J. F., Kajiura, M., and Sumita, K. 1999. Query expansion through feedback in Japanese information filtering based on the probabilistic model (in Japanese). Journal of Information Processing Society of Japan 40, 5, 2429--2438.Google Scholar
Sakai, T., Kajiura, M., and Sumita, K. 2000. A first step towards flexible local feedback for ad hoc retrieval. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages (IRAL 2000), Hong Kong. 95--102. Google Scholar
Sakai, T., Koyama, M., Kumano, A., and Manabe, T. 2004a. Toshiba BRIDJE at NTCIR-4 CLIR: monolingual/bilingual IR and flexible feedback. In Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies: Information Retrieval, Question Answering and Summarization (NTCIR-4), Tokyo, Japan.Google Scholar
Sakai, T., Koyama, M., Suzuki, M., Kumano, A., and Manabe, T. 2003a. BRIDJE over a language barrier: cross-language information access by integrating translation and retrieval. In Proceedings of the 6th International Workshop on Information Retrieval with Asian Languages (IRAL 2003), Sapporo, Japan. 65--76. Google Scholar
Sakai, T., Koyama, M., Suzuki, M., and Manabe, T. 2003b. Toshiba KIDS at NTCIR-3: Japanese and English-Japanese IR. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (NTCIR-3), Tokyo, Japan.Google Scholar
Sakai, T. and Robertson, S. E. 2001. Flexible pseudo-relevance feedback using optimization tables. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. 396--397. Google Scholar
Sakai, T. and Robertson, S. E. 2002. Relative and absolute term selection criteria: a comparative study for English and Japanese IR. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland. 411--412. Google Scholar
Sakai, T., Robertson, S. E., and Walker, S. 2001a. Flexible pseudo-relevance feedback for NTCIR-2. In Proceedings of the Second NTCIR Workshop on Research in Chinese and Japanese Text Retrieval and Text Summarization, Tokyo, Japan. 165--172.Google Scholar
Sakai, T., Robertson, S. E., and Walker, S. 2001b. Flexible pseudo-relevance feedback via direct mapping and categorization of search requests. In BCS-IRSG European Annual Colloquium on Information Retrieval Research (ECIR 2001), Darmstadt, Germany. 3--14.Google Scholar
Sakai, T., Saito, Y., Koyama, M., Kokubu, T., and Manabe, T. 2004. High-precision search via question abstraction for Japanese question answering. In Information Processing Society of Japan SIG Technical Reports FI-76-19/NL-163-19. 139--146.Google Scholar
Sakai, T. and Sparck Jones, K. 2001. Generic summaries for indexing in information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. 190--198. Google Scholar
Sparck Jones, K., Walker, S., and Robertson, S. E. 2000. A probabilistic model of information retrieval: development and comparative experiments. Information Processing and Management 36, 779--808 (Part I) and 809--840 (Part II). Google Scholar
Voorhees, E. M. 2004a. Measuring ineffectiveness. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 562--563. Google Scholar
Voorhees, E. M. 2004b. Overview of the TREC 2003 robust retrieval track. In NIST Special Publication 500-255: The Twelfth Text Retrieval Conference (TREC 2003).Google Scholar
Warren, R. H. and Liu, T. 2004. A review of relevance feedback experiments at the 2003 reliable information access (RIA) workshop. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 570--571. Google Scholar
Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18, 1, 79--112. Google Scholar

Index Terms

Flexible pseudo-relevance feedback via selective sampling
1. Information systems
  1. Information retrieval

Recommendations

A cluster-based resampling method for pseudo-relevance feedback
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-...
Read More
Query dependent pseudo-relevance feedback based on wikipedia
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem ...
Read More
Document-based and term-based linear methods for pseudo-relevance feedback

Query expansion is a successful approach for improving Information Retrieval effectiveness. This work focuses on pseudo-relevance feedback (PRF) which provides an automatic method for expanding queries without explicit user feedback. These techniques ...
Read More

Reviews

Reviewer: Donald Harris Kraft

A modification to the mechanism suggested in previous work to improve information retrieval performance via relevance feedback is presented in this paper. In this case, the authors consider pseudo-relevance feedback (PRF), which takes the top P documents retrieved in response to a query and assumes those are relevant, so that K new terms in those documents can be added to the original query. The authors note that this is a form of unsupervised learning. The authors also consider flexible PRF, where the parameters can be optimized for each search. The real contribution of this paper is that the authors have extended this methodology by using selective sampling, so some of the top documents can be skipped, which is related to document clustering. The authors add the notion of memory resetting, which involves taking a few documents, discarding the next few, taking the next few, and so on, based on the number of search terms in the document at each rank. The algorithm is tested on standard Japanese and Japanese/English test collections used in cross-language retrieval work. The authors use their previously presented bi-directional retriever/information distiller for Japanese and English (BRIDJE) system, and measure effectiveness via mean average precision for documents deemed highly relevant, relevant, or partially relevant. They are interested in seeing when improvement is reached over all topics. They find that PRF is an improvement, but that their algorithm does not always provide the best improvement. Analysis of some of the subtopics, however, shows that their new algorithm can be promising. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 4, Issue 2
June 2005
179 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/1105696
Issue’s Table of Contents

Copyright © 2005 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2005
Published in talip Volume 4, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Pseudo-relevance feedback
flexible pseudo-relevance feedback
selective sampling
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 800
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Flexible pseudo-relevance feedback via selective sampling

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A cluster-based resampling method for pseudo-relevance feedback

Query dependent pseudo-relevance feedback based on wikipedia

Document-based and term-based linear methods for pseudo-relevance feedback

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Flexible pseudo-relevance feedback via selective sampling

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A cluster-based resampling method for pseudo-relevance feedback

Query dependent pseudo-relevance feedback based on wikipedia

Document-based and term-based linear methods for pseudo-relevance feedback

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media