Abstract
Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top P documents as relevant is often not the best strategy. However, predicting when Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.
- Amati, G., Carpineto, C., and Romano, G. 2004. Fondazione ugo bordoni at TREC 2003: robust and web track. In NIST Special Publication 500-255: The 12th Text Retrieval Conference (TREC 2003).Google Scholar
- Billerbeck, B. and Zobel, J. 2003. When query expansion fails. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada. 387--388. Google Scholar
- Buckley, C. 2004. Topic prediction based on comparative retrieval rankings. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 506--507. Google Scholar
- Buckley, C. and Harman, D. 2004. Reliable information access final workshop report.Google Scholar
- Buckley, C., Mitra, M., Walz, J., and Cardie, C. 1998. Using clustering and superconcepts within SMART: TREC 6. In NIST Special Publication 500-240: The 6th Text REtrieval Conference (TREC-6).Google Scholar
- Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2002. Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland. 299--306. Google Scholar
- Evans, D. A. and Lefferts, R. G. 1994. Design and evaluation of the CLARIT-TREC-2 system. In NIST Special Publication 500-215: The 2nd Text REtrieval Conference (TREC-2).Google Scholar
- Harman, D. and Buckley, C. 2004. The NRRC reliable information access (RIA) workshop. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 528--529. Google Scholar
- Kishida, K., Chen, K.-H., Lee, S., Kuriyama, K., Kando, N., Chen, H.-H., Myaeng, S.-H., and Eguchi, K. 2004. Overview of CLIR task at the fourth NTCIR workshop. In Working Notes of the Fourth NTCIR Workshop Meeting (NTCIR-4), Tokyo, Japan. 1--59.Google Scholar
- Lam-Adesina, A. M. and Jones, G. J. F. 2001. Applying summarization techniques for term selection in relevance feedback. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, 1--9. Google Scholar
- Lu, A., Ayoub, M., and Dong, J. 1997. Ad hoc experiments using EUREKA. In NIST Special Publication 500-238: The 5th Text REtrieval Conference (TREC-5).Google Scholar
- Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. 206--214. Google Scholar
- Montgomery, J., Si, L., Callan, J., and Evans, D. A. 2004. Effect of varying number of documents in blind feedback. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 476--477. Google Scholar
- Sakai, T. 2000. MT-based Japanese-English cross-language IR experiments using the TREC test collections. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages (IRAL 2000), Hong Kong. 181--188. Google Scholar
- Sakai, T. 2001. Japanese-English cross-language information retrieval using machine translation and pseudo-relevance feedback. International Journal of Computer Processing of Oriental Languages (IJCPOL) 14, 2, 83--107.Google Scholar
- Sakai, T. 2004a. New performance metrics based on multigrade relevance: Their application to question answering. In Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies: Information Retrieval, Question Answering and Summarization (NTCIR-4), Tokyo, Japan.Google Scholar
- Sakai, T. 2004b. Ranking the NTCIR systems based on multigrade relevance. In Proceedings of the 1st Asia Information Retrieval Symposium (AIRS 2004), Beijing, China. 170--177. Google Scholar
- Sakai, T. 2005. Ranking the NTCIR systems based on multigrade relevance. In Lecture Notes in Computer Science 3411: Information Retrieval Technology (AIRS 2004 Revised Selected Papers). Springer-Verlag, New York. 251--262. Google Scholar
- Sakai, T., Jones, G. J. F., Kajiura, M., and Sumita, K. 1999. Query expansion through feedback in Japanese information filtering based on the probabilistic model (in Japanese). Journal of Information Processing Society of Japan 40, 5, 2429--2438.Google Scholar
- Sakai, T., Kajiura, M., and Sumita, K. 2000. A first step towards flexible local feedback for ad hoc retrieval. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages (IRAL 2000), Hong Kong. 95--102. Google Scholar
- Sakai, T., Koyama, M., Kumano, A., and Manabe, T. 2004a. Toshiba BRIDJE at NTCIR-4 CLIR: monolingual/bilingual IR and flexible feedback. In Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies: Information Retrieval, Question Answering and Summarization (NTCIR-4), Tokyo, Japan.Google Scholar
- Sakai, T., Koyama, M., Suzuki, M., Kumano, A., and Manabe, T. 2003a. BRIDJE over a language barrier: cross-language information access by integrating translation and retrieval. In Proceedings of the 6th International Workshop on Information Retrieval with Asian Languages (IRAL 2003), Sapporo, Japan. 65--76. Google Scholar
- Sakai, T., Koyama, M., Suzuki, M., and Manabe, T. 2003b. Toshiba KIDS at NTCIR-3: Japanese and English-Japanese IR. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (NTCIR-3), Tokyo, Japan.Google Scholar
- Sakai, T. and Robertson, S. E. 2001. Flexible pseudo-relevance feedback using optimization tables. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. 396--397. Google Scholar
- Sakai, T. and Robertson, S. E. 2002. Relative and absolute term selection criteria: a comparative study for English and Japanese IR. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland. 411--412. Google Scholar
- Sakai, T., Robertson, S. E., and Walker, S. 2001a. Flexible pseudo-relevance feedback for NTCIR-2. In Proceedings of the Second NTCIR Workshop on Research in Chinese and Japanese Text Retrieval and Text Summarization, Tokyo, Japan. 165--172.Google Scholar
- Sakai, T., Robertson, S. E., and Walker, S. 2001b. Flexible pseudo-relevance feedback via direct mapping and categorization of search requests. In BCS-IRSG European Annual Colloquium on Information Retrieval Research (ECIR 2001), Darmstadt, Germany. 3--14.Google Scholar
- Sakai, T., Saito, Y., Koyama, M., Kokubu, T., and Manabe, T. 2004. High-precision search via question abstraction for Japanese question answering. In Information Processing Society of Japan SIG Technical Reports FI-76-19/NL-163-19. 139--146.Google Scholar
- Sakai, T. and Sparck Jones, K. 2001. Generic summaries for indexing in information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. 190--198. Google Scholar
- Sparck Jones, K., Walker, S., and Robertson, S. E. 2000. A probabilistic model of information retrieval: development and comparative experiments. Information Processing and Management 36, 779--808 (Part I) and 809--840 (Part II). Google Scholar
- Voorhees, E. M. 2004a. Measuring ineffectiveness. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 562--563. Google Scholar
- Voorhees, E. M. 2004b. Overview of the TREC 2003 robust retrieval track. In NIST Special Publication 500-255: The Twelfth Text Retrieval Conference (TREC 2003).Google Scholar
- Warren, R. H. and Liu, T. 2004. A review of relevance feedback experiments at the 2003 reliable information access (RIA) workshop. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 570--571. Google Scholar
- Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18, 1, 79--112. Google Scholar
Index Terms
- Flexible pseudo-relevance feedback via selective sampling
Recommendations
A cluster-based resampling method for pseudo-relevance feedback
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalTypical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-...
Query dependent pseudo-relevance feedback based on wikipedia
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalPseudo-relevance feedback (PRF) via query-expansion has been proven to be e®ective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from an initial search are assumed to be relevant and used for PRF. One problem ...
Document-based and term-based linear methods for pseudo-relevance feedback
Query expansion is a successful approach for improving Information Retrieval effectiveness. This work focuses on pseudo-relevance feedback (PRF) which provides an automatic method for expanding queries without explicit user feedback. These techniques ...
Comments