skip to main content
article

Flexible pseudo-relevance feedback via selective sampling

Authors Info & Claims
Published:01 June 2005Publication History
Skip Abstract Section

Abstract

Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top P documents as relevant is often not the best strategy. However, predicting when Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.

References

  1. Amati, G., Carpineto, C., and Romano, G. 2004. Fondazione ugo bordoni at TREC 2003: robust and web track. In NIST Special Publication 500-255: The 12th Text Retrieval Conference (TREC 2003).Google ScholarGoogle Scholar
  2. Billerbeck, B. and Zobel, J. 2003. When query expansion fails. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada. 387--388. Google ScholarGoogle Scholar
  3. Buckley, C. 2004. Topic prediction based on comparative retrieval rankings. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 506--507. Google ScholarGoogle Scholar
  4. Buckley, C. and Harman, D. 2004. Reliable information access final workshop report.Google ScholarGoogle Scholar
  5. Buckley, C., Mitra, M., Walz, J., and Cardie, C. 1998. Using clustering and superconcepts within SMART: TREC 6. In NIST Special Publication 500-240: The 6th Text REtrieval Conference (TREC-6).Google ScholarGoogle Scholar
  6. Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2002. Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland. 299--306. Google ScholarGoogle Scholar
  7. Evans, D. A. and Lefferts, R. G. 1994. Design and evaluation of the CLARIT-TREC-2 system. In NIST Special Publication 500-215: The 2nd Text REtrieval Conference (TREC-2).Google ScholarGoogle Scholar
  8. Harman, D. and Buckley, C. 2004. The NRRC reliable information access (RIA) workshop. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 528--529. Google ScholarGoogle Scholar
  9. Kishida, K., Chen, K.-H., Lee, S., Kuriyama, K., Kando, N., Chen, H.-H., Myaeng, S.-H., and Eguchi, K. 2004. Overview of CLIR task at the fourth NTCIR workshop. In Working Notes of the Fourth NTCIR Workshop Meeting (NTCIR-4), Tokyo, Japan. 1--59.Google ScholarGoogle Scholar
  10. Lam-Adesina, A. M. and Jones, G. J. F. 2001. Applying summarization techniques for term selection in relevance feedback. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, 1--9. Google ScholarGoogle Scholar
  11. Lu, A., Ayoub, M., and Dong, J. 1997. Ad hoc experiments using EUREKA. In NIST Special Publication 500-238: The 5th Text REtrieval Conference (TREC-5).Google ScholarGoogle Scholar
  12. Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia. 206--214. Google ScholarGoogle Scholar
  13. Montgomery, J., Si, L., Callan, J., and Evans, D. A. 2004. Effect of varying number of documents in blind feedback. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 476--477. Google ScholarGoogle Scholar
  14. Sakai, T. 2000. MT-based Japanese-English cross-language IR experiments using the TREC test collections. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages (IRAL 2000), Hong Kong. 181--188. Google ScholarGoogle Scholar
  15. Sakai, T. 2001. Japanese-English cross-language information retrieval using machine translation and pseudo-relevance feedback. International Journal of Computer Processing of Oriental Languages (IJCPOL) 14, 2, 83--107.Google ScholarGoogle Scholar
  16. Sakai, T. 2004a. New performance metrics based on multigrade relevance: Their application to question answering. In Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies: Information Retrieval, Question Answering and Summarization (NTCIR-4), Tokyo, Japan.Google ScholarGoogle Scholar
  17. Sakai, T. 2004b. Ranking the NTCIR systems based on multigrade relevance. In Proceedings of the 1st Asia Information Retrieval Symposium (AIRS 2004), Beijing, China. 170--177. Google ScholarGoogle Scholar
  18. Sakai, T. 2005. Ranking the NTCIR systems based on multigrade relevance. In Lecture Notes in Computer Science 3411: Information Retrieval Technology (AIRS 2004 Revised Selected Papers). Springer-Verlag, New York. 251--262. Google ScholarGoogle Scholar
  19. Sakai, T., Jones, G. J. F., Kajiura, M., and Sumita, K. 1999. Query expansion through feedback in Japanese information filtering based on the probabilistic model (in Japanese). Journal of Information Processing Society of Japan 40, 5, 2429--2438.Google ScholarGoogle Scholar
  20. Sakai, T., Kajiura, M., and Sumita, K. 2000. A first step towards flexible local feedback for ad hoc retrieval. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages (IRAL 2000), Hong Kong. 95--102. Google ScholarGoogle Scholar
  21. Sakai, T., Koyama, M., Kumano, A., and Manabe, T. 2004a. Toshiba BRIDJE at NTCIR-4 CLIR: monolingual/bilingual IR and flexible feedback. In Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies: Information Retrieval, Question Answering and Summarization (NTCIR-4), Tokyo, Japan.Google ScholarGoogle Scholar
  22. Sakai, T., Koyama, M., Suzuki, M., Kumano, A., and Manabe, T. 2003a. BRIDJE over a language barrier: cross-language information access by integrating translation and retrieval. In Proceedings of the 6th International Workshop on Information Retrieval with Asian Languages (IRAL 2003), Sapporo, Japan. 65--76. Google ScholarGoogle Scholar
  23. Sakai, T., Koyama, M., Suzuki, M., and Manabe, T. 2003b. Toshiba KIDS at NTCIR-3: Japanese and English-Japanese IR. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (NTCIR-3), Tokyo, Japan.Google ScholarGoogle Scholar
  24. Sakai, T. and Robertson, S. E. 2001. Flexible pseudo-relevance feedback using optimization tables. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. 396--397. Google ScholarGoogle Scholar
  25. Sakai, T. and Robertson, S. E. 2002. Relative and absolute term selection criteria: a comparative study for English and Japanese IR. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland. 411--412. Google ScholarGoogle Scholar
  26. Sakai, T., Robertson, S. E., and Walker, S. 2001a. Flexible pseudo-relevance feedback for NTCIR-2. In Proceedings of the Second NTCIR Workshop on Research in Chinese and Japanese Text Retrieval and Text Summarization, Tokyo, Japan. 165--172.Google ScholarGoogle Scholar
  27. Sakai, T., Robertson, S. E., and Walker, S. 2001b. Flexible pseudo-relevance feedback via direct mapping and categorization of search requests. In BCS-IRSG European Annual Colloquium on Information Retrieval Research (ECIR 2001), Darmstadt, Germany. 3--14.Google ScholarGoogle Scholar
  28. Sakai, T., Saito, Y., Koyama, M., Kokubu, T., and Manabe, T. 2004. High-precision search via question abstraction for Japanese question answering. In Information Processing Society of Japan SIG Technical Reports FI-76-19/NL-163-19. 139--146.Google ScholarGoogle Scholar
  29. Sakai, T. and Sparck Jones, K. 2001. Generic summaries for indexing in information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA. 190--198. Google ScholarGoogle Scholar
  30. Sparck Jones, K., Walker, S., and Robertson, S. E. 2000. A probabilistic model of information retrieval: development and comparative experiments. Information Processing and Management 36, 779--808 (Part I) and 809--840 (Part II). Google ScholarGoogle Scholar
  31. Voorhees, E. M. 2004a. Measuring ineffectiveness. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 562--563. Google ScholarGoogle Scholar
  32. Voorhees, E. M. 2004b. Overview of the TREC 2003 robust retrieval track. In NIST Special Publication 500-255: The Twelfth Text Retrieval Conference (TREC 2003).Google ScholarGoogle Scholar
  33. Warren, R. H. and Liu, T. 2004. A review of relevance feedback experiments at the 2003 reliable information access (RIA) workshop. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK. 570--571. Google ScholarGoogle Scholar
  34. Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18, 1, 79--112. Google ScholarGoogle Scholar

Index Terms

  1. Flexible pseudo-relevance feedback via selective sampling

    Recommendations

    Reviews

    Donald Harris Kraft

    A modification to the mechanism suggested in previous work to improve information retrieval performance via relevance feedback is presented in this paper. In this case, the authors consider pseudo-relevance feedback (PRF), which takes the top P documents retrieved in response to a query and assumes those are relevant, so that K new terms in those documents can be added to the original query. The authors note that this is a form of unsupervised learning. The authors also consider flexible PRF, where the parameters can be optimized for each search. The real contribution of this paper is that the authors have extended this methodology by using selective sampling, so some of the top documents can be skipped, which is related to document clustering. The authors add the notion of memory resetting, which involves taking a few documents, discarding the next few, taking the next few, and so on, based on the number of search terms in the document at each rank. The algorithm is tested on standard Japanese and Japanese/English test collections used in cross-language retrieval work. The authors use their previously presented bi-directional retriever/information distiller for Japanese and English (BRIDJE) system, and measure effectiveness via mean average precision for documents deemed highly relevant, relevant, or partially relevant. They are interested in seeing when improvement is reached over all topics. They find that PRF is an improvement, but that their algorithm does not always provide the best improvement. Analysis of some of the subtopics, however, shows that their new algorithm can be promising. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader