ABSTRACT
Several researchers have proposed to reduce the number of topics used in TREC-like initiatives. One research direction that has been pursued is what is the optimal topic subset of a given cardinality that evaluates the systems/runs in the most accurate way. Such a research direction has been so far mainly theoretical, with almost no indication on how to select the few good topics in practice. We propose such a practical criterion for topic selection: we rely on the methods for automatic system evaluation without relevance judgments, and by running some experiments on several TREC collections we show that the topics selected on the basis of those evaluations are indeed more informative than random topics.
- Javed A. Aslam and Robert Savell. 2003. On the Effectiveness of Evaluating Retrieval Systems in the Absence of Relevance Judgments. In Proceedings of 26th ACM SIGIR. 361--362. Google ScholarDigital Library
- Andrea Berto, Stefano Mizzaro, and Stephen Robertson. 2013. On Using Fewer Topics in Information Retrieval Evaluations. In Proc. of ACM ICTIR 2013. 9:30-- 9:37. Google ScholarDigital Library
- Chris Buckley and Ellen M. Voorhees. 2000. Evaluating Evaluation Measure Stability. In Proceedings of the 23rd ACM SIGIR. ACM, New York, NY, USA, 33--40. Google ScholarDigital Library
- Fernando Diaz. 2007. Performance Prediction Using Spatial Autocorrelation. In Proceedings of 30th ACM SIGIR. 583--590. Google ScholarDigital Library
- Susan E Embretson and Steven P Reise. 2013. Item response theory. Psychology Press.Google Scholar
- John Guiver, Stefano Mizzaro, and Stephen Robertson. 2009. A Few Good Topics: Experiments in Topic Set Reduction for Retrieval Evaluation. ACM Trans. Inf. Syst. 27, 4, Article 21 (Nov. 2009), 26 pages. Google ScholarDigital Library
- Mucahid Kutlu, Tamer Elsayed, and Matthew Lease. 2018. Intelligent topic selection for low-cost information retrieval evaluation: A New perspective on deep vs. shallow judging. Inform. Processing & Management 54, 1 (2018), 37--59. Google ScholarDigital Library
- Stefano Mizzaro, Josiane Mothe, Kevin Roitero, and Md Zia Ullah. 2018. Query Performance Prediction and Effectiveness Evaluation Without Relevance Judgments: Two Sides of the Same Coin. In Proc. of the 41st ACM SIGIR. In press. Google ScholarDigital Library
- Rabia Nuray and Fazli Can. 2003. Automatic Ranking of Retrieval Systems in Imperfect Environments. In Proceedings of 26th ACM SIGIR. 379--380. Google ScholarDigital Library
- Rabia Nuray and Fazli Can. 2006. Automatic ranking of information retrieval systems using data fusion. Information Processing & Management 42, 3 (May 2006), 595--614. Google ScholarDigital Library
- Stephen Robertson. 2011. On the Contributions of Topics to System Evaluation. In Proceedings of the 33rd ECIR. 129--140. Google ScholarDigital Library
- Tetsuya Sakai and Chin-Yew Lin. 2010. Ranking Retrieval Systems without Relevance Assessments - Revisited. In Proceeding of 3rd EVIA - A Satellite Workshop of NTCIR-8. National Institute of Informatics, Tokyo, Japan, 25--33.Google Scholar
- Ian Soboroff, Charles Nicholas, and Patrick Cahan. 2001. Ranking Retrieval Systems Without Relevance Judgments. In Proc. of 24th ACM SIGIR. 66--73. Google ScholarDigital Library
- Anselm Spoerri. 2007. Using the structure of overlap between search results to rank retrieval systems without relevance judgments. Information Processing & Management 43, 4 (2007), 1059 -- 1070. Google ScholarDigital Library
- Ellen M Voorhees. 2003. Overview of the TREC 2003 Robust Retrieval Track.. In Trec. 69--77.Google Scholar
- Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann. Google ScholarDigital Library
- Shengli Wu and Fabio Crestani. 2003. Methods for Ranking Information Retrieval Systems Without Relevance Judgments. In Proceedings of the 2003 ACM Symposium on Applied Computing. 811--816. Google ScholarDigital Library
Index Terms
- Effectiveness Evaluation with a Subset of Topics: A Practical Approach
Recommendations
Reproduce. Generalize. Extend. On Information Retrieval Evaluation without Relevance Judgments
Special Issue on Reproducibility in IR: Evaluation Campaigns, Collections and AnalysesThe evaluation of retrieval effectiveness by means of test collections is a commonly used methodology in the information retrieval field. Some researchers have addressed the quite fascinating research question of whether it is possible to evaluate ...
Finding Semantically Valid and Relevant Topics by Association-Based Topic Selection Model
Regular Papers and Special Issue: Data-driven Intelligence for Wireless NetworkingTopic modelling methods such as Latent Dirichlet Allocation (LDA) have been successfully applied to various fields, since these methods can effectively characterize document collections by using a mixture of semantically rich topics. So far, many models ...
Fewer topics? A million topics? Both?! On topics subsets in test collections
AbstractWhen evaluating IR run effectiveness using a test collection, a key question is: What search topics should be used? We explore what happens to measurement accuracy when the number of topics in a test collection is reduced, using the Million Query ...
Comments