short-paper

Effectiveness Evaluation with a Subset of Topics: A Practical Approach

Authors:
Kevin Roitero

University of Udine, Udine, Italy

University of Udine, Udine, Italy
View Profile

,
Michael Soprano

University of Udine, Udine, Italy

University of Udine, Udine, Italy
View Profile

,
Stefano Mizzaro

University of Udine, Udine, Italy

University of Udine, Udine, Italy
View Profile

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalJune 2018Pages 1145–1148https://doi.org/10.1145/3209978.3210108

Published:27 June 2018Publication History

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Pages 1145–1148

ABSTRACT

Several researchers have proposed to reduce the number of topics used in TREC-like initiatives. One research direction that has been pursued is what is the optimal topic subset of a given cardinality that evaluates the systems/runs in the most accurate way. Such a research direction has been so far mainly theoretical, with almost no indication on how to select the few good topics in practice. We propose such a practical criterion for topic selection: we rely on the methods for automatic system evaluation without relevance judgments, and by running some experiments on several TREC collections we show that the topics selected on the basis of those evaluations are indeed more informative than random topics.

References

Javed A. Aslam and Robert Savell. 2003. On the Effectiveness of Evaluating Retrieval Systems in the Absence of Relevance Judgments. In Proceedings of 26th ACM SIGIR. 361--362. Google ScholarDigital Library
Andrea Berto, Stefano Mizzaro, and Stephen Robertson. 2013. On Using Fewer Topics in Information Retrieval Evaluations. In Proc. of ACM ICTIR 2013. 9:30-- 9:37. Google ScholarDigital Library
Chris Buckley and Ellen M. Voorhees. 2000. Evaluating Evaluation Measure Stability. In Proceedings of the 23rd ACM SIGIR. ACM, New York, NY, USA, 33--40. Google ScholarDigital Library
Fernando Diaz. 2007. Performance Prediction Using Spatial Autocorrelation. In Proceedings of 30th ACM SIGIR. 583--590. Google ScholarDigital Library
Susan E Embretson and Steven P Reise. 2013. Item response theory. Psychology Press.Google Scholar
John Guiver, Stefano Mizzaro, and Stephen Robertson. 2009. A Few Good Topics: Experiments in Topic Set Reduction for Retrieval Evaluation. ACM Trans. Inf. Syst. 27, 4, Article 21 (Nov. 2009), 26 pages. Google ScholarDigital Library
Mucahid Kutlu, Tamer Elsayed, and Matthew Lease. 2018. Intelligent topic selection for low-cost information retrieval evaluation: A New perspective on deep vs. shallow judging. Inform. Processing & Management 54, 1 (2018), 37--59. Google ScholarDigital Library
Stefano Mizzaro, Josiane Mothe, Kevin Roitero, and Md Zia Ullah. 2018. Query Performance Prediction and Effectiveness Evaluation Without Relevance Judgments: Two Sides of the Same Coin. In Proc. of the 41st ACM SIGIR. In press. Google ScholarDigital Library
Rabia Nuray and Fazli Can. 2003. Automatic Ranking of Retrieval Systems in Imperfect Environments. In Proceedings of 26th ACM SIGIR. 379--380. Google ScholarDigital Library
Rabia Nuray and Fazli Can. 2006. Automatic ranking of information retrieval systems using data fusion. Information Processing & Management 42, 3 (May 2006), 595--614. Google ScholarDigital Library
Stephen Robertson. 2011. On the Contributions of Topics to System Evaluation. In Proceedings of the 33rd ECIR. 129--140. Google ScholarDigital Library
Tetsuya Sakai and Chin-Yew Lin. 2010. Ranking Retrieval Systems without Relevance Assessments - Revisited. In Proceeding of 3rd EVIA - A Satellite Workshop of NTCIR-8. National Institute of Informatics, Tokyo, Japan, 25--33.Google Scholar
Ian Soboroff, Charles Nicholas, and Patrick Cahan. 2001. Ranking Retrieval Systems Without Relevance Judgments. In Proc. of 24th ACM SIGIR. 66--73. Google ScholarDigital Library
Anselm Spoerri. 2007. Using the structure of overlap between search results to rank retrieval systems without relevance judgments. Information Processing & Management 43, 4 (2007), 1059 -- 1070. Google ScholarDigital Library
Ellen M Voorhees. 2003. Overview of the TREC 2003 Robust Retrieval Track.. In Trec. 69--77.Google Scholar
Ian H. Witten, Eibe Frank, Mark A. Hall, and Christopher J. Pal. 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann. Google ScholarDigital Library
Shengli Wu and Fabio Crestani. 2003. Methods for Ranking Information Retrieval Systems Without Relevance Judgments. In Proceedings of the 2003 ACM Symposium on Applied Computing. 811--816. Google ScholarDigital Library

Index Terms

Effectiveness Evaluation with a Subset of Topics: A Practical Approach
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Test collections

Recommendations

Reproduce. Generalize. Extend. On Information Retrieval Evaluation without Relevance Judgments
Special Issue on Reproducibility in IR: Evaluation Campaigns, Collections and Analyses

The evaluation of retrieval effectiveness by means of test collections is a commonly used methodology in the information retrieval field. Some researchers have addressed the quite fascinating research question of whether it is possible to evaluate ...
Read More
Finding Semantically Valid and Relevant Topics by Association-Based Topic Selection Model
Regular Papers and Special Issue: Data-driven Intelligence for Wireless Networking

Topic modelling methods such as Latent Dirichlet Allocation (LDA) have been successfully applied to various fields, since these methods can effectively characterize document collections by using a mixture of semantically rich topics. So far, many models ...
Read More
Fewer topics? A million topics? Both?! On topics subsets in test collections
Abstract
When evaluating IR run effectiveness using a test collection, a key question is: What search topics should be used? We explore what happens to measurement accuracy when the number of topics in a test collection is reduced, using the Million Query ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
June 2018
1509 pages
ISBN:9781450356572
DOI:10.1145/3209978
General Chairs:
Kevyn Collins-Thompson
University of Michigan, United States
,
Qiaozhu Mei
University of Michigan, United States
,
Program Chairs:
Brian Davison
Lehigh University, United States
,
Yiqun Liu
Tsinghua University, China
,
Emine Yilmaz
University College London, United Kingdom
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
few topics
test collections
topic selection
trec
Qualifiers
- short-paper
Conference

Acceptance Rates
SIGIR '18 Paper Acceptance Rate86of409submissions,21%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 129
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Effectiveness Evaluation with a Subset of Topics: A Practical Approach

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reproduce. Generalize. Extend. On Information Retrieval Evaluation without Relevance Judgments

Finding Semantically Valid and Relevant Topics by Association-Based Topic Selection Model

Fewer topics? A million topics? Both?! On topics subsets in test collections