short-paper

Pooled Evaluation Over Query Variations: Users are as Diverse as Systems

Authors:
Alistair Moffat

The University of Melbourne, Melbourne, Australia

The University of Melbourne, Melbourne, Australia
View Profile

,
Falk Scholer

RMIT University, Melbourne, Australia

RMIT University, Melbourne, Australia
View Profile

,
Paul Thomas

CSIRO, Canberra, Australia

CSIRO, Canberra, Australia
View Profile

,
Peter Bailey

Microsoft, Canberra, Australia

Microsoft, Canberra, Australia
View Profile

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementOctober 2015Pages 1759–1762https://doi.org/10.1145/2806416.2806606

Published:17 October 2015Publication History

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Pages 1759–1762

ABSTRACT

Evaluation of information retrieval systems with test collections makes use of a suite of fixed resources: a document corpus; a set of topics; and associated judgments of the relevance of each document to each topic. With large modern collections, exhaustive judging is not feasible. Therefore an approach called pooling is typically used where, for example, the documents to be judged can be determined by taking the union of all documents returned in the top positions of the answer lists returned by a range of systems. Conventionally, pooling uses system variations to provide diverse documents to be judged for a topic; different user queries are not considered. We explore the ramifications of user query variability on pooling, and demonstrate that conventional test collections do not cover this source of variation. The effect of user query variation on the size of the judging pool is just as strong as the effect of retrieval system variation. We conclude that user query variation should be incorporated early in test collection construction, and cannot be considered effectively post hoc.

References

P. Bailey, A. Moffat, F. Scholer, and P. Thomas. User variability and IR system evaluation. In Proc. SIGIR, 2015. Google ScholarDigital Library
C. Buckley and J. Walz. The TREC-8 query track. In Proc. TREC, 1999.Google Scholar
C. Buckley, D. Dimmick, I. Soboroff, and E. Voorhees. Bias and the limits of pooling for large collections. J. Inf. Ret., 10 (6): 491--508, 2007. Google ScholarDigital Library
S. Büttcher, C. L. Clarke, P. C. Yeung, and I. Soboroff. Reliable information retrieval evaluation with incomplete and biased judgements. In Proc. SIGIR, pages 63--70, 2007. Google ScholarDigital Library
C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 terabyte track. In Proc. TREC, 2004.Google Scholar
D. Harman. The TREC test collections. In E. M. Voorhees and D. K. Harman, editors, TREC: Experiment and Evaluation in Information Retrieval, chapter 2, pages 21--52. MIT Press, 2005.Google Scholar
D. Metzler and W. B. Croft. A Markov random field model for term dependencies. In Proc. SIGIR, pages 472--479, 2005. Google ScholarDigital Library
A. Moffat, W. Webber, and J. Zobel. Strategic system comparisons via targeted relevance judgments. In Proc. SIGIR, pages 375--382, 2007. Google ScholarDigital Library
K. Sparck Jones and C. J. van Rijsbergen. Report on the need for and the provision of an "ideal" information retrieval test collection. Technical report, Computer Laboratory, University of Cambridge, 1975. British Library Research and Development Report No. 5266.Google Scholar
K. Sparck Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval: development and comparative experiments. Part 1. phInf. Proc. Man., 36 (6): 779--808, 2000. Google ScholarDigital Library
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Inf. Proc. Man., 36 (5): 697--716, 2000. Google ScholarDigital Library
E. M. Voorhees. Overview of the TREC 2002 question answering track. In Proc. TREC, 2002.Google Scholar
E. M. Voorhees. Overview of the TREC 2003 robust retrieval track. In Proc. TREC, 2003.Google Scholar
W. Webber, A. Moffat, and J. Zobel. A similarity measure for indefinite rankings. ACM Trans. Inf. Sys., 28 (4): 20.1--20.38, 2010. Google ScholarDigital Library
J. Zobel. How reliable are the results of large-scale information retrieval experiments? In Proc. SIGIR, pages 307--314, 1998. Google ScholarDigital Library

Index Terms

Pooled Evaluation Over Query Variations: Users are as Diverse as Systems
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Retrieval Consistency in the Presence of Query Variations
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

A search engine that can return the ideal results for a person's information need, independent of the specific query that is used to express that need, would be preferable to one that is overly swayed by the individual terms used; search engines should ...
Read More
User Variability and IR System Evaluation
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Test collection design eliminates sources of user variability to make statistical comparisons among information retrieval (IR) systems more affordable. Does this choice unnecessarily limit generalizability of the outcomes to real usage scenarios? We ...
Read More
Incorporating User Expectations and Behavior into the Measurement of Search Effectiveness

Information retrieval systems aim to help users satisfy information needs. We argue that the goal of the person using the system, and the pattern of behavior that they exhibit as they proceed to attain that goal, should be incorporated into the methods ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
October 2015
1998 pages
ISBN:9781450337946
DOI:10.1145/2806416
General Chairs:
James Bailey
The University of Melbourne
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Charu C. Aggarwal
IBM
,
Maarten de Rijke
University of Amsterdam
,
Ravi Kumar
Google
,
Vanessa Murdock
Microsoft
,
Timos Sellis
RMIT University
,
Jeffrey Xu Yu
Chinese University of Hong Kong
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
relevance measures
test collections
user behavior
Qualifiers
- short-paper
Conference

Acceptance Rates
CIKM '15 Paper Acceptance Rate165of646submissions,26%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 181
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Pooled Evaluation Over Query Variations: Users are as Diverse as Systems

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Retrieval Consistency in the Presence of Query Variations

User Variability and IR System Evaluation

Incorporating User Expectations and Behavior into the Measurement of Search Effectiveness