A Case for Automatic System Evaluation

Hauff, Claudia; Hiemstra, Djoerd; Azzopardi, Leif; de Jong, Franciska

doi:10.1007/978-3-642-12275-0_16

Claudia Hauff²⁴,
Djoerd Hiemstra²⁴,
Leif Azzopardi²⁵ &
…
Franciska de Jong²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

European Conference on Information Retrieval

2183 Accesses
15 Citations

Abstract

Ranking a set retrieval systems according to their retrieval effectiveness without relying on relevance judgments was first explored by Soboroff et al. [13]. Over the years, a number of alternative approaches have been proposed, all of which have been evaluated on early TREC test collections. In this work, we perform a wider analysis of system ranking estimation methods on sixteen TREC data sets which cover more tasks and corpora than previously. Our analysis reveals that the performance of system ranking estimation approaches varies across topics. This observation motivates the hypothesis that the performance of such methods can be improved by selecting the “right” subset of topics from a topic set. We show that using topic subsets improves the performance of automatic system ranking methods by 26% on average, with a maximum of 60%. We also observe that the commonly experienced problem of underestimating the performance of the best systems is data set dependent and not inherent to system ranking estimation. These findings support the case for automatic system evaluation and motivate further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rank Correlation Methods. Hafner Publishing Co., New York (1955)
Google Scholar
Amitay, E., Carmel, D., Lempel, R., Soffer, A.: Scaling ir-system evaluation using term relevance sets. In: SIGIR 2004, pp. 10–17 (2004)
Google Scholar
Aslam, J.A., Pavlu, V.: Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 198–209. Springer, Heidelberg (2007)
Chapter Google Scholar
Aslam, J.A., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: SIGIR 2006, pp. 541–548 (2006)
Google Scholar
Aslam, J.A., Savell, R.: On the effectiveness of evaluating retrieval systems in the absence of relevance judgments. In: SIGIR 2003, pp. 361–362 (2003)
Google Scholar
Carterette, B., Allan, J.: Incremental test collections. In: CIKM 2005, pp. 680–687 (2005)
Google Scholar
Diaz, F.: Performance prediction using spatial autocorrelation. In: SIGIR 2007, pp. 583–590 (2007)
Google Scholar
Efron, M.: Using multiple query aspects to build test collections without human relevance judgments. In: ECIR 2009, pp. 276–287 (2009)
Google Scholar
Guiver, J., Mizzaro, S., Robertson, S.: A few good topics: Experiments in topic set reduction for retrieval evaluation. To appear in TOIS
Google Scholar
Krovetz, R.: Viewing morphology as an inference process. In: SIGIR 1993, pp. 191–202 (1993)
Google Scholar
Mizzaro, S., Robertson, S.: Hits hits trec: exploring ir evaluation results with network analysis. In: SIGIR 2007, pp. 479–486 (2007)
Google Scholar
Nuray, R., Can, F.: Automatic ranking of information retrieval systems using data fusion. Information Processing and Management 42(3), 595–614 (2006)
Article MATH Google Scholar
Soboroff, I., Nicholas, C., Cahan, P.: Ranking retrieval systems without relevance judgments. In: SIGIR 2001, pp. 66–73 (2001)
Google Scholar
Spoerri, A.: Using the structure of overlap between search results to rank retrieval systems without relevance judgments. Information Processing and Management 43(4), 1059–1070 (2007)
Article Google Scholar
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing and Management 36, 697–716 (2000)
Article Google Scholar
Wu, S., Crestani, F.: Methods for ranking information retrieval systems without relevance judgments. In: Matsui, M., Zuccherato, R.J. (eds.) SAC 2003. LNCS, vol. 3006, pp. 811–816. Springer, Heidelberg (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Twente, Enschede, The Netherlands
Claudia Hauff, Djoerd Hiemstra & Franciska de Jong
University of Glasgow, Glasgow, UK
Leif Azzopardi

Authors

Claudia Hauff
View author publications
You can also search for this author in PubMed Google Scholar
Djoerd Hiemstra
View author publications
You can also search for this author in PubMed Google Scholar
Leif Azzopardi
View author publications
You can also search for this author in PubMed Google Scholar
Franciska de Jong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Adaptive Information Cluster, Dublin City University, Dublin, 9, Ireland
Cathal Gurrin
The Open University, Walton Hall, MK7 6HF, Milton Keynes, UK
Yulan He
Microsoft Research Ltd, 7 JJ Thomson Avenue, CB3 0FB, Cambridge, UK
Gabriella Kazai
Department of Computer Science, University of Essex, Wivenhoe Park, CO4 3SQ, Colchester, UK
Udo Kruschwitz
The Open University, Walton Hall, Milton Keynes, UK
Suzanne Little
University of London, London, UK
Thomas Roelleke
Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Department of Computing Science, University of Glasgow, 17 Lilybank Gardens, G12 8QQ, Glasgow, UK
Keith van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hauff, C., Hiemstra, D., Azzopardi, L., de Jong, F. (2010). A Case for Automatic System Evaluation. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-12275-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12274-3
Online ISBN: 978-3-642-12275-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics