If I Had a Million Queries

Carterette, Ben; Pavlu, Virgil; Kanoulas, Evangelos; Aslam, Javed A.; Allan, James

doi:10.1007/978-3-642-00958-7_27

Ben Carterette¹⁹,
Virgil Pavlu²⁰,
Evangelos Kanoulas²⁰,
Javed A. Aslam²⁰ &
…
James Allan²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

European Conference on Information Retrieval

3243 Accesses
18 Citations

Abstract

As document collections grow larger, the information needs and relevance judgments in a test collection must be well-chosen within a limited budget to give the most reliable and robust evaluation results. In this work we analyze a sample of queries categorized by length and corpus-appropriateness to determine the right proportion needed to distinguish between systems. We also analyze the appropriate division of labor between developing topics and making relevance judgments, and show that only a small, biased sample of queries with sparse judgments is needed to produce the same results as a much larger sample of queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sakai, T.: Alternatives to bpref. In: Proceedings of SIGIR, pp. 71–78. ACM, New York (2007)
Google Scholar
Carterette, B., Allan, J., Sitaraman, R.K.: Minimal test collections for retrieval evaluation. In: Proceedings of SIGIR, pp. 268–275 (2006)
Google Scholar
Moffat, A., Webber, W., Zobel, J.: Strategic system comparisons via targeted relevance judgments. In: Proceedings of SIGIR, pp. 375–382. ACM, New York (2007)
Google Scholar
Aslam, J.A., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: Proceedings of SIGIR, pp. 541–548 (2006)
Google Scholar
Allan, J., Aslam, J.A., Carterette, B., Pavlu, V., Kanoulas, E.: Overview of the trec 2008 million query track. In: Notebook Proceedings of TREC (2008)
Google Scholar
Carterette, B., Pavlu, V., Kanoulas, E., Allan, J., Aslam, J.A.: Evaluation over thousands of queries. In: Proceedings of SIGIR, pp. 651–658 (2008)
Google Scholar
Allan, J., Carterette, B., Aslam, J.A., Pavlu, V., Dachev, B., Kanoulas, E.: Overview of the TREC 2007 Million Query Track. In: Proceedings of TREC (2007)
Google Scholar
Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. In: Proceedings of CIKM, pp. 102–111 (2006)
Google Scholar
Aslam, J.A., Pavlu, V.: A practical sampling strategy for efficient retrieval evaluation, technical report
Google Scholar
Brewer, K.R.W., Hanif, M.: Sampling With Unequal Probabilities. Springer, New York (1983)
Google Scholar
Stevens, W.L.: Sampling without replacement with probability proportional to size. Journal of the Royal Statistical Society. Series B (Methodological) 20(2), 393–397 (1958)
MathSciNet MATH Google Scholar
Thompson, S.K.: Sampling. Wiley Series in Probability and Mathematical Statistics (1992)
Google Scholar
Banks, D., Over, P., Zhang, N.F.: Blind men and elephants: Six approaches to trec data. Inf. Retr. 1(1-2), 7–34 (1999)
Article Google Scholar
Bodoff, D., Li, P.: Test theory for assessing ir test collection. In: Proceedings of SIGIR, pp. 367–374 (2007)
Google Scholar
Brennan, R.L.: Generalizability Theory. Springer, New York (2001)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer and Info. Sciences, University of Delaware, Newark, DE, USA
Ben Carterette
College of Computer and Info. Science, Northeastern University, Boston, MA, USA
Virgil Pavlu, Evangelos Kanoulas & Javed A. Aslam
Dept. of Computer Science, University of Massachusetts Amherst, Amherst, MA, USA
James Allan

Authors

Ben Carterette
View author publications
You can also search for this author in PubMed Google Scholar
Virgil Pavlu
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Kanoulas
View author publications
You can also search for this author in PubMed Google Scholar
Javed A. Aslam
View author publications
You can also search for this author in PubMed Google Scholar
James Allan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université de Toulouse - IRIT,, 118 Route de Narbonne,, 31062,, Toulouse Cedex 4,, France
Mohand Boughanem
Laboratoire d’Informatique de Grenoble, BP 53,, Université Joseph Fourier,, 38041, Grenoble Cedex 9,, France
Catherine Berrut
Université de Toulouse - IRIT,, 118 Route de Narbonne,, 31062, Toulouse Cedex 4,, France
Josiane Mothe & Chantal Soule-Dupuy &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J.A., Allan, J. (2009). If I Had a Million Queries. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-00958-7_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics