Skip to main content

If I Had a Million Queries

  • Conference paper
Advances in Information Retrieval (ECIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

Abstract

As document collections grow larger, the information needs and relevance judgments in a test collection must be well-chosen within a limited budget to give the most reliable and robust evaluation results. In this work we analyze a sample of queries categorized by length and corpus-appropriateness to determine the right proportion needed to distinguish between systems. We also analyze the appropriate division of labor between developing topics and making relevance judgments, and show that only a small, biased sample of queries with sparse judgments is needed to produce the same results as a much larger sample of queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sakai, T.: Alternatives to bpref. In: Proceedings of SIGIR, pp. 71–78. ACM, New York (2007)

    Google Scholar 

  2. Carterette, B., Allan, J., Sitaraman, R.K.: Minimal test collections for retrieval evaluation. In: Proceedings of SIGIR, pp. 268–275 (2006)

    Google Scholar 

  3. Moffat, A., Webber, W., Zobel, J.: Strategic system comparisons via targeted relevance judgments. In: Proceedings of SIGIR, pp. 375–382. ACM, New York (2007)

    Google Scholar 

  4. Aslam, J.A., Pavlu, V., Yilmaz, E.: A statistical method for system evaluation using incomplete judgments. In: Proceedings of SIGIR, pp. 541–548 (2006)

    Google Scholar 

  5. Allan, J., Aslam, J.A., Carterette, B., Pavlu, V., Kanoulas, E.: Overview of the trec 2008 million query track. In: Notebook Proceedings of TREC (2008)

    Google Scholar 

  6. Carterette, B., Pavlu, V., Kanoulas, E., Allan, J., Aslam, J.A.: Evaluation over thousands of queries. In: Proceedings of SIGIR, pp. 651–658 (2008)

    Google Scholar 

  7. Allan, J., Carterette, B., Aslam, J.A., Pavlu, V., Dachev, B., Kanoulas, E.: Overview of the TREC 2007 Million Query Track. In: Proceedings of TREC (2007)

    Google Scholar 

  8. Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. In: Proceedings of CIKM, pp. 102–111 (2006)

    Google Scholar 

  9. Aslam, J.A., Pavlu, V.: A practical sampling strategy for efficient retrieval evaluation, technical report

    Google Scholar 

  10. Brewer, K.R.W., Hanif, M.: Sampling With Unequal Probabilities. Springer, New York (1983)

    Google Scholar 

  11. Stevens, W.L.: Sampling without replacement with probability proportional to size. Journal of the Royal Statistical Society. Series B (Methodological) 20(2), 393–397 (1958)

    MathSciNet  MATH  Google Scholar 

  12. Thompson, S.K.: Sampling. Wiley Series in Probability and Mathematical Statistics (1992)

    Google Scholar 

  13. Banks, D., Over, P., Zhang, N.F.: Blind men and elephants: Six approaches to trec data. Inf. Retr. 1(1-2), 7–34 (1999)

    Article  Google Scholar 

  14. Bodoff, D., Li, P.: Test theory for assessing ir test collection. In: Proceedings of SIGIR, pp. 367–374 (2007)

    Google Scholar 

  15. Brennan, R.L.: Generalizability Theory. Springer, New York (2001)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J.A., Allan, J. (2009). If I Had a Million Queries. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00958-7_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00957-0

  • Online ISBN: 978-3-642-00958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics