Skip to main content

Streamlining Evaluation with ir-measures

  • Conference paper
  • First Online:
Book cover Advances in Information Retrieval (ECIR 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Included in the following conference series:

Abstract

We present ir-measures, a new tool that makes it convenient to calculate a diverse set of evaluation measures used in information retrieval. Rather than implementing its own measure calculations, ir-measures provides a common interface to a handful of evaluation tools. The necessary tools are automatically invoked (potentially multiple times) to calculate all the desired metrics, simplifying the evaluation process for the user. The tool also makes it easier for researchers to use recently-proposed measures (such as those from the C/W/L framework) alongside traditional measures, potentially encouraging their adoption.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For instance, the MSMARCO MRR evaluation script: https://git.io/JKG1S.

  2. 2.

    Docs: https://ir-measur.es/, Source: https://github.com/terrierteam/ir_measures.

  3. 3.

    https://ir-measur.es/en/latest/measures.html.

  4. 4.

    https://git.io/JKG94, https://git.io/JKCTo.

  5. 5.

    https://git.io/JKCT1.

  6. 6.

    https://git.io/JKG9O.

  7. 7.

    https://git.io/JKG1S.

  8. 8.

    https://git.io/JKCT5.

References

  1. Azzopardi, L., Mackenzie, J., Moffat, A.: ERR is not C/W/L: exploring the relationship between expected reciprocal rank and other metrics. In: ICTIR (2021)

    Google Scholar 

  2. Azzopardi, L., Thomas, P., Moffat, A.: Cwl_eval: an evaluation tool for information retrieval. In: SIGIR (2019)

    Google Scholar 

  3. Bajaj, P., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: CoCo@NIPS (2016)

    Google Scholar 

  4. Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: SIGIR (2004)

    Google Scholar 

  5. Buckley, C., Voorhees, E.M.: Retrieval System Evaluation. MIT Press, Cambridge (2005)

    Google Scholar 

  6. Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: CIKM (2009)

    Google Scholar 

  7. Clarke, C.L.A., et al.: Novelty and diversity in information retrieval evaluation. In: SIGIR (2008)

    Google Scholar 

  8. Clarke, C.L.A., Kolla, M., Vechtomova, O.: An effectiveness measure for ambiguous and underspecified queries. In: ICTIR (2009)

    Google Scholar 

  9. Clarke, C.L.A., Vtyurina, A., Smucker, M.D.: Assessing top-k preferences. TOIS 39(3), 1–21 (2021)

    Article  Google Scholar 

  10. Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.: Overview of the TREC 2019 deep learning track. In: TREC (2019)

    Google Scholar 

  11. Fuhr, N.: Some common mistakes in ir evaluation, and how they can be avoided. SIGIR Forum 51, 32–41 (2018)

    Article  Google Scholar 

  12. Harman, D.: Evaluation issues in information retrieval. IPM 28(4), 439–440 (1992)

    MathSciNet  Google Scholar 

  13. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. TOIS 20(4), 422–446 (2002)

    Article  Google Scholar 

  14. Jose, K.M., Nguyen, T., MacAvaney, S., Dalton, J., Yates, A.: Diffir: exploring differences in ranking models’ behavior. In: SIGIR (2021)

    Google Scholar 

  15. Kantor, P., Voorhees, E.: The TREC-5 confusion track. Inf. Retr. 2(2–3), 165–176 (2000)

    Article  Google Scholar 

  16. Lin, J., et al.: Supporting interoperability between open-source search engines with the common index file format. In: SIGIR (2020)

    Google Scholar 

  17. Lucchese, C., Muntean, C.I., Nardini, F.M., Perego, R., Trani, S.: Rankeval: an evaluation and analysis framework for learning-to-rank solutions. In: SIGIR (2017)

    Google Scholar 

  18. MacAvaney, S.: OpenNIR: a complete neural ad-hoc ranking pipeline. In: WSDM (2020)

    Google Scholar 

  19. MacAvaney, S., Yates, A., Feldman, S., Downey, D., Cohan, A., Goharian, N.: Simplified data wrangling with ir_datasets. In: SIGIR (2021)

    Google Scholar 

  20. Macdonald, C., Tonellotto, N.: Declarative experimentation ininformation retrieval using PyTerrier. In: Proceedings of ICTIR 2020 (2020)

    Google Scholar 

  21. Moffat, A., Bailey, P., Scholer, F., Thomas, P.: Inst: an adaptive metric for information retrieval evaluation. In: Australasian Document Computing Symposium (2015)

    Google Scholar 

  22. Moffat, A., Bailey, P., Scholer, F., Thomas, P.: Incorporating user expectations and behavior into the measurement of search effectiveness. TOIS 35(3), 1–38 (2017)

    Article  Google Scholar 

  23. Moffat, A., Scholer, F., Thomas, P.: Models and metrics: IR evaluation as a user process. In: Australasian Document Computing Symposium (2012)

    Google Scholar 

  24. National Institute of Standards and Technology: trec_eval. https://github.com/usnistgov/trec_eval (1993–2021)

  25. Palotti, J., Scells, H., Zuccon, G.: TrecTools: an open-source python library for information retrieval practitioners involved in TREC-like campaigns. In: SIGIR (2019)

    Google Scholar 

  26. Piwowarski, B.: Experimaestro and datamaestro: experiment and dataset managers (for IR). In: SIGIR (2020)

    Google Scholar 

  27. Sakai, T.: On Fuhr’s guideline for IR evaluation. SIGIR Forum 54, 1–8 (2020)

    Article  Google Scholar 

  28. Van Gysel, C., de Rijke, M.: Pytrec_eval: an extremely fast python interface to trec_eval. In: SIGIR (2018)

    Google Scholar 

  29. Van Rijsbergen, C.J.: Information retrieval (1979)

    Google Scholar 

  30. Voorhees, E., et al.: Trec-covid: constructing a pandemic information retrieval test collection. ArXiv abs/2005.04474 (2020)

    Google Scholar 

  31. Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. In: CIKM (2006)

    Google Scholar 

  32. Zhang, F., Liu, Y., Li, X., Zhang, M., Xu, Y., Ma, S.: Evaluating web search with a bejeweled player model. In: SIGIR (2017)

    Google Scholar 

Download references

Acknowledgements.

We thank the contributors to the ir-measures repository. We acknowledge EPSRC grant EP/R018634/1: Closed-Loop Data Science for Complex, Computationally- & Data-Intensive Analytics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sean MacAvaney .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

MacAvaney, S., Macdonald, C., Ounis, I. (2022). Streamlining Evaluation with ir-measures. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99739-7_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99738-0

  • Online ISBN: 978-3-030-99739-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics