Streamlining Evaluation with ir-measures

MacAvaney, Sean; Macdonald, Craig; Ounis, Iadh

doi:10.1007/978-3-030-99739-7_38

Sean MacAvaney¹⁵,
Craig Macdonald¹⁵ &
Iadh Ounis¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13186))

Included in the following conference series:

European Conference on Information Retrieval

2436 Accesses
13 Citations

Abstract

We present ir-measures, a new tool that makes it convenient to calculate a diverse set of evaluation measures used in information retrieval. Rather than implementing its own measure calculations, ir-measures provides a common interface to a handful of evaluation tools. The necessary tools are automatically invoked (potentially multiple times) to calculate all the desired metrics, simplifying the evaluation process for the user. The tool also makes it easier for researchers to use recently-proposed measures (such as those from the C/W/L framework) alongside traditional measures, potentially encouraging their adoption.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For instance, the MSMARCO MRR evaluation script: https://git.io/JKG1S.
2.
Docs: https://ir-measur.es/, Source: https://github.com/terrierteam/ir_measures.
3.
https://ir-measur.es/en/latest/measures.html.
4.
https://git.io/JKG94, https://git.io/JKCTo.
5.
https://git.io/JKCT1.
6.
https://git.io/JKG9O.
7.
https://git.io/JKG1S.
8.
https://git.io/JKCT5.

References

Azzopardi, L., Mackenzie, J., Moffat, A.: ERR is not C/W/L: exploring the relationship between expected reciprocal rank and other metrics. In: ICTIR (2021)
Google Scholar
Azzopardi, L., Thomas, P., Moffat, A.: Cwl_eval: an evaluation tool for information retrieval. In: SIGIR (2019)
Google Scholar
Bajaj, P., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: CoCo@NIPS (2016)
Google Scholar
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: SIGIR (2004)
Google Scholar
Buckley, C., Voorhees, E.M.: Retrieval System Evaluation. MIT Press, Cambridge (2005)
Google Scholar
Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: CIKM (2009)
Google Scholar
Clarke, C.L.A., et al.: Novelty and diversity in information retrieval evaluation. In: SIGIR (2008)
Google Scholar
Clarke, C.L.A., Kolla, M., Vechtomova, O.: An effectiveness measure for ambiguous and underspecified queries. In: ICTIR (2009)
Google Scholar
Clarke, C.L.A., Vtyurina, A., Smucker, M.D.: Assessing top-k preferences. TOIS 39(3), 1–21 (2021)
Article Google Scholar
Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.: Overview of the TREC 2019 deep learning track. In: TREC (2019)
Google Scholar
Fuhr, N.: Some common mistakes in ir evaluation, and how they can be avoided. SIGIR Forum 51, 32–41 (2018)
Article Google Scholar
Harman, D.: Evaluation issues in information retrieval. IPM 28(4), 439–440 (1992)
MathSciNet Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. TOIS 20(4), 422–446 (2002)
Article Google Scholar
Jose, K.M., Nguyen, T., MacAvaney, S., Dalton, J., Yates, A.: Diffir: exploring differences in ranking models’ behavior. In: SIGIR (2021)
Google Scholar
Kantor, P., Voorhees, E.: The TREC-5 confusion track. Inf. Retr. 2(2–3), 165–176 (2000)
Article Google Scholar
Lin, J., et al.: Supporting interoperability between open-source search engines with the common index file format. In: SIGIR (2020)
Google Scholar
Lucchese, C., Muntean, C.I., Nardini, F.M., Perego, R., Trani, S.: Rankeval: an evaluation and analysis framework for learning-to-rank solutions. In: SIGIR (2017)
Google Scholar
MacAvaney, S.: OpenNIR: a complete neural ad-hoc ranking pipeline. In: WSDM (2020)
Google Scholar
MacAvaney, S., Yates, A., Feldman, S., Downey, D., Cohan, A., Goharian, N.: Simplified data wrangling with ir_datasets. In: SIGIR (2021)
Google Scholar
Macdonald, C., Tonellotto, N.: Declarative experimentation ininformation retrieval using PyTerrier. In: Proceedings of ICTIR 2020 (2020)
Google Scholar
Moffat, A., Bailey, P., Scholer, F., Thomas, P.: Inst: an adaptive metric for information retrieval evaluation. In: Australasian Document Computing Symposium (2015)
Google Scholar
Moffat, A., Bailey, P., Scholer, F., Thomas, P.: Incorporating user expectations and behavior into the measurement of search effectiveness. TOIS 35(3), 1–38 (2017)
Article Google Scholar
Moffat, A., Scholer, F., Thomas, P.: Models and metrics: IR evaluation as a user process. In: Australasian Document Computing Symposium (2012)
Google Scholar
National Institute of Standards and Technology: trec_eval. https://github.com/usnistgov/trec_eval (1993–2021)
Palotti, J., Scells, H., Zuccon, G.: TrecTools: an open-source python library for information retrieval practitioners involved in TREC-like campaigns. In: SIGIR (2019)
Google Scholar
Piwowarski, B.: Experimaestro and datamaestro: experiment and dataset managers (for IR). In: SIGIR (2020)
Google Scholar
Sakai, T.: On Fuhr’s guideline for IR evaluation. SIGIR Forum 54, 1–8 (2020)
Article Google Scholar
Van Gysel, C., de Rijke, M.: Pytrec_eval: an extremely fast python interface to trec_eval. In: SIGIR (2018)
Google Scholar
Van Rijsbergen, C.J.: Information retrieval (1979)
Google Scholar
Voorhees, E., et al.: Trec-covid: constructing a pandemic information retrieval test collection. ArXiv abs/2005.04474 (2020)
Google Scholar
Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. In: CIKM (2006)
Google Scholar
Zhang, F., Liu, Y., Li, X., Zhang, M., Xu, Y., Ma, S.: Evaluating web search with a bejeweled player model. In: SIGIR (2017)
Google Scholar

Download references

Acknowledgements.

We thank the contributors to the ir-measures repository. We acknowledge EPSRC grant EP/R018634/1: Closed-Loop Data Science for Complex, Computationally- & Data-Intensive Analytics.

Author information

Authors and Affiliations

University of Glasgow, Glasgow, UK
Sean MacAvaney, Craig Macdonald & Iadh Ounis

Authors

Sean MacAvaney
View author publications
You can also search for this author in PubMed Google Scholar
Craig Macdonald
View author publications
You can also search for this author in PubMed Google Scholar
Iadh Ounis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sean MacAvaney .

Editor information

Editors and Affiliations

Martin Luther University Halle-Wittenberg, Halle, Germany
Matthias Hagen
Leiden University, Leiden, The Netherlands
Suzan Verberne
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Duisburg-Essen, Essen, Germany
Christin Seifert
University of Stavanger, Stavanger, Norway
Krisztian Balog
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Stavanger, Stavanger, Norway
Vinay Setty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

MacAvaney, S., Macdonald, C., Ounis, I. (2022). Streamlining Evaluation with ir-measures. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13186. Springer, Cham. https://doi.org/10.1007/978-3-030-99739-7_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-99739-7_38
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99738-0
Online ISBN: 978-3-030-99739-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Streamlining Evaluation with ir-measures