skip to main content
10.1145/3331184.3331339acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

The Impact of Score Ties on Repeatability in Document Ranking

Published:18 July 2019Publication History

ABSTRACT

Document ranking experiments should be repeatable. However, the interaction between multi-threaded indexing and score ties during retrieval may yield non-deterministic rankings, making repeatability not as trivial as one might imagine. In the context of the open-source Lucene search engine, score ties are broken by internal document ids, which are assigned at index time. Due to multi-threaded indexing, which makes experimentation with large modern document collections practical, internal document ids are not assigned consistently between different index instances of the same collection, and thus score ties are broken unpredictably. This short paper examines the effectiveness impact of such score ties, quantifying the variability that can be attributed to this phenomenon. The obvious solution to this non-determinism and to ensure repeatable document ranking is to break score ties using external collection document ids. This approach, however, comes with measurable efficiency costs due to the necessity of consulting external identifiers during query evaluation.

References

  1. V. Anh, O. de Kretser, and A. Moffat. 2001. Vector-Space Ranking with Effective Early Termination. In SIGIR. 35--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Arguello, M. Crane, F. Diaz, J. Lin, and A. Trotman. 2015. Report on the SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR). SIGIR Forum, Vol. 49, 2 (2015), 107--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Cabanac, G. Hubert, M. Boughanem, and C. Chrisment. 2010. Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation. In CLEF. 112--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Ferro and G. Silvello. 2015. Rank-Biased Precision Reloaded: Reproducibility and Generalization. In ECIR. 768--780.Google ScholarGoogle Scholar
  5. J. Lin, M. Crane, A. Trotman, J. Callan, I. Chattopadhyaya, J. Foley, G. Ingersoll, C. Macdonald, and S. Vigna. 2016. Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge. In ECIR. 408--420.Google ScholarGoogle Scholar
  6. J. Lin and A. Trotman. 2015. Anytime Ranking for Impact-Ordered Indexes. In ICTIR. 301--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. McSherry and M. Najork. 2008. Computing Information Retrieval Performance Measures Efficiently in the Presence of Tied Scores. In ECIR. 414--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. 2011. Overview of the TREC-2011 Microblog Track. In TREC.Google ScholarGoogle Scholar
  9. H. Wu and H. Fang. 2013. Tie Breaker: A Novel Way of Combining Retrieval Signal. In ICTIR. 72--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Yang, H. Fang, and J. Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. In SIGIR. 1253--1256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Yang, H. Fang, and J. Lin. 2018. Anserini: Reproducible Ranking Baselines Using Lucene. Journal of Data and Information Quality, Vol. 10, 4 (2018), Article 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Z. Yang, A. Moffat, and A. Turpin. 2016. How Precise Does Document Scoring Need to Be? In AIRS. 279--291.Google ScholarGoogle Scholar

Index Terms

  1. The Impact of Score Ties on Repeatability in Document Ranking

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2019
        1512 pages
        ISBN:9781450361729
        DOI:10.1145/3331184

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 July 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        SIGIR'19 Paper Acceptance Rate84of426submissions,20%Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader