short-paper

The Impact of Score Ties on Repeatability in Document Ranking

Authors:
Jimmy Lin

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Peilin Yang

No affiliation, San Francisco, CA, USA

No affiliation, San Francisco, CA, USA
View Profile

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information RetrievalJuly 2019Pages 1125–1128https://doi.org/10.1145/3331184.3331339

Published:18 July 2019Publication History

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1125–1128

ABSTRACT

Document ranking experiments should be repeatable. However, the interaction between multi-threaded indexing and score ties during retrieval may yield non-deterministic rankings, making repeatability not as trivial as one might imagine. In the context of the open-source Lucene search engine, score ties are broken by internal document ids, which are assigned at index time. Due to multi-threaded indexing, which makes experimentation with large modern document collections practical, internal document ids are not assigned consistently between different index instances of the same collection, and thus score ties are broken unpredictably. This short paper examines the effectiveness impact of such score ties, quantifying the variability that can be attributed to this phenomenon. The obvious solution to this non-determinism and to ensure repeatable document ranking is to break score ties using external collection document ids. This approach, however, comes with measurable efficiency costs due to the necessity of consulting external identifiers during query evaluation.

References

V. Anh, O. de Kretser, and A. Moffat. 2001. Vector-Space Ranking with Effective Early Termination. In SIGIR. 35--42. Google ScholarDigital Library
J. Arguello, M. Crane, F. Diaz, J. Lin, and A. Trotman. 2015. Report on the SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR). SIGIR Forum, Vol. 49, 2 (2015), 107--116. Google ScholarDigital Library
G. Cabanac, G. Hubert, M. Boughanem, and C. Chrisment. 2010. Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation. In CLEF. 112--123. Google ScholarDigital Library
N. Ferro and G. Silvello. 2015. Rank-Biased Precision Reloaded: Reproducibility and Generalization. In ECIR. 768--780.Google Scholar
J. Lin, M. Crane, A. Trotman, J. Callan, I. Chattopadhyaya, J. Foley, G. Ingersoll, C. Macdonald, and S. Vigna. 2016. Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge. In ECIR. 408--420.Google Scholar
J. Lin and A. Trotman. 2015. Anytime Ranking for Impact-Ordered Indexes. In ICTIR. 301--304. Google ScholarDigital Library
F. McSherry and M. Najork. 2008. Computing Information Retrieval Performance Measures Efficiently in the Presence of Tied Scores. In ECIR. 414--421. Google ScholarDigital Library
I. Ounis, C. Macdonald, J. Lin, and I. Soboroff. 2011. Overview of the TREC-2011 Microblog Track. In TREC.Google Scholar
H. Wu and H. Fang. 2013. Tie Breaker: A Novel Way of Combining Retrieval Signal. In ICTIR. 72--75. Google ScholarDigital Library
P. Yang, H. Fang, and J. Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. In SIGIR. 1253--1256. Google ScholarDigital Library
P. Yang, H. Fang, and J. Lin. 2018. Anserini: Reproducible Ranking Baselines Using Lucene. Journal of Data and Information Quality, Vol. 10, 4 (2018), Article 16. Google ScholarDigital Library
Z. Yang, A. Moffat, and A. Turpin. 2016. How Precise Does Document Scoring Need to Be? In AIRS. 279--291.Google Scholar

Index Terms

The Impact of Score Ties on Repeatability in Document Ranking
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval efficiency
    2. Information retrieval query processing

Recommendations

Efficient passage ranking for document databases

Queries to text collections are resolved by ranking the documents in the collection and returning the highest-scoring documents to the user. An alternative retrieval method is to rank passages, that is, short fragments of documents, a strategy that can ...
Read More
Context-sensitive document ranking
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Ranking is a main research issue in IR-styled keyword search over a set of documents. In this paper, we study a new keyword search problem, called context-sensitive document ranking, which is to rank documents with an additional context that provides ...
Read More
An Algorithm for Ranking Pages Based on Theme Seach Engines
CICN '12: Proceedings of the 2012 Fourth International Conference on Computational Intelligence and Communication Networks

With the development of information technology, it is very difficult for users to find the information they need because more and more information is on the Internet. The search engine is a tool for users to find information, however, customers are not ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2019
1512 pages
ISBN:9781450361729
DOI:10.1145/3331184
General Chairs:
Benjamin Piwowarski
CNRS - Sorbonne Universite, France
,
Max Chevalier
Universite de Toulouse, CNRS, France
,
Eric Gaussier
Universite Grenoble Alpes, CNRS, France
,
Program Chairs:
Yoelle Maarek
Amazon Research, Israel
,
Jian-Yun Nie
University of Montreal, Canada
,
Falk Scholer
RMIT University, Australia
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 July 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Lucene
query evaluation
Qualifiers
- short-paper
Conference

Acceptance Rates
SIGIR'19 Paper Acceptance Rate84of426submissions,20%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 178
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The Impact of Score Ties on Repeatability in Document Ranking

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient passage ranking for document databases

Context-sensitive document ranking

An Algorithm for Ranking Pages Based on Theme Seach Engines