short-paper

Analyzing the Influence of Bigrams on Retrieval Bias and Effectiveness

Authors:
Abdulaziz AlQatan

University of Strathclyde, Glasgow, United Kingdom

University of Strathclyde, Glasgow, United Kingdom
View Profile

,
Leif Azzopardi

University of Strathclyde, Glasgow, United Kingdom

University of Strathclyde, Glasgow, United Kingdom
View Profile

,
Yashar Moshfeghi

University of Strathclyde, Glasgow, United Kingdom

University of Strathclyde, Glasgow, United Kingdom
View Profile

ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information RetrievalSeptember 2020Pages 157–160https://doi.org/10.1145/3409256.3409831

Published:14 September 2020Publication History

ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

Pages 157–160

ABSTRACT

Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relationship between retrieval effectiveness and retrieval bias. While various factors influencing bias have been examined, there has been no work examining the impact of using bigram within the index on retrieval bias. Intuitively, how the documents are represented, and what terms they contain, will influence whether they are retrievable or not. In this paper, we investigate how the bias of a system changes depending on how the documents are represented using unigrams, bigrams or both. Our analysis of three different retrieval models on three TREC collections, shows that using a bigram only representation results in the lowest bias compared to unigram only representation, but at the expense of retrieval effectiveness. However, when both representations are combined it results in reducing the overall bias, as well as increasing effectiveness. These findings suggest that when configuring and indexing the collection, that the bag-of-words approach (unigrams), should be augmented with bigrams to create better and fairer retrieval systems.

References

Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 357--389.Google ScholarDigital Library
Leif Azzopardi and Vishwa Vinay. 2008. Retrievability: An Evaluation Measure for Higher Order Information Access Tasks. In Proc. of CIKM '08. ACM, 561--570.Google ScholarDigital Library
Ricardo Baeza-Yates. 2018. Bias on the Web. Comm. ACM, Vol. 61, 6 (2018), 54--61.Google ScholarDigital Library
Shariq Bashir and Andreas Rauber. 2009. Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In Proc. of CIKM '09. 1863--1866.Google ScholarDigital Library
Shariq Bashir and Andreas Rauber. 2010. Improving retrievability of patents in prior-art search. In Proc. of ECIR '10. 457--470.Google ScholarDigital Library
Ruey-Cheng Chen, Leif Azzopardi, and Falk Scholer. 2017. An Empirical Analysis of Pruning Techniques: Performance, Retrievability and Bias. In Proc. of the 2017 ACM on Conference on Information and Knowledge Management. 2023--2026.Google ScholarDigital Library
Debasis Ganguly, Ayan Bandyopadhyay, Mandar Mitra, and Gareth J.F. Jones. 2016. Retrievability of Code Mixed Microblogs. In Proc. of the 39th International ACM SIGIR Conference (Pisa, Italy) (SIGIR '16). 973--976.Google Scholar
J Gastwirth. 1972. The Estimation of the Lorenz Curve and Gini Index. The Review of Economics and Statistics, Vol. 54 (1972), 306--316. Issue 3.Google ScholarCross Ref
Christopher D Manning, Christopher D Manning, and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT press.Google Scholar
Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to information retrieval. Cambridge university press.Google Scholar
Seung-Hoon Na, Jungi Kim, In-Su Kang, and Jong-Hyeok Lee. 2008. Exploiting proximity feature in bigram language model for information retrieval. 821--822.Google Scholar
Jiaul H. Paik and Jimmy Lin. 2016. Retrievability in API-Based "Evaluation as a Service". In Proc. of the 2016 ACM International Conference on the Theory of Information Retrieval (Newark, Delaware, USA) (ICTIR '16). 91--94.Google Scholar
Vassilis Plachouras and Iadh Ounis. 2007. Multinomial Randomness Models for Retrieval with Document Fields. In ECIR.Google Scholar
Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. FNTIR, Vol. 3, 4 (2009), 333--389.Google Scholar
Thaer Samar, Myriam C. Traub, Jacco Ossenbruggen, Lynda Hardman, and Arjen P. Vries. 2018. Quantifying Retrieval Bias in Web Archive Search. Int. J. Digit. Libr., Vol. 19, 1 (March 2018), 57--75.Google Scholar
Fei Song and W. Bruce Croft. 1999. A General Language Model for Information Retrieval. In Proc. of the Eighth International Conference on Information and Knowledge Management (Kansas City, Missouri, USA) (CIKM '99). Association for Computing Machinery, New York, NY, USA, 316--321.Google Scholar
Chade-Meng Tan, Yuan-Fang Wang, and Chan-Do Lee. 2002. The use of bigrams to enhance text categorization. IPM, Vol. 38, 4 (2002), 529--546.Google Scholar
Myriam C. Traub, Thaer Samar, Jacco van Ossenbruggen, Jiyin He, Arjen de Vries, and Lynda Hardman. 2016. Querylog-Based Assessment of Retrievability Bias in a Large Newspaper Corpus. In Proc. of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries (Newark, New Jersey, USA) (JCDL '16). Association for Computing Machinery, New York, NY, USA, 7--16.Google ScholarDigital Library
Colin Wilkie and Leif Azzopardi. 2013. Relating retrievability, performance and length. In Proc. of SIGIR '13 (Dublin, Ireland). 937--940.Google ScholarDigital Library
Colin Wilkie and Leif Azzopardi. 2014a. Best and Fairest: An Empirical Analysis of Retrieval System Bias. Advances in Information Retrieval (2014), 13--25.Google Scholar
Colin Wilkie and Leif Azzopardi. 2014b. A Retrievability Analysis: Exploring the Relationship Between Retrieval Bias and Retrieval Performance. In Proc. of CIKM '14 (Shanghai, China). 81--90.Google ScholarDigital Library
Colin Wilkie and Leif Azzopardi. 2018. The impact of fielding on retrieval performance and bias. Proc. of the Association for Information Science and Technology, Vol. 55, 1 (2018), 564--572.Google ScholarCross Ref

Index Terms

Analyzing the Influence of Bigrams on Retrieval Bias and Effectiveness
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

On the Orthogonality of Bias and Utility in Ad hoc Retrieval
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Various researchers have recently explored the impact of different types of biases on information retrieval tasks such as ad hoc retrieval and question answering. While the impact of bias needs to be controlled in order to avoid increased prejudices, the ...
Read More
A Retrievability Analysis: Exploring the Relationship Between Retrieval Bias and Retrieval Performance
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Retrievability provides an alternative way to assess an Information Retrieval (IR) system by measuring how easily documents can be retrieved. Retrievability can also be used to determine the level of retrieval bias a system exerts upon a collection of ...
Read More
Algorithmic Bias: Do Good Systems Make Relevant Documents More Retrievable?
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Algorithmic bias presents a difficult challenge within Information Retrieval. Long has it been known that certain algorithms favour particular documents due to attributes of these documents that are not directly related to relevance. The evaluation of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval
September 2020
207 pages
ISBN:9781450380676
DOI:10.1145/3409256
General Chairs:
Krisztian Balog
University of Stavanger, Norway
,
Vinay Setty
University of Stavanger, Norway
,
Program Chairs:
Christina Lioma
University of Copenhagen, Denmark
,
Yiqun Liu
Tsinghua University, China
,
Min Zhang
Tsinghua University, China
,
Klaus Berberich
HTW Saar & MPI for Informatics, Germany
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 September 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bias
evaluation
fairness
retrievability
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate209of482submissions,43%
Upcoming Conference
ICTIR '24

Sponsor:

sigir

The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval

July 13, 2024

Washington DC , DC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 82
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Analyzing the Influence of Bigrams on Retrieval Bias and Effectiveness

ICTIR '20: Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

On the Orthogonality of Bias and Utility in Ad hoc Retrieval

A Retrievability Analysis: Exploring the Relationship Between Retrieval Bias and Retrieval Performance

Algorithmic Bias: Do Good Systems Make Relevant Documents More Retrievable?