short-paper

Evaluating Fairness in Argument Retrieval

Authors:
Sachin Pathiyan Cherumanal

RMIT University, Melbourne, VIC, Australia

RMIT University, Melbourne, VIC, Australia
View Profile

,
Damiano Spina

RMIT University, Melbourne, VIC, Australia

RMIT University, Melbourne, VIC, Australia
View Profile

,
Falk Scholer

RMIT University, Melbourne, VIC, Australia

RMIT University, Melbourne, VIC, Australia
View Profile

,
W. Bruce Croft

University of Massachusetts, Amherst, MA, USA

University of Massachusetts, Amherst, MA, USA
View Profile

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementOctober 2021Pages 3363–3367https://doi.org/10.1145/3459637.3482099

Published:30 October 2021Publication History

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 3363–3367

ABSTRACT

Existing commercial search engines often struggle to represent different perspectives of a search query. Argument retrieval systems address this limitation of search engines and provide both positive (PRO) and negative (CON) perspectives about a user's information need on a controversial topic (e.g., climate change). The effectiveness of such argument retrieval systems is typically evaluated based on topical relevance and argument quality, without taking into account the often differing number of documents shown for the argument stances (PRO or CON). Therefore, systems may retrieve relevant passages, but with a biased exposure of arguments. In this work, we analyze a range of non-stochastic fairness-aware ranking and diversity metrics to evaluate the extent to which argument stances are fairly exposed in argument retrieval systems.

Using the official runs of the argument retrieval task Ttouché at CLEF 2020, as well as synthetic data to control the amount and order of argument stances in the rankings, we show that systems with the best effectiveness in terms of topical relevance are not necessarily the most fair or the most diverse in terms of argument stance. The relationships we found between (un)fairness and diversity metrics shed light on how to evaluate group fairness -- in addition to topical relevance -- in argument retrieval settings.

References

Yamen Ajjour, Henning Wachsmuth, Johannes Kiesel, Martin Potthast, Matthias Hagen, and Benno Stein. 2019. Data Acquisition for Argument Search: The args.me Corpus. In Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz). Springer, 48--59. https://doi.org/10.1007/978-3-030-30179-8_4Google ScholarCross Ref
James Allan, Jaime Arguello, Leif Azzopardi, Peter Bailey, Tim Baldwin, Krisztian Balog, Hannah Bast, Nick Belkin, Klaus Berberich, Bodo von Billerbeck, Jamie Callan, Rob Capra, Mark Carman, Ben Carterette, Charles L. A. Clarke, Kevyn Collins-Thompson, Nick Craswell, W. Bruce Croft, J. Shane Culpepper, Jeff Dalton, Gianluca Demartini, Fernado Diaz, Laura Dietz, Susan Dumais, Carsten Eickhoff, Nicola Ferro, Norbert Fuhr, Shlomo Geva, Claudia Hauff, David Hawking, Hideo Joho, Gareth Jones, Jaap Kamps, Noriko Kando, Diane Kelly, Jaewon Kim, Julia Kiseleva, Yiqun Liu, Xiaolu Lu, Stefano Mizzaro, Alistair Moffat, Jian-Yun Nie, Alexandra Olteanu, Iadh Ounis, Filip Radlinski, Maarten de Rijke, Mark Sanderson, Falk Scholer, Laurianne Sitbon, Mark Smucker, Ian Soboroff, Damiano Spina, Torsten Suel, James Thom, Paul Thomas, Andrew Trotman, Ellen Voorhees, Arjen P. de Vries, Emine Yilmaz, and Guido Zuccon. 2018. Research Frontiers in Information Retrieval: Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018). SIGIR Forum, Vol. 52, 1 (Aug. 2018), 34--90. https://doi.org/10.1145/3274784.3274788Google Scholar
Enrique Amigó, Damiano Spina, and Jorge Carrillo-de Albornoz. 2018. An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (Ann Arbor, MI, USA) (SIGIR '18). Association for Computing Machinery, 625--634. https://doi.org/10.1145/3209978.3210024 Google ScholarDigital Library
Asia J. Biega, Krishna P. Gummadi, and Gerhard Weikum. 2018. Equity of Attention: Amortizing Individual Fairness in Rankings. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (Ann Arbor, MI, USA) (SIGIR '18). Association for Computing Machinery, 405--414. https://doi.org/10.1145/3209978.3210063 Google ScholarDigital Library
Alexander Bondarenko, Maik Fröbe, Meriem Beloucif, Lukas Gienapp, Yamen Ajjour, Alexander Panchenko, Chris Biemann, Benno Stein, Henning Wachsmuth, Martin Potthast, et al. 2020. Overview of Touché 2020: Argument Retrieval. In International Conference of the Cross-Language Evaluation Forum for European Languages (CLEF '20). Springer, 384--395. https://doi.org/10.1007/978-3-030-58219-7_26Google ScholarDigital Library
Ben Carterette. 2009. On Rank Correlation and the Distance between Rankings. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Boston, MA, USA) (SIGIR '09). Association for Computing Machinery, 436--443. https://doi.org/10.1145/1571941.1572017 Google ScholarDigital Library
Carlos Castillo. 2019. Fairness and Transparency in Ranking. SIGIR Forum, Vol. 52, 2 (Jan. 2019), 64--71. https://doi.org/10.1145/3308774.3308783 Google ScholarDigital Library
Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and Diversity in Information Retrieval Evaluation. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Singapore, Singapore) (SIGIR '08). Association for Computing Machinery, 659--666. https://doi.org/10.1145/1390334.1390446 Google ScholarDigital Library
Fernando Diaz, Bhaskar Mitra, Michael D. Ekstrand, Asia J. Biega, and Ben Carterette. 2020. Evaluating Stochastic Rankings with Expected Exposure. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (Virtual Event, Ireland) (CIKM '20). Association for Computing Machinery, 275--284. https://doi.org/10.1145/3340531.3411962 Google ScholarDigital Library
Tim Draws, Nava Tintarev, and Ujwal Gadiraju. 2021. Assessing Viewpoint Diversity in Search Results Using Ranking Fairness Metrics. SIGKDD Explor. Newsl., Vol. 23, 1 (May 2021), 50--58. https://doi.org/10.1145/3468507.3468515 Google ScholarDigital Library
Lorik Dumani, Patrick J Neumann, and Ralf Schenkel. 2020. A Framework for Argument Retrieval. In Proceedings of the 42nd European Conference on Information Retrieval (ECIR '20). Springer, 431--445. https://doi.org/10.1007/978-3-030-45439-5_29Google ScholarDigital Library
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012a. Fairness through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (Cambridge, Massachusetts) (ITCS '12). Association for Computing Machinery, 214--226. https://doi.org/10.1145/2090236.2090255 Google ScholarDigital Library
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012b. Fairness through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (Cambridge, Massachusetts) (ITCS '12). Association for Computing Machinery, 214--226. https://doi.org/10.1145/2090236.2090255 Google ScholarDigital Library
Nicola Ferro. 2017. What Does Affect the Correlation Among Evaluation Measures? ACM Trans. Inf. Syst., Vol. 36, 2, Article 19 (Aug. 2017), bibinfonumpages40 pages. https://doi.org/10.1145/3106371 Google ScholarDigital Library
Ruoyuan Gao and Chirag Shah. 2020. Toward Creating a Fairer Ranking in Search Engine Results. Information Processing & Management, Vol. 57, 1 (2020), 102138. https://doi.org/10.1016/j.ipm.2019.102138Google ScholarDigital Library
Sahin Cem Geyik, Stuart Ambler, and Krishnaram Kenthapadi. 2019. Fairness-Aware Ranking in Searc & Recommendation Systems with Application to LinkedIn Talent Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, 2221--2231. https://doi.org/10.1145/3292500.3330691 Google ScholarDigital Library
Maurice G Kendall. 1938. A New Measure of Rank Correlation. Biometrika, Vol. 30, 1/2 (1938), 81--93. https://www.jstor.org/stable/2332226Google ScholarCross Ref
Johannes Kiesel, Damiano Spina, Henning Wachsmuth, and Benno Stein. 2021. The Meant, the Said, and the Understood: Conversational Argument Search and Cognitive Biases. In CUI 2021 - 3rd Conference on Conversational User Interfaces (Bilbao (online), Spain) (CUI '21). Association for Computing Machinery, Article 20, bibinfonumpages5 pages. https://doi.org/10.1145/3469595.3469615 Google ScholarDigital Library
Mucahid Kutlu, Tamer Elsayed, Maram Hasanain, and Matthew Lease. 2018. When Rank Order Isn't Enough: New Statistical-Significance-Aware Correlation Measures. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM '18). Association for Computing Machinery, 397--406. https://doi.org/10.1145/3269206.3271751 Google ScholarDigital Library
Dino Pedreschi, Salvatore Ruggieri, and Franco Turini. 2009. Measuring Discrimination in Socially-Sensitive Decision Records. In Proceedings of the 2009 SIAM International Conference on Data Mining (SDM '09). SIAM, 581--592. https://doi.org/10.1137/1.9781611972795.50Google ScholarCross Ref
Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-Aware Data Mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Las Vegas, Nevada, USA) (KDD '08). Association for Computing Machinery, 560--568. https://doi.org/10.1145/1401890.1401959 Google ScholarDigital Library
Martin Potthast, Lukas Gienapp, Florian Euchner, Nick Heilenkötter, Nico Weidmann, Henning Wachsmuth, Benno Stein, and Matthias Hagen. 2019. Argument Search: Assessing Argument Relevance. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR'19). Association for Computing Machinery, 1117--1120. https://doi.org/10.1145/3331184.3331327 Google ScholarDigital Library
Tetsuya Sakai and Zhaohao Zeng. 2019. Which Diversity Evaluation Measures Are "Good"?. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR'19). Association for Computing Machinery, 595--604. https://doi.org/10.1145/3331184.3331215 Google ScholarDigital Library
Ashudeep Singh and Thorsten Joachims. 2018. Fairness of Exposure in Rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD '18). Association for Computing Machinery, 2219--2228. https://doi.org/10.1145/3219819.3220088 Google ScholarDigital Library
Henning Wachsmuth, Nona Naderi, Yufang Hou, Yonatan Bilu, Vinodkumar Prabhakaran, Tim Alberdingk Thijm, Graeme Hirst, and Benno Stein. 2017a. Computational Argumentation Quality Assessment in Natural Language. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (ACL '17). Association for Computational Linguistics, 176--187. https://www.aclweb.org/anthology/E17--1017Google ScholarCross Ref
Henning Wachsmuth, Martin Potthast, Khalid Al-Khatib, Yamen Ajjour, Jana Puschmann, Jiani Qu, Jonas Dorsch, Viorel Morari, Janek Bevendorff, and Benno Stein. 2017b. Building an Argument Search Engine for the Web. In 4th Workshop on Argument Mining (ArgMining 2017) at EMNLP, Kevin Ashley, Claire Cardie, Nancy Green, Iryna Gurevych, Ivan Habernal, Diane Litman, Georgios Petasis, Chris Reed, Noam Slonim, and Vern Walker (Eds.). Association for Computational Linguistics, 49--59. https://www.aclweb.org/anthology/W17-5106Google ScholarCross Ref
Colin Wilkie and Leif Azzopardi. 2014. Best and Fairest: An Empirical Analysis of Retrieval System Bias. In Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 8416 (Amsterdam, The Netherlands) (ECIR '14). Springer-Verlag, 13--25.Google ScholarCross Ref
Ke Yang and Julia Stoyanovich. 2017. Measuring Fairness in Ranked Outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management (Chicago, IL, USA) (SSDBM '17). Association for Computing Machinery, Article 22, 6 pages. https://doi.org/10.1145/3085504.3085526 Google ScholarDigital Library
Emine Yilmaz, Javed A. Aslam, and Stephen Robertson. 2008. A New Rank Correlation Coefficient for Information Retrieval. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Singapore, Singapore) (SIGIR '08). Association for Computing Machinery, 587--594. https://doi.org/10.1145/1390334.1390435 Google ScholarDigital Library
Meike Zehlike, Ke Yang, and Julia Stoyanovich. 2021. Fairness in Ranking: A Survey. arxiv: 2103.14000 [cs.IR]Google Scholar

Index Terms

Evaluating Fairness in Argument Retrieval
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Web-Retrieval Supported Argument Space Exploration
CHIIR '17: Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval

Solid decision making should be ideally based on clear arguments that can be justified by trustworthy information sources. However, argument spaces can quickly get quite complex and it is very often hard to trace the line of arguments found in ...
Read More
Argument and Counter-Argument Generation: A Critical Survey
Natural Language Processing and Information Systems
Abstract
Argument Generation (AG) is becoming an increasingly active research topic in Natural Language Processing (NLP), and a large variety of terms has been used to highlight different aspects and methods of AG such as argument construction, argument ...
Read More
Argument Retrieval from Web
Experimental IR Meets Multilinguality, Multimodality, and Interaction
Abstract
We are well beyond the days of expecting search engines to help us find documents containing the answer to a question or information about a query. We expect a search engine to help us in the decision-making process. Argument retrieval task in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
argument retrieval
evaluation
fairness
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 229
  Total Downloads
- Downloads (Last 12 months)55
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evaluating Fairness in Argument Retrieval

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Web-Retrieval Supported Argument Space Exploration

Argument and Counter-Argument Generation: A Critical Survey

Argument Retrieval from Web