skip to main content
10.1145/3459637.3482099acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Evaluating Fairness in Argument Retrieval

Published:30 October 2021Publication History

ABSTRACT

Existing commercial search engines often struggle to represent different perspectives of a search query. Argument retrieval systems address this limitation of search engines and provide both positive (PRO) and negative (CON) perspectives about a user's information need on a controversial topic (e.g., climate change). The effectiveness of such argument retrieval systems is typically evaluated based on topical relevance and argument quality, without taking into account the often differing number of documents shown for the argument stances (PRO or CON). Therefore, systems may retrieve relevant passages, but with a biased exposure of arguments. In this work, we analyze a range of non-stochastic fairness-aware ranking and diversity metrics to evaluate the extent to which argument stances are fairly exposed in argument retrieval systems.

Using the official runs of the argument retrieval task Ttouché at CLEF 2020, as well as synthetic data to control the amount and order of argument stances in the rankings, we show that systems with the best effectiveness in terms of topical relevance are not necessarily the most fair or the most diverse in terms of argument stance. The relationships we found between (un)fairness and diversity metrics shed light on how to evaluate group fairness -- in addition to topical relevance -- in argument retrieval settings.

References

  1. Yamen Ajjour, Henning Wachsmuth, Johannes Kiesel, Martin Potthast, Matthias Hagen, and Benno Stein. 2019. Data Acquisition for Argument Search: The args.me Corpus. In Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz). Springer, 48--59. https://doi.org/10.1007/978-3-030-30179-8_4Google ScholarGoogle ScholarCross RefCross Ref
  2. James Allan, Jaime Arguello, Leif Azzopardi, Peter Bailey, Tim Baldwin, Krisztian Balog, Hannah Bast, Nick Belkin, Klaus Berberich, Bodo von Billerbeck, Jamie Callan, Rob Capra, Mark Carman, Ben Carterette, Charles L. A. Clarke, Kevyn Collins-Thompson, Nick Craswell, W. Bruce Croft, J. Shane Culpepper, Jeff Dalton, Gianluca Demartini, Fernado Diaz, Laura Dietz, Susan Dumais, Carsten Eickhoff, Nicola Ferro, Norbert Fuhr, Shlomo Geva, Claudia Hauff, David Hawking, Hideo Joho, Gareth Jones, Jaap Kamps, Noriko Kando, Diane Kelly, Jaewon Kim, Julia Kiseleva, Yiqun Liu, Xiaolu Lu, Stefano Mizzaro, Alistair Moffat, Jian-Yun Nie, Alexandra Olteanu, Iadh Ounis, Filip Radlinski, Maarten de Rijke, Mark Sanderson, Falk Scholer, Laurianne Sitbon, Mark Smucker, Ian Soboroff, Damiano Spina, Torsten Suel, James Thom, Paul Thomas, Andrew Trotman, Ellen Voorhees, Arjen P. de Vries, Emine Yilmaz, and Guido Zuccon. 2018. Research Frontiers in Information Retrieval: Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018). SIGIR Forum, Vol. 52, 1 (Aug. 2018), 34--90. https://doi.org/10.1145/3274784.3274788Google ScholarGoogle Scholar
  3. Enrique Amigó, Damiano Spina, and Jorge Carrillo-de Albornoz. 2018. An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (Ann Arbor, MI, USA) (SIGIR '18). Association for Computing Machinery, 625--634. https://doi.org/10.1145/3209978.3210024 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Asia J. Biega, Krishna P. Gummadi, and Gerhard Weikum. 2018. Equity of Attention: Amortizing Individual Fairness in Rankings. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (Ann Arbor, MI, USA) (SIGIR '18). Association for Computing Machinery, 405--414. https://doi.org/10.1145/3209978.3210063 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alexander Bondarenko, Maik Fröbe, Meriem Beloucif, Lukas Gienapp, Yamen Ajjour, Alexander Panchenko, Chris Biemann, Benno Stein, Henning Wachsmuth, Martin Potthast, et al. 2020. Overview of Touché 2020: Argument Retrieval. In International Conference of the Cross-Language Evaluation Forum for European Languages (CLEF '20). Springer, 384--395. https://doi.org/10.1007/978-3-030-58219-7_26Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ben Carterette. 2009. On Rank Correlation and the Distance between Rankings. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Boston, MA, USA) (SIGIR '09). Association for Computing Machinery, 436--443. https://doi.org/10.1145/1571941.1572017 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carlos Castillo. 2019. Fairness and Transparency in Ranking. SIGIR Forum, Vol. 52, 2 (Jan. 2019), 64--71. https://doi.org/10.1145/3308774.3308783 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and Diversity in Information Retrieval Evaluation. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Singapore, Singapore) (SIGIR '08). Association for Computing Machinery, 659--666. https://doi.org/10.1145/1390334.1390446 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fernando Diaz, Bhaskar Mitra, Michael D. Ekstrand, Asia J. Biega, and Ben Carterette. 2020. Evaluating Stochastic Rankings with Expected Exposure. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (Virtual Event, Ireland) (CIKM '20). Association for Computing Machinery, 275--284. https://doi.org/10.1145/3340531.3411962 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tim Draws, Nava Tintarev, and Ujwal Gadiraju. 2021. Assessing Viewpoint Diversity in Search Results Using Ranking Fairness Metrics. SIGKDD Explor. Newsl., Vol. 23, 1 (May 2021), 50--58. https://doi.org/10.1145/3468507.3468515 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lorik Dumani, Patrick J Neumann, and Ralf Schenkel. 2020. A Framework for Argument Retrieval. In Proceedings of the 42nd European Conference on Information Retrieval (ECIR '20). Springer, 431--445. https://doi.org/10.1007/978-3-030-45439-5_29Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012a. Fairness through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (Cambridge, Massachusetts) (ITCS '12). Association for Computing Machinery, 214--226. https://doi.org/10.1145/2090236.2090255 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012b. Fairness through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (Cambridge, Massachusetts) (ITCS '12). Association for Computing Machinery, 214--226. https://doi.org/10.1145/2090236.2090255 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Nicola Ferro. 2017. What Does Affect the Correlation Among Evaluation Measures? ACM Trans. Inf. Syst., Vol. 36, 2, Article 19 (Aug. 2017), bibinfonumpages40 pages. https://doi.org/10.1145/3106371 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ruoyuan Gao and Chirag Shah. 2020. Toward Creating a Fairer Ranking in Search Engine Results. Information Processing & Management, Vol. 57, 1 (2020), 102138. https://doi.org/10.1016/j.ipm.2019.102138Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sahin Cem Geyik, Stuart Ambler, and Krishnaram Kenthapadi. 2019. Fairness-Aware Ranking in Searc & Recommendation Systems with Application to LinkedIn Talent Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD '19). Association for Computing Machinery, 2221--2231. https://doi.org/10.1145/3292500.3330691 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Maurice G Kendall. 1938. A New Measure of Rank Correlation. Biometrika, Vol. 30, 1/2 (1938), 81--93. https://www.jstor.org/stable/2332226Google ScholarGoogle ScholarCross RefCross Ref
  18. Johannes Kiesel, Damiano Spina, Henning Wachsmuth, and Benno Stein. 2021. The Meant, the Said, and the Understood: Conversational Argument Search and Cognitive Biases. In CUI 2021 - 3rd Conference on Conversational User Interfaces (Bilbao (online), Spain) (CUI '21). Association for Computing Machinery, Article 20, bibinfonumpages5 pages. https://doi.org/10.1145/3469595.3469615 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mucahid Kutlu, Tamer Elsayed, Maram Hasanain, and Matthew Lease. 2018. When Rank Order Isn't Enough: New Statistical-Significance-Aware Correlation Measures. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM '18). Association for Computing Machinery, 397--406. https://doi.org/10.1145/3269206.3271751 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dino Pedreschi, Salvatore Ruggieri, and Franco Turini. 2009. Measuring Discrimination in Socially-Sensitive Decision Records. In Proceedings of the 2009 SIAM International Conference on Data Mining (SDM '09). SIAM, 581--592. https://doi.org/10.1137/1.9781611972795.50Google ScholarGoogle ScholarCross RefCross Ref
  21. Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-Aware Data Mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Las Vegas, Nevada, USA) (KDD '08). Association for Computing Machinery, 560--568. https://doi.org/10.1145/1401890.1401959 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Martin Potthast, Lukas Gienapp, Florian Euchner, Nick Heilenkötter, Nico Weidmann, Henning Wachsmuth, Benno Stein, and Matthias Hagen. 2019. Argument Search: Assessing Argument Relevance. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR'19). Association for Computing Machinery, 1117--1120. https://doi.org/10.1145/3331184.3331327 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tetsuya Sakai and Zhaohao Zeng. 2019. Which Diversity Evaluation Measures Are "Good"?. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Paris, France) (SIGIR'19). Association for Computing Machinery, 595--604. https://doi.org/10.1145/3331184.3331215 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ashudeep Singh and Thorsten Joachims. 2018. Fairness of Exposure in Rankings. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD '18). Association for Computing Machinery, 2219--2228. https://doi.org/10.1145/3219819.3220088 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Henning Wachsmuth, Nona Naderi, Yufang Hou, Yonatan Bilu, Vinodkumar Prabhakaran, Tim Alberdingk Thijm, Graeme Hirst, and Benno Stein. 2017a. Computational Argumentation Quality Assessment in Natural Language. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (ACL '17). Association for Computational Linguistics, 176--187. https://www.aclweb.org/anthology/E17--1017Google ScholarGoogle ScholarCross RefCross Ref
  26. Henning Wachsmuth, Martin Potthast, Khalid Al-Khatib, Yamen Ajjour, Jana Puschmann, Jiani Qu, Jonas Dorsch, Viorel Morari, Janek Bevendorff, and Benno Stein. 2017b. Building an Argument Search Engine for the Web. In 4th Workshop on Argument Mining (ArgMining 2017) at EMNLP, Kevin Ashley, Claire Cardie, Nancy Green, Iryna Gurevych, Ivan Habernal, Diane Litman, Georgios Petasis, Chris Reed, Noam Slonim, and Vern Walker (Eds.). Association for Computational Linguistics, 49--59. https://www.aclweb.org/anthology/W17-5106Google ScholarGoogle ScholarCross RefCross Ref
  27. Colin Wilkie and Leif Azzopardi. 2014. Best and Fairest: An Empirical Analysis of Retrieval System Bias. In Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 8416 (Amsterdam, The Netherlands) (ECIR '14). Springer-Verlag, 13--25.Google ScholarGoogle ScholarCross RefCross Ref
  28. Ke Yang and Julia Stoyanovich. 2017. Measuring Fairness in Ranked Outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management (Chicago, IL, USA) (SSDBM '17). Association for Computing Machinery, Article 22, 6 pages. https://doi.org/10.1145/3085504.3085526 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Emine Yilmaz, Javed A. Aslam, and Stephen Robertson. 2008. A New Rank Correlation Coefficient for Information Retrieval. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Singapore, Singapore) (SIGIR '08). Association for Computing Machinery, 587--594. https://doi.org/10.1145/1390334.1390435 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Meike Zehlike, Ke Yang, and Julia Stoyanovich. 2021. Fairness in Ranking: A Survey. arxiv: 2103.14000 [cs.IR]Google ScholarGoogle Scholar

Index Terms

  1. Evaluating Fairness in Argument Retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
      October 2021
      4966 pages
      ISBN:9781450384469
      DOI:10.1145/3459637

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 October 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader