ABSTRACT
Argument search engines identify, extract, and rank the most important arguments for and against a given controversial topic. A number of such systems have recently been developed, usually focusing on classic information retrieval ranking methods that are based on frequency information. An important aspect that has been ignored so far by search engines is the quality of arguments. We present a quality-aware ranking framework for arguments already extracted from texts and represented as argument graphs, considering multiple established quality measures. An extensive evaluation with a standard benchmark collection demonstrates that taking quality into account significantly helps to improve retrieval quality for argument search. We also publish a dataset in which arguments with respect to topics were tediously annotated by humans with three widely accepted argument quality dimensions.
Supplemental Material
- Yamen Ajjour, Henning Wachsmuth, Johannes Kiesel, Martin Potthast, Matthias Hagen, and Benno Stein. 2019. Data Acquisition for Argument Search: The args.me Corpus. In KI 2019: Advances in Artificial Intelligence - 42nd German Conference on AI, Kassel, Germany, September 23--26, 2019, Proceedings (Lecture Notes in Computer Science), Vol. 11793. Springer, 48--59. https://doi.org/10.1007/978--3-030--30179--8_4Google ScholarCross Ref
- Gianni Amati and C. J. van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems, Vol. 20, 4 (2002), 357--389. https://doi.org/10.1145/582415.582416Google ScholarDigital Library
- J. Anthony Blair. 2012. Groundwork in the Theory of Argumentation. Argumentation Library, Vol. 21. Springer Netherlands. https://doi.org/10.1007/978--94-007--2363--4Google Scholar
- Alexander Bondarenko, Matthias Hagen, Martin Potthast, Henning Wachsmuth, Meriem Beloucif, Chris Biemann, Alexander Panchenko, and Benno Stein. 2020. Touché : First Shared Task on Argument Retrieval. In Advances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14--17, 2020, Proceedings, Part II (Lecture Notes in Computer Science), Vol. 12036. Springer, 517--523. https://doi.org/10.1007/978--3-030--45442--5_67Google Scholar
- Leo Breiman. 1997. Arcing the edge. Technical Report. Technical Report 486, Statistics Department, University of California at ?.Google Scholar
- Elena Cabrio and Serena Villata. 2018. Five Years of Argument Mining: a Data-driven Analysis. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13--19, 2018, Stockholm, Sweden. 5427--5433. https://doi.org/10.24963/ijcai.2018/766Google ScholarCross Ref
- Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Bü ttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, Singapore, July 20--24, 2008. ACM, 659--666. https://doi.org/10.1145/1390334.1390446Google ScholarDigital Library
- Corinna Cortes and Vladimir Vapnik. 1995. Support-Vector Networks. Mach. Learn., Vol. 20, 3 (1995), 273--297. https://doi.org/10.1007/BF00994018Google ScholarCross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers). 4171--4186. https://aclweb.org/anthology/papers/N/N19/N19--1423/Google Scholar
- Lorik Dumani, Patrick J. Neumann, and Ralf Schenkel. 2020. A Framework for Argument Retrieval - Ranking Argument Clusters by Frequency and Specificity. In Advances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14--17, 2020, Proceedings, Part I (Lecture Notes in Computer Science), Vol. 12035. Springer, 431--445. https://doi.org/10.1007/978--3-030--45439--5_29Google Scholar
- Lorik Dumani and Ralf Schenkel. 2019. A Systematic Comparison of Methods for Finding Good Premises for Claims. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21--25, 2019. 957--960. https://doi.org/10.1145/3331184.3331282Google ScholarDigital Library
- Evelyn Fix and J. L. Hodges Jr. 1952. Discriminatory analysis: Nonparametric discrimination: Consistency properties. USAF School of Aviation Medicine, Project (1952), 21--49.Google Scholar
- Jerome H Friedman. 2002. Stochastic gradient boosting. Computational statistics & data analysis, Vol. 38, 4 (2002), 367--378.Google Scholar
- Martin Gleize, Eyal Shnarch, Leshem Choshen, Lena Dankin, Guy Moshkowich, Ranit Aharonov, and Noam Slonim. 2019. Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. 967--976. https://www.aclweb.org/anthology/P19--1093/Google ScholarCross Ref
- Ivan Habernal and Iryna Gurevych. 2016. Which argument is more convincing? Analyzing and predicting convincingness of Web arguments using bidirectional LSTM. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7--12, 2016, Berlin, Germany, Volume 1: Long Papers. https://www.aclweb.org/anthology/P16--1150/Google ScholarCross Ref
- Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data .Prentice-Hall.Google ScholarDigital Library
- Kalervo Jarvelin and Jaana Kekalainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, Vol. 20, 4 (2002), 422--446. https://doi.org/10.1145/582415.582418Google ScholarDigital Library
- Klaus Krippendorff. 1970. Estimating the Reliability, Systematic Error and Random Error of Interval Data.Google Scholar
- David M. Lane. 2018. All Pairwise Comparisons Among Means. http://onlinestatbook.com/2/tests_of_means/pairwise.html Retrieved 05-August-2020 fromGoogle Scholar
- Andy Liaw and Matthew Wiener. 2002. Classification and Regression by randomForest. R News, Vol. 2, 3 (2002), 18--22. http://CRAN.R-project.org/doc/Rnews/Google Scholar
- Stuart P. Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Inf. Theory, Vol. 28, 2 (1982), 129--136. https://doi.org/10.1109/TIT.1982.1056489Google ScholarDigital Library
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schü tze. 2008. Introduction to information retrieval. Cambridge University Press. https://doi.org/10.1017/CBO9780511809071Google Scholar
- Peter McCullagh and John A. Nelder. 1989. Generalized Linear Models .Springer. https://doi.org/10.1007/978--1--4899--3242--6Google Scholar
- Fabian Pedregosa, Gaë l Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake VanderPlas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res., Vol. 12 (2011), 2825--2830. http://dl.acm.org/citation.cfm?id=2078195Google ScholarDigital Library
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3--7, 2019. Association for Computational Linguistics, 3980--3990. https://doi.org/10.18653/v1/D19--1410Google ScholarCross Ref
- Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian Stab, and Iryna Gurevych. 2019. Classification and Clustering of Arguments with Contextualized Word Embeddings. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. 567--578. https://www.aclweb.org/anthology/P19--1054/Google ScholarCross Ref
- Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In Proceedings of The Third Text REtrieval Conference, TREC 1994, Gaithersburg, Maryland, USA, November 2--4, 1994, Vol. Special Publication 500--225. National Institute of Standards and Technology (NIST), 109--126. http://trec.nist.gov/pubs/trec3/papers/city.ps.gzGoogle Scholar
- Stephen E. Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval, Vol. 3, 4 (2009), 333--389. https://doi.org/10.1561/1500000019Google ScholarDigital Library
- Tetsuya Sakai. 2018. Laboratory Experiments in Information Retrieval - Sample Sizes, Effect Sizes, and Statistical Power. The Information Retrieval Series, Vol. 40. Springer. https://doi.org/10.1007/978--981--13--1199--4Google Scholar
- Gerard Salton, A. Wong, and Chung-Shu Yang. 1975. A Vector Space Model for Automatic Indexing. Commun. ACM, Vol. 18, 11 (1975), 613--620. https://doi.org/10.1145/361219.361220Google ScholarDigital Library
- Samuel Sanford Shapiro and Martin B Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika, Vol. 52, 3/4 (1965), 591--611.Google ScholarCross Ref
- R. R. Sokal and C. D. Michener. 1958. A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, Vol. 38 (1958), 1409--1438.Google Scholar
- Christian Stab, Johannes Daxenberger, Chris Stahlhut, Tristan Miller, Benjamin Schiller, Christopher Tauchmann, Steffen Eger, and Iryna Gurevych. 2018. ArgumenText: Searching for Arguments in Heterogeneous Sources. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 2--4, 2018, Demonstrations. 21--25. https://www.aclweb.org/anthology/N18--5005/Google ScholarCross Ref
- Christian Stab and Iryna Gurevych. 2014. Identifying Argumentative Discourse Structures in Persuasive Essays. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25--29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. 46--56. https://www.aclweb.org/anthology/D14--1006/Google ScholarCross Ref
- Manfred Stede, Stergos D. Afantenos, Andreas Peldszus, Nicholas Asher, and Jé ré my Perret. 2016. Parallel Discourse Annotations on a Corpus of Short Texts. In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorovz, Slovenia, May 23--28, 2016. http://www.lrec-conf.org/proceedings/lrec2016/summaries/477.htmlGoogle Scholar
- Henning Wachsmuth, Martin Potthast, Khalid Al Khatib, Yamen Ajjour, Jana Puschmann, Jiani Qu, Jonas Dorsch, Viorel Morari, Janek Bevendorff, and Benno Stein. 2017a. Building an Argument Search Engine for the Web. In Proc. 4th Workshop on Argument Mining (ArgMining@EMNLP). 49--59. https://doi.org/10.18653/v1/W17--5106Google ScholarCross Ref
- Henning Wachsmuth, Benno Stein, Graeme Hirst, Vinodkumar Prabhakaran, Yonatan Bilu, Yufang Hou, Nona Naderi, and Tim Alberdingk Thijm. 2017b. Computational Argumentation Quality Assessment in Natural Language. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3--7, 2017, Volume 1: Long Papers. 176--187. https://aclweb.org/anthology/E17--1017/Google ScholarCross Ref
Index Terms
- Quality-Aware Ranking of Arguments
Recommendations
QuARk: A GUI for Quality-Aware Ranking of Arguments
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information RetrievalWith the Web augmenting every day and computers increasingly getting more powerful, research in the field of computational argumentation becomes more and more important. One of its research branches is argument retrieval, which aims at finding and ...
Evaluating Fairness in Argument Retrieval
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementExisting commercial search engines often struggle to represent different perspectives of a search query. Argument retrieval systems address this limitation of search engines and provide both positive (PRO) and negative (CON) perspectives about a user's ...
A Framework for Argument Retrieval: Ranking Argument Clusters by Frequency and Specificity
Advances in Information RetrievalAbstractComputational argumentation has recently become a fast growing field of research. An argument consists of a claim, such as “We should abandon fossil fuels”, which is supported or attacked by at least one premise, for example “Burning fossil fuels ...
Comments