skip to main content
research-article

A Neural Model to Jointly Predict and Explain Truthfulness of Statements

Published:28 December 2022Publication History
Skip Abstract Section

Abstract

Automated fact-checking (AFC) systems exist to combat disinformation, however, their complexity usually makes them opaque to the end-user, making it difficult to foster trust in the system. In this article, we introduce the E-BART model with the hope of making progress on this front. E-BART is able to provide a veracity prediction for a claim and jointly generate a human-readable explanation for this decision. We show that E-BART is competitive with the state-of-the-art on the e-FEVER and e-SNLI tasks. In addition, we validate the joint-prediction architecture by showing (1) that generating explanations does not significantly impede the model from performing well in its main task of veracity prediction, and (2) that predicted veracity and explanations are more internally coherent when generated jointly than separately. We also calibrate the E-BART model, allowing the output of the final model to be correctly interpreted as the confidence of correctness. Finally, we also conduct an extensive human evaluation on the impact of generated explanations and observe that: Explanations increase human ability to spot misinformation and make people more skeptical about claims, and explanations generated by E-BART are competitive with ground truth explanations.

REFERENCES

  1. [1] Ahmadi Naser, Lee Joohyung, Papotti Paolo, and Saeed Mohammed. 2019. Explainable fact checking with probabilistic answer set programming. In Proceedings of the Truth and Trust Online Conference.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Atanasova Pepa, Simonsen Jakob Grue, Lioma Christina, and Augenstein Isabelle. 2020. Generating fact checking explanations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 73527364.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Beckler Dylan T., Thumser Zachary C., Schofield Jonathon S., and Marasco Paul D.. 2018. Reliability in evaluator-based tests: Using simulation-constructed models to determine contextually relevant agreement thresholds. BMC Med. Res. Methodol. 18, 1 (2018), 112.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Bowman Samuel R., Angeli Gabor, Potts Christopher, and Manning Christopher D.. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 632642.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, et al. 2020. Language models are few-shot learners. In Proceedings of the Annual Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  6. [6] Camburu Oana-Maria, Rocktäschel Tim, Lukasiewicz Thomas, and Blunsom Phil. 2018. E-SNLI: Natural language inference with natural language explanations. In Proceedings of the Annual Conference on Neural Information Processing Systems. 95609572.Google ScholarGoogle Scholar
  7. [7] Denaux Ronald and Gomez-Perez Jose Manuel. 2020. Linked credibility reviews for explainable misinformation detection. In The Semantic Web – ISWC 2020. Springer International Publishing, 147163.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics. 41714186.Google ScholarGoogle Scholar
  9. [9] Lucas Graves. 2018. Understanding the promise and limits of automated fact-checking. Factsheet published by the Reuters Institute for the Study of Journalism (2018).Google ScholarGoogle Scholar
  10. [10] Guo Chuan, Pleiss Geoff, Sun Yu, and Weinberger Kilian Q.. 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). JMLR.org, 13211330.Google ScholarGoogle Scholar
  11. [11] Hanselowski Andreas, Zhang Hao, Li Zile, Sorokin Daniil, Schiller Benjamin, Schulz Claudia, and Gurevych Iryna. 2018. UKP-Athene: Multi-sentence textual entailment for claim verification. In Proceedings of the 1st Workshop on Fact Extraction and VERification (FEVER).Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Kim Seonhoon, Kang Inho, and Kwak Nojun. 2019. Semantic sentence matching with densely-connected recurrent and co-attentive information. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. AAAI Press, 65866593.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Kotonya Neema and Toni Francesca. 2020. Explainable automated fact-checking for public health claims. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 77407754.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Klaus Krippendorff. 2011. Computing Krippendorff’s alpha-reliability. Working Paper. Retrieved from https://repository.upenn.edu/asc_papers/43.Google ScholarGoogle Scholar
  15. [15] Kutlu Mücahid, McDonnell Tyler, Elsayed Tamer, and Lease Matthew. 2020. Annotator rationales for labeling tasks in crowdsourcing. J. Artif. Intell. Res. 69 (2020), 143189.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Lewis Mike, Liu Yinhan, Goyal Naman, Ghazvininejad Marjan, Mohamed Abdelrahman, Levy Omer, Stoyanov Veselin, and Zettlemoyer Luke. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 78717880.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Liu Xiaodong, He Pengcheng, Chen Weizhu, and Gao Jianfeng. 2019. Multi-task deep neural networks for natural language understanding. In Proceedings of the 57th Conference of the Association for Computational Linguistics. Association for Computational Linguistics, 44874496.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis Mike, Zettlemoyer Luke, and Stoyanov Veselin. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google ScholarGoogle Scholar
  19. [19] Naeini Mahdi Pakdaman, Cooper Gregory F., and Hauskrecht Milos. 2015. Obtaining well calibrated probabilities using Bayesian binning. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). AAAI Press, 29012907. Google ScholarGoogle Scholar
  20. [20] Nie Yixin, Chen Haonan, and Bansal Mohit. 2019. Combining fact extraction and verification with neural semantic matching networks. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 68596866.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Pennycook Gordon, Epstein Ziv, Mosleh Mohsen, Arechar Antonio A., Eckles Dean, and Rand David G.. 2021. Shifting attention to accuracy can reduce misinformation online. Nature 592, 7855 (2021), 590595.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Pilault Jonathan, Elhattami Amine, and Pal Christopher J.. 2021. Conditionally adaptive multi-task learning: Improving transfer learning in NLP using fewer parameters & less data. In Proceedings of the 9th International Conference on Learning Representations.Google ScholarGoogle Scholar
  23. [23] Platt John et al. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10, 3 (1999), 6174.Google ScholarGoogle Scholar
  24. [24] Portelli Beatrice, Zhao Jason, Schuster Tal, Serra Giuseppe, and Santus Enrico. 2020. Distilling the evidence to augment fact verification models. In Proceedings of the 3rd Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics, 4751.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Mücahid Kutlu, Tyler McDonnell, Tamer Elsayed, and Matthew Lease. 2020. Annotator rationales for labeling tasks in crowdsourcing. J. Artif. Intell. Res. 69 (2020), 143–189. Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Reimers Nils and Gurevych Iryna. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 39803990.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Roitero Kevin, Soprano Michael, Fan Shaoyang, Spina Damiano, Mizzaro Stefano, and Demartini Gianluca. 2020. Can the crowd identify misinformation objectively? The effects of judgment scale and assessor’s background. In Proceedings of the 43rd International ACM SIGIR Conference. Association for Computing Machinery, New York, NY, 439448.Google ScholarGoogle Scholar
  28. [28] Sanh Victor, Debut Lysandre, Chaumond Julien, and Wolf Thomas. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).Google ScholarGoogle Scholar
  29. [29] Shleifer Sam and Rush Alexander M.. 2020. Pre-trained summarization distillation. arXiv preprint arXiv:2010.13002 (2020).Google ScholarGoogle Scholar
  30. [30] Shu Kai, Cui Limeng, Wang Suhang, Lee Dongwon, and Liu Huan. 2019. dEFEND: Explainable fake news detection. In Proceedings of the 25th ACM SIGKDD International Conference. Association for Computing Machinery, New York, NY, 395405.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Soleimani Amir, Monz Christof, and Worring Marcel. 2020. BERT for evidence retrieval and claim verification. In Proceedings of the 42nd European Conference on IR Research. Springer, 359366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Stammbach Dominik and Ash Elliott. 2020. e-FEVER: Explanations and summaries for automated fact checking. In Proceedings of theTruth and Trust Online Conference (TTO’20). Hacks Hackers.Google ScholarGoogle Scholar
  33. [33] Stammbach Dominik and Neumann Guenter. 2019. Team DOMLIN: Exploiting evidence enhancement for the FEVER shared task. In Proceedings of the 2nd Workshop on Fact Extraction and VERification (FEVER). 105109.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Thorne James, Vlachos Andreas, Cocarascu Oana, Christodoulopoulos Christos, and Mittal Arpit. 2018. The fact extraction and VERification (FEVER) shared task. In Proceedings of the 1st Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics, 19.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Toreini Ehsan, Aitken Mhairi, Coopamootoo Kovila, Elliott Karen, Zelaya Carlos Gonzalez, and Moorsel Aad van. 2020. The relationship between trust in AI and trustworthy machine learning technologies. In Proceedings of the Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, New York, NY, 272283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the Annual Conference on Neural Information Processing Systems. 59986008.Google ScholarGoogle Scholar
  37. [37] Vlachos Andreas and Riedel Sebastian. 2014. Fact checking: Task definition and dataset construction. In Proceedings of the Workshop on Language Technologies and Computational Social Science. Association for Computational Linguistics, 1822.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Wu Lianwei, Rao Yuan, Zhao Yongqiang, Liang Hao, and Nazir Ambreen. 2020. DTCA: Decision tree-based co-attention networks for explainable claim verification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 10241035.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Yoneda Takuma, Mitchell Jeff, Welbl Johannes, Stenetorp Pontus, and Riedel Sebastian. 2018. UCL machine reading group: Four factor framework for fact finding (HexaF). In Proceedings of the 1st Workshop on Fact Extraction and VERification (FEVER). Association for Computational Linguistics, 97102.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Zadrozny Bianca and Elkan Charles. 2001. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). 609616. Google ScholarGoogle Scholar
  41. [41] Zapf Antonia, Castell Stefanie, Morawietz Lars, and Karch André. 2016. Measuring inter-rater reliability for nominal data—which coefficients and confidence intervals are appropriate? BMC Med. Res. Methodol. 16, 1 (2016), 110.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Zhang Zhuosheng, Wu Yuwei, Li Zuchao, and Zhao Hai. 2018. Explicit contextual semantics for text comprehension. arXiv preprint arXiv:1809.02794 (2018).Google ScholarGoogle Scholar
  43. [43] Zhang Zhuosheng, Wu Yuwei, Zhao Hai, Li Zuchao, Zhang Shuailiang, Zhou Xi, and Zhou Xiang. 2020. Semantics-aware BERT for language understanding. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. AAAI Press, New York, NY, 96289635.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Neural Model to Jointly Predict and Explain Truthfulness of Statements

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Journal of Data and Information Quality
        Journal of Data and Information Quality  Volume 15, Issue 1
        March 2023
        197 pages
        ISSN:1936-1955
        EISSN:1936-1963
        DOI:10.1145/3578367
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 December 2022
        • Online AM: 9 July 2022
        • Accepted: 19 May 2022
        • Revised: 31 March 2022
        • Received: 22 November 2021
        Published in jdiq Volume 15, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)148
        • Downloads (Last 6 weeks)4

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format