skip to main content
research-article

Combining Human and Machine Confidence in Truthfulness Assessment

Published:28 December 2022Publication History
Skip Abstract Section

Abstract

Automatically detecting online misinformation at scale is a challenging and interdisciplinary problem. Deciding what is to be considered truthful information is sometimes controversial and also difficult for educated experts. As the scale of the problem increases, human-in-the-loop approaches to truthfulness that combine both the scalability of machine learning (ML) and the accuracy of human contributions have been considered.

In this work, we look at the potential to automatically combine machine-based systems with human-based systems. The former exploit superviseds ML approaches; the latter involve either crowd workers (i.e., human non-experts) or human experts. Since both ML and crowdsourcing approaches can produce a score indicating the level of confidence on their truthfulness judgments (either algorithmic or self-reported, respectively), we address the question of whether it is feasible to make use of such confidence scores to effectively and efficiently combine three approaches: (i) machine-based methods, (ii) crowd workers, and (iii) human experts. The three approaches differ significantly, as they range from available, cheap, fast, scalable, but less accurate to scarce, expensive, slow, not scalable, but highly accurate.

REFERENCES

  1. [1] Allen Jennifer, Arechar Antonio A., Pennycook Gordon, and Rand David G.. 2021. Scaling up fact-checking using the wisdom of crowds. Science Advances 7, 36 (2021), eabf4393. Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Amershi Saleema, Cakmak Maya, Knox William Bradley, and Kulesza Todd. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Lora Aroyo and Chris Welty. 2013. Crowd truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. In The 5th International ACM Conference on Web Science in 2013 (WebSci’13). ACM, Paris, France.Google ScholarGoogle Scholar
  4. [4] Attenberg Joshua, Ipeirotis Panos, and Provost Foster. 2015. Beat the machine: Challenging humans to find a predictive model’s “unknown unknowns”. Journal of Data and Information Quality (JDIQ) 6, 1 (2015), 117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Checco Alessandro, Roitero Kevin, Maddalena Eddy, Mizzaro Stefano, and Demartini Gianluca. 2017. Let’s agree to disagree: Fixing agreement measures for crowdsourcing. In Proceedings of HCOMP. AAAI Press, Quebec City, Canada, 1120. https://aaai.org/ocs/index.php/HCOMP/HCOMP17/paper/view/15927Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Demartini Gianluca, Difallah Djellel Eddine, and Cudré-Mauroux Philippe. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of WWW. ACM, New York, NY, 469478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Demartini Gianluca, Difallah Djellel Eddine, Gadiraju Ujwal, and Catasta Michele. 2017. An introduction to hybrid human-machine information systems. Foundations and Trends in Web Science 7, 1 (2017), 187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Demartini Gianluca, Mizzaro Stefano, and Spina Damiano. 2020. Human-in-the-loop artificial intelligence for fighting online misinformation: Challenges and opportunities. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 43, 3 (Sep2020), 6574. http://sites.computer.org/debull/A20sept/p65.pdfGoogle ScholarGoogle Scholar
  9. [9] Demartini Gianluca, Trushkowsky Beth, Kraska Tim, Franklin Michael J., and Berkeley UC. 2013. CrowdQ: Crowdsourced query understanding. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR’13). www.cidrdb.org, Asilomar, CA, USA. http://cidrdb.org/cidr2013/Papers/CIDR13_Paper137.pdfGoogle ScholarGoogle Scholar
  10. [10] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805Google ScholarGoogle Scholar
  11. [11] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. Association for Computational Linguistics, Minneapolis, Minnesota, 41714186. Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Difallah Djellel Eddine, Demartini Gianluca, and Cudré-Mauroux Philippe. 2016. Scheduling human intelligence tasks in multi-tenant crowd-powered systems. In Proceedings of the 25th International Conference on World Wide Web. ACM, New York, NY, 855865.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Franklin Michael J., Kossmann Donald, Kraska Tim, Ramesh Sukriti, and Xin Reynold. 2011. CrowdDB: Answering queries with crowdsourcing. In Proceedings of SIGMOD. ACM, New York, NY, 6172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Gadiraju Ujwal, Demartini Gianluca, Kawase Ricardo, and Dietze Stefan. 2015. Human beyond the machine: Challenges and opportunities of microtask crowdsourcing. IEEE Intelligent Systems 30, 4 (2015), 8185.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Gadiraju Ujwal, Fetahu Besnik, Kawase Ricardo, Siehndel Patrick, and Dietze Stefan. 2017. Using worker self-assessments for competence-based pre-selection in crowdsourcing microtasks. ACM Trans. Comput.-Hum. Interact. 24, 4, Article 30 (Aug.2017), 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Guo Chuan, Pleiss Geoff, Sun Yu, and Weinberger Kilian Q.. 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 70), Precup Doina and Teh Yee Whye (Eds.). PMLR, 13211330. https://proceedings.mlr.press/v70/guo17a.htmlGoogle ScholarGoogle Scholar
  17. [17] Hertwig Ralph. 2012. Tapping into the wisdom of the crowd-with confidence. Science 336, 6079 (2012), 303304. Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Jarrett Julian, Silva Larissa Ferreira Da, Mello Laerte, Andere Sadallo, Cruz Gustavo, and Blake M. Brian. 2015. Self-generating a labor force for crowdsourcing: Is worker confidence a predictor of quality?. In Proceedings of the 3rd IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb’15). IEEE, Washington, DC, 8590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Joachims Thorsten and Radlinski Filip. 2007. Search engines that learn from implicit feedback. Computer 40, 8 (2007), 3440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Joglekar Manas, Garcia-Molina Hector, and Parameswaran Aditya. 2013. Evaluating the crowd with confidence. In Proceedings of KDD. ACM, Chicago, IL, 686694. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Barbera David La, Roitero Kevin, Spina Damiano, Mizzaro Stefano, and Demartini Gianluca. 2020. Crowdsourcing truthfulness: The impact of judgment scale and assessor bias. In Proceedings of ECIR. 207214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Li Qunwei and Varshney Pramod K.. 2017. Does confidence reporting from the crowd benefit crowdsourcing performance?. In Proceedings of the 2nd International Workshop on Social Sensing (SocialSens). ACM, Pittsburgh, PA, 4954. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Maddalena Eddy, Roitero Kevin, Demartini Gianluca, and Mizzaro Stefano. 2017. Considering assessor agreement in IR evaluation. In Proceedings of ICTIR. ACM, Amsterdam, NL, 7582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Mandelbaum Amit and Weinshall Daphna. 2017. Distance-based confidence score for neural network classifiers. (2017). https://arxiv.org/abs/1709.09844Google ScholarGoogle Scholar
  25. [25] Nakov Preslav, Corney David, Hasanain Maram, Alam Firoj, Elsayed Tamer, Barrón-Cedeño Alberto, Papotti Paolo, Shaar Shaden, and Martino Giovanni Da San. 2021. Automated fact-checking for assisting human fact-checkers. In Proceedings of IJCAI. IJCAI, 45514558. Survey Track.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Nowak Stefanie and Rüger Stefan. 2010. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the International Conference on Multimedia Information Retrieval (MIR’10). ACM, Philadelphia, PA, 557566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Poggi Matteo, Tosi Fabio, and Mattoccia Stefano. 2017. Quantitative evaluation of confidence measures in a machine learning world. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). Piscataway, NJ, IEEE 2017, Venice, IT, 52285237. Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Roitero Kevin, Soprano Michael, Fan Shaoyang, Spina Damiano, Mizzaro Stefano, and Demartini Gianluca. 2020. Can the crowd identify misinformation objectively? The effects of judgment scale and assessor’s background. In Proceedings of SIGIR. ACM, Xi’an, China, 439448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Roitero Kevin, Soprano Michael, Portelli Beatrice, Luise Massimiliano De, Spina Damiano, Mea Vincenzo Della, Serra Giuseppe, Mizzaro Stefano, and Demartini Gianluca. 2021. Can the crowd judge truthfulness? A longitudinal study on recent misinformation about COVID-19. (2021), 31 pages. Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Kevin Roitero, Michael Soprano, Beatrice Portelli, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, and Gianluca Demartini. 2020. The COVID-19 infodemic: Can the crowd judge recent misinformation objectively? In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland. ACM, New York, NY, 1305–1314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Saling Lauren L., Mallal Devi, Scholer Falk, Skelton Russell, and Spina Damiano. 2021. No one is immune to misinformation: An investigation of misinformation sharing by subscribers to a fact-checking newsletter. PLOS ONE 16, 8 (82021), 113. Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Sarasua Cristina, Simperl Elena, and Noy Natalya F.. 2012. Crowdmap: Crowdsourcing ontology alignment with microtasks. In Proceedings of the International Semantic Web Conference (ISWC’12). Springer, Boston, MA, 525541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Sellam Thibault, Yadlowsky Steve, Tenney Ian, Wei Jason, Saphra Naomi, D’Amour Alexander Nicholas, Linzen Tal, Bastings Jasmijn, Turc Iulia Raluca, Eisenstein Jacob, Das Dipanjan, and Pavlick Ellie. 2022. The MultiBERTs: BERT reproductions for robustness analysis. In ICLR 2022. https://arxiv.org/abs/2106.16163Google ScholarGoogle Scholar
  34. [34] Settles Burr. 2009. Active Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences. http://digital.library.wisc.edu/1793/60660Google ScholarGoogle Scholar
  35. [35] Song Jinhua, Wang Hao, Gao Yang, and An Bo. 2018. Active learning with confidence-based answers for crowdsourcing labeling tasks. Knowledge-Based Systems 159 (2018), 244258. Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Soprano Michael, Roitero Kevin, Barbera David La, Ceolin Davide, Spina Damiano, Mizzaro Stefano, and Demartini Gianluca. 2021. The many dimensions of truthfulness: Crowdsourcing misinformation assessments on a multidimensional scale. Information Processing & Management 58, 6 (2021), 102710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of NeurIPS. Curran Associates, Inc., Long Beach, CA, 59986008. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdfGoogle ScholarGoogle Scholar
  38. [38] Vicario Michela Del, Quattrociocchi Walter, Scala Antonio, and Zollo Fabiana. 2019. Polarization and fake news: Early warning of potential misinformation targets. ACM Trans. Web 13, 2, Article 10 (Mar2019), 22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Ahn Luis Von. 2006. Games with a purpose. Computer 39, 6 (2006), 9294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Wang William Yang. 2017. “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of ACL. ACL, Vancouver, Canada, 422426. https://aclanthology.org/P17-2067Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Combining Human and Machine Confidence in Truthfulness Assessment

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Journal of Data and Information Quality
        Journal of Data and Information Quality  Volume 15, Issue 1
        March 2023
        197 pages
        ISSN:1936-1955
        EISSN:1936-1963
        DOI:10.1145/3578367
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 December 2022
        • Online AM: 11 July 2022
        • Accepted: 25 May 2022
        • Revised: 31 March 2022
        • Received: 23 November 2021
        Published in jdiq Volume 15, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)213
        • Downloads (Last 6 weeks)21

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format