research-article

Combining Human and Machine Confidence in Truthfulness Assessment

Authors:
Yunke Qu

The University of Queensland, Brisbane, Australia

The University of Queensland, Brisbane, Australia
View Profile

,
Kevin Roitero

University of Udine, Udine, Italy

University of Udine, Udine, Italy
View Profile

,
David La Barbera

University of Udine, Udine, Italy

University of Udine, Udine, Italy
View Profile

,
Damiano Spina

RMIT University, Melbourne, Australia

RMIT University, Melbourne, Australia
View Profile

,
Stefano Mizzaro

University of Udine, Udine, Italy

University of Udine, Udine, Italy
View Profile

,
Gianluca Demartini

The University of Queensland, Brisbane, Australia

The University of Queensland, Brisbane, Australia
View Profile

Authors Info & Claims

Journal of Data and Information Quality Volume 15 Issue 1Article No.: 5pp 1–17https://doi.org/10.1145/3546916

Published:28 December 2022Publication History

Journal of Data and Information Quality

Abstract

Automatically detecting online misinformation at scale is a challenging and interdisciplinary problem. Deciding what is to be considered truthful information is sometimes controversial and also difficult for educated experts. As the scale of the problem increases, human-in-the-loop approaches to truthfulness that combine both the scalability of machine learning (ML) and the accuracy of human contributions have been considered.

In this work, we look at the potential to automatically combine machine-based systems with human-based systems. The former exploit superviseds ML approaches; the latter involve either crowd workers (i.e., human non-experts) or human experts. Since both ML and crowdsourcing approaches can produce a score indicating the level of confidence on their truthfulness judgments (either algorithmic or self-reported, respectively), we address the question of whether it is feasible to make use of such confidence scores to effectively and efficiently combine three approaches: (i) machine-based methods, (ii) crowd workers, and (iii) human experts. The three approaches differ significantly, as they range from available, cheap, fast, scalable, but less accurate to scarce, expensive, slow, not scalable, but highly accurate.

REFERENCES

[1] Allen Jennifer, Arechar Antonio A., Pennycook Gordon, and Rand David G.. 2021. Scaling up fact-checking using the wisdom of crowds. Science Advances 7, 36 (2021), eabf4393. Google ScholarCross Ref
[2] Amershi Saleema, Cakmak Maya, Knox William Bradley, and Kulesza Todd. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105–120. Google ScholarDigital Library
[3] Lora Aroyo and Chris Welty. 2013. Crowd truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. In The 5th International ACM Conference on Web Science in 2013 (WebSci’13). ACM, Paris, France.Google Scholar
[4] Attenberg Joshua, Ipeirotis Panos, and Provost Foster. 2015. Beat the machine: Challenging humans to find a predictive model’s “unknown unknowns”. Journal of Data and Information Quality (JDIQ) 6, 1 (2015), 1–17. Google ScholarDigital Library
[5] Checco Alessandro, Roitero Kevin, Maddalena Eddy, Mizzaro Stefano, and Demartini Gianluca. 2017. Let’s agree to disagree: Fixing agreement measures for crowdsourcing. In Proceedings of HCOMP. AAAI Press, Quebec City, Canada, 11–20. https://aaai.org/ocs/index.php/HCOMP/HCOMP17/paper/view/15927Google ScholarCross Ref
[6] Demartini Gianluca, Difallah Djellel Eddine, and Cudré-Mauroux Philippe. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of WWW. ACM, New York, NY, 469–478. Google ScholarDigital Library
[7] Demartini Gianluca, Difallah Djellel Eddine, Gadiraju Ujwal, and Catasta Michele. 2017. An introduction to hybrid human-machine information systems. Foundations and Trends in Web Science 7, 1 (2017), 1–87. Google ScholarDigital Library
[8] Demartini Gianluca, Mizzaro Stefano, and Spina Damiano. 2020. Human-in-the-loop artificial intelligence for fighting online misinformation: Challenges and opportunities. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 43, 3 (Sep2020), 65–74. http://sites.computer.org/debull/A20sept/p65.pdfGoogle Scholar
[9] Demartini Gianluca, Trushkowsky Beth, Kraska Tim, Franklin Michael J., and Berkeley UC. 2013. CrowdQ: Crowdsourced query understanding. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR’13). www.cidrdb.org, Asilomar, CA, USA. http://cidrdb.org/cidr2013/Papers/CIDR13_Paper137.pdfGoogle Scholar
[10] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805Google Scholar
[11] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. Google ScholarCross Ref
[12] Difallah Djellel Eddine, Demartini Gianluca, and Cudré-Mauroux Philippe. 2016. Scheduling human intelligence tasks in multi-tenant crowd-powered systems. In Proceedings of the 25th International Conference on World Wide Web. ACM, New York, NY, 855–865.Google ScholarDigital Library
[13] Franklin Michael J., Kossmann Donald, Kraska Tim, Ramesh Sukriti, and Xin Reynold. 2011. CrowdDB: Answering queries with crowdsourcing. In Proceedings of SIGMOD. ACM, New York, NY, 61–72. Google ScholarDigital Library
[14] Gadiraju Ujwal, Demartini Gianluca, Kawase Ricardo, and Dietze Stefan. 2015. Human beyond the machine: Challenges and opportunities of microtask crowdsourcing. IEEE Intelligent Systems 30, 4 (2015), 81–85.Google ScholarDigital Library
[15] Gadiraju Ujwal, Fetahu Besnik, Kawase Ricardo, Siehndel Patrick, and Dietze Stefan. 2017. Using worker self-assessments for competence-based pre-selection in crowdsourcing microtasks. ACM Trans. Comput.-Hum. Interact. 24, 4, Article 30 (Aug.2017), 26 pages. Google ScholarDigital Library
[16] Guo Chuan, Pleiss Geoff, Sun Yu, and Weinberger Kilian Q.. 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 70), Precup Doina and Teh Yee Whye (Eds.). PMLR, 1321–1330. https://proceedings.mlr.press/v70/guo17a.htmlGoogle Scholar
[17] Hertwig Ralph. 2012. Tapping into the wisdom of the crowd-with confidence. Science 336, 6079 (2012), 303–304. Google ScholarCross Ref
[18] Jarrett Julian, Silva Larissa Ferreira Da, Mello Laerte, Andere Sadallo, Cruz Gustavo, and Blake M. Brian. 2015. Self-generating a labor force for crowdsourcing: Is worker confidence a predictor of quality?. In Proceedings of the 3rd IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb’15). IEEE, Washington, DC, 85–90. Google ScholarDigital Library
[19] Joachims Thorsten and Radlinski Filip. 2007. Search engines that learn from implicit feedback. Computer 40, 8 (2007), 34–40. Google ScholarDigital Library
[20] Joglekar Manas, Garcia-Molina Hector, and Parameswaran Aditya. 2013. Evaluating the crowd with confidence. In Proceedings of KDD. ACM, Chicago, IL, 686–694. Google ScholarDigital Library
[21] Barbera David La, Roitero Kevin, Spina Damiano, Mizzaro Stefano, and Demartini Gianluca. 2020. Crowdsourcing truthfulness: The impact of judgment scale and assessor bias. In Proceedings of ECIR. 207–214.Google ScholarDigital Library
[22] Li Qunwei and Varshney Pramod K.. 2017. Does confidence reporting from the crowd benefit crowdsourcing performance?. In Proceedings of the 2nd International Workshop on Social Sensing (SocialSens). ACM, Pittsburgh, PA, 49–54. Google ScholarDigital Library
[23] Maddalena Eddy, Roitero Kevin, Demartini Gianluca, and Mizzaro Stefano. 2017. Considering assessor agreement in IR evaluation. In Proceedings of ICTIR. ACM, Amsterdam, NL, 75–82. Google ScholarDigital Library
[24] Mandelbaum Amit and Weinshall Daphna. 2017. Distance-based confidence score for neural network classifiers. (2017). https://arxiv.org/abs/1709.09844Google Scholar
[25] Nakov Preslav, Corney David, Hasanain Maram, Alam Firoj, Elsayed Tamer, Barrón-Cedeño Alberto, Papotti Paolo, Shaar Shaden, and Martino Giovanni Da San. 2021. Automated fact-checking for assisting human fact-checkers. In Proceedings of IJCAI. IJCAI, 4551–4558. Survey Track.Google ScholarCross Ref
[26] Nowak Stefanie and Rüger Stefan. 2010. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the International Conference on Multimedia Information Retrieval (MIR’10). ACM, Philadelphia, PA, 557–566. Google ScholarDigital Library
[27] Poggi Matteo, Tosi Fabio, and Mattoccia Stefano. 2017. Quantitative evaluation of confidence measures in a machine learning world. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). Piscataway, NJ, IEEE 2017, Venice, IT, 5228–5237. Google ScholarCross Ref
[28] Roitero Kevin, Soprano Michael, Fan Shaoyang, Spina Damiano, Mizzaro Stefano, and Demartini Gianluca. 2020. Can the crowd identify misinformation objectively? The effects of judgment scale and assessor’s background. In Proceedings of SIGIR. ACM, Xi’an, China, 439–448. Google ScholarDigital Library
[29] Roitero Kevin, Soprano Michael, Portelli Beatrice, Luise Massimiliano De, Spina Damiano, Mea Vincenzo Della, Serra Giuseppe, Mizzaro Stefano, and Demartini Gianluca. 2021. Can the crowd judge truthfulness? A longitudinal study on recent misinformation about COVID-19. (2021), 31 pages. Google ScholarCross Ref
[30] Kevin Roitero, Michael Soprano, Beatrice Portelli, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, and Gianluca Demartini. 2020. The COVID-19 infodemic: Can the crowd judge recent misinformation objectively? In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland. ACM, New York, NY, 1305–1314. Google ScholarDigital Library
[31] Saling Lauren L., Mallal Devi, Scholer Falk, Skelton Russell, and Spina Damiano. 2021. No one is immune to misinformation: An investigation of misinformation sharing by subscribers to a fact-checking newsletter. PLOS ONE 16, 8 (82021), 1–13. Google ScholarCross Ref
[32] Sarasua Cristina, Simperl Elena, and Noy Natalya F.. 2012. Crowdmap: Crowdsourcing ontology alignment with microtasks. In Proceedings of the International Semantic Web Conference (ISWC’12). Springer, Boston, MA, 525–541. Google ScholarDigital Library
[33] Sellam Thibault, Yadlowsky Steve, Tenney Ian, Wei Jason, Saphra Naomi, D’Amour Alexander Nicholas, Linzen Tal, Bastings Jasmijn, Turc Iulia Raluca, Eisenstein Jacob, Das Dipanjan, and Pavlick Ellie. 2022. The MultiBERTs: BERT reproductions for robustness analysis. In ICLR 2022. https://arxiv.org/abs/2106.16163Google Scholar
[34] Settles Burr. 2009. Active Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences. http://digital.library.wisc.edu/1793/60660Google Scholar
[35] Song Jinhua, Wang Hao, Gao Yang, and An Bo. 2018. Active learning with confidence-based answers for crowdsourcing labeling tasks. Knowledge-Based Systems 159 (2018), 244–258. Google ScholarCross Ref
[36] Soprano Michael, Roitero Kevin, Barbera David La, Ceolin Davide, Spina Damiano, Mizzaro Stefano, and Demartini Gianluca. 2021. The many dimensions of truthfulness: Crowdsourcing misinformation assessments on a multidimensional scale. Information Processing & Management 58, 6 (2021), 102710. Google ScholarDigital Library
[37] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of NeurIPS. Curran Associates, Inc., Long Beach, CA, 5998–6008. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdfGoogle Scholar
[38] Vicario Michela Del, Quattrociocchi Walter, Scala Antonio, and Zollo Fabiana. 2019. Polarization and fake news: Early warning of potential misinformation targets. ACM Trans. Web 13, 2, Article 10 (Mar2019), 22 pages. Google ScholarDigital Library
[39] Ahn Luis Von. 2006. Games with a purpose. Computer 39, 6 (2006), 92–94. Google ScholarDigital Library
[40] Wang William Yang. 2017. “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of ACL. ACL, Vancouver, Canada, 422–426. https://aclanthology.org/P17-2067Google ScholarCross Ref

Index Terms

Combining Human and Machine Confidence in Truthfulness Assessment
1. Human-centered computing
  1. Collaborative and social computing
2. Information systems
  1. Information retrieval

Recommendations

The Effects of Crowd Worker Biases in Fact-Checking Tasks
FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

Due to the increasing amount of information shared online every day, the need for sound and reliable ways of distinguishing between trustworthy and non-trustworthy information is as present as ever. One technique for performing fact-checking at scale is ...
Read More
Can the crowd judge truthfulness? A longitudinal study on recent misinformation about COVID-19
Abstract
Recently, the misinformation problem has been addressed with a crowdsourcing-based approach: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd of non-expert is exploited. We study whether crowdsourcing is an ...
Read More
Does Evidence from Peers Help Crowd Workers in Assessing Truthfulness?
WWW '22: Companion Proceedings of the Web Conference 2022

Misinformation has been rapidly spreading online. The current approach to deal with it is deploying expert fact-checkers that follow forensic processes to identify the veracity of statements. Unfortunately, such an approach does not scale well. To deal ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of Data and Information Quality Volume 15, Issue 1
March 2023
197 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3578367
Editor:
Tiziana Catarci
Sapienza University of Rome, Rome, Italy
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 December 2022
- Online AM: 11 July 2022
- Accepted: 25 May 2022
- Revised: 31 March 2022
- Received: 23 November 2021
Published in jdiq Volume 15, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Misinformation
crowdsourcing
hybrid intelligence
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 372
  Total Downloads
- Downloads (Last 12 months)213
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Combining Human and Machine Confidence in Truthfulness Assessment

Journal of Data and Information Quality

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

The Effects of Crowd Worker Biases in Fact-Checking Tasks

Can the crowd judge truthfulness? A longitudinal study on recent misinformation about COVID-19

Does Evidence from Peers Help Crowd Workers in Assessing Truthfulness?