Abstract
Automatically detecting online misinformation at scale is a challenging and interdisciplinary problem. Deciding what is to be considered truthful information is sometimes controversial and also difficult for educated experts. As the scale of the problem increases, human-in-the-loop approaches to truthfulness that combine both the scalability of machine learning (ML) and the accuracy of human contributions have been considered.
In this work, we look at the potential to automatically combine machine-based systems with human-based systems. The former exploit superviseds ML approaches; the latter involve either crowd workers (i.e., human non-experts) or human experts. Since both ML and crowdsourcing approaches can produce a score indicating the level of confidence on their truthfulness judgments (either algorithmic or self-reported, respectively), we address the question of whether it is feasible to make use of such confidence scores to effectively and efficiently combine three approaches: (i) machine-based methods, (ii) crowd workers, and (iii) human experts. The three approaches differ significantly, as they range from available, cheap, fast, scalable, but less accurate to scarce, expensive, slow, not scalable, but highly accurate.
- [1] . 2021. Scaling up fact-checking using the wisdom of crowds. Science Advances 7, 36 (2021), eabf4393. Google ScholarCross Ref
- [2] . 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105–120. Google ScholarDigital Library
- [3] Lora Aroyo and Chris Welty. 2013. Crowd truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. In The 5th International ACM Conference on Web Science in 2013 (WebSci’13). ACM, Paris, France.Google Scholar
- [4] . 2015. Beat the machine: Challenging humans to find a predictive model’s “unknown unknowns”. Journal of Data and Information Quality (JDIQ) 6, 1 (2015), 1–17. Google ScholarDigital Library
- [5] . 2017. Let’s agree to disagree: Fixing agreement measures for crowdsourcing. In Proceedings of HCOMP. AAAI Press, Quebec City, Canada, 11–20. https://aaai.org/ocs/index.php/HCOMP/HCOMP17/paper/view/15927Google ScholarCross Ref
- [6] . 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of WWW. ACM, New York, NY, 469–478. Google ScholarDigital Library
- [7] . 2017. An introduction to hybrid human-machine information systems. Foundations and Trends in Web Science 7, 1 (2017), 1–87. Google ScholarDigital Library
- [8] . 2020. Human-in-the-loop artificial intelligence for fighting online misinformation: Challenges and opportunities. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 43, 3 (
Sep 2020), 65–74. http://sites.computer.org/debull/A20sept/p65.pdfGoogle Scholar - [9] . 2013. CrowdQ: Crowdsourced query understanding. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR’13). www.cidrdb.org, Asilomar, CA, USA. http://cidrdb.org/cidr2013/Papers/CIDR13_Paper137.pdfGoogle Scholar
- [10] . 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018).
arXiv:1810.04805 http://arxiv.org/abs/1810.04805Google Scholar - [11] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. Google ScholarCross Ref
- [12] . 2016. Scheduling human intelligence tasks in multi-tenant crowd-powered systems. In Proceedings of the 25th International Conference on World Wide Web. ACM, New York, NY, 855–865.Google ScholarDigital Library
- [13] . 2011. CrowdDB: Answering queries with crowdsourcing. In Proceedings of SIGMOD. ACM, New York, NY, 61–72. Google ScholarDigital Library
- [14] . 2015. Human beyond the machine: Challenges and opportunities of microtask crowdsourcing. IEEE Intelligent Systems 30, 4 (2015), 81–85.Google ScholarDigital Library
- [15] . 2017. Using worker self-assessments for competence-based pre-selection in crowdsourcing microtasks. ACM Trans. Comput.-Hum. Interact. 24, 4, Article
30 (Aug. 2017), 26 pages. Google ScholarDigital Library - [16] . 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning(
Proceedings of Machine Learning Research , Vol. 70), and (Eds.). PMLR, 1321–1330. https://proceedings.mlr.press/v70/guo17a.htmlGoogle Scholar - [17] . 2012. Tapping into the wisdom of the crowd-with confidence. Science 336, 6079 (2012), 303–304. Google ScholarCross Ref
- [18] . 2015. Self-generating a labor force for crowdsourcing: Is worker confidence a predictor of quality?. In Proceedings of the 3rd IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb’15). IEEE, Washington, DC, 85–90. Google ScholarDigital Library
- [19] . 2007. Search engines that learn from implicit feedback. Computer 40, 8 (2007), 34–40. Google ScholarDigital Library
- [20] . 2013. Evaluating the crowd with confidence. In Proceedings of KDD. ACM, Chicago, IL, 686–694. Google ScholarDigital Library
- [21] . 2020. Crowdsourcing truthfulness: The impact of judgment scale and assessor bias. In Proceedings of ECIR. 207–214.Google ScholarDigital Library
- [22] . 2017. Does confidence reporting from the crowd benefit crowdsourcing performance?. In Proceedings of the 2nd International Workshop on Social Sensing (SocialSens). ACM, Pittsburgh, PA, 49–54. Google ScholarDigital Library
- [23] . 2017. Considering assessor agreement in IR evaluation. In Proceedings of ICTIR. ACM, Amsterdam, NL, 75–82. Google ScholarDigital Library
- [24] . 2017. Distance-based confidence score for neural network classifiers. (2017). https://arxiv.org/abs/1709.09844Google Scholar
- [25] . 2021. Automated fact-checking for assisting human fact-checkers. In Proceedings of IJCAI. IJCAI, 4551–4558.
Survey Track .Google ScholarCross Ref - [26] . 2010. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the International Conference on Multimedia Information Retrieval (MIR’10). ACM, Philadelphia, PA, 557–566. Google ScholarDigital Library
- [27] . 2017. Quantitative evaluation of confidence measures in a machine learning world. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). Piscataway, NJ, IEEE 2017, Venice, IT, 5228–5237. Google ScholarCross Ref
- [28] . 2020. Can the crowd identify misinformation objectively? The effects of judgment scale and assessor’s background. In Proceedings of SIGIR. ACM, Xi’an, China, 439–448. Google ScholarDigital Library
- [29] . 2021. Can the crowd judge truthfulness? A longitudinal study on recent misinformation about COVID-19. (2021), 31 pages. Google ScholarCross Ref
- [30] Kevin Roitero, Michael Soprano, Beatrice Portelli, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, and Gianluca Demartini. 2020. The COVID-19 infodemic: Can the crowd judge recent misinformation objectively? In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland. ACM, New York, NY, 1305–1314. Google ScholarDigital Library
- [31] . 2021. No one is immune to misinformation: An investigation of misinformation sharing by subscribers to a fact-checking newsletter. PLOS ONE 16, 8 (
8 2021), 1–13. Google ScholarCross Ref - [32] . 2012. Crowdmap: Crowdsourcing ontology alignment with microtasks. In Proceedings of the International Semantic Web Conference (ISWC’12). Springer, Boston, MA, 525–541. Google ScholarDigital Library
- [33] . 2022. The MultiBERTs: BERT reproductions for robustness analysis. In ICLR 2022. https://arxiv.org/abs/2106.16163Google Scholar
- [34] . 2009. Active Learning Literature Survey.
Technical Report . University of Wisconsin-Madison Department of Computer Sciences. http://digital.library.wisc.edu/1793/60660Google Scholar - [35] . 2018. Active learning with confidence-based answers for crowdsourcing labeling tasks. Knowledge-Based Systems 159 (2018), 244–258. Google ScholarCross Ref
- [36] . 2021. The many dimensions of truthfulness: Crowdsourcing misinformation assessments on a multidimensional scale. Information Processing & Management 58, 6 (2021), 102710. Google ScholarDigital Library
- [37] . 2017. Attention is all you need. In Proceedings of NeurIPS. Curran Associates, Inc., Long Beach, CA, 5998–6008. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdfGoogle Scholar
- [38] . 2019. Polarization and fake news: Early warning of potential misinformation targets. ACM Trans. Web 13, 2, Article
10 (Mar 2019), 22 pages. Google ScholarDigital Library - [39] . 2006. Games with a purpose. Computer 39, 6 (2006), 92–94. Google ScholarDigital Library
- [40] . 2017. “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of ACL. ACL, Vancouver, Canada, 422–426. https://aclanthology.org/P17-2067Google ScholarCross Ref
Index Terms
- Combining Human and Machine Confidence in Truthfulness Assessment
Recommendations
The Effects of Crowd Worker Biases in Fact-Checking Tasks
FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and TransparencyDue to the increasing amount of information shared online every day, the need for sound and reliable ways of distinguishing between trustworthy and non-trustworthy information is as present as ever. One technique for performing fact-checking at scale is ...
Can the crowd judge truthfulness? A longitudinal study on recent misinformation about COVID-19
AbstractRecently, the misinformation problem has been addressed with a crowdsourcing-based approach: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd of non-expert is exploited. We study whether crowdsourcing is an ...
Does Evidence from Peers Help Crowd Workers in Assessing Truthfulness?
WWW '22: Companion Proceedings of the Web Conference 2022Misinformation has been rapidly spreading online. The current approach to deal with it is deploying expert fact-checkers that follow forensic processes to identify the veracity of statements. Unfortunately, such an approach does not scale well. To deal ...
Comments