ABSTRACT
Due to the increasing amount of information shared online every day, the need for sound and reliable ways of distinguishing between trustworthy and non-trustworthy information is as present as ever. One technique for performing fact-checking at scale is to employ human intelligence in the form of crowd workers. Although earlier work has suggested that crowd workers can reliably identify misinformation, cognitive biases of crowd workers may reduce the quality of truthfulness judgments in this context. We performed a systematic exploratory analysis of publicly available crowdsourced data to identify a set of potential systematic biases that may occur when crowd workers perform fact-checking tasks. Following this exploratory study, we collected a novel data set of crowdsourced truthfulness judgments to validate our hypotheses. Our findings suggest that workers generally overestimate the truthfulness of statements and that different individual characteristics (i.e., their belief in science) and cognitive biases (i.e., the affect heuristic and overconfidence) can affect their annotations. Interestingly, we find that, depending on the general judgment tendencies of workers, their biases may sometimes lead to more accurate judgments.
- Jennifer Allen, Antonio A Arechar, Gordon Pennycook, and David G Rand. 2021. Scaling up fact-checking using the wisdom of crowds. Science advances 7, 36 (2021), eabf4393.Google Scholar
- Ricardo Baeza-Yates. 2018. Bias on the Web. Commun. ACM 61, 6 (2018), 54–61.Google ScholarDigital Library
- Momen Bhuiyan, Amy Zhang, Connie Sehat, and Tanushree Mitra. 2020. Investigating “Who” in the Crowdsourcing of News Credibility. In Computational Journalism Symposium.Google Scholar
- Praveen Chandar and Ben Carterette. 2018. Estimating Clickthrough Bias in the Cascade Model. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM ’18). ACM, New York, NY, USA, 1587–1590.Google ScholarDigital Library
- Jiawei Chen, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He. 2020. Bias and Debias in Recommender System: A Survey and Future Directions. arXiv preprint arXiv:2010.03240(2020).Google Scholar
- Neil Dagnall, Andrew Denovan, Kenneth Graham Drinkwater, and Andrew Parker. 2019. An Evaluation of the Belief in Science Scale. Frontiers in Psychology 10 (2019), 861.Google ScholarCross Ref
- Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2020. Human-in-the-loop Artificial Intelligence for Fighting Online Misinformation: Challenges and Opportunities. Bulletin of IEEE Computer Society 43, 3 (2020), 65–74.Google Scholar
- Tim Draws, Oana Inel, Nava Tintarev, Christian Baden, and Benjamin Timmermans. 2022. Comprehensive Viewpoint Representations for a Deeper Understanding of User Interactions With Debated Topics. In ACM SIGIR Conference on Human Information Interaction and Retrieval (Regensburg, Germany) (CHIIR ’22). Association for Computing Machinery, New York, NY, USA, 135–145. https://doi.org/10.1145/3498366.3505812Google ScholarDigital Library
- Tim Draws, Alisa Rieger, Oana Inel, Ujwal Gadiraju, and Nava Tintarev. 2021. A Checklist to Combat Cognitive Biases in Crowdsourcing. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 9, 1 (Oct. 2021), 48–59. https://ojs.aaai.org/index.php/HCOMP/article/view/18939Google ScholarCross Ref
- Tim Draws, Nava Tintarev, Ujwal Gadiraju, Alessandro Bozzon, and Benjamin Timmermans. 2021. This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affect User Attitudes on Debated Topics. ACM, New York, NY, USA, 295–305.Google ScholarDigital Library
- Carsten Eickhoff. 2018. Cognitive Biases in Crowdsourcing. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM). ACM, 162–170.Google ScholarDigital Library
- Robert Epstein and Ronald E. Robertson. 2015. The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences 112, 33(2015), E4512–E4521.Google ScholarCross Ref
- Ziv Epstein, Gordon Pennycook, and David Rand. 2020. Will the Crowd Game the Algorithm? Using Layperson Judgments to Combat Misinformation on Social Media by Downranking Distrusted Sources. ACM, New York, NY, USA, 1–11.Google Scholar
- Edgar Erdfelder, Franz Faul, and Axel Buchner. 1996. GPOWER: A general power analysis program. Behavior Research Methods, Instruments, & Computers 28, 1 (01 Mar 1996), 1–11.Google Scholar
- Shane Frederick. 2005. Cognitive Reflection and Decision Making. Journal of Economic Perspectives 19, 4 (12 2005), 25–42.Google ScholarCross Ref
- Amira Ghenai and Yelena Mejova. 2017. Catching Zika Fever: Application of Crowdsourcing and Machine Learning for Tracking Health Misinformation on Twitter. In 2017 IEEE International Conference on Healthcare Informatics (ICHI). 518–518.Google Scholar
- Anastasia Giachanou and Paolo Rosso. 2020. The Battle Against Online Harmful Information: The Cases of Fake News and Hate Speech. Association for Computing Machinery, New York, NY, USA, 3503–3504.Google ScholarDigital Library
- Stephan Grimmelikhuijsen and Eva Knies. 2017. Validating a scale for citizen trust in government organizations. International Review of Administrative Sciences 83, 3 (2017), 583–601.Google ScholarCross Ref
- Matthew Haigh. 2016. Has the Standard Cognitive Reflection Test Become a Victim of Its Own Success?Advances in cognitive psychology 12, 3 (30 Sep 2016), 145–149.Google Scholar
- Lei Han, Kevin Roitero, Ujwal Gadiraju, Cristina Sarasua, Alessandro Checco, Eddy Maddalena, and Gianluca Demartini. 2019. The Impact of Task Abandonment in Crowdsourcing. IEEE Transactions on Knowledge and Data Engineering (2019), 1–1.Google ScholarCross Ref
- Christoph Hube, Besnik Fetahu, and Ujwal Gadiraju. 2019. Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments. In Proceedings of CHI. 12 pages.Google ScholarDigital Library
- Klaus Krippendorff. 2011. Computing Krippendorff’s Alpha-Reliability. (2011).Google Scholar
- David La Barbera, Kevin Roitero, Damiano Spina, Stefano Mizzaro, and Gianluca Demartini. 2020. Crowdsourcing Truthfulness: The Impact of Judgment Scale and Assessor Bias. In Proceedings of ECIR. Springer, New York, NY, USA, 207–214.Google Scholar
- Eun-Ju Lee. 2012. That’s Not the Way It Is: How User-Generated Comments on the News Affect Perceived Media Bias. Journal of Computer-Mediated Communication 18, 1 (2012), 32–45.Google ScholarDigital Library
- Kevin G. Love. 1981. Comparison of peer assessment methods: Reliability, validity, friendship bias, and user reaction.Journal of Applied Psychology 66, 4 (1981), 451.Google Scholar
- Eddy Maddalena, Davide Ceolin, and Stefano Mizzaro. 2018. Multidimensional News Quality: A Comparison of Crowdsourcing and Nichesourcing.. In CIKM Workshops. ACM, New York, NY, USA.Google Scholar
- Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. 2016. Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016). 31–41.Google ScholarCross Ref
- Susan Morgan. 2018. Fake news, disinformation, manipulation and online tactics to undermine democracy. Journal of Cyber Policy 3, 1 (2018), 39–43. https://doi.org/10.1080/23738871.2018.1462395 arXiv:https://doi.org/10.1080/23738871.2018.1462395Google ScholarCross Ref
- Lev Muchnik, Sinan Aral, and Sean J Taylor. 2013. Social influence bias: A randomized experiment. Science 341, 6146 (2013), 647–651.Google Scholar
- Gordon Pennycook and David G Rand. 2018. Crowdsourcing judgments of news source quality. SSRN. com (2018).Google Scholar
- Gordon Pennycook and David G Rand. 2019. Fighting Misinformation on Social Media Using Crowdsourced Judgments of News Source Quality. Proceedings of the National Academy of Sciences 116, 7 (2019), 2521–2526.Google ScholarCross Ref
- Marcos Rodrigues Pinto, Yuri Oliveira de Lima, Carlos Eduardo Barbosa, and Jano Moreira de Souza. 2019. Towards fact-checking through crowdsourcing. In 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 494–499.Google ScholarCross Ref
- Frances A. Pogacar, Amira Ghenai, Mark D. Smucker, and Charles L.A. Clarke. 2017. The Positive and Negative Influence of Search Results on People’s Decisions about the Efficacy of Medical Treatments. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval (Amsterdam, The Netherlands) (ICTIR ’17). Association for Computing Machinery, New York, NY, USA, 209–216.Google ScholarDigital Library
- Nicolas Pröllochs. 2021. Community-Based Fact-Checking on Twitter’s Birdwatch Platform. arXiv preprint arXiv:2104.07175(2021).Google Scholar
- Alisa Rieger, Tim Draws, Mariët Theune, and Nava Tintarev. 2021. This Item Might Reinforce Your Opinion: Obfuscation and Labeling of Search Results to Mitigate Confirmation Bias. In Proceedings of the 32nd ACM Conference on Hypertext and Social Media. 189–199.Google ScholarDigital Library
- Kevin Roitero, Gianluca Demartini, Stefano Mizzaro, and Damiano Spina. 2018. How Many Truth Levels? Six? One Hundred? Even More? Validating Truthfulness of Statements via Crowdsourcing. In Proceedings of the CIKM 2018 Workshops co-located with 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), Torino, Italy, October 22, 2018. ACM, New York, NY, USA, 6 pages.Google Scholar
- Kevin Roitero, Michael Soprano, Shaoyang Fan, Damiano Spina, Stefano Mizzaro, and Gianluca Demartini. 2020. Can The Crowd Identify Misinformation Objectively? The Effects of Judgment Scale and Assessor’s Background. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, USA, 439–448.Google ScholarDigital Library
- Kevin Roitero, Michael Soprano, Beatrice Portelli, Massimiliano De Luise, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, and Gianluca Demartini. 2021. Can the crowd judge truthfulness? A longitudinal study on recent misinformation about COVID-19. Personal and Ubiquitous Computing 1, 1 (2021), 1–31.Google Scholar
- Kevin Roitero, Michael Soprano, Beatrice Portelli, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, and Gianluca Demartini. 2020. The COVID-19 infodemic: Can the crowd judge recent misinformation objectively?. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1305–1314.Google ScholarDigital Library
- Ricky J Sethi. 2017. Crowdsourcing the verification of fake news and alternative facts. In Proceedings of the 28th ACM Conference on Hypertext and Social Media. 315–316.Google ScholarDigital Library
- Shaban Shabani and Maria Sokhn. 2018. Hybrid machine-crowd approach for fake news detection. In 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC). IEEE, 299–306.Google ScholarCross Ref
- Michael Soprano, Kevin Roitero, Francesco Bombassei De Bona, and Stefano Mizzaro. 2022. Crowd_Frame: A Simple and Complete Framework to Deploy Complex Crowdsourcing Tasks Off-the-Shelf. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (Virtual Event, AZ, USA) (WSDM ’22). Association for Computing Machinery, New York, NY, USA, 1605–1608. https://doi.org/10.1145/3488560.3502182Google ScholarDigital Library
- Michael Soprano, Kevin Roitero, David La Barbera, Davide Ceolin, Damiano Spina, Stefano Mizzaro, and Gianluca Demartini. 2021. The many dimensions of truthfulness: Crowdsourcing misinformation assessments on a multidimensional scale. Information Processing & Management 58, 6 (2021), 102710.Google ScholarDigital Library
- Sebastian Tschiatschek, Adish Singla, Manuel Gomez Rodriguez, Arpit Merchant, and Andreas Krause. 2017. Detecting fake news in social networks via crowdsourcing. arXiv preprint arXiv:1711.09025(2017).Google Scholar
- Jacky Visser, John Lawrence, and Chris Reed. 2020. Reason-Checking Fake News. Commun. ACM 63, 11 (oct 2020), 38–40.Google ScholarDigital Library
- Silvio Waisbord. 2018. Truth is what happens to news: On journalism, fake news, and post-truth. Journalism studies 19, 13 (2018), 1866–1878.Google ScholarCross Ref
- William Yang Wang. 2017. ” Liar, Liar Pants on Fire”: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). ACL, Vancouver, Canada, 422–426.Google ScholarCross Ref
- William Yang Wang. 2017. ”Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. In Proceedings of ACL. 422–426.Google Scholar
- Ryen W White. 2014. Belief dynamics in Web search. Journal of the Association for Information Science and Technology 65, 11(2014), 2165–2178.Google ScholarDigital Library
- Pinar Yildirim, Esther Gal-Or, and Tansev Geylani. 2013. User-Generated Content and Bias in News Media. Management Science 59, 12 (2013), 2655–2666.Google ScholarDigital Library
- Yisong Yue, Rajan Patel, and Hein Roehrig. 2010. Beyond Position Bias: Examining Result Attractiveness as a Source of Presentation Bias in Clickthrough Data. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA, 1011–1018.Google ScholarDigital Library
Index Terms
- The Effects of Crowd Worker Biases in Fact-Checking Tasks
Recommendations
Crowdsourced Fact-Checking at Twitter: How Does the Crowd Compare With Experts?
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementFact-checking is one of the effective solutions in fighting online misinformation. However, traditional fact-checking is a process requiring scarce expert human resources, and thus does not scale well on social media because of the continuous flow of ...
Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing SystemsCrowdsourced data acquired from tasks that comprise a subjective component (e.g. opinion detection, sentiment analysis) is potentially affected by the inherent bias of crowd workers who contribute to the tasks. This can lead to biased and noisy ground-...
The many dimensions of truthfulness: Crowdsourcing misinformation assessments on a multidimensional scale
AbstractRecent work has demonstrated the viability of using crowdsourcing as a tool for evaluating the truthfulness of public statements. Under certain conditions such as: (1) having a balanced set of workers with different backgrounds and ...
Highlights- We crowdsource truthfulness judgments using seven dimensions rather than just one.
Comments