skip to main content
10.1145/3292522.3326027acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

Explainable Machine Learning for Fake News Detection

Authors Info & Claims
Published:26 June 2019Publication History

ABSTRACT

Recently, there have been many research efforts aiming to understand fake news phenomena and to identify typical patterns and features of fake news. Yet, the real discriminating power of these features is still unknown: some are more general, but others perform well only with specific data. In this work, we conduct a highly exploratory investigation that produced hundreds of thousands of models from a large and diverse set of features. These models are unbiased in the sense that their features are randomly chosen from the pool of available features. While the vast majority of models are ineffective, we were able to produce a number of models that yield highly accurate decisions, thus effectively separating fake news from actual stories. Specifically, we focused our analysis on models that rank a randomly chosen fake news story higher than a randomly chosen fact with more than 0.85 probability. For these models we found a strong link between features and model predictions, showing that some features are clearly tailored for detecting certain types of fake news, thus evidencing that different combinations of features cover a specific region of the fake news space. Finally, we present an explanation of factors contributing to model decisions, thus promoting civic reasoning by complementing our ability to evaluate digital content and reach warranted conclusions.

References

  1. Hadeer Ahmed, Issa Traore, and Sherif Saad. 2017. Detection of online fake news using N-gram analysis and machine learning techniques. In Int'l Conference on Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments (ISDDC).Google ScholarGoogle Scholar
  2. Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the 2016 election. Journal of Economic Perspectives 31, 2 (2017), 211--36.Google ScholarGoogle ScholarCross RefCross Ref
  3. David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proc. of the Annual ACM-SIAM symposium on Discrete Algorithms (SODA). Society for Industrial and Applied Mathematics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern information retrieval. Vol. 463. ACM press New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sreyasee Das Bhattacharjee, Ashit Talukder, and Bala Venkatram Balantrapu. 2017. Active learning based news veracity detection with feature weighting and deep-shallow fusion. In Proc. of the Int'l Conference on Big Data (Big Data). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  6. Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credibility on twitter. In Proc. of the Int'l Conference on World Wide Web (WWW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proc. of the Int'l Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M Rocha, Johan Bollen, Filippo Menczer, and Alessandro Flammini. 2015. Computational fact checking from knowledge networks. PLOS ONE 10, 6 (2015).Google ScholarGoogle Scholar
  9. Niall J Conroy, Victoria L Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. In Proc. of the Annual Meeting of the (ASIS&T). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Daniel H Dalip, Marcos André Gonçalves, Marco Cristo, and Pável Calado. 2017. A general multiview framework for assessing the quality of collaboratively created content on web 2.0. Journal of the Association for Information Science and Technology 68, 2 (2017), 286--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Samantha Finn, Panagiotis Takis Metaxas, Eni Mustafaraj, Megan Oâ??Keefe, Lindsay Tang, Susan Tang, and Laura Zeng. 2014. TRAILS: A system for monitoring the propagation of rumors on twitter. In Proc. of the Computation + Journalism Conference (C+J).Google ScholarGoogle Scholar
  12. Adrien Friggeri, Lada A Adamic, Dean Eckles, and Justin Cheng. 2014. Rumor Cascades. In Proc. of the Int'l AAAI Conference on Weblogs and Social (ICWSM).Google ScholarGoogle Scholar
  13. Kevin Gallagher. 2017. The Social Media Demographics Report: Differences in age, gender, and income at the top platforms. http://www.businessinsider.com/thesocial- media-demographics-report-2017--8, Business Insider (2017).Google ScholarGoogle Scholar
  14. Jennifer Golbeck, Matthew Mauriello, Brooke Auxier, Keval H Bhanushali, Christopher Bonk, Mohamed Amine Bouzaghrane, Cody Buntain, Riya Chanduka, Paul Cheakalos, Jennine B Everett, and others. 2018. Fake News vs Satire: A Dataset and Analysis. In Proc. of the Int'l Conference onWeb Science (WebScience). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Aditi Gupta, Ponnurangam Kumaraguru, Carlos Castillo, and Patrick Meier. 2014. Tweetcred: Real-time credibility assessment of content on twitter. In Proc. of the Int'l Conference on Social Informatics (SocInfo).Google ScholarGoogle ScholarCross RefCross Ref
  16. Zhiwei Jin, Juan Cao, Yongdong Zhang, Jianshe Zhou, and Qi Tian. 2017. Novel visual and statistical image features for microblogs news verification. IEEE Transactions on Multimedia 19, 3 (2017), 598--608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jooyeon Kim, Behzad Tabibian, Alice Oh, Bernhard Schölkopf, and Manuel Gomez-Rodriguez. 2018. Leveraging the crowd to detect and reduce the spread of fake news and misinformation. In Proc. of the Int'l Conference on Web Search and Data Mining (WSDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Srijan Kumar, Robert West, and Jure Leskovec. 2016. Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes. In Proc. of the WWW Companion. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sejeong Kwon, Meeyoung Cha, and Kyomin Jung. 2017. Rumor detection over varying time windows. PLOS ONE 12, 1 (2017).Google ScholarGoogle Scholar
  20. David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M Greenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Pennycook, David Rothschild, and others. 2018. The science of fake news. Science 359, 6380 (2018), 1094--1096.Google ScholarGoogle Scholar
  21. Yaliang Li, Qi Li, Jing Gao, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2015. On the discovery of evolving truth. In Proc. of the Int'l Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proc. of the Neural Information Processing Systems (NIPS), I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data Using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.Google ScholarGoogle Scholar
  24. J. W. Pennebaker, M. E. Francis, and R. J. Booth. 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates (2001).Google ScholarGoogle Scholar
  25. Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. 2017. Automatic detection of fake news. Proc. of the Int'l Conference on Computational Linguistics (2017).Google ScholarGoogle Scholar
  26. Anirudh Ramachandran and Nick Feamster. 2006. Understanding the networklevel behavior of spammers. In Proc. of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jacob Ratkiewicz, Michael Conover, Mark R Meiss, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. 2011. Detecting and tracking political abuse in social media. In Proc. of the Int'l AAAI Conference onWeblogs and Social (ICWSM).Google ScholarGoogle Scholar
  28. Julio C. S. Reis, André Correia, Fabrício Murai, Adriano Veloso, and Fabrício Benevenuto. 2019. Supervised Learning for Fake News Detection. IEEE Intelligent Systems 34, 2 (2019). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Manoel H. Ribeiro, Pedro H. C. Guerra,Wagner Meira Jr., and VirgÃ?lio Almeida. 2017. "Everything I Disagree With is# FakeNews": Correlating Political Polarization and Spread of Misinformation. In Proc. of Data Science + Journalism Workshop.Google ScholarGoogle Scholar
  30. Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics 20 (1987), 53--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Victoria Rubin, Niall Conroy, Yimin Chen, and Sarah Cornwell. 2016. Fake news or truth? using satirical cues to detect potentially misleading news. In Proc. of the Workshop on Computational Approaches to Deception Detection (NAACL-HLT).Google ScholarGoogle ScholarCross RefCross Ref
  32. Giovanni Santia and Jake Williams. 2018. BuzzFace: A News Veracity Dataset with Facebook User Commentary and Egos. In Proc. of the Int'l AAAI Conference on Weblogs and Social (ICWSM).Google ScholarGoogle Scholar
  33. Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Filippo Menczer. 2016. Hoaxy: A platform for tracking online misinformation. In Proc. of the WWW Companion. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang, Alessandro Flammini, and Filippo Menczer. 2018. The spread of low-credibility content by social bots. Nature communications 9, 1 (2018), 4787.Google ScholarGoogle Scholar
  35. Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter 19, 1 (2017), 22--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Silverman, L. Strapagiel, H. Shaban, E. Hall,, and J. Singer-Vine. 2016. Hyperpartisan facebook pages are publishing false and misleading information at an alarming rate. https://www.buzzfeed.com/craigsilverman/partisan-fb-pagesanalysis, Buzzfeed (2016).Google ScholarGoogle Scholar
  37. Eugenio Tacchini, Gabriele Ballarin, Marco L Della Vedova, Stefano Moret, and Luca de Alfaro. 2017. Some like it hoax: Automated fake news detection in social networks. In Proc. of the Workshop on Data Science for Social Good (SoGood).Google ScholarGoogle Scholar
  38. Sebastian Tschiatschek, Adish Singla, Manuel Gomez Rodriguez, Arpit Merchant, and Andreas Krause. 2018. Fake News Detection in Social Networks via Crowd Signals. In Proc. of the WWW Companion. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Svitlana Volkova, Kyle Shaffer, Jin Yea Jang, and Nathan Hodas. 2017. Separating facts from fiction: Linguistic models to classify suspicious and trusted news posts on twitter. In Proc. of the Annual Meeting of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  40. Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science 359, 6380 (2018), 1146--1151.Google ScholarGoogle Scholar
  41. William Yang Wang. 2017. "Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection. In Proc. of the Annual Meeting of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  42. Wei Wei and Xiaojun Wan. 2017. Learning to identify ambiguous and misleading news headlines. In Proc. of the Int'l Joint Conference on AI (IJCAI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhe Zhao, Paul Resnick, and Qiaozhu Mei. 2015. Enquiring minds: Early detection of rumors in social media from enquiry posts. In Proc. of the WWW Companion Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Explainable Machine Learning for Fake News Detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WebSci '19: Proceedings of the 10th ACM Conference on Web Science
        June 2019
        395 pages
        ISBN:9781450362023
        DOI:10.1145/3292522

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 June 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        WebSci '19 Paper Acceptance Rate41of130submissions,32%Overall Acceptance Rate218of875submissions,25%

        Upcoming Conference

        Websci '24
        16th ACM Web Science Conference
        May 21 - 24, 2024
        Stuttgart , Germany

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader