ABSTRACT
Recently, there have been many research efforts aiming to understand fake news phenomena and to identify typical patterns and features of fake news. Yet, the real discriminating power of these features is still unknown: some are more general, but others perform well only with specific data. In this work, we conduct a highly exploratory investigation that produced hundreds of thousands of models from a large and diverse set of features. These models are unbiased in the sense that their features are randomly chosen from the pool of available features. While the vast majority of models are ineffective, we were able to produce a number of models that yield highly accurate decisions, thus effectively separating fake news from actual stories. Specifically, we focused our analysis on models that rank a randomly chosen fake news story higher than a randomly chosen fact with more than 0.85 probability. For these models we found a strong link between features and model predictions, showing that some features are clearly tailored for detecting certain types of fake news, thus evidencing that different combinations of features cover a specific region of the fake news space. Finally, we present an explanation of factors contributing to model decisions, thus promoting civic reasoning by complementing our ability to evaluate digital content and reach warranted conclusions.
- Hadeer Ahmed, Issa Traore, and Sherif Saad. 2017. Detection of online fake news using N-gram analysis and machine learning techniques. In Int'l Conference on Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments (ISDDC).Google Scholar
- Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the 2016 election. Journal of Economic Perspectives 31, 2 (2017), 211--36.Google ScholarCross Ref
- David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proc. of the Annual ACM-SIAM symposium on Discrete Algorithms (SODA). Society for Industrial and Applied Mathematics. Google ScholarDigital Library
- Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern information retrieval. Vol. 463. ACM press New York. Google ScholarDigital Library
- Sreyasee Das Bhattacharjee, Ashit Talukder, and Bala Venkatram Balantrapu. 2017. Active learning based news veracity detection with feature weighting and deep-shallow fusion. In Proc. of the Int'l Conference on Big Data (Big Data). IEEE.Google ScholarCross Ref
- Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credibility on twitter. In Proc. of the Int'l Conference on World Wide Web (WWW). Google ScholarDigital Library
- Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proc. of the Int'l Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarDigital Library
- Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M Rocha, Johan Bollen, Filippo Menczer, and Alessandro Flammini. 2015. Computational fact checking from knowledge networks. PLOS ONE 10, 6 (2015).Google Scholar
- Niall J Conroy, Victoria L Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. In Proc. of the Annual Meeting of the (ASIS&T). Google ScholarDigital Library
- Daniel H Dalip, Marcos André Gonçalves, Marco Cristo, and Pável Calado. 2017. A general multiview framework for assessing the quality of collaboratively created content on web 2.0. Journal of the Association for Information Science and Technology 68, 2 (2017), 286--308. Google ScholarDigital Library
- Samantha Finn, Panagiotis Takis Metaxas, Eni Mustafaraj, Megan Oâ??Keefe, Lindsay Tang, Susan Tang, and Laura Zeng. 2014. TRAILS: A system for monitoring the propagation of rumors on twitter. In Proc. of the Computation + Journalism Conference (C+J).Google Scholar
- Adrien Friggeri, Lada A Adamic, Dean Eckles, and Justin Cheng. 2014. Rumor Cascades. In Proc. of the Int'l AAAI Conference on Weblogs and Social (ICWSM).Google Scholar
- Kevin Gallagher. 2017. The Social Media Demographics Report: Differences in age, gender, and income at the top platforms. http://www.businessinsider.com/thesocial- media-demographics-report-2017--8, Business Insider (2017).Google Scholar
- Jennifer Golbeck, Matthew Mauriello, Brooke Auxier, Keval H Bhanushali, Christopher Bonk, Mohamed Amine Bouzaghrane, Cody Buntain, Riya Chanduka, Paul Cheakalos, Jennine B Everett, and others. 2018. Fake News vs Satire: A Dataset and Analysis. In Proc. of the Int'l Conference onWeb Science (WebScience). Google ScholarDigital Library
- Aditi Gupta, Ponnurangam Kumaraguru, Carlos Castillo, and Patrick Meier. 2014. Tweetcred: Real-time credibility assessment of content on twitter. In Proc. of the Int'l Conference on Social Informatics (SocInfo).Google ScholarCross Ref
- Zhiwei Jin, Juan Cao, Yongdong Zhang, Jianshe Zhou, and Qi Tian. 2017. Novel visual and statistical image features for microblogs news verification. IEEE Transactions on Multimedia 19, 3 (2017), 598--608. Google ScholarDigital Library
- Jooyeon Kim, Behzad Tabibian, Alice Oh, Bernhard Schölkopf, and Manuel Gomez-Rodriguez. 2018. Leveraging the crowd to detect and reduce the spread of fake news and misinformation. In Proc. of the Int'l Conference on Web Search and Data Mining (WSDM). Google ScholarDigital Library
- Srijan Kumar, Robert West, and Jure Leskovec. 2016. Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes. In Proc. of the WWW Companion. Google ScholarDigital Library
- Sejeong Kwon, Meeyoung Cha, and Kyomin Jung. 2017. Rumor detection over varying time windows. PLOS ONE 12, 1 (2017).Google Scholar
- David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M Greenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Pennycook, David Rothschild, and others. 2018. The science of fake news. Science 359, 6380 (2018), 1094--1096.Google Scholar
- Yaliang Li, Qi Li, Jing Gao, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2015. On the discovery of evolving truth. In Proc. of the Int'l Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarDigital Library
- Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proc. of the Neural Information Processing Systems (NIPS), I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc. Google ScholarDigital Library
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data Using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.Google Scholar
- J. W. Pennebaker, M. E. Francis, and R. J. Booth. 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates (2001).Google Scholar
- Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. 2017. Automatic detection of fake news. Proc. of the Int'l Conference on Computational Linguistics (2017).Google Scholar
- Anirudh Ramachandran and Nick Feamster. 2006. Understanding the networklevel behavior of spammers. In Proc. of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM). Google ScholarDigital Library
- Jacob Ratkiewicz, Michael Conover, Mark R Meiss, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. 2011. Detecting and tracking political abuse in social media. In Proc. of the Int'l AAAI Conference onWeblogs and Social (ICWSM).Google Scholar
- Julio C. S. Reis, André Correia, Fabrício Murai, Adriano Veloso, and Fabrício Benevenuto. 2019. Supervised Learning for Fake News Detection. IEEE Intelligent Systems 34, 2 (2019). Google ScholarDigital Library
- Manoel H. Ribeiro, Pedro H. C. Guerra,Wagner Meira Jr., and VirgÃ?lio Almeida. 2017. "Everything I Disagree With is# FakeNews": Correlating Political Polarization and Spread of Misinformation. In Proc. of Data Science + Journalism Workshop.Google Scholar
- Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics 20 (1987), 53--65. Google ScholarDigital Library
- Victoria Rubin, Niall Conroy, Yimin Chen, and Sarah Cornwell. 2016. Fake news or truth? using satirical cues to detect potentially misleading news. In Proc. of the Workshop on Computational Approaches to Deception Detection (NAACL-HLT).Google ScholarCross Ref
- Giovanni Santia and Jake Williams. 2018. BuzzFace: A News Veracity Dataset with Facebook User Commentary and Egos. In Proc. of the Int'l AAAI Conference on Weblogs and Social (ICWSM).Google Scholar
- Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Filippo Menczer. 2016. Hoaxy: A platform for tracking online misinformation. In Proc. of the WWW Companion. Google ScholarDigital Library
- Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang, Alessandro Flammini, and Filippo Menczer. 2018. The spread of low-credibility content by social bots. Nature communications 9, 1 (2018), 4787.Google Scholar
- Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter 19, 1 (2017), 22--36. Google ScholarDigital Library
- C. Silverman, L. Strapagiel, H. Shaban, E. Hall,, and J. Singer-Vine. 2016. Hyperpartisan facebook pages are publishing false and misleading information at an alarming rate. https://www.buzzfeed.com/craigsilverman/partisan-fb-pagesanalysis, Buzzfeed (2016).Google Scholar
- Eugenio Tacchini, Gabriele Ballarin, Marco L Della Vedova, Stefano Moret, and Luca de Alfaro. 2017. Some like it hoax: Automated fake news detection in social networks. In Proc. of the Workshop on Data Science for Social Good (SoGood).Google Scholar
- Sebastian Tschiatschek, Adish Singla, Manuel Gomez Rodriguez, Arpit Merchant, and Andreas Krause. 2018. Fake News Detection in Social Networks via Crowd Signals. In Proc. of the WWW Companion. Google ScholarDigital Library
- Svitlana Volkova, Kyle Shaffer, Jin Yea Jang, and Nathan Hodas. 2017. Separating facts from fiction: Linguistic models to classify suspicious and trusted news posts on twitter. In Proc. of the Annual Meeting of the ACL.Google ScholarCross Ref
- Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science 359, 6380 (2018), 1146--1151.Google Scholar
- William Yang Wang. 2017. "Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection. In Proc. of the Annual Meeting of the ACL.Google ScholarCross Ref
- Wei Wei and Xiaojun Wan. 2017. Learning to identify ambiguous and misleading news headlines. In Proc. of the Int'l Joint Conference on AI (IJCAI). Google ScholarDigital Library
- Zhe Zhao, Paul Resnick, and Qiaozhu Mei. 2015. Enquiring minds: Early detection of rumors in social media from enquiry posts. In Proc. of the WWW Companion Google ScholarDigital Library
Index Terms
- Explainable Machine Learning for Fake News Detection
Recommendations
“This is Fake! Shared it by Mistake”:Assessing the Intent of Fake News Spreaders
WWW '22: Proceedings of the ACM Web Conference 2022Individuals can be misled by fake news and spread it unintentionally without knowing it is false. This phenomenon has been frequently observed but has not been investigated. Our aim in this work is to assess the intent of fake news spreaders. To ...
Science Disinformation: On the Problem of Fake News
AbstractThis article is devoted to an important socio-cultural phenomenon that undermines public confidence in science, that is, fake science news. The term fake news is analyzed and data on the dissemination of fake news on social networks is provided. ...
Multidimensional Analysis of Fake News Spreaders on Twitter
Computational Data and Social NetworksAbstractSocial media has become a tool to spread false information with the help of its large complex network. The consequences of such misinformation could be very severe. The paper uses the Twitter conversations about the scrapping of Article 370 in ...
Comments