Skip to main content

Augmentation-Based Ensemble Learning for Stance and Fake News Detection

  • Conference paper
  • First Online:
Advances in Computational Collective Intelligence (ICCCI 2022)

Abstract

Data augmentation is an unsupervised technique used to generate additional training data by slightly modifying already existing data. Besides preventing data scarcity, one of the main interest of data augmentation is that it increases training data diversity, and hence improves models’ ability to generalize to unseen data. In this work we investigate the use of text data augmentation for the task of stance and fake news detection.

In the first part of our work, we explore the effect of various text augmentation techniques on the performance of common classification algorithms. Besides identifying the best performing (classification algorithm, augmentation technique) pairs, our study reveals that the motto “the more, the better” is the wrong approach regarding text augmentation and that there is no one-size-fits-all text augmentation technique.

The second part of our work leverages the results of our study to propose a novel augmentation-based, ensemble learning approach that can be seen as a mixture between stacking and bagging. The proposed approach leverages text augmentation to enhance base learners’ diversity and accuracy, ergo the predictive performance of the ensemble. Experiments conducted on two real-world datasets show that our ensemble learning approach achieves very promising predictive performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Karnyoto, A.S., Sun, C., Liu, B., Wang, X.: Augmentation and heterogeneous graph neural network for AAAI2021-Covid-19 fake news detection. Int. J. Mach. Learn. Cybern. 13 (2022). https://doi.org/10.1007/s13042-021-01503-5

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  4. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)

    Google Scholar 

  5. Dulhanty, C., Deglint, J.L., Daya, I.B., Wong, A.: Taking a stance on fake news: towards automatic disinformation assessment via deep bidirectional transformer language models for stance detection. CoRR abs/1911.11951 (2019)

    Google Scholar 

  6. Fellbaum, C.: Wordnet and wordnets. In: Barber, A. (ed.) Encyclopedia of Language and Linguistics, pp. 2–665. Elsevier, Amsterdam (2005)

    Google Scholar 

  7. Hanselowski, A., et al.: A retrospective analysis of the fake news challenge stance-detection task (2018)

    Google Scholar 

  8. Hsu, C.C., Ajorlou, A., Jadbabaie, Ali, P.: News sharing, and cascades on social networks, December 2021. https://ssrn.com/abstract=3934010 or https://doi.org/10.2139/ssrn.3934010. Accessed 05 Jan 2022

  9. Jouini, K., Maaloul, M.H., Korbaa, O.: Real-time, CNN-based assistive device for visually impaired people. In: 2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–6 (2021)

    Google Scholar 

  10. Khan, J.Y., Khondaker, M.T.I., Afroz, S., Uddin, G., Iqbal, A.: A benchmark study of machine learning models for online fake news detection. Mach. Learn. Appl. 4, 100032 (2021). https://doi.org/10.1016/j.mlwa.2021.100032, https://www.sciencedirect.com/science/article/pii/S266682702100013X

  11. Li, B., Hou, Y., Che, W.: Data augmentation approaches in natural language processing: a survey. CoRR abs/2110.01852 (2021). https://arxiv.org/abs/2110.01852

  12. Li, S., et al.: Stacking-based ensemble learning on low dimensional features for fake news detection. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (2019). https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00383

  13. Ma, E.: NLP augmentation (2019). https://github.com/makcedward/nlpaug. Accessed 15 May 2021

  14. Mahabub, A.: A robust technique of fake news detection using ensemble voting classifier and comparison with other classifiers. SN Appl. Sci. 2(4), 1–9 (2020). https://doi.org/10.1007/s42452-020-2326-y

    Article  MathSciNet  Google Scholar 

  15. Marivate, V., Sefara, T.: Improving short text classification through global augmentation methods. CoRR abs/1907.03752 (2019). http://arxiv.org/abs/1907.03752

  16. NLTK.org: Natural Language Toolkit. https://github.com/nltk/nltk. Accessed 15 May 2021

  17. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  18. Pomerleau, D., Rao, D.: The fake news challenge: exploring how artificial intelligence technologies could be leveraged to combat fake news (2017). http://www.fakenewschallenge.org/. Accessed 15 Dec 2021

  19. Riedel, B., Augenstein, I., Spithourakis, G.P., Riedel, S.: A simple but tough-to-beat baseline for the fake news challenge stance detection task. CoRR abs/1707.03264 (2017). http://arxiv.org/abs/1707.03264

  20. Sepúlveda Torres, R., Vicente, M., Saquete, E., Lloret, E., Sanz, M.: Headlinestancechecker: exploiting summarization to detect headline disinformation. J. Web Semant. 71, 100660 (2021). https://doi.org/10.1016/j.websem.2021.100660

  21. Serrano, E., Iglesias, C.A., Garijo, M.: A survey of Twitter rumor spreading simulations. In: Núñez, M., Nguyen, N.T., Camacho, D., Trawiński, B. (eds.) ICCCI 2015. LNCS (LNAI), vol. 9329, pp. 113–122. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24069-5_11

    Chapter  Google Scholar 

  22. Shi, L., Liu, D., Liu, G., Meng, K.: AUG-BERT: an efficient data augmentation algorithm for text classification. In: Liang, Q., Wang, W., Liu, X., Na, Z., Jia, M., Zhang, B. (eds.) CSPS 2019. LNEE, vol. 571, pp. 2191–2198. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9409-6_266

    Chapter  Google Scholar 

  23. Shorten, C., Khoshgoftaar, T.M., Furht, B.: Text data augmentation for deep learning. J. Big Data 8(1), 1–34 (2021). https://doi.org/10.1186/s40537-021-00492-0

    Article  Google Scholar 

  24. Shu, K.: FakeNewsNet (2019). https://doi.org/10.7910/DVN/UEMMHS. Accessed 15 Dec 2021

  25. Slovikovskaya, V.: Transfer learning from transformers to fake news challenge stance detection (FNC-1) task. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 1211–1218. European Language Resources Association (2019). https://www.aclweb.org/anthology/2020.lrec-1.152

  26. Surowiecki, J.: The Wisdom of Crowds, 1st edn. Anchor Books, New York (2005)

    Google Scholar 

  27. Suting, Y., Ning, Z.: Construction of structural diversity of ensemble learning based on classification coding. In: 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), vol. 9, pp. 1205–1208 (2020). https://doi.org/10.1109/ITAIC49862.2020.9338807

  28. Tesfagergish, S.G., Damaševičius, R., Kapočiūtė-Dzikienė, J.: Deep fake recognition in tweets using text augmentation, word embeddings and deep learning. In: Gervasi, O., et al. (eds.) ICCSA 2021. LNCS, vol. 12954, pp. 523–538. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86979-3_37

    Chapter  Google Scholar 

  29. Wang, W.Y.: “liar, liar pants on fire”: a new benchmark dataset for fake news detection. CoRR abs/1705.00648 (2017). http://arxiv.org/abs/1705.00648

  30. Xie, Q., Dai, Z., Hovy, E.H., Luong, M., Le, Q.V.: Unsupervised data augmentation. CoRR abs/1904.12848 (2019). http://arxiv.org/abs/1904.12848

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khaled Jouini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Salah, I., Jouini, K., Korbaa, O. (2022). Augmentation-Based Ensemble Learning for Stance and Fake News Detection. In: Bădică, C., Treur, J., Benslimane, D., Hnatkowska, B., Krótkiewicz, M. (eds) Advances in Computational Collective Intelligence. ICCCI 2022. Communications in Computer and Information Science, vol 1653. Springer, Cham. https://doi.org/10.1007/978-3-031-16210-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16210-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16209-1

  • Online ISBN: 978-3-031-16210-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics