Abstract
Data augmentation is an unsupervised technique used to generate additional training data by slightly modifying already existing data. Besides preventing data scarcity, one of the main interest of data augmentation is that it increases training data diversity, and hence improves models’ ability to generalize to unseen data. In this work we investigate the use of text data augmentation for the task of stance and fake news detection.
In the first part of our work, we explore the effect of various text augmentation techniques on the performance of common classification algorithms. Besides identifying the best performing (classification algorithm, augmentation technique) pairs, our study reveals that the motto “the more, the better” is the wrong approach regarding text augmentation and that there is no one-size-fits-all text augmentation technique.
The second part of our work leverages the results of our study to propose a novel augmentation-based, ensemble learning approach that can be seen as a mixture between stacking and bagging. The proposed approach leverages text augmentation to enhance base learners’ diversity and accuracy, ergo the predictive performance of the ensemble. Experiments conducted on two real-world datasets show that our ensemble learning approach achieves very promising predictive performances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Karnyoto, A.S., Sun, C., Liu, B., Wang, X.: Augmentation and heterogeneous graph neural network for AAAI2021-Covid-19 fake news detection. Int. J. Mach. Learn. Cybern. 13 (2022). https://doi.org/10.1007/s13042-021-01503-5
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)
Dulhanty, C., Deglint, J.L., Daya, I.B., Wong, A.: Taking a stance on fake news: towards automatic disinformation assessment via deep bidirectional transformer language models for stance detection. CoRR abs/1911.11951 (2019)
Fellbaum, C.: Wordnet and wordnets. In: Barber, A. (ed.) Encyclopedia of Language and Linguistics, pp. 2–665. Elsevier, Amsterdam (2005)
Hanselowski, A., et al.: A retrospective analysis of the fake news challenge stance-detection task (2018)
Hsu, C.C., Ajorlou, A., Jadbabaie, Ali, P.: News sharing, and cascades on social networks, December 2021. https://ssrn.com/abstract=3934010 or https://doi.org/10.2139/ssrn.3934010. Accessed 05 Jan 2022
Jouini, K., Maaloul, M.H., Korbaa, O.: Real-time, CNN-based assistive device for visually impaired people. In: 2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–6 (2021)
Khan, J.Y., Khondaker, M.T.I., Afroz, S., Uddin, G., Iqbal, A.: A benchmark study of machine learning models for online fake news detection. Mach. Learn. Appl. 4, 100032 (2021). https://doi.org/10.1016/j.mlwa.2021.100032, https://www.sciencedirect.com/science/article/pii/S266682702100013X
Li, B., Hou, Y., Che, W.: Data augmentation approaches in natural language processing: a survey. CoRR abs/2110.01852 (2021). https://arxiv.org/abs/2110.01852
Li, S., et al.: Stacking-based ensemble learning on low dimensional features for fake news detection. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (2019). https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00383
Ma, E.: NLP augmentation (2019). https://github.com/makcedward/nlpaug. Accessed 15 May 2021
Mahabub, A.: A robust technique of fake news detection using ensemble voting classifier and comparison with other classifiers. SN Appl. Sci. 2(4), 1–9 (2020). https://doi.org/10.1007/s42452-020-2326-y
Marivate, V., Sefara, T.: Improving short text classification through global augmentation methods. CoRR abs/1907.03752 (2019). http://arxiv.org/abs/1907.03752
NLTK.org: Natural Language Toolkit. https://github.com/nltk/nltk. Accessed 15 May 2021
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pomerleau, D., Rao, D.: The fake news challenge: exploring how artificial intelligence technologies could be leveraged to combat fake news (2017). http://www.fakenewschallenge.org/. Accessed 15 Dec 2021
Riedel, B., Augenstein, I., Spithourakis, G.P., Riedel, S.: A simple but tough-to-beat baseline for the fake news challenge stance detection task. CoRR abs/1707.03264 (2017). http://arxiv.org/abs/1707.03264
Sepúlveda Torres, R., Vicente, M., Saquete, E., Lloret, E., Sanz, M.: Headlinestancechecker: exploiting summarization to detect headline disinformation. J. Web Semant. 71, 100660 (2021). https://doi.org/10.1016/j.websem.2021.100660
Serrano, E., Iglesias, C.A., Garijo, M.: A survey of Twitter rumor spreading simulations. In: Núñez, M., Nguyen, N.T., Camacho, D., Trawiński, B. (eds.) ICCCI 2015. LNCS (LNAI), vol. 9329, pp. 113–122. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24069-5_11
Shi, L., Liu, D., Liu, G., Meng, K.: AUG-BERT: an efficient data augmentation algorithm for text classification. In: Liang, Q., Wang, W., Liu, X., Na, Z., Jia, M., Zhang, B. (eds.) CSPS 2019. LNEE, vol. 571, pp. 2191–2198. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9409-6_266
Shorten, C., Khoshgoftaar, T.M., Furht, B.: Text data augmentation for deep learning. J. Big Data 8(1), 1–34 (2021). https://doi.org/10.1186/s40537-021-00492-0
Shu, K.: FakeNewsNet (2019). https://doi.org/10.7910/DVN/UEMMHS. Accessed 15 Dec 2021
Slovikovskaya, V.: Transfer learning from transformers to fake news challenge stance detection (FNC-1) task. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 1211–1218. European Language Resources Association (2019). https://www.aclweb.org/anthology/2020.lrec-1.152
Surowiecki, J.: The Wisdom of Crowds, 1st edn. Anchor Books, New York (2005)
Suting, Y., Ning, Z.: Construction of structural diversity of ensemble learning based on classification coding. In: 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), vol. 9, pp. 1205–1208 (2020). https://doi.org/10.1109/ITAIC49862.2020.9338807
Tesfagergish, S.G., Damaševičius, R., Kapočiūtė-Dzikienė, J.: Deep fake recognition in tweets using text augmentation, word embeddings and deep learning. In: Gervasi, O., et al. (eds.) ICCSA 2021. LNCS, vol. 12954, pp. 523–538. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86979-3_37
Wang, W.Y.: “liar, liar pants on fire”: a new benchmark dataset for fake news detection. CoRR abs/1705.00648 (2017). http://arxiv.org/abs/1705.00648
Xie, Q., Dai, Z., Hovy, E.H., Luong, M., Le, Q.V.: Unsupervised data augmentation. CoRR abs/1904.12848 (2019). http://arxiv.org/abs/1904.12848
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Salah, I., Jouini, K., Korbaa, O. (2022). Augmentation-Based Ensemble Learning for Stance and Fake News Detection. In: Bădică, C., Treur, J., Benslimane, D., Hnatkowska, B., Krótkiewicz, M. (eds) Advances in Computational Collective Intelligence. ICCCI 2022. Communications in Computer and Information Science, vol 1653. Springer, Cham. https://doi.org/10.1007/978-3-031-16210-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-16210-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16209-1
Online ISBN: 978-3-031-16210-7
eBook Packages: Computer ScienceComputer Science (R0)