Abstract
Text classification is an important task in natural language processing and numerous studies aim to improve the accuracy and efficiency of text classification models. In this study, we propose an effective and efficient text classification model which is based on self-attention solely. The recently proposed multi-dimensional self-attention significantly improved the performance of self-attention. However, existing models suffer from two major limitations: (1) the previous multi-dimensional self-attention models are quite time-consuming; (2) the dependencies of elements along the feature axis are not taken into account. To overcome these problems, in this paper, a much more computational efficient multi-dimensional self-attention model is proposed, and two parallel self-attention modules, called dual-axial self-attention, are applied to capture rich dependencies along the feature axis as well as the text axis. A text classification model is then derived. The experimental results on eight representative datasets show that the proposed text classification model can obtain state-of-the-art results and the proposed self-attention outperforms conventional self-attention models.
Similar content being viewed by others
References
Bi J W, Liu Y, Fan Z P, et al. Wisdom of crowds: conducting importance-performance analysis (IPA) through online reviews. Tourism Manage, 2019, 70: 460–478
Felbo B, Mislove A, Søgaard A, et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, 2017. 1615–1625
Liu Y, Bi J W, Fan Z P. Ranking products through online reviews: a method based on sentiment analysis technique and intuitionistic fuzzy set theory. Inf Fusion, 2017, 36: 149–161
Wang X J, Yu L T, Ren K, et al. Dynamic attention deep model for article recommendation by learning human editors’ demonstration. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, 2017. 2051–2059
Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, 2015. 649–657
Xiao Y J, Cho K. Efficient character-level document classification by combining convolution and recurrent layers. 2016. ArXiv:1602.00367
McCallum A, Nigam K. A comparison of event models for naive Bayes text classification. In: Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, 1998, 752: 41–48
Rennie J D M, Shih L, Teevan J, et al. Tackling the poor assumptions of naive Bayes text classifiers. In: Proceedings of the 20th International Conference 2003, Washington, 2003. 616–623
Jiang L, Li C, Wang S, et al. Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell, 2016, 52: 26–39
Jiang L, Wang S, Li C, et al. Structure extended multinomial naive Bayes. Inf Sci, 2016, 329: 346–356
Conneau A, Schwenk H, Barrault L, et al. Very deep convolutional networks for natural language processing. 2016. ArXiv:1606.01781
Yogatama D, Dyer C, Wang L, et al. Generative and discriminative text classification with recurrent neural networks. 2017. ArXiv:1703.01898
Yu Z P, Liu G S. Sliced recurrent neural networks. In: Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, 2018. 2953–2964
Shen T, Zhou T Y, Long G D, et al. DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, 2018. 2953–2964
Shen T, Zhou T Y, Long G D, et al. Reinforced self-attention network: a hybrid of hard and soft attention for sequence modeling. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, 2018. 4345–4352
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 2014. ArXiv:1409.0473
Luong T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, 2015. 1412–1421
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, 2017. 6000–6010
Dhingra B, Liu H X, Yang Z L, et al. Gated-attention readers for text comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, 2017. 1832–1846
Fu J, Liu J, Tian H J, et al. Dual attention network for scene segmentation. 2018. ArXiv:1809.02983
Yang B S, Wang L Y, Wong D F, et al. Convolutional self-attention networks. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), Minneapolis, 2019. 4040–4045
Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), New Orleans, 2018. 464–468
Al-Rfou R, Choe D, Constant N, et al. Character-level language modeling with deeper self-attention. 2018. ArXiv:1808.04444
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, 2015
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS 2010), Sardinia, 2010. 249–256
Demsar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res, 2006, 7: 1–30
Jiang L, Zhang L, Li C, et al. A correlation-based feature weighting filter for naive Bayes. IEEE Trans Knowl Data Eng, 2019, 31: 201–213
Grave E, Mikolov T, Joulin A, et al. Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), Valencia, 2017. 427–431
Yu Z P. Code of sliced recurrent neural networks (SRNN): zepingyu0512/srnn. 2019. https://github.com/zepingyu0512/srnn
Shen T. Code of directional self-attention network (DiSAN): taoshen58/DiSAN. 2019. https://github.com/taoshen58/DiSAN
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 61802435, 61802433).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, X., Qiu, X., Pang, J. et al. Dual-axial self-attention network for text classification. Sci. China Inf. Sci. 64, 222102 (2021). https://doi.org/10.1007/s11432-019-2744-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-019-2744-2