Dual-axial self-attention network for text classification

Zhang, Xiaochuan; Qiu, Xipeng; Pang, Jianmin; Liu, Fudong; Li, Xingwei

doi:10.1007/s11432-019-2744-2

Dual-axial self-attention network for text classification

Research Paper
Published: 25 November 2021

Volume 64, article number 222102, (2021)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Xiaochuan Zhang¹,
Xipeng Qiu^2,3,
Jianmin Pang¹,
Fudong Liu¹ &
…
Xingwei Li¹

241 Accesses
6 Citations
Explore all metrics

Abstract

Text classification is an important task in natural language processing and numerous studies aim to improve the accuracy and efficiency of text classification models. In this study, we propose an effective and efficient text classification model which is based on self-attention solely. The recently proposed multi-dimensional self-attention significantly improved the performance of self-attention. However, existing models suffer from two major limitations: (1) the previous multi-dimensional self-attention models are quite time-consuming; (2) the dependencies of elements along the feature axis are not taken into account. To overcome these problems, in this paper, a much more computational efficient multi-dimensional self-attention model is proposed, and two parallel self-attention modules, called dual-axial self-attention, are applied to capture rich dependencies along the feature axis as well as the text axis. A text classification model is then derived. The experimental results on eight representative datasets show that the proposed text classification model can obtain state-of-the-art results and the proposed self-attention outperforms conventional self-attention models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

TextConvoNet: a convolutional neural network based architecture for text classification

Article 22 October 2022

How to Fine-Tune BERT for Text Classification?

References

Bi J W, Liu Y, Fan Z P, et al. Wisdom of crowds: conducting importance-performance analysis (IPA) through online reviews. Tourism Manage, 2019, 70: 460–478
Article Google Scholar
Felbo B, Mislove A, Søgaard A, et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, 2017. 1615–1625
Liu Y, Bi J W, Fan Z P. Ranking products through online reviews: a method based on sentiment analysis technique and intuitionistic fuzzy set theory. Inf Fusion, 2017, 36: 149–161
Article Google Scholar
Wang X J, Yu L T, Ren K, et al. Dynamic attention deep model for article recommendation by learning human editors’ demonstration. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, 2017. 2051–2059
Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, 2015. 649–657
Xiao Y J, Cho K. Efficient character-level document classification by combining convolution and recurrent layers. 2016. ArXiv:1602.00367
McCallum A, Nigam K. A comparison of event models for naive Bayes text classification. In: Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, 1998, 752: 41–48
Rennie J D M, Shih L, Teevan J, et al. Tackling the poor assumptions of naive Bayes text classifiers. In: Proceedings of the 20th International Conference 2003, Washington, 2003. 616–623
Jiang L, Li C, Wang S, et al. Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell, 2016, 52: 26–39
Article Google Scholar
Jiang L, Wang S, Li C, et al. Structure extended multinomial naive Bayes. Inf Sci, 2016, 329: 346–356
Article Google Scholar
Conneau A, Schwenk H, Barrault L, et al. Very deep convolutional networks for natural language processing. 2016. ArXiv:1606.01781
Yogatama D, Dyer C, Wang L, et al. Generative and discriminative text classification with recurrent neural networks. 2017. ArXiv:1703.01898
Yu Z P, Liu G S. Sliced recurrent neural networks. In: Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, 2018. 2953–2964
Shen T, Zhou T Y, Long G D, et al. DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, 2018. 2953–2964
Shen T, Zhou T Y, Long G D, et al. Reinforced self-attention network: a hybrid of hard and soft attention for sequence modeling. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, 2018. 4345–4352
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 2014. ArXiv:1409.0473
Luong T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, 2015. 1412–1421
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, 2017. 6000–6010
Dhingra B, Liu H X, Yang Z L, et al. Gated-attention readers for text comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, 2017. 1832–1846
Fu J, Liu J, Tian H J, et al. Dual attention network for scene segmentation. 2018. ArXiv:1809.02983
Yang B S, Wang L Y, Wong D F, et al. Convolutional self-attention networks. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), Minneapolis, 2019. 4040–4045
Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), New Orleans, 2018. 464–468
Al-Rfou R, Choe D, Constant N, et al. Character-level language modeling with deeper self-attention. 2018. ArXiv:1808.04444
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, 2015
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS 2010), Sardinia, 2010. 249–256
Demsar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res, 2006, 7: 1–30
MATH Google Scholar
Jiang L, Zhang L, Li C, et al. A correlation-based feature weighting filter for naive Bayes. IEEE Trans Knowl Data Eng, 2019, 31: 201–213
Article Google Scholar
Grave E, Mikolov T, Joulin A, et al. Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), Valencia, 2017. 427–431
Yu Z P. Code of sliced recurrent neural networks (SRNN): zepingyu0512/srnn. 2019. https://github.com/zepingyu0512/srnn
Shen T. Code of directional self-attention network (DiSAN): taoshen58/DiSAN. 2019. https://github.com/taoshen58/DiSAN

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61802435, 61802433).

Author information

Authors and Affiliations

State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, 450001, China
Xiaochuan Zhang, Jianmin Pang, Fudong Liu & Xingwei Li
Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai, 201203, China
Xipeng Qiu
School of Computer Science, Fudan University, Shanghai, 201203, China
Xipeng Qiu

Authors

Xiaochuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xipeng Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Pang
View author publications
You can also search for this author in PubMed Google Scholar
Fudong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xingwei Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xipeng Qiu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Qiu, X., Pang, J. et al. Dual-axial self-attention network for text classification. Sci. China Inf. Sci. 64, 222102 (2021). https://doi.org/10.1007/s11432-019-2744-2

Download citation

Received: 20 September 2019
Revised: 19 November 2019
Accepted: 26 December 2019
Published: 25 November 2021
DOI: https://doi.org/10.1007/s11432-019-2744-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual-axial self-attention network for text classification

Abstract

Access this article

Similar content being viewed by others

Impact of word embedding models on text analytics in deep learning environment: a review

TextConvoNet: a convolutional neural network based architecture for text classification

How to Fine-Tune BERT for Text Classification?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dual-axial self-attention network for text classification

Abstract

Access this article

Similar content being viewed by others

Impact of word embedding models on text analytics in deep learning environment: a review

TextConvoNet: a convolutional neural network based architecture for text classification

How to Fine-Tune BERT for Text Classification?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation