Skip to main content
Log in

Dual-axial self-attention network for text classification

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Text classification is an important task in natural language processing and numerous studies aim to improve the accuracy and efficiency of text classification models. In this study, we propose an effective and efficient text classification model which is based on self-attention solely. The recently proposed multi-dimensional self-attention significantly improved the performance of self-attention. However, existing models suffer from two major limitations: (1) the previous multi-dimensional self-attention models are quite time-consuming; (2) the dependencies of elements along the feature axis are not taken into account. To overcome these problems, in this paper, a much more computational efficient multi-dimensional self-attention model is proposed, and two parallel self-attention modules, called dual-axial self-attention, are applied to capture rich dependencies along the feature axis as well as the text axis. A text classification model is then derived. The experimental results on eight representative datasets show that the proposed text classification model can obtain state-of-the-art results and the proposed self-attention outperforms conventional self-attention models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bi J W, Liu Y, Fan Z P, et al. Wisdom of crowds: conducting importance-performance analysis (IPA) through online reviews. Tourism Manage, 2019, 70: 460–478

    Article  Google Scholar 

  2. Felbo B, Mislove A, Søgaard A, et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, 2017. 1615–1625

  3. Liu Y, Bi J W, Fan Z P. Ranking products through online reviews: a method based on sentiment analysis technique and intuitionistic fuzzy set theory. Inf Fusion, 2017, 36: 149–161

    Article  Google Scholar 

  4. Wang X J, Yu L T, Ren K, et al. Dynamic attention deep model for article recommendation by learning human editors’ demonstration. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, 2017. 2051–2059

  5. Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, 2015. 649–657

  6. Xiao Y J, Cho K. Efficient character-level document classification by combining convolution and recurrent layers. 2016. ArXiv:1602.00367

  7. McCallum A, Nigam K. A comparison of event models for naive Bayes text classification. In: Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, 1998, 752: 41–48

  8. Rennie J D M, Shih L, Teevan J, et al. Tackling the poor assumptions of naive Bayes text classifiers. In: Proceedings of the 20th International Conference 2003, Washington, 2003. 616–623

  9. Jiang L, Li C, Wang S, et al. Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell, 2016, 52: 26–39

    Article  Google Scholar 

  10. Jiang L, Wang S, Li C, et al. Structure extended multinomial naive Bayes. Inf Sci, 2016, 329: 346–356

    Article  Google Scholar 

  11. Conneau A, Schwenk H, Barrault L, et al. Very deep convolutional networks for natural language processing. 2016. ArXiv:1606.01781

  12. Yogatama D, Dyer C, Wang L, et al. Generative and discriminative text classification with recurrent neural networks. 2017. ArXiv:1703.01898

  13. Yu Z P, Liu G S. Sliced recurrent neural networks. In: Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), Santa Fe, 2018. 2953–2964

  14. Shen T, Zhou T Y, Long G D, et al. DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, 2018. 2953–2964

  15. Shen T, Zhou T Y, Long G D, et al. Reinforced self-attention network: a hybrid of hard and soft attention for sequence modeling. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, 2018. 4345–4352

  16. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 2014. ArXiv:1409.0473

  17. Luong T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, 2015. 1412–1421

  18. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, 2017. 6000–6010

  19. Dhingra B, Liu H X, Yang Z L, et al. Gated-attention readers for text comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, 2017. 1832–1846

  20. Fu J, Liu J, Tian H J, et al. Dual attention network for scene segmentation. 2018. ArXiv:1809.02983

  21. Yang B S, Wang L Y, Wong D F, et al. Convolutional self-attention networks. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019), Minneapolis, 2019. 4040–4045

  22. Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), New Orleans, 2018. 464–468

  23. Al-Rfou R, Choe D, Constant N, et al. Character-level language modeling with deeper self-attention. 2018. ArXiv:1808.04444

  24. Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, 2015

  25. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS 2010), Sardinia, 2010. 249–256

  26. Demsar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res, 2006, 7: 1–30

    MATH  Google Scholar 

  27. Jiang L, Zhang L, Li C, et al. A correlation-based feature weighting filter for naive Bayes. IEEE Trans Knowl Data Eng, 2019, 31: 201–213

    Article  Google Scholar 

  28. Grave E, Mikolov T, Joulin A, et al. Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), Valencia, 2017. 427–431

  29. Yu Z P. Code of sliced recurrent neural networks (SRNN): zepingyu0512/srnn. 2019. https://github.com/zepingyu0512/srnn

  30. Shen T. Code of directional self-attention network (DiSAN): taoshen58/DiSAN. 2019. https://github.com/taoshen58/DiSAN

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61802435, 61802433).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xipeng Qiu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Qiu, X., Pang, J. et al. Dual-axial self-attention network for text classification. Sci. China Inf. Sci. 64, 222102 (2021). https://doi.org/10.1007/s11432-019-2744-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-019-2744-2

Keywords

Navigation