Wavelet-Attention CNN for image classification

Zhao, Xiangyu; Huang, Peng; Shu, Xiangbo

doi:10.1007/s00530-022-00889-8

Wavelet-Attention CNN for image classification

Regular Paper
Published: 22 January 2022

Volume 28, pages 915–924, (2022)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

1988 Accesses
32 Citations
Explore all metrics

Abstract

The feature learning methods based on convolutional neural network (CNN) have successfully produced tremendous achievements in image classification tasks. However, the inherent noise and some other factors may weaken the effectiveness of the convolutional feature statistics. In this paper, we investigate Discrete Wavelet Transform (DWT) in the frequency domain and design a new Wavelet-Attention (WA) block to only implement attention in the high-frequency domain. Based on this, we propose a Wavelet-Attention convolutional neural network (WA-CNN) for image classification. Specifically, WA-CNN decomposes the feature maps into low-frequency and high-frequency components for storing the structures of the basic objects, as well as the detailed information and noise, respectively. Then, the WA block is leveraged to capture the detailed information in the high-frequency domain with different attention factors but reserves the basic object structures in the low-frequency domain. Experimental results on CIFAR-10 and CIFAR-100 datasets show that our proposed WA-CNN achieves significant improvements in classification accuracy compared to other related networks. Specifically, based on MobileNetV2 backbones, WA-CNN achieves 1.26% Top-1 accuracy improvement on the CIFAR-10 benchmark and 1.54% Top-1 accuracy improvement on the CIFAR-100 benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Deep learning models for digital image processing: a review

Article 07 January 2024

References

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Shu, X., Zhang, L., Tang, J., Xie, G.-S., Yan, S.: Computational face reader. In: International Conference on Multimedia Modeling, pp. 114–126 (2016)
Kumar, K., Shrimankar, D.D.: F-des: fast and deep event summarization. IEEE Trans. Multimed. 20(2), 323–334 (2017)
Article Google Scholar
Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3588–3597 (2018)
Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., Agrawal, A.: Context encoding for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention (2014). arXiv preprint arXiv:1412.7755
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Wang, H., Wu, X., Huang, Z., Xing, E.P.: High-frequency component helps explain the generalization of convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8684–8694 (2020)
Li, Q., Shen, L., Guo, S., Lai, Z.: Wavelet integrated cnns for noise-robust image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7245–7254 (2020)
Tang, J., Shu, X., Yan, R., Zhang, L.: Coherence constrained graph lstm for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)
Shu, X., Tang, J., Qi, G.-J., Liu, W., Yang, J.: Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 1110–1118 (2021)
Article Google Scholar
Shu, X., Zhang, L., Sun, Y., Tang, J.: Host-parasite: graph lstm-in-lstm for group activity recognition. IEEE Trans. Neural Netw. Learn. Syst. 32(2), 663–674 (2020)
Article Google Scholar
Yan, R., Xie, L., Tang, J., Shu, X., Tian, Q.: Higcin: hierarchical graph-based cross inference network for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Yan, R., Shu, X., Yuan, C., Tian, Q., Tang, J.: Position-aware participation-contributed temporal dynamic model for group activity recognition. IEEE Transactions on Neural Networks and Learning Systems (2021)
Kumar, K., Shrimankar, D.D.: Esumm: event summarization on scale-free networks. IETE Technical Review (2018)
Kumar, K.: Text query based summarized event searching interface system using deep learning over cloud. Multimed. Tools Appl. 80(7), 11079–11094 (2021)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Zagoruyko, S., Komodakis, N.: Wide residual networks (2016). arXiv preprint arXiv:1605.07146
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Chollet, F.: Xception: deep learning with depth wise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order boltzmann machine. In: Advances in Neural Information Processing Systems, pp. 1243–1251 (2010)
Woo, S., Park, J., Lee, J.-Y., So Kweon, I.: Cbam: convolutional block attention module. In: European Conference on Computer Vision (2018)
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: IEEE International Conference on Computer Vision Workshops (2019)
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)
Shu, X., Zhang, L., Qi, G.-J., Liu, W., Tang, J.: Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)
Shen, Z., Zhang, M., Zhao, H., Yi, S., Li, H.: Efficient attention: Attention with linear complexities (2018). arXiv preprint arXiv:1812.01243
Chen, Y., Kalantidis, Y., Li, J., Yan, S., Feng, J.: A nets: double attention networks (2018). arXiv preprint arXiv:1810.11579
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: IEEE International Conference on Computer Vision, pp. 603–612 (2019)
Park, J., Woo, S., Lee, J.-Y., Kweon, I.S.: Bam: Bottleneck attention module (2018). arXiv preprint arXiv:1807.06514
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: Frequency channel attention networks (2020). arXiv preprint arXiv:2012.11879
Penedo, M., Pearlman, W.A., Tahoces, P.G., Souto, M., Vidal, J.J.: Region-based wavelet coding methods for digital mammography. IEEE Trans. Med. Imaging 22(10), 1288–1296 (2003)
Article Google Scholar
Do, M.N., Vetterli, M.: The contourlet transform: an efficient directional multiresolution image representation. IEEE Trans. Image Process. 14(12), 2091–2106 (2005)
Article Google Scholar
Huang, H., He, R., Sun, Z., Tan, T.: Wavelet-srnet: a wavelet-based cnn for multi-scale face super resolution. In: IEEE International Conference on Computer Vision, pp. 1689–1697 (2017)
Savareh, B.A., Emami, H., Hajiabadi, M., Azimi, S.M., Ghafoori, M.: Wavelet-enhanced convolutional neural network: a new idea in a deep learning paradigm. Biomed. Eng./Biomedizinische Technik 64(2), 195–205 (2019)
Duan, Y., Liu, F., Jiao, L., Zhao, P., Zhang, L.: Sar image segmentation based on convolutional-wavelet neural network and markov random field. Pattern Recogn. 64, 255–267 (2017)
Article Google Scholar
Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W.: Multi-level wavelet-cnn for image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782 (2018)
Williams, T., Li, R.: Wavelet pooling for convolutional neural networks. In: International Conference on Learning Representations (2018)
Yoo, J., Uh, Y., Chun, S., Kang, B., Ha, J.-W.: Photorealistic style transfer via wavelet transforms. In: IEEE International Conference on Computer Vision, pp. 9036–9045 (2019)
Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 60–65 (2005)
Santoso, S., Powers, E.J., Grady, W.M., Hofmann, P.: Power quality assessment via wavelet transform analysis. IEEE Trans. Power Deliv. 11(2), 924–930 (1996)
Article Google Scholar
Zhang, R.: Making convolutional networks shift-invariant again. In: International Conference on Machine Learning, pp. 7324–7334 (2019)
Yang, L., Zhang, R.-Y., Li, L., Xie, X.: Simam: A simple, parameter-free attention module for convolutional neural networks. In: International Conference on Machine Learning, pp. 11863–11874 (2021)

Download references

Acknowledgements

The work is supported by the National Key R&D Program of China (No. 2018AAA0102001), the National Natural Science Foundation of China (Grant Nos. 62072245, and 61932020), the Natural Science Foundation of Jiangsu Province (Grant No. BK20211520).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei Street, Nanjing, 210094, Jiangsu, China
Xiangyu Zhao, Peng Huang & Xiangbo Shu

Authors

Xiangyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangbo Shu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangbo Shu.

Additional information

Communicated by Y. Zhang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, X., Huang, P. & Shu, X. Wavelet-Attention CNN for image classification. Multimedia Systems 28, 915–924 (2022). https://doi.org/10.1007/s00530-022-00889-8

Download citation

Received: 25 October 2021
Accepted: 26 December 2021
Published: 22 January 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00530-022-00889-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Wavelet-Attention CNN for image classification

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Deep learning models for digital image processing: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Wavelet-Attention CNN for image classification

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Deep learning models for digital image processing: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation