Skip to main content
Log in

Wavelet-Attention CNN for image classification

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

The feature learning methods based on convolutional neural network (CNN) have successfully produced tremendous achievements in image classification tasks. However, the inherent noise and some other factors may weaken the effectiveness of the convolutional feature statistics. In this paper, we investigate Discrete Wavelet Transform (DWT) in the frequency domain and design a new Wavelet-Attention (WA) block to only implement attention in the high-frequency domain. Based on this, we propose a Wavelet-Attention convolutional neural network (WA-CNN) for image classification. Specifically, WA-CNN decomposes the feature maps into low-frequency and high-frequency components for storing the structures of the basic objects, as well as the detailed information and noise, respectively. Then, the WA block is leveraged to capture the detailed information in the high-frequency domain with different attention factors but reserves the basic object structures in the low-frequency domain. Experimental results on CIFAR-10 and CIFAR-100 datasets show that our proposed WA-CNN achieves significant improvements in classification accuracy compared to other related networks. Specifically, based on MobileNetV2 backbones, WA-CNN achieves 1.26% Top-1 accuracy improvement on the CIFAR-10 benchmark and 1.54% Top-1 accuracy improvement on the CIFAR-100 benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

  2. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

  3. Shu, X., Zhang, L., Tang, J., Xie, G.-S., Yan, S.: Computational face reader. In: International Conference on Multimedia Modeling, pp. 114–126 (2016)

  4. Kumar, K., Shrimankar, D.D.: F-des: fast and deep event summarization. IEEE Trans. Multimed. 20(2), 323–334 (2017)

    Article  Google Scholar 

  5. Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3588–3597 (2018)

  6. Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., Agrawal, A.: Context encoding for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160 (2018)

  7. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556

  8. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  10. Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention (2014). arXiv preprint arXiv:1412.7755

  11. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)

  12. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

  13. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

  14. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)

  15. Wang, H., Wu, X., Huang, Z., Xing, E.P.: High-frequency component helps explain the generalization of convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8684–8694 (2020)

  16. Li, Q., Shen, L., Guo, S., Lai, Z.: Wavelet integrated cnns for noise-robust image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7245–7254 (2020)

  17. Tang, J., Shu, X., Yan, R., Zhang, L.: Coherence constrained graph lstm for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)

  18. Shu, X., Tang, J., Qi, G.-J., Liu, W., Yang, J.: Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 1110–1118 (2021)

    Article  Google Scholar 

  19. Shu, X., Zhang, L., Sun, Y., Tang, J.: Host-parasite: graph lstm-in-lstm for group activity recognition. IEEE Trans. Neural Netw. Learn. Syst. 32(2), 663–674 (2020)

    Article  Google Scholar 

  20. Yan, R., Xie, L., Tang, J., Shu, X., Tian, Q.: Higcin: hierarchical graph-based cross inference network for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)

  21. Yan, R., Shu, X., Yuan, C., Tian, Q., Tang, J.: Position-aware participation-contributed temporal dynamic model for group activity recognition. IEEE Transactions on Neural Networks and Learning Systems (2021)

  22. Kumar, K., Shrimankar, D.D.: Esumm: event summarization on scale-free networks. IETE Technical Review (2018)

  23. Kumar, K.: Text query based summarized event searching interface system using deep learning over cloud. Multimed. Tools Appl. 80(7), 11079–11094 (2021)

    Article  Google Scholar 

  24. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  25. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  26. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

  27. Zagoruyko, S., Komodakis, N.: Wide residual networks (2016). arXiv preprint arXiv:1605.07146

  28. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)

  29. Chollet, F.: Xception: deep learning with depth wise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)

  30. Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order boltzmann machine. In: Advances in Neural Information Processing Systems, pp. 1243–1251 (2010)

  31. Woo, S., Park, J., Lee, J.-Y., So Kweon, I.: Cbam: convolutional block attention module. In: European Conference on Computer Vision (2018)

  32. Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: IEEE International Conference on Computer Vision Workshops (2019)

  33. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)

  34. Shu, X., Zhang, L., Qi, G.-J., Liu, W., Tang, J.: Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)

  35. Shen, Z., Zhang, M., Zhao, H., Yi, S., Li, H.: Efficient attention: Attention with linear complexities (2018). arXiv preprint arXiv:1812.01243

  36. Chen, Y., Kalantidis, Y., Li, J., Yan, S., Feng, J.: A nets: double attention networks (2018). arXiv preprint arXiv:1810.11579

  37. Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: IEEE International Conference on Computer Vision, pp. 603–612 (2019)

  38. Park, J., Woo, S., Lee, J.-Y., Kweon, I.S.: Bam: Bottleneck attention module (2018). arXiv preprint arXiv:1807.06514

  39. Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)

  40. Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: Frequency channel attention networks (2020). arXiv preprint arXiv:2012.11879

  41. Penedo, M., Pearlman, W.A., Tahoces, P.G., Souto, M., Vidal, J.J.: Region-based wavelet coding methods for digital mammography. IEEE Trans. Med. Imaging 22(10), 1288–1296 (2003)

    Article  Google Scholar 

  42. Do, M.N., Vetterli, M.: The contourlet transform: an efficient directional multiresolution image representation. IEEE Trans. Image Process. 14(12), 2091–2106 (2005)

    Article  Google Scholar 

  43. Huang, H., He, R., Sun, Z., Tan, T.: Wavelet-srnet: a wavelet-based cnn for multi-scale face super resolution. In: IEEE International Conference on Computer Vision, pp. 1689–1697 (2017)

  44. Savareh, B.A., Emami, H., Hajiabadi, M., Azimi, S.M., Ghafoori, M.: Wavelet-enhanced convolutional neural network: a new idea in a deep learning paradigm. Biomed. Eng./Biomedizinische Technik 64(2), 195–205 (2019)

  45. Duan, Y., Liu, F., Jiao, L., Zhao, P., Zhang, L.: Sar image segmentation based on convolutional-wavelet neural network and markov random field. Pattern Recogn. 64, 255–267 (2017)

    Article  Google Scholar 

  46. Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W.: Multi-level wavelet-cnn for image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782 (2018)

  47. Williams, T., Li, R.: Wavelet pooling for convolutional neural networks. In: International Conference on Learning Representations (2018)

  48. Yoo, J., Uh, Y., Chun, S., Kang, B., Ha, J.-W.: Photorealistic style transfer via wavelet transforms. In: IEEE International Conference on Computer Vision, pp. 9036–9045 (2019)

  49. Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 60–65 (2005)

  50. Santoso, S., Powers, E.J., Grady, W.M., Hofmann, P.: Power quality assessment via wavelet transform analysis. IEEE Trans. Power Deliv. 11(2), 924–930 (1996)

    Article  Google Scholar 

  51. Zhang, R.: Making convolutional networks shift-invariant again. In: International Conference on Machine Learning, pp. 7324–7334 (2019)

  52. Yang, L., Zhang, R.-Y., Li, L., Xie, X.: Simam: A simple, parameter-free attention module for convolutional neural networks. In: International Conference on Machine Learning, pp. 11863–11874 (2021)

Download references

Acknowledgements

The work is supported by the National Key R&D Program of China (No. 2018AAA0102001), the National Natural Science Foundation of China (Grant Nos. 62072245, and 61932020), the Natural Science Foundation of Jiangsu Province (Grant No. BK20211520).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangbo Shu.

Additional information

Communicated by Y. Zhang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Huang, P. & Shu, X. Wavelet-Attention CNN for image classification. Multimedia Systems 28, 915–924 (2022). https://doi.org/10.1007/s00530-022-00889-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-022-00889-8

Keywords

Navigation