Skip to main content
Log in

Joint weighted knowledge distillation and multi-scale feature distillation for long-tailed recognition

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Data in the natural open world tends to follow a long-tailed class distribution, leading deep models trained on such datasets to frequently exhibit inferior performance on the tail classes. Although existing approaches improve a model’s performance on tail categories through strategies such as class rebalancing, they often sacrifice the deep features that the model has already learned. In this paper, we propose a new joint distillation framework called JWAFD (Joint weighted knowledge distillation and multi-scale feature distillation) to address the long-tailed recognition problem from the perspective of knowledge distillation. The framework comprises two effective modules. Firstly, the weighted knowledge distillation module, which uses a category prior to adjust the weights of each category. By doing so, the training process becomes more balanced across all categories. Then, the multi-scale feature distillation module, which helps to further optimize the feature representation, thus solving the problem of under-learning of features encountered in previous studies. Compared with previous studies, the proposed framework significantly improves the performance of rare classes while maintaining the performance of head classes recognition. Extensive experiments on three benchmark datasets(CIFAR-100-LT, ImageNet-LT and iNaturalist2018) have demonstrated that the proposed novel distillation framework achieves comparable performance to the state-of-the-art long-tailed recognition methods. Our code is available at: https://github.com/xiaohe6/JWAFD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of data and materials

The data used to support the findings of this study are available from the corresponding author upon request.

References

  1. Zhu X, Men J, Yang L et al (2022) Imbalanced driving scene recognition with class focal loss and data augmentation. Int J Mach Learn Cybern 13(10):2957–2975. https://doi.org/10.1007/s13042-022-01575-x

    Article  Google Scholar 

  2. Zhao Z, Zuo M, Yu J et al (2022) Siamese network based on global and local feature matching for object tracking. J Electron Imaging 31:063,022-063,022. https://doi.org/10.1117/1.JEI.31.6.063022

    Article  Google Scholar 

  3. Han M, Guo H, Li J et al (2022) Global-local information based oversampling for multi-class imbalanced data. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-022-01746-w

    Article  Google Scholar 

  4. Everingham M, Van Gool L, Williams CK et al (2009) The pascal visual object classes (voc) challenge. Int J Comput Vision 88:303–308. https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  5. Deng J, Dong W, Socher R et al (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  6. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115:211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  7. Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: common objects in context. In: Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, part V 13, Springer, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

  8. Fu Y, Xiang L, Zahid Y et al (2022) Long-tailed visual recognition with deep models: a methodological survey and evaluation. Neurocomputing. https://doi.org/10.1016/j.neucom.2022.08.031

    Article  Google Scholar 

  9. Zhu H, Liu H, Fu A (2021) Class-weighted neural network for monotonic imbalanced classification. Int J Mach Learn Cybern 12:1191–1201. https://doi.org/10.1007/s13042-020-01228-x

    Article  Google Scholar 

  10. Cui Y, Jia M, Lin TY et al (2019) Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 9268–9277. https://doi.org/10.1109/CVPR.2019.00949

  11. Wang YX, Ramanan D, Hebert M (2017) Learning to model the tail. Adv Neural Inf Process Syst. https://doi.org/10.5555/3295222.3295446

    Article  PubMed  PubMed Central  Google Scholar 

  12. Huang C, Li Y, Loy CC et al (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5375–5384. https://doi.org/10.1109/CVPR.2016.580

  13. Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: IEEE Transactions on pattern analysis & machine intelligence. PP(99):2999–3007. https://doi.org/10.1109/TPAMI.2018.2858826

  14. Jamal MA, Brown M, Yang MH et al (2020) Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7610–7619. https://doi.org/10.1109/CVPR42600.2020.00763

  15. Cao K, Wei C, Gaidon A et al (2019) Learning imbalanced datasets with label-distribution-aware margin loss. Adv Neural Inf Process Syst. https://doi.org/10.5555/3454287.3454427

    Article  PubMed  PubMed Central  Google Scholar 

  16. Zhai J, Qi J, Zhang S (2022) Imbalanced data classification based on diverse sample generation and classifier fusion. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-021-01321-9

    Article  Google Scholar 

  17. Zhou B, Cui Q, Wei XS et al (2020) Bbn: bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 9719–9728. https://doi.org/10.1109/CVPR42600.2020.00974

  18. Liu S, Garrepalli R, Dietterich T et al (2018) Open category detection with pac guarantees. In: International conference on machine learning. PMLR, pp 3169–3178

  19. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284. https://doi.org/10.1109/TKDE.2008.239

    Article  Google Scholar 

  20. Van Horn G, Mac Aodha O, Song Y et al (2018) The inaturalist species classification and detection dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8769–8778. https://doi.org/10.1109/CVPR.2018.00914

  21. Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Advances in intelligent computing: international conference on intelligent computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I 1, Springer, pp 878–887. https://doi.org/10.1007/11538059_91

  22. Ye X, Li H, Imakura A et al (2020) An oversampling framework for imbalanced classification based on laplacian eigenmaps. Neurocomputing 399:107–116. https://doi.org/10.1016/j.neucom.2020.02.081

    Article  Google Scholar 

  23. Zhu L, Yang Y (2020) Inflated episodic memory with region self-attention for long-tailed visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 4344–4353. https://doi.org/10.1109/CVPR42600.2020.00440

  24. Zhang H, Jiang L, Li C (2021) Cs-resnet: Cost-sensitive residual convolutional neural network for pcb cosmetic defect detection. Expert Syst Appl 185(115):673. https://doi.org/10.1016/j.eswa.2021.115673

    Article  Google Scholar 

  25. Kang B, Xie S, Rohrbach M, Yan Z, Gordo A, Feng J, Kalantidis Y (2019) Decoupling representation and classifier for long-tailed recognition. Comput Vis Pattern Pattern Recognit. https://doi.org/10.48550/arXiv.1910.09217

  26. Chu P, Bian X, Liu S et al (2020) Feature space augmentation for long-tailed data. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16, Springer, pp 694–710. https://doi.org/10.1007/978-3-030-58526-6_41

  27. Liu Z, Miao Z, Zhan X et al (2019) Large-scale long-tailed recognition in an open world. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2537–2546. https://doi.org/10.1109/CVPR.2019.00264

  28. Chen Q, Liu Q, Lin E (2021) A knowledge-guide hierarchical learning method for long-tailed image classification. Neurocomputing 459:408–418. https://doi.org/10.1016/j.neucom.2021.07.008

    Article  Google Scholar 

  29. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comput Sci 14(7):38–39. https://doi.org/10.4140/TCP.n.2015.249

    Article  Google Scholar 

  30. Yim J, Joo D, Bae J et al (2017) A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4133–4141. https://doi.org/10.1109/CVPR.2017.754

  31. Xiang L, Ding G, Han J (2020) Learning from multiple experts: self-paced knowledge distillation for long-tailed classification. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, Springer, pp 247–263. https://doi.org/10.1007/978-3-030-58558-7_15

  32. Mullapudi RT, Poms F, Mark WR et al (2021) Background splitting: finding rare classes in a sea of background. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8043–8052. https://doi.org/10.1109/CVPR46437.2021.00795

  33. Yue C, Long M, Wang J et al (2016) Deep quantization network for efficient image retrieval. In: Proc. 13th AAAI Conf. Artif. Intell. pp 3457–3463. https://doi.org/10.1609/aaai.v30i1.10455

  34. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  35. Xie S, Girshick R, Dollár P et al (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1492–1500. https://doi.org/10.1109/CVPR.2017.634

  36. He YY, Wu J, Wei XS (2021) Distilling virtual examples for long-tailed recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 235–244. https://doi.org/10.1109/ICCV48922.2021.00030

  37. Tang K, Huang J, Zhang H (2020) Long-tailed classification by keeping the good and removing the bad momentum causal effect. Adv Neural Inf Process Syst 33:1513–1524. https://doi.org/10.5555/3495724.3495852

    Article  Google Scholar 

  38. Menon AK, Jayasumana S, Rawat AS, jain H, Veit A, Kumar S (2020) Long-tail learning via logit adjustment. Mach Learn. https://doi.org/10.48550/arXiv.2007.07314

  39. Li T, Wang L, Wu G (2021) Self supervision to distillation for long-tailed visual recognition. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 630–639. https://doi.org/10.1109/ICCV48922.2021.00067

  40. Zhao X, Xiao J, Yu S et al (2023) Weight-guided class complementing for long-tailed image recognition. Pattern Recogn 138(109):374. https://doi.org/10.1016/j.patcog.2023.109374

    Article  Google Scholar 

  41. Li T, Cao P, Yuan Y et al (2022a) Targeted supervised contrastive learning for long-tailed recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 6918–6928. https://doi.org/10.1109/CVPR52688.2022.00679

  42. Li M, Cheung Ym HuZ (2022) Key point sensitive loss for long-tailed visual recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3196044

    Article  PubMed  Google Scholar 

  43. Yang Y, Chen S, Li X et al (2022) Inducing neural collapse in imbalanced learning: do we really need a learnable classifier at the end of deep neural network? In: Advances in neural information processing systems

  44. Li M, Cheung YM, Jiang J (2022) Feature-balanced loss for long-tailed visual recognition. In: 2022 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6. https://doi.org/10.1109/ICME52920.2022.9860003

  45. Wang W, Zhao Z, Wang P et al (2022) Attentive feature augmentation for long-tailed visual recognition. IEEE Trans Circuits Syst Video Technol 32(9):5803–5816. https://doi.org/10.1109/TCSVT.2022.3161427

    Article  Google Scholar 

  46. Zhang ML, Zhang XY, Wang C et al (2023) Towards prior gap and representation gap for long-tailed recognition. Pattern Recogn 133(109):012. https://doi.org/10.1016/j.patcog.2022.109012

    Article  Google Scholar 

  47. Samuel D, Chechik G (2021) Distributional robustness loss for long-tail learning. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 9495–9504. https://doi.org/10.1109/ICCV48922.2021.00936

  48. Yang Y, Xu Z (2020) Rethinking the value of labels for improving class-imbalanced learning. Adv Neural Inf Process Syst 33:19,290-19,301. https://doi.org/10.5555/3495724.3497342

    Article  Google Scholar 

  49. Zhang S, Li Z, Yan S et al (2021) Distribution alignment: a unified framework for long-tail visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2361–2370. https://doi.org/10.1109/CVPR46437.2021.00239

  50. Zhou B, Khosla A, Lapedriza A et al (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2016.319

  51. Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Key Science and Technology Project of Henan Province (Grant No.201300210400) and Henan Province Science and Technology Research Project (Grant No.232102210031).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by, JY, SW, CL, XH, HL and YH. The first draft of the manuscript was written by YH and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Junyang Yu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, Y., Wang, S., Yu, J. et al. Joint weighted knowledge distillation and multi-scale feature distillation for long-tailed recognition. Int. J. Mach. Learn. & Cyber. 15, 1647–1661 (2024). https://doi.org/10.1007/s13042-023-01988-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01988-2

Keywords

Navigation