Skip to main content

Balanced-MixUp for Highly Imbalanced Medical Image Classification

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2021 (MICCAI 2021)

Abstract

Highly imbalanced datasets are ubiquitous in medical image classification problems. In such problems, it is often the case that rare classes associated to less prevalent diseases are severely under-represented in labeled databases, typically resulting in poor performance of machine learning algorithms due to overfitting in the learning process. In this paper, we propose a novel mechanism for sampling training data based on the popular MixUp regularization technique, which we refer to as Balanced-MixUp. In short, Balanced-MixUp simultaneously performs regular (i.e., instance-based) and balanced (i.e., class-based) sampling of the training data. The resulting two sets of samples are then mixed-up to create a more balanced training distribution from which a neural network can effectively learn without incurring in heavily under-fitting the minority classes. We experiment with a highly imbalanced dataset of retinal images (55K samples, 5 classes) and a long-tail dataset of gastro-intestinal video frames (10K images, 23 classes), using two CNNs of varying representation capabilities. Experimental results demonstrate that applying Balanced-MixUp outperforms other conventional sampling schemes and loss functions specifically designed to deal with imbalanced data. Code is released at https://github.com/agaldran/balanced_mixup

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.kaggle.com/c/diabetic-retinopathy-detection.

  2. 2.

    https://www.adcis.net/en/third-party/messidor2/.

  3. 3.

    https://www.kaggle.com/google-brain/messidor2-dr-grades.

  4. 4.

    https://endotect.com/.

References

  1. Araujo, T., et al.: DR\(\vert \)GRADUATE: uncertainty-aware deep learning-based diabetic retinopathy grading in eye fundus images. Med. Image Anal. 63, 101715 (2020)

    Article  Google Scholar 

  2. Borgli, H., et al.: HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci. Data 7(1), 283 (2020)

    Article  Google Scholar 

  3. Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLOS ONE 12(6), 0177678 (2017)

    Article  Google Scholar 

  4. Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018)

    Article  Google Scholar 

  5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)

    Article  Google Scholar 

  6. Chou, H.-P., Chang, S.-C., Pan, J.-Y., Wei, W., Juan, D.-C.: Remix: rebalanced mixup. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12540, pp. 95–110. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65414-6_9

    Chapter  Google Scholar 

  7. Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

    Google Scholar 

  8. Galdran, A., et al.: Non-uniform label smoothing for diabetic retinopathy grading from retinal fundus images with deep neural networks. Trans. Vis. Sci. Technol. 9(2), 34–34 (2020)

    Article  Google Scholar 

  9. Galdran, A., Dolz, J., Chakor, H., Lombaert, H., Ben Ayed, I.: Cost-sensitive regularization for diabetic retinopathy grading from eye fundus images. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12265, pp. 665–674. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59722-1_64

    Chapter  Google Scholar 

  10. González-Gonzalo, C., Liefers, B., Ginneken, B., Sánchez, C.I.: Iterative augmentation of visual evidence for weakly-supervised lesion localization in deep interpretability frameworks: application to color fundus images. IEEE Trans. Med. Imaging 39(11), 3499–3511 (2020)

    Article  Google Scholar 

  11. He, A., Li, T., Li, N., Wang, K., Fu, H.: CABNet: category attention block for imbalanced diabetic retinopathy grading. IEEE Trans. Med. Imaging 40(1), 143–153 (2021)

    Article  Google Scholar 

  12. Hicks, S., Jha, D., Thambawita, V., Halvorsen, P., Hammer, H.L., Riegler, M.: The EndoTect 2020 challenge: evaluation and comparison of classification, segmentation and inference time for endoscopy. In: 25th International Conference on Pattern Recognition (ICPR) (2020)

    Google Scholar 

  13. Jiménez-Sánchez, A., et al.: Medical-based deep curriculum learning for improved fracture classification. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11769, pp. 694–702. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_77

    Chapter  Google Scholar 

  14. Kabra, A., et al.: MixBoost: synthetic oversampling with boosted mixup for handling extreme imbalance. arXiv arXiv: 2009.01571 (September 2020)

  15. Kang, B., et al.: Decoupling representation and classifier for long-tailed recognition. In: ICLR (2020)

    Google Scholar 

  16. Kolesnikov, A., et al.: Big Transfer (BiT): general visual representation learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 491–507. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_29

    Chapter  Google Scholar 

  17. Krause, J., et al.: Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmology 125(8), 1264–1272 (2018)

    Article  Google Scholar 

  18. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection, pp. 2980–2988 (2017)

    Google Scholar 

  19. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020)

    Article  Google Scholar 

  20. Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)

    Article  Google Scholar 

  21. Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 185–201. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_12

    Chapter  Google Scholar 

  22. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NEURIPS 2019, pp. 8024–8035 (2019)

    Google Scholar 

  23. Quellec, G., Lamard, M., Conze, P.H., Massin, P., Cochener, B.: Automatic detection of rare pathologies in fundus photographs using few-shot learning. Med. Image Anal. 61, 101660 (2020)

    Article  Google Scholar 

  24. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (June 2018)

    Google Scholar 

  25. Shanmugam, D., Blalock, D., Balakrishnan, G., Guttag, J.: When and why test-time augmentation works. arXiv arXiv:2011.11156 (November 2020)

  26. Thulasidasan, S., Chennupati, G., Bilmes, J.A., Bhattacharya, T., Michalak, S.: On mixup training: improved calibration and predictive uncertainty for deep neural networks. In: Advances in Neural Information Processing Systems (2019)

    Google Scholar 

  27. de la Torre, J., Puig, D., Valls, A.: Weighted kappa loss function for multi-class classification of ordinal data in deep learning. Pattern Recogn. Lett. 105, 144–154 (2018)

    Article  Google Scholar 

  28. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)

    Google Scholar 

  29. Zhou, Y., et al.: Collaborative learning of semi-supervised segmentation and classification for medical images. In: Conference on Computer Vision and Pattern Recognition (June 2019)

    Google Scholar 

  30. Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)

    Article  Google Scholar 

  31. Zhuang, J., Cai, J., Wang, R., Zhang, J., Zheng, W.: CARE: class attention to regions of lesion for classification on imbalanced data. In: International Conference on Medical Imaging with Deep Learning, pp. 588–597. PMLR (May 2019)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by a Marie Skłodowska-Curie Global Fellowship (No. 892297) and by Australian Research Council grants (DP180103232 and FT190100525).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrian Galdran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Galdran, A., Carneiro, G., González Ballester, M.A. (2021). Balanced-MixUp for Highly Imbalanced Medical Image Classification. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12905. Springer, Cham. https://doi.org/10.1007/978-3-030-87240-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87240-3_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87239-7

  • Online ISBN: 978-3-030-87240-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics