Abstract
Highly imbalanced datasets are ubiquitous in medical image classification problems. In such problems, it is often the case that rare classes associated to less prevalent diseases are severely under-represented in labeled databases, typically resulting in poor performance of machine learning algorithms due to overfitting in the learning process. In this paper, we propose a novel mechanism for sampling training data based on the popular MixUp regularization technique, which we refer to as Balanced-MixUp. In short, Balanced-MixUp simultaneously performs regular (i.e., instance-based) and balanced (i.e., class-based) sampling of the training data. The resulting two sets of samples are then mixed-up to create a more balanced training distribution from which a neural network can effectively learn without incurring in heavily under-fitting the minority classes. We experiment with a highly imbalanced dataset of retinal images (55K samples, 5 classes) and a long-tail dataset of gastro-intestinal video frames (10K images, 23 classes), using two CNNs of varying representation capabilities. Experimental results demonstrate that applying Balanced-MixUp outperforms other conventional sampling schemes and loss functions specifically designed to deal with imbalanced data. Code is released at https://github.com/agaldran/balanced_mixup
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Araujo, T., et al.: DR\(\vert \)GRADUATE: uncertainty-aware deep learning-based diabetic retinopathy grading in eye fundus images. Med. Image Anal. 63, 101715 (2020)
Borgli, H., et al.: HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci. Data 7(1), 283 (2020)
Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLOS ONE 12(6), 0177678 (2017)
Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
Chou, H.-P., Chang, S.-C., Pan, J.-Y., Wei, W., Juan, D.-C.: Remix: rebalanced mixup. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12540, pp. 95–110. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65414-6_9
Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
Galdran, A., et al.: Non-uniform label smoothing for diabetic retinopathy grading from retinal fundus images with deep neural networks. Trans. Vis. Sci. Technol. 9(2), 34–34 (2020)
Galdran, A., Dolz, J., Chakor, H., Lombaert, H., Ben Ayed, I.: Cost-sensitive regularization for diabetic retinopathy grading from eye fundus images. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12265, pp. 665–674. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59722-1_64
González-Gonzalo, C., Liefers, B., Ginneken, B., Sánchez, C.I.: Iterative augmentation of visual evidence for weakly-supervised lesion localization in deep interpretability frameworks: application to color fundus images. IEEE Trans. Med. Imaging 39(11), 3499–3511 (2020)
He, A., Li, T., Li, N., Wang, K., Fu, H.: CABNet: category attention block for imbalanced diabetic retinopathy grading. IEEE Trans. Med. Imaging 40(1), 143–153 (2021)
Hicks, S., Jha, D., Thambawita, V., Halvorsen, P., Hammer, H.L., Riegler, M.: The EndoTect 2020 challenge: evaluation and comparison of classification, segmentation and inference time for endoscopy. In: 25th International Conference on Pattern Recognition (ICPR) (2020)
Jiménez-Sánchez, A., et al.: Medical-based deep curriculum learning for improved fracture classification. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11769, pp. 694–702. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_77
Kabra, A., et al.: MixBoost: synthetic oversampling with boosted mixup for handling extreme imbalance. arXiv arXiv: 2009.01571 (September 2020)
Kang, B., et al.: Decoupling representation and classifier for long-tailed recognition. In: ICLR (2020)
Kolesnikov, A., et al.: Big Transfer (BiT): general visual representation learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 491–507. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_29
Krause, J., et al.: Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmology 125(8), 1264–1272 (2018)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection, pp. 2980–2988 (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020)
Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 185–201. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_12
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NEURIPS 2019, pp. 8024–8035 (2019)
Quellec, G., Lamard, M., Conze, P.H., Massin, P., Cochener, B.: Automatic detection of rare pathologies in fundus photographs using few-shot learning. Med. Image Anal. 61, 101660 (2020)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (June 2018)
Shanmugam, D., Blalock, D., Balakrishnan, G., Guttag, J.: When and why test-time augmentation works. arXiv arXiv:2011.11156 (November 2020)
Thulasidasan, S., Chennupati, G., Bilmes, J.A., Bhattacharya, T., Michalak, S.: On mixup training: improved calibration and predictive uncertainty for deep neural networks. In: Advances in Neural Information Processing Systems (2019)
de la Torre, J., Puig, D., Valls, A.: Weighted kappa loss function for multi-class classification of ordinal data in deep learning. Pattern Recogn. Lett. 105, 144–154 (2018)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)
Zhou, Y., et al.: Collaborative learning of semi-supervised segmentation and classification for medical images. In: Conference on Computer Vision and Pattern Recognition (June 2019)
Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)
Zhuang, J., Cai, J., Wang, R., Zhang, J., Zheng, W.: CARE: class attention to regions of lesion for classification on imbalanced data. In: International Conference on Medical Imaging with Deep Learning, pp. 588–597. PMLR (May 2019)
Acknowledgments
This work was partially supported by a Marie Skłodowska-Curie Global Fellowship (No. 892297) and by Australian Research Council grants (DP180103232 and FT190100525).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Galdran, A., Carneiro, G., González Ballester, M.A. (2021). Balanced-MixUp for Highly Imbalanced Medical Image Classification. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12905. Springer, Cham. https://doi.org/10.1007/978-3-030-87240-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-87240-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87239-7
Online ISBN: 978-3-030-87240-3
eBook Packages: Computer ScienceComputer Science (R0)