Skip to main content

Advertisement

Log in

Distribution-Sensitive Unbalanced Data Oversampling Method for Medical Diagnosis

  • Systems-Level Quality Improvement
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Aiming at the problem of low accuracy of classification learning algorithm caused by serious imbalance of sample set in medical diagnostic application, this paper proposes a distribution-sensitive oversampling algorithm for imbalanced data. The algorithm accurately divides the minority samples into noise samples, unstable samples, boundary samples and stable samples according to the location of the minority samples. Different samples are processed differently to select the most suitable sample for the synthesis of new samples. In the case of sample synthesis, a distribution-sensitive sample synthesis method is adopted. Different sample synthesis methods are selected according to their different distance from the surrounding minority samples, so as to ensure that the newly synthesized samples have the same characteristics with the original minority samples. The real medical diagnostic data test shows that this algorithm improves the accuracy rate of classification learning algorithm compared with the existing sampling algorithms, especially for the accuracy rate and recall rate of minority classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Sun, Y., Wong, A. K., and Kamel, M. S., Classification of imbalanced data: A review. Int. J. Pattern Recogn. Artif. Intell. 23(04):687–719, 2009.

    Article  Google Scholar 

  2. Garcia, S., and Herrera, F., Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol. Comput. 17(3):275–306, 2009.

    Article  Google Scholar 

  3. Lopez, V., Fernandez, A., Garcia, S., Palade, V., and Herrera, F., An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inform. Sci. 250:113–141, 2013.

    Article  Google Scholar 

  4. Wang, S., and Yao, X., Multiclass imbalance problems: Analysis and potential solutions. IEEE Trans. Syst. Man Cybernet. B: Cybernet. 42(4):1119–1130, 2012.

    Article  Google Scholar 

  5. Zhang, Z., Krawczyk, B., Garcia, S., Rosales-Perez, A., and Herrera, F., Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl.-Based Syst. 106:251–263, 2016.

    Article  Google Scholar 

  6. Krawczyk, B., Learning from imbalanced data: Open challenges and future directions. Progress Artif. Intell. 5(4):221–232, 2016.

    Article  Google Scholar 

  7. Chawla, N. V., Bowyer, K. W., Hall, L. O. et al., SMOTE: Synthetic minority over-sampling technique [J]. J. Artif. Intell. Res. 16(1):321–357, 2002.

    Article  Google Scholar 

  8. Bunkhumpornpat, C., Sinapiromsaran, K., and Lursinsap C., Safe-level-SMOTE: Safe-level-synthetic minority over-sampling TEchnique for handling the class imbalanced problem[C]// Pacific-Asia conference on advances in knowledge discovery and data mining. Springer-Verlag, :475–482, 2009.

  9. Han, H., Wang, W. Y., and Mao, B. H., Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning[A]. Int. Conf. Intell. Comput. 3644(5):878–887, 2005.

    Google Scholar 

  10. Bunkhumpornpat, C., Sinapiromsaran, K., and Lursinsap, C., DBSMOTE: Density-based synthetic minority over-sampling TEchnique[J]. Appl. Intell. 36(3):664–684, 2012.

    Article  Google Scholar 

  11. Bunkhumpornpat, C., and Sinapiromsaran, K. CORE: core-based synthetic minority over-sampling and borderline majority under-sampling technique.[M]. Inderscience Publishers, 2015.

  12. Bennin, K.E. and Keung, J. et al., MAHAKIL: Diversity based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction[J]. IEEE Transactions on Software Engineering, (99) :1–1, 2017.

  13. Mathew, J., Pang, C. K., Luo, M. et al., Classification of imbalanced data by oversampling in kernel space of support vector machines[J]. IEEE Trans. Neural Netw. Learn. Syst. 29(9):4065–4076, 2018.

    Article  Google Scholar 

  14. Douzas, G., Bacao, F., and Last, F., Improving imbalanced learning through a heuristic oversampling method based on K-means and SMOTE[J]. Information Sciences, 2018.

  15. Jin, S., and Pedersen, T., Duluth UROP at SemEval-2018 task 2: Multilingual emoji prediction with ensemble learning and oversampling[J]. 2018.

Download references

Funding

Funded by NSFC (No. 61672020), the national key research and development program[2016YFB0800303], Supported by DongGuan Innovative Research Team Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weihong Han.

Ethics declarations

Declaration of Conflict of Interest

Weihong Han, Zizhong Huang, Shudong Li and Yan Jia declare no conflict of interest directly related to the submitted work.

Ethical Approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Systems-Level Quality Improvement

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, W., Huang, Z., Li, S. et al. Distribution-Sensitive Unbalanced Data Oversampling Method for Medical Diagnosis. J Med Syst 43, 39 (2019). https://doi.org/10.1007/s10916-018-1154-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-018-1154-8

Keywords

Navigation