Skip to main content

Minority–Majority Mix mean Oversampling Technique: An Efficient Technique to Improve Classification of Imbalanced Data Sets

  • Conference paper
  • First Online:
Computing in Engineering and Technology

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1025))

Abstract

The challenges related to handling of the gigantic imbalanced data volumes are incredible and has set a new trail for its efficient processing. The inventive prospects contained by these huge imbalanced data sets have posed a priority of concern in recent research avenues. The several applications handling imbalanced Big Data sets have noted significance for precise classification while determining unidentified values from these data sets. Traditional classifiers are not able to discourse the imbalance of class distribution among the data samples. A class having fewer samples indicates difficulty in learning, whereas it points to a notable drop in the performance. Recent studies demonstrate that the classifier independent set of oversampling techniques are more capable to efficiently handle the issues raised in imbalanced data sets. An enhanced oversampling technique, viz., Minority–Majority Mix mean Oversampling Technique (MMMmOT), improving classification performance is discussed in detail in this paper. An appropriate consideration of majority as well as minority samples is planned to generate the synthetic samples. The proposed technique is investigated encircling data sets mainly from the UCI repository over Apache Hadoop. Furthermore, the stimulus of maintaining the imbalance ratio with better oversampling instances from the generated pool is analyzed. The results of classification performance are recognized using standard parameters like F-Measure and area under the curve. The achieved experimental outcomes clearly exhibit the preeminence of the presented technique over the traditional techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. J. Intelli. Data Analy. 6, 429–449 (2002)

    Article  Google Scholar 

  2. He, H., Garcia, E.: Learning from imbalanced data. J. Trans. Knowl. Data Engg. 21, 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239

    Article  Google Scholar 

  3. Sun, Y., Wong, A., Kamel, M.: Classification of imbalanced data: a review. J. Patt. Recog. Artif. Intel. 23, 687–719 (2009). https://doi.org/10.1142/S0218001409007326

    Article  Google Scholar 

  4. Byoung-Jun, P., Oh, S., Pedrycz, W.: The design of polynomial function-based neural…network predictors for detection of software defects. J. Inform. Sci. 40–57 (2013). https://doi.org/10.1016/j.ins.2011.01.026

  5. López, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. J. Inform. Sci. 250, 113–141 (2013). https://doi.org/10.1016/j.ins.2013.07.007

    Article  Google Scholar 

  6. Sara, R., Lopez, V., Benitez, J., Herrera, F.: On the use of MapReduce for imbalanced big data using Random Forest. J. Inform. Sci. 112–137 (2014). https://doi.org/10.1016/j.ins.2014.03.043

  7. Jiang, H., Chen, Y., Qiao, Z.: Scaling up mapreduce-based big data processing on multi-GPU systems. SpingerLink Clust. Comp. 18, 369–383 (2015). https://doi.org/10.1007/s10586-014-0400-1

    Article  Google Scholar 

  8. Batista, G., Prati, R., Monard, M.: A study of the behaviour of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslet. Speci. Iss. Learn. from Imbal. Data. (6), 20–29 (2004). https://doi.org/10.1145/1007730.1007735

  9. Chawla, N., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Art. Int. Research 1(6), 321–357 (2002). https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  10. Han, H., Wang, W., Mao, B.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Proc. Internat. Conf. Int. Comp. 3644, 878–887 (2005). https://doi.org/10.1007/11538059_91

    Article  Google Scholar 

  11. Chumphol, B., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: Safelevel-synthetic minority over-sampling technique for handling the class imbalanced problem. PAKDD Adv. In Know. Discov. Data Min. 475–482 (2009). https://link.springer.com/chapter/10.1007%2F978-3-642-01307-2_43

  12. He, H., Bai, Y., Garcia, E., Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Join Conference on Neural Network, pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969

  13. Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. J. Data Min. Know. Discov. 28, 92–122 (2014). https://doi.org/10.1007/s10618-012-0295-5

    Article  MathSciNet  MATH  Google Scholar 

  14. Garcia, S., et al.: Evolutionary-based selection of generalized instances for imbalanced classification. J. Know. Based Sys. 3–12 (2012). https://doi.org/10.1016/j.knosys.2011.01.012

  15. Jinyan, L., Simon, F., Raymond, W., Victor, C.: Adaptive multi-objective swarm fusion for imbalanced data classification. J. Inform. Fus. 39, 1–24 (2018). https://doi.org/10.1016/j.inffus.2017.03.007

    Article  Google Scholar 

  16. Feng, H., Hang, L.: A novel boundary oversampling algorithm based on neighborhood rough set model NRSBoundary-SMOTE. J. Mat. Prob. Eng. 1–11 (2013). http://dx.doi.org/10.1155/2013/694809

  17. Chawla, N., Aleksandar, L., Hall, L., Bowyer, K.: SMOTEBoost: improving prediction of the minority class in boosting. PKDD Know. Disc. In Data. 107–119 (2003). https://doi.org/10.1007/978-3-540-39804-2_12

  18. Ratsch, G., Onoda T., Muller, K.: Soft margins for AdaBoost. J. Mach. Learn. (42), 287–320 (2001). https://link.springer.com/article/10.1023%2FA%3A1007618119488

  19. Joonho, G., Hyunjoong, K.: RHSBoost: improving classification performance in imbalance data. J. Comp. Stat. Data Analy. 111, 1–13 (2017). https://doi.org/10.1016/j.csda.2017.01.005

    Article  MathSciNet  MATH  Google Scholar 

  20. Alberto, F., Jesus, M., Herrera, F.: Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning. IPMU Comp. Int. Know. Sys. Desg. 89–98 (2010). https://doi.org/10.1007/978-3-642-14049-5_10

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sachin Patil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Patil, S., Sonavane, S. (2020). Minority–Majority Mix mean Oversampling Technique: An Efficient Technique to Improve Classification of Imbalanced Data Sets. In: Iyer, B., Deshpande, P., Sharma, S., Shiurkar, U. (eds) Computing in Engineering and Technology. Advances in Intelligent Systems and Computing, vol 1025. Springer, Singapore. https://doi.org/10.1007/978-981-32-9515-5_48

Download citation

Publish with us

Policies and ethics