Abstract
The challenges related to handling of the gigantic imbalanced data volumes are incredible and has set a new trail for its efficient processing. The inventive prospects contained by these huge imbalanced data sets have posed a priority of concern in recent research avenues. The several applications handling imbalanced Big Data sets have noted significance for precise classification while determining unidentified values from these data sets. Traditional classifiers are not able to discourse the imbalance of class distribution among the data samples. A class having fewer samples indicates difficulty in learning, whereas it points to a notable drop in the performance. Recent studies demonstrate that the classifier independent set of oversampling techniques are more capable to efficiently handle the issues raised in imbalanced data sets. An enhanced oversampling technique, viz., Minority–Majority Mix mean Oversampling Technique (MMMmOT), improving classification performance is discussed in detail in this paper. An appropriate consideration of majority as well as minority samples is planned to generate the synthetic samples. The proposed technique is investigated encircling data sets mainly from the UCI repository over Apache Hadoop. Furthermore, the stimulus of maintaining the imbalance ratio with better oversampling instances from the generated pool is analyzed. The results of classification performance are recognized using standard parameters like F-Measure and area under the curve. The achieved experimental outcomes clearly exhibit the preeminence of the presented technique over the traditional techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. J. Intelli. Data Analy. 6, 429–449 (2002)
He, H., Garcia, E.: Learning from imbalanced data. J. Trans. Knowl. Data Engg. 21, 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239
Sun, Y., Wong, A., Kamel, M.: Classification of imbalanced data: a review. J. Patt. Recog. Artif. Intel. 23, 687–719 (2009). https://doi.org/10.1142/S0218001409007326
Byoung-Jun, P., Oh, S., Pedrycz, W.: The design of polynomial function-based neural…network predictors for detection of software defects. J. Inform. Sci. 40–57 (2013). https://doi.org/10.1016/j.ins.2011.01.026
López, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. J. Inform. Sci. 250, 113–141 (2013). https://doi.org/10.1016/j.ins.2013.07.007
Sara, R., Lopez, V., Benitez, J., Herrera, F.: On the use of MapReduce for imbalanced big data using Random Forest. J. Inform. Sci. 112–137 (2014). https://doi.org/10.1016/j.ins.2014.03.043
Jiang, H., Chen, Y., Qiao, Z.: Scaling up mapreduce-based big data processing on multi-GPU systems. SpingerLink Clust. Comp. 18, 369–383 (2015). https://doi.org/10.1007/s10586-014-0400-1
Batista, G., Prati, R., Monard, M.: A study of the behaviour of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslet. Speci. Iss. Learn. from Imbal. Data. (6), 20–29 (2004). https://doi.org/10.1145/1007730.1007735
Chawla, N., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Art. Int. Research 1(6), 321–357 (2002). https://doi.org/10.1613/jair.953
Han, H., Wang, W., Mao, B.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Proc. Internat. Conf. Int. Comp. 3644, 878–887 (2005). https://doi.org/10.1007/11538059_91
Chumphol, B., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: Safelevel-synthetic minority over-sampling technique for handling the class imbalanced problem. PAKDD Adv. In Know. Discov. Data Min. 475–482 (2009). https://link.springer.com/chapter/10.1007%2F978-3-642-01307-2_43
He, H., Bai, Y., Garcia, E., Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Join Conference on Neural Network, pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969
Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. J. Data Min. Know. Discov. 28, 92–122 (2014). https://doi.org/10.1007/s10618-012-0295-5
Garcia, S., et al.: Evolutionary-based selection of generalized instances for imbalanced classification. J. Know. Based Sys. 3–12 (2012). https://doi.org/10.1016/j.knosys.2011.01.012
Jinyan, L., Simon, F., Raymond, W., Victor, C.: Adaptive multi-objective swarm fusion for imbalanced data classification. J. Inform. Fus. 39, 1–24 (2018). https://doi.org/10.1016/j.inffus.2017.03.007
Feng, H., Hang, L.: A novel boundary oversampling algorithm based on neighborhood rough set model NRSBoundary-SMOTE. J. Mat. Prob. Eng. 1–11 (2013). http://dx.doi.org/10.1155/2013/694809
Chawla, N., Aleksandar, L., Hall, L., Bowyer, K.: SMOTEBoost: improving prediction of the minority class in boosting. PKDD Know. Disc. In Data. 107–119 (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Ratsch, G., Onoda T., Muller, K.: Soft margins for AdaBoost. J. Mach. Learn. (42), 287–320 (2001). https://link.springer.com/article/10.1023%2FA%3A1007618119488
Joonho, G., Hyunjoong, K.: RHSBoost: improving classification performance in imbalance data. J. Comp. Stat. Data Analy. 111, 1–13 (2017). https://doi.org/10.1016/j.csda.2017.01.005
Alberto, F., Jesus, M., Herrera, F.: Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning. IPMU Comp. Int. Know. Sys. Desg. 89–98 (2010). https://doi.org/10.1007/978-3-642-14049-5_10
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Patil, S., Sonavane, S. (2020). Minority–Majority Mix mean Oversampling Technique: An Efficient Technique to Improve Classification of Imbalanced Data Sets. In: Iyer, B., Deshpande, P., Sharma, S., Shiurkar, U. (eds) Computing in Engineering and Technology. Advances in Intelligent Systems and Computing, vol 1025. Springer, Singapore. https://doi.org/10.1007/978-981-32-9515-5_48
Download citation
DOI: https://doi.org/10.1007/978-981-32-9515-5_48
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9514-8
Online ISBN: 978-981-32-9515-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)