Minority–Majority Mix mean Oversampling Technique: An Efficient Technique to Improve Classification of Imbalanced Data Sets

Patil, Sachin; Sonavane, Shefali

doi:10.1007/978-981-32-9515-5_48

Sachin Patil¹⁸ &
Shefali Sonavane¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1025))

1049 Accesses
3 Citations

Abstract

The challenges related to handling of the gigantic imbalanced data volumes are incredible and has set a new trail for its efficient processing. The inventive prospects contained by these huge imbalanced data sets have posed a priority of concern in recent research avenues. The several applications handling imbalanced Big Data sets have noted significance for precise classification while determining unidentified values from these data sets. Traditional classifiers are not able to discourse the imbalance of class distribution among the data samples. A class having fewer samples indicates difficulty in learning, whereas it points to a notable drop in the performance. Recent studies demonstrate that the classifier independent set of oversampling techniques are more capable to efficiently handle the issues raised in imbalanced data sets. An enhanced oversampling technique, viz., Minority–Majority Mix mean Oversampling Technique (MMMmOT), improving classification performance is discussed in detail in this paper. An appropriate consideration of majority as well as minority samples is planned to generate the synthetic samples. The proposed technique is investigated encircling data sets mainly from the UCI repository over Apache Hadoop. Furthermore, the stimulus of maintaining the imbalance ratio with better oversampling instances from the generated pool is analyzed. The results of classification performance are recognized using standard parameters like F-Measure and area under the curve. The achieved experimental outcomes clearly exhibit the preeminence of the presented technique over the traditional techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Experimental Analysis of Oversampling Techniques in Class Imbalance Problem

An Adaptive Oversampling Technique for Imbalanced Datasets

Investigation of Imbalanced Big Data Set Classification: Clustering Minority Samples Over Sampling Technique

References

Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. J. Intelli. Data Analy. 6, 429–449 (2002)
Article Google Scholar
He, H., Garcia, E.: Learning from imbalanced data. J. Trans. Knowl. Data Engg. 21, 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239
Article Google Scholar
Sun, Y., Wong, A., Kamel, M.: Classification of imbalanced data: a review. J. Patt. Recog. Artif. Intel. 23, 687–719 (2009). https://doi.org/10.1142/S0218001409007326
Article Google Scholar
Byoung-Jun, P., Oh, S., Pedrycz, W.: The design of polynomial function-based neural…network predictors for detection of software defects. J. Inform. Sci. 40–57 (2013). https://doi.org/10.1016/j.ins.2011.01.026
López, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. J. Inform. Sci. 250, 113–141 (2013). https://doi.org/10.1016/j.ins.2013.07.007
Article Google Scholar
Sara, R., Lopez, V., Benitez, J., Herrera, F.: On the use of MapReduce for imbalanced big data using Random Forest. J. Inform. Sci. 112–137 (2014). https://doi.org/10.1016/j.ins.2014.03.043
Jiang, H., Chen, Y., Qiao, Z.: Scaling up mapreduce-based big data processing on multi-GPU systems. SpingerLink Clust. Comp. 18, 369–383 (2015). https://doi.org/10.1007/s10586-014-0400-1
Article Google Scholar
Batista, G., Prati, R., Monard, M.: A study of the behaviour of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslet. Speci. Iss. Learn. from Imbal. Data. (6), 20–29 (2004). https://doi.org/10.1145/1007730.1007735
Chawla, N., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Art. Int. Research 1(6), 321–357 (2002). https://doi.org/10.1613/jair.953
Article MATH Google Scholar
Han, H., Wang, W., Mao, B.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Proc. Internat. Conf. Int. Comp. 3644, 878–887 (2005). https://doi.org/10.1007/11538059_91
Article Google Scholar
Chumphol, B., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: Safelevel-synthetic minority over-sampling technique for handling the class imbalanced problem. PAKDD Adv. In Know. Discov. Data Min. 475–482 (2009). https://link.springer.com/chapter/10.1007%2F978-3-642-01307-2_43
He, H., Bai, Y., Garcia, E., Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Join Conference on Neural Network, pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969
Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. J. Data Min. Know. Discov. 28, 92–122 (2014). https://doi.org/10.1007/s10618-012-0295-5
Article MathSciNet MATH Google Scholar
Garcia, S., et al.: Evolutionary-based selection of generalized instances for imbalanced classification. J. Know. Based Sys. 3–12 (2012). https://doi.org/10.1016/j.knosys.2011.01.012
Jinyan, L., Simon, F., Raymond, W., Victor, C.: Adaptive multi-objective swarm fusion for imbalanced data classification. J. Inform. Fus. 39, 1–24 (2018). https://doi.org/10.1016/j.inffus.2017.03.007
Article Google Scholar
Feng, H., Hang, L.: A novel boundary oversampling algorithm based on neighborhood rough set model NRSBoundary-SMOTE. J. Mat. Prob. Eng. 1–11 (2013). http://dx.doi.org/10.1155/2013/694809
Chawla, N., Aleksandar, L., Hall, L., Bowyer, K.: SMOTEBoost: improving prediction of the minority class in boosting. PKDD Know. Disc. In Data. 107–119 (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Ratsch, G., Onoda T., Muller, K.: Soft margins for AdaBoost. J. Mach. Learn. (42), 287–320 (2001). https://link.springer.com/article/10.1023%2FA%3A1007618119488
Joonho, G., Hyunjoong, K.: RHSBoost: improving classification performance in imbalance data. J. Comp. Stat. Data Analy. 111, 1–13 (2017). https://doi.org/10.1016/j.csda.2017.01.005
Article MathSciNet MATH Google Scholar
Alberto, F., Jesus, M., Herrera, F.: Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning. IPMU Comp. Int. Know. Sys. Desg. 89–98 (2010). https://doi.org/10.1007/978-3-642-14049-5_10

Download references

Author information

Authors and Affiliations

Rajarambapu Institute of Technology, Rajaramnagar and Research Scholar, Walchand College of Engineering, Sangli, Maharashtra, India
Sachin Patil
Walchand College of Engineering, Sangli, 416415, Maharashtra, India
Shefali Sonavane

Authors

Sachin Patil
View author publications
You can also search for this author in PubMed Google Scholar
Shefali Sonavane
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sachin Patil .

Editor information

Editors and Affiliations

Department of Electronics and Telecommunication Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Maharashtra, India
Brijesh Iyer
Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Maharashtra, India
P. S. Deshpande
Department of Electronics and Computer Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
S. C. Sharma
Deogiri Institute of Engineering and Management Studies, Aurangabad, Maharashtra, India
Ulhas Shiurkar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patil, S., Sonavane, S. (2020). Minority–Majority Mix mean Oversampling Technique: An Efficient Technique to Improve Classification of Imbalanced Data Sets. In: Iyer, B., Deshpande, P., Sharma, S., Shiurkar, U. (eds) Computing in Engineering and Technology. Advances in Intelligent Systems and Computing, vol 1025. Springer, Singapore. https://doi.org/10.1007/978-981-32-9515-5_48

Download citation

DOI: https://doi.org/10.1007/978-981-32-9515-5_48
Published: 17 October 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9514-8
Online ISBN: 978-981-32-9515-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Minority–Majority Mix mean Oversampling Technique: An Efficient Technique to Improve Classification of Imbalanced Data Sets

Abstract

Access this chapter

Similar content being viewed by others

Experimental Analysis of Oversampling Techniques in Class Imbalance Problem

An Adaptive Oversampling Technique for Imbalanced Datasets

Investigation of Imbalanced Big Data Set Classification: Clustering Minority Samples Over Sampling Technique

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Minority–Majority Mix mean Oversampling Technique: An Efficient Technique to Improve Classification of Imbalanced Data Sets

Abstract

Access this chapter

Similar content being viewed by others

Experimental Analysis of Oversampling Techniques in Class Imbalance Problem

An Adaptive Oversampling Technique for Imbalanced Datasets

Investigation of Imbalanced Big Data Set Classification: Clustering Minority Samples Over Sampling Technique

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation