Combination Approach of SVM Ensembles and Resampling Method for Imbalanced Datasets

Chen, Xin; Zhang, Yuqing; Wu, Kexian

doi:10.1007/978-1-4614-7010-6_72

Xin Chen³,
Yuqing Zhang³ &
Kexian Wu³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 236))

1082 Accesses

Abstract

Datasets in real world are often predominately composed of normal examples with only a small percentage of interesting or abnormal examples. A new approach is applied in this paper to address the imbalance problem by combining SVM ensembles and resampling method. Through empirical analysis, researchers cluster majority classes by k-means algorithm into subclass which decreases the imbalance ratio. Additionally, they use resampling method which concludes oversampling and undersampling techniques to deal with the problem of long training time and low training efficiency in SVM ensembles. Experimental results show that the SVM ensembles with resampling method outperform individual SVM classifiers. The proposed combination approach can effectively solve the imbalance problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI) (2000)
Google Scholar
Dumais, S., Platt, J., Heckerman, D.: Inductive learning algorithms and representations for text categorization. In: Proceedings of International Conference on Information and Knowledge Management (CIKM) (1998)
Google Scholar
Chan, P.K., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1998)
Google Scholar
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2–3), 195–215 (1998)
Article Google Scholar
Fawcett, T., Provost, F.J.: Adaptive fraud detection. Data Mining and Knowledge Discovery, pp. 291–316 (1997)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book MATH Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Google Scholar
Japkowicz, N.: A novelty detection approach to classification. In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases (2006)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res 16, 321–357 (2002)
Google Scholar
Wu, G., Chang, E.Y.: Aligning boundary in kernel space for learning imbalanced dataset. In: IEEE International Conference on Data Mining (ICDM), pp. 265–272 (2004)
Google Scholar
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: International Joint Conference on Artificial Intelligence (IJCAI99) (1999)
Google Scholar
Yan, R., Liu, Y., Jin, R., et al.: On predicting rare classes with SVM ensembles in scene classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 21–24. Hong Kong (2003)
Google Scholar
Chen, C., Liaw, A., Breiman, L.: Using random forest to learn imbalanced data. In: Technical Report 666, Statistics Department, University of California at Berkeley (2004)
Google Scholar

Download references

Acknowledgments

Supported by Beijing Key Laboratory of Network Systems and Network Culture(Beijing University of Posts and Telecommunications) and the Innovative Experiment Plan for College Students of China University of Geosciences, Beijing.

Author information

Authors and Affiliations

China University of Geosciences, Beijing, China
Xin Chen, Yuqing Zhang & Kexian Wu

Authors

Xin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kexian Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuqing Zhang .

Editor information

Editors and Affiliations

Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA
W. Eric Wong
College of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, People’s Republic of China
Tinghuai Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, X., Zhang, Y., Wu, K. (2013). Combination Approach of SVM Ensembles and Resampling Method for Imbalanced Datasets. In: Wong, W.E., Ma, T. (eds) Emerging Technologies for Information Systems, Computing, and Management. Lecture Notes in Electrical Engineering, vol 236. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7010-6_72

Download citation

DOI: https://doi.org/10.1007/978-1-4614-7010-6_72
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7009-0
Online ISBN: 978-1-4614-7010-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics