Abstract
Datasets in real world are often predominately composed of normal examples with only a small percentage of interesting or abnormal examples. A new approach is applied in this paper to address the imbalance problem by combining SVM ensembles and resampling method. Through empirical analysis, researchers cluster majority classes by k-means algorithm into subclass which decreases the imbalance ratio. Additionally, they use resampling method which concludes oversampling and undersampling techniques to deal with the problem of long training time and low training efficiency in SVM ensembles. Experimental results show that the SVM ensembles with resampling method outperform individual SVM classifiers. The proposed combination approach can effectively solve the imbalance problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI) (2000)
Dumais, S., Platt, J., Heckerman, D.: Inductive learning algorithms and representations for text categorization. In: Proceedings of International Conference on Information and Knowledge Management (CIKM) (1998)
Chan, P.K., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1998)
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2–3), 195–215 (1998)
Fawcett, T., Provost, F.J.: Adaptive fraud detection. Data Mining and Knowledge Discovery, pp. 291–316 (1997)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Japkowicz, N.: A novelty detection approach to classification. In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases (2006)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res 16, 321–357 (2002)
Wu, G., Chang, E.Y.: Aligning boundary in kernel space for learning imbalanced dataset. In: IEEE International Conference on Data Mining (ICDM), pp. 265–272 (2004)
Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: International Joint Conference on Artificial Intelligence (IJCAI99) (1999)
Yan, R., Liu, Y., Jin, R., et al.: On predicting rare classes with SVM ensembles in scene classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 21–24. Hong Kong (2003)
Chen, C., Liaw, A., Breiman, L.: Using random forest to learn imbalanced data. In: Technical Report 666, Statistics Department, University of California at Berkeley (2004)
Acknowledgments
Supported by Beijing Key Laboratory of Network Systems and Network Culture(Beijing University of Posts and Telecommunications) and the Innovative Experiment Plan for College Students of China University of Geosciences, Beijing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Chen, X., Zhang, Y., Wu, K. (2013). Combination Approach of SVM Ensembles and Resampling Method for Imbalanced Datasets. In: Wong, W.E., Ma, T. (eds) Emerging Technologies for Information Systems, Computing, and Management. Lecture Notes in Electrical Engineering, vol 236. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7010-6_72
Download citation
DOI: https://doi.org/10.1007/978-1-4614-7010-6_72
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7009-0
Online ISBN: 978-1-4614-7010-6
eBook Packages: EngineeringEngineering (R0)