Skip to main content

Combination Approach of SVM Ensembles and Resampling Method for Imbalanced Datasets

  • Conference paper
Emerging Technologies for Information Systems, Computing, and Management

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 236))

  • 1082 Accesses

Abstract

Datasets in real world are often predominately composed of normal examples with only a small percentage of interesting or abnormal examples. A new approach is applied in this paper to address the imbalance problem by combining SVM ensembles and resampling method. Through empirical analysis, researchers cluster majority classes by k-means algorithm into subclass which decreases the imbalance ratio. Additionally, they use resampling method which concludes oversampling and undersampling techniques to deal with the problem of long training time and low training efficiency in SVM ensembles. Experimental results show that the SVM ensembles with resampling method outperform individual SVM classifiers. The proposed combination approach can effectively solve the imbalance problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI) (2000)

    Google Scholar 

  2. Dumais, S., Platt, J., Heckerman, D.: Inductive learning algorithms and representations for text categorization. In: Proceedings of International Conference on Information and Knowledge Management (CIKM) (1998)

    Google Scholar 

  3. Chan, P.K., Stolfo, S.J.: Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1998)

    Google Scholar 

  4. Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2–3), 195–215 (1998)

    Article  Google Scholar 

  5. Fawcett, T., Provost, F.J.: Adaptive fraud detection. Data Mining and Knowledge Discovery, pp. 291–316 (1997)

    Google Scholar 

  6. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  7. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    Google Scholar 

  8. Japkowicz, N.: A novelty detection approach to classification. In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases (2006)

    Google Scholar 

  9. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res 16, 321–357 (2002)

    Google Scholar 

  10. Wu, G., Chang, E.Y.: Aligning boundary in kernel space for learning imbalanced dataset. In: IEEE International Conference on Data Mining (ICDM), pp. 265–272 (2004)

    Google Scholar 

  11. Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector machines. In: International Joint Conference on Artificial Intelligence (IJCAI99) (1999)

    Google Scholar 

  12. Yan, R., Liu, Y., Jin, R., et al.: On predicting rare classes with SVM ensembles in scene classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 21–24. Hong Kong (2003)

    Google Scholar 

  13. Chen, C., Liaw, A., Breiman, L.: Using random forest to learn imbalanced data. In: Technical Report 666, Statistics Department, University of California at Berkeley (2004)

    Google Scholar 

Download references

Acknowledgments

Supported by Beijing Key Laboratory of Network Systems and Network Culture(Beijing University of Posts and Telecommunications) and the Innovative Experiment Plan for College Students of China University of Geosciences, Beijing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuqing Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this paper

Cite this paper

Chen, X., Zhang, Y., Wu, K. (2013). Combination Approach of SVM Ensembles and Resampling Method for Imbalanced Datasets. In: Wong, W.E., Ma, T. (eds) Emerging Technologies for Information Systems, Computing, and Management. Lecture Notes in Electrical Engineering, vol 236. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7010-6_72

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-7010-6_72

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-7009-0

  • Online ISBN: 978-1-4614-7010-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics