Abstract
Under-sampling generalizations of bagging ensembles improve classification of imbalanced data better than other ensembles. Roughly Balanced Bagging is the most accurate among them. In this paper, we experimentally study its properties that may influence its good performance. Results of experiments show that it can be constructed with a small number of component classifiers. However, they are less diversified than components of the standard bagging. Moreover, its good performance comes from its ability to recognize unsafe types of minority examples better than other ensembles. We also present how to improve its performance by integrating bootstrap sampling with the random selection of attributes.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Błaszczyński, J., Stefanowski, J., Idkowiak, Ł.: Extending bagging for imbalanced data. In: Burduk, R., Jackowski, K., Kurzynski, M., Wozniak, M., Zolnierek, A. (eds.) CORES 2013. AISC, vol. 226, pp. 273–282. Springer, Heidelberg (2013)
Błaszczyński, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150A, 184–203 (2015)
Chang, E.Y.: Statistical learning for effective visual information retrieval. In: Proceedings of the ICIP 2003, vol. 3, pp. 609–612 (2003)
Dal Pozzolo, A., Caelen, O., Bontempi, G.: When is undersampling effective in unbalanced classification tasks? In: Appice, A., et al. (eds.) ECML PKDD 2015. LNCS, vol. 9284, pp. 200–215. Springer, Heidelberg (2015)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 99, 1–22 (2011)
He, H., Ma, Y. (eds.): Imbalanced Learning: Foundations. Algorithms and Applications, IEEE - Wiley, Hoboken (2013)
Hido, S., Kashima, H.: Roughly balanced bagging for imbalance data. Stat. Anal. Data Min. 2(5–6), 412–426 (2009). Proceedings of the SIAM International Conference on Data Mining, 143–152 (2008)
Ho, T.: The random subspace method for constructing decision forests. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Hoens, T.R., Chawla, N.V.: Generating diverse ensembles to counter the problem of class imbalance. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 488–499. Springer, Heidelberg (2010)
Jo, T., Japkowicz, N.: Class Imbalances versus small disjuncts. ACM SIGKDD Explor. Newslett. 6(1), 40–49 (2004)
Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. Part A 41(3), 552–568 (2011)
Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms, 2d edn. Wiley, Hoboken (2014)
Liu, A., Zhu, Z.: Ensemble methods for class imbalance learning. In: He, H., Ma, Y. (eds.) Imbalanced Learning: Foundations, Algorithms and Applications, pp. 61–82. Wiley, Hoboken (2013)
Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012, Part II. LNCS, vol. 7209, pp. 139–150. Springer, Heidelberg (2012)
Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. (2015). doi:10.1007/s10844-015-0368-1
Sobhani, P., Viktor, H., Matwin, S.: Learning from imbalanced data using ensemble methods and cluster-based undersampling. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2014. LNCS, vol. 8983, pp. 69–83. Springer, Heidelberg (2015)
Pio, G., Malerba, D., D’Eila, D., Ceci, M.: Integrating microRNA target predictions for the discovery of gene regulatory networks: a semi-supervised ensemble learning approach. BMC Bioinform. 15(Suppl. 1), S4 (2014)
Wallace, B., Small, K., Brodley, C., Trikalinos, T.: Class Imbalance, Redux. In: Proceedings of the 11th IEEE International Conference on Data Mining, pp. 754–763 (2011)
Wang, S., Yao, T.: Diversity analysis on imbalanced data sets by using ensemble models. In: Proceedings of the IEEE Symposium Computational Intelligence Data Mining, pp. 324–331 (2009)
Acknowledgements
The research was supported by NCN grant DEC-2013/11/B/ST6/00963.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Lango, M., Stefanowski, J. (2016). The Usefulness of Roughly Balanced Bagging for Complex and High-Dimensional Imbalanced Data. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2015. Lecture Notes in Computer Science(), vol 9607. Springer, Cham. https://doi.org/10.1007/978-3-319-39315-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-39315-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39314-8
Online ISBN: 978-3-319-39315-5
eBook Packages: Computer ScienceComputer Science (R0)