skip to main content
10.1145/3019612.3019667acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

MILES: <u>m</u>ulticlass <u>i</u>mbalanced <u>l</u>earning in <u>e</u>nsembles through <u>s</u>elective sampling

Published:03 April 2017Publication History

ABSTRACT

Imbalanced learning is the problem of learning from datasets when the class proportions are highly imbalanced. Imbalanced datasets are increasingly seen in many domains and pose a challenge to traditional classification techniques. Learning from imbalanced multiclass data (three or more classes) creates additional complexities. Studies suggest that ensemble learners can be trained to emphasize different segments of data pertaining to different classes and thereby produce more accurate results than regular imbalance learning techniques. Thus, we propose a new approach to building ensembles of classifiers for multiclass imbalanced datasets, called Multiclass Imbalance Learning in Ensembles through Selective Sampling (MILES). Each member of MILES is trained with the data selectively sampled from the bands around cluster centroids in a way that diversity is aggressively encouraged within the ensemble. Resampling techniques are utilized to balance the distribution of the data that comes from each cluster. We performed several experiments applying our approach to different datasets demonstrating improved performance for recognizing minority class examples and balancing the G-mean and Mean Area Under the Curve (MAUC). We further applied MILES to classify prolonged emergency department (ED) stays with consistently higher performance as compared to existing methods.

References

  1. M. Galar, A. Fernandez, E. Barrenechea, H. Bustince and F. Herrera, "A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches," Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 42, pp. 463--484, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321--357, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  3. L. Breiman, "Bagging predictors," Mach. Learning, vol. 24, pp. 123--140, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of Computer and System Sciences, vol. 55, pp. 119--139, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Wang and X. Yao, "Multiclass imbalance problems: Analysis and potential solutions," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 42, pp. 1119--1130, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Díez-Pastor, J. F., Rodríguez, J. J., García-Osorio, C. I., & Kuncheva, L. I. (2015). Diversity techniques improve the performance of the best imbalance learning ensembles. Information Sciences, 325, 98--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Galar, M., Fernández, A., Barrenechea, E., & Herrera, F. (2013). EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognition, 46(12), 3460--3471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Blaszczyński, J., & Stefanowski, J. (2015). Neighbourhood sampling in bagging for imbalanced data. Neurocomputing, 150, 529--542.Google ScholarGoogle Scholar
  9. Karpatne, A., Khandelwal, A., & Kumar, V. Ensemble Learning Methods for Binary Classification with Multi-modality within the Classes, SIAM, 2015 Int. Conf. in Data Mining.Google ScholarGoogle Scholar
  10. N. V. Chawla, A. Lazarevic, L. O. Hall and K. W. Bowyer, "SMOTEBoost: Improving prediction of the minority class in boosting," in PKDD Springer, 2003, pp. 107--119.Google ScholarGoogle Scholar
  11. C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse and A. Napolitano, "RUSBoost: A hybrid approach to alleviating class imbalance," Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, vol. 40, pp. 185--197, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Zhang, S. Burer, & W.N. Street (2006). Ensemble pruning via semi-definite programming. The Journal of Machine Learning Research, 7, 1315--1338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. R. Levin, E. T. Harley, J. C. Fackler, C. U. Lehmann, J. W. Custer, D. France and S. L. Zeger, "Real-time forecasting of pediatric intensive care unit length of stay using computerized provider orders," Crit. Care Med., vol. 40, 2012.Google ScholarGoogle Scholar
  14. A. Azari, V. P. Janeja and A. Mohseni, Healthcare Data Mining: Predicting Hospital Length of Stay (PHLOS), IJKDB, vol. 3, pp. 44--66, 2012.Google ScholarGoogle Scholar
  15. R. Stoean, C. Stoean, A. Sandita, D. Ciobanu & C. Mesina, (2015) "Ensemble of Classifiers for Length of Stay Prediction in Colorectal Cancer". In Advances in Computational Intelligence (pp. 444--457). 2015.Google ScholarGoogle Scholar
  16. M. E. Celebi, H. A. Kingravi and P. A. Vela, "A comparative study of efficient initialization methods for the k-means clustering algorithm," Expert Syst. Appl., vol. 40, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Namayanja and V. P. Janeja, "Subspace Discovery for Disease Management: A Case Study in Metabolic Syndrome," Methods, Models, and Computation for Medical Informatics, pp. 36, 2012.Google ScholarGoogle Scholar
  18. J. Su and H. Zhang, A fast decision tree learning algorithm, in Proceedings of the National Conference on Artificial Intelligence, 2006, pp. 500. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. MILES: <u>m</u>ulticlass <u>i</u>mbalanced <u>l</u>earning in <u>e</u>nsembles through <u>s</u>elective sampling

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '17: Proceedings of the Symposium on Applied Computing
      April 2017
      2004 pages
      ISBN:9781450344869
      DOI:10.1145/3019612

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 April 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,650of6,669submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader