research-article

MILES: multiclass imbalanced learning in ensembles through selective sampling

Authors:
Ali Azari

University of Maryland, Baltimore

University of Maryland, Baltimore
View Profile

,
Vandana P. Janeja

University of Maryland, Baltimore

University of Maryland, Baltimore
View Profile

,
Scott Levin

Johns Hopkins University

Johns Hopkins University
View Profile

SAC '17: Proceedings of the Symposium on Applied ComputingApril 2017Pages 811–816https://doi.org/10.1145/3019612.3019667

Published:03 April 2017Publication History

SAC '17: Proceedings of the Symposium on Applied Computing

Pages 811–816

ABSTRACT

Imbalanced learning is the problem of learning from datasets when the class proportions are highly imbalanced. Imbalanced datasets are increasingly seen in many domains and pose a challenge to traditional classification techniques. Learning from imbalanced multiclass data (three or more classes) creates additional complexities. Studies suggest that ensemble learners can be trained to emphasize different segments of data pertaining to different classes and thereby produce more accurate results than regular imbalance learning techniques. Thus, we propose a new approach to building ensembles of classifiers for multiclass imbalanced datasets, called Multiclass Imbalance Learning in Ensembles through Selective Sampling (MILES). Each member of MILES is trained with the data selectively sampled from the bands around cluster centroids in a way that diversity is aggressively encouraged within the ensemble. Resampling techniques are utilized to balance the distribution of the data that comes from each cluster. We performed several experiments applying our approach to different datasets demonstrating improved performance for recognizing minority class examples and balancing the G-mean and Mean Area Under the Curve (MAUC). We further applied MILES to classify prolonged emergency department (ED) stays with consistently higher performance as compared to existing methods.

References

M. Galar, A. Fernandez, E. Barrenechea, H. Bustince and F. Herrera, "A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches," Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 42, pp. 463--484, 2012. Google ScholarDigital Library
N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321--357, 2002. Google ScholarCross Ref
L. Breiman, "Bagging predictors," Mach. Learning, vol. 24, pp. 123--140, 1996. Google ScholarDigital Library
Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of Computer and System Sciences, vol. 55, pp. 119--139, 1997. Google ScholarDigital Library
S. Wang and X. Yao, "Multiclass imbalance problems: Analysis and potential solutions," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 42, pp. 1119--1130, 2012. Google ScholarDigital Library
Díez-Pastor, J. F., Rodríguez, J. J., García-Osorio, C. I., & Kuncheva, L. I. (2015). Diversity techniques improve the performance of the best imbalance learning ensembles. Information Sciences, 325, 98--117. Google ScholarDigital Library
Galar, M., Fernández, A., Barrenechea, E., & Herrera, F. (2013). EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognition, 46(12), 3460--3471. Google ScholarDigital Library
Blaszczyński, J., & Stefanowski, J. (2015). Neighbourhood sampling in bagging for imbalanced data. Neurocomputing, 150, 529--542.Google Scholar
Karpatne, A., Khandelwal, A., & Kumar, V. Ensemble Learning Methods for Binary Classification with Multi-modality within the Classes, SIAM, 2015 Int. Conf. in Data Mining.Google Scholar
N. V. Chawla, A. Lazarevic, L. O. Hall and K. W. Bowyer, "SMOTEBoost: Improving prediction of the minority class in boosting," in PKDD Springer, 2003, pp. 107--119.Google Scholar
C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse and A. Napolitano, "RUSBoost: A hybrid approach to alleviating class imbalance," Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, vol. 40, pp. 185--197, 2010. Google ScholarDigital Library
Y. Zhang, S. Burer, & W.N. Street (2006). Ensemble pruning via semi-definite programming. The Journal of Machine Learning Research, 7, 1315--1338. Google ScholarDigital Library
S. R. Levin, E. T. Harley, J. C. Fackler, C. U. Lehmann, J. W. Custer, D. France and S. L. Zeger, "Real-time forecasting of pediatric intensive care unit length of stay using computerized provider orders," Crit. Care Med., vol. 40, 2012.Google Scholar
A. Azari, V. P. Janeja and A. Mohseni, Healthcare Data Mining: Predicting Hospital Length of Stay (PHLOS), IJKDB, vol. 3, pp. 44--66, 2012.Google Scholar
R. Stoean, C. Stoean, A. Sandita, D. Ciobanu & C. Mesina, (2015) "Ensemble of Classifiers for Length of Stay Prediction in Colorectal Cancer". In Advances in Computational Intelligence (pp. 444--457). 2015.Google Scholar
M. E. Celebi, H. A. Kingravi and P. A. Vela, "A comparative study of efficient initialization methods for the k-means clustering algorithm," Expert Syst. Appl., vol. 40, 2013. Google ScholarDigital Library
J. Namayanja and V. P. Janeja, "Subspace Discovery for Disease Management: A Case Study in Metabolic Syndrome," Methods, Models, and Computation for Medical Informatics, pp. 36, 2012.Google Scholar
J. Su and H. Zhang, A fast decision tree learning algorithm, in Proceedings of the National Conference on Artificial Intelligence, 2006, pp. 500. Google ScholarDigital Library

Index Terms

MILES: multiclass imbalanced learning in ensembles through selective sampling
1. Computing methodologies
 1. Machine learning
 1. Machine learning algorithms
 1. Ensemble methods
 Bagging

Recommendations

Learning ECOC Code Matrix for Multiclass Classification with Application to Glaucoma Diagnosis

Classification of different mechanisms of angle closure glaucoma (ACG) is important for medical diagnosis. Error-correcting output code (ECOC) is an effective approach for multiclass classification. In this study, we propose a new ensemble learning ...
Read More
DUEN: Dynamic ensemble handling class imbalance in network intrusion detection
Abstract
Network intrusion detection is an important technology for maintaining cybersecurity. The inherent difficulties co-existing in network traffic datasets, such as class imbalance, class overlapping, and noises, limit detection accuracy. ...
Highlights
- Building dynamic ensemble to handle complex class imbalance in intrusion detection.
Read More
Progressive boosting for class imbalance and its application to face re-identification

The progressive Boosting Ensemble is proposed for learning from imbalanced data.Partitioning data in Boosting lead to higher diversity and less information loss.Trajectory under-sampling in PBoost is more effective for face re-identification.Validating ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '17: Proceedings of the Symposium on Applied Computing
April 2017
2004 pages
ISBN:9781450344869
DOI:10.1145/3019612
Conference Chair:
Sung Y. Shin
South Dakota State University
,
Program Chairs:
Dongwan Shin
New Mexico Tech
,
Maria Lencastre
University of Pernambuco, Brazil
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 April 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
class imbalance
ensemble learning
multiclass classification
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 113
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MILES: <u>m</u>ulticlass <u>i</u>mbalanced <u>l</u>earning in <u>e</u>nsembles through <u>s</u>elective sampling

SAC '17: Proceedings of the Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning ECOC Code Matrix for Multiclass Classification with Application to Glaucoma Diagnosis

DUEN: Dynamic ensemble handling class imbalance in network intrusion detection

Progressive boosting for class imbalance and its application to face re-identification