ABSTRACT
Imbalanced learning is the problem of learning from datasets when the class proportions are highly imbalanced. Imbalanced datasets are increasingly seen in many domains and pose a challenge to traditional classification techniques. Learning from imbalanced multiclass data (three or more classes) creates additional complexities. Studies suggest that ensemble learners can be trained to emphasize different segments of data pertaining to different classes and thereby produce more accurate results than regular imbalance learning techniques. Thus, we propose a new approach to building ensembles of classifiers for multiclass imbalanced datasets, called Multiclass Imbalance Learning in Ensembles through Selective Sampling (MILES). Each member of MILES is trained with the data selectively sampled from the bands around cluster centroids in a way that diversity is aggressively encouraged within the ensemble. Resampling techniques are utilized to balance the distribution of the data that comes from each cluster. We performed several experiments applying our approach to different datasets demonstrating improved performance for recognizing minority class examples and balancing the G-mean and Mean Area Under the Curve (MAUC). We further applied MILES to classify prolonged emergency department (ED) stays with consistently higher performance as compared to existing methods.
- M. Galar, A. Fernandez, E. Barrenechea, H. Bustince and F. Herrera, "A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches," Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 42, pp. 463--484, 2012. Google ScholarDigital Library
- N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321--357, 2002. Google ScholarCross Ref
- L. Breiman, "Bagging predictors," Mach. Learning, vol. 24, pp. 123--140, 1996. Google ScholarDigital Library
- Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of Computer and System Sciences, vol. 55, pp. 119--139, 1997. Google ScholarDigital Library
- S. Wang and X. Yao, "Multiclass imbalance problems: Analysis and potential solutions," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 42, pp. 1119--1130, 2012. Google ScholarDigital Library
- Díez-Pastor, J. F., Rodríguez, J. J., García-Osorio, C. I., & Kuncheva, L. I. (2015). Diversity techniques improve the performance of the best imbalance learning ensembles. Information Sciences, 325, 98--117. Google ScholarDigital Library
- Galar, M., Fernández, A., Barrenechea, E., & Herrera, F. (2013). EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognition, 46(12), 3460--3471. Google ScholarDigital Library
- Blaszczyński, J., & Stefanowski, J. (2015). Neighbourhood sampling in bagging for imbalanced data. Neurocomputing, 150, 529--542.Google Scholar
- Karpatne, A., Khandelwal, A., & Kumar, V. Ensemble Learning Methods for Binary Classification with Multi-modality within the Classes, SIAM, 2015 Int. Conf. in Data Mining.Google Scholar
- N. V. Chawla, A. Lazarevic, L. O. Hall and K. W. Bowyer, "SMOTEBoost: Improving prediction of the minority class in boosting," in PKDD Springer, 2003, pp. 107--119.Google Scholar
- C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse and A. Napolitano, "RUSBoost: A hybrid approach to alleviating class imbalance," Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, vol. 40, pp. 185--197, 2010. Google ScholarDigital Library
- Y. Zhang, S. Burer, & W.N. Street (2006). Ensemble pruning via semi-definite programming. The Journal of Machine Learning Research, 7, 1315--1338. Google ScholarDigital Library
- S. R. Levin, E. T. Harley, J. C. Fackler, C. U. Lehmann, J. W. Custer, D. France and S. L. Zeger, "Real-time forecasting of pediatric intensive care unit length of stay using computerized provider orders," Crit. Care Med., vol. 40, 2012.Google Scholar
- A. Azari, V. P. Janeja and A. Mohseni, Healthcare Data Mining: Predicting Hospital Length of Stay (PHLOS), IJKDB, vol. 3, pp. 44--66, 2012.Google Scholar
- R. Stoean, C. Stoean, A. Sandita, D. Ciobanu & C. Mesina, (2015) "Ensemble of Classifiers for Length of Stay Prediction in Colorectal Cancer". In Advances in Computational Intelligence (pp. 444--457). 2015.Google Scholar
- M. E. Celebi, H. A. Kingravi and P. A. Vela, "A comparative study of efficient initialization methods for the k-means clustering algorithm," Expert Syst. Appl., vol. 40, 2013. Google ScholarDigital Library
- J. Namayanja and V. P. Janeja, "Subspace Discovery for Disease Management: A Case Study in Metabolic Syndrome," Methods, Models, and Computation for Medical Informatics, pp. 36, 2012.Google Scholar
- J. Su and H. Zhang, A fast decision tree learning algorithm, in Proceedings of the National Conference on Artificial Intelligence, 2006, pp. 500. Google ScholarDigital Library
Index Terms
- MILES: <u>m</u>ulticlass <u>i</u>mbalanced <u>l</u>earning in <u>e</u>nsembles through <u>s</u>elective sampling
Recommendations
Learning ECOC Code Matrix for Multiclass Classification with Application to Glaucoma Diagnosis
Classification of different mechanisms of angle closure glaucoma (ACG) is important for medical diagnosis. Error-correcting output code (ECOC) is an effective approach for multiclass classification. In this study, we propose a new ensemble learning ...
DUEN: Dynamic ensemble handling class imbalance in network intrusion detection
AbstractNetwork intrusion detection is an important technology for maintaining cybersecurity. The inherent difficulties co-existing in network traffic datasets, such as class imbalance, class overlapping, and noises, limit detection accuracy. ...
Highlights- Building dynamic ensemble to handle complex class imbalance in intrusion detection.
Progressive boosting for class imbalance and its application to face re-identification
The progressive Boosting Ensemble is proposed for learning from imbalanced data.Partitioning data in Boosting lead to higher diversity and less information loss.Trajectory under-sampling in PBoost is more effective for face re-identification.Validating ...
Comments