Abstract
This paper analyzes the application of a particular class of Bregman divergences to design cost-sensitive classifiers for multiclass problems. We show that these divergence measures can be used to estimate posterior probabilities with maximal accuracy for the probability values that are close to the decision boundaries. Asymptotically, the proposed divergence measures provide classifiers minimizing the sum of decision costs in non-separable problems, and maximizing a margin in separable MAP problems.
Article PDF
Similar content being viewed by others
References
Abe, N., Zadrozny, B., & Langford, J. (2004). An iterative method for multi-class cost-sensitive learning. In KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 3–11). New York: ACM.
Banerjee, A., Guo, X., & Wang, H. (2005). On the optimality of conditional expectation as a Bregman predictor. IEEE Transactions on Information Theory, 51(7), 2664–2669.
Bradford, J. P., Kunz, C., Kohavi, R., Brunk, C., & Brodley, C. E. (1998). Pruning decision trees with misclassification costs. In Proceedings of the European conference on machine learning (pp. 131–136). Berlin: Springer.
Bregman, L. M. (1967). The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(10), 200–217.
Cid-Sueiro, J., & Figueiras-Vidal, A. R. (2001). On the structure of strict sense Bayesian cost functions and its applications. IEEE Transactions on Neural Networks, 12(3).
Cid-Sueiro, J., Arribas, J. I., Urbán-Muñoz, S., & Figueiras-Vidal, A. R. (1999). Cost functions to estimate a posteriori probabilities in multi-class problems. IEEE Transactions on Neural Networks, 10(3), 645–656.
Dhillon, I. S., Banerjee, A., Merugu, S., & Ghosh, J. (2005). Clustering with Bregman divergences. Journal of Machine Learning Research, 6, 1705–1749.
Fan, W., Stolfo, S. J., Zhang, J., & Chan, P. K. (1999). Adacost: misclassification cost-sensitive boosting. In Proc. 16th international conf. on machine learning (pp. 97–105). San Mateo: Morgan Kaufmann.
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378.
Guerrero-Curieses, A., Cid-Sueiro, J., Alaiz-Rodríguez, R., & Figueiras, A. (2004). Local estimation of posterior class probabilities to minimize classification errors. IEEE Transactions on Neural Networks, 15(2), 309–317.
Guerrero-Curieses, A., Alaiz-Rodríguez, R., & Cid-Sueiro, J. (2005). Loss function to combine learning and decision in multiclass problems. Neurocomputing, 69, 3–17.
Kapur, J. N., & Kesavan, H. K. (1993). Entropy optimization principles with applications. San Diego: Academic Press.
Kukar, M. Z., & Kononenko, I. (1998). Cost-sensitive learning with neural networks. In Proceedings of the 13th European conference on artificial intelligence (ECAI-98) (pp. 445–449). New York: Wiley.
Liu, X. Y., & Zhou, Z. H. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63–77.
Lozano, A. C., & Abe, N. (2008). Multi-class cost-sensitive boosting with p-norm loss functions. In KDD ’08: proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 506–514). New York: ACM.
Marrocco, C., & Tortorella, F. (2004). A cost-sensitive paradigm for multiclass to binary decomposition schemes. Lecture notes in computer science (Vol. 3138, pp. 753–761). Berlin: Springer.
Miller, J. W., Goodman, R., & Smyth, P. (1993). On loss functions which minimize to conditional expected values and posterior probabilities. IEEE Transactions on Information Theory, 39(4), 1404–1408.
O’Brien, D. B., & Gray, R. M. (2005). Improving classification performance by exploring the role of cost matrices in partitioning the estimated class probability space. In Proceedings of the ICML workshop on ROC analysis (pp. 79–86).
O’Brien, D. B., Gupta, M. R., & Gray, R. M. (2008). Cost-sensitive multi-class classification from probability estimates. In ICML ’08: proceedings of the 25th international conference on machine learning (pp. 712–719). New York: ACM.
Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.
Provost, F., & Fawcett, T. (2001). Robust classification systems for imprecise environments. Machine Learning, 42(3), 203–231.
Savage, L. J. (1971). Elicitation of personal probabilities and expectations. Journal of the American Statistical Association (pp. 783–801).
Stuetzle, W., Buja, A., & Shen, Y. (2005). Loss functions for binary class probability estimation and classification: Structure and applications (Technical report). Department of Statistics, University of Pennsylvania.
Zadrozny, B., & Elkan, C. (2001a). Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 204–213). New York: ACM.
Zadrozny, B., & Elkan, C. (2001b). Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In ICML ’01: proceedings of the eighteenth international conference on machine learning (pp. 609–616). San Francisco: Morgan Kaufmann.
Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In KDD ’02: proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 694–699). New York: ACM.
Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. In ICDM ’03: proc. of the 3rd IEEE int. conf. on data mining (p. 435). Washington: IEEE Comput. Soc.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Aleksander Kołcz, Dunja Mladenić, Wray Buntine, Marko Grobelnik, and John Shawe-Taylor.
Rights and permissions
About this article
Cite this article
Santos-Rodríguez, R., Guerrero-Curieses, A., Alaiz-Rodríguez, R. et al. Cost-sensitive learning based on Bregman divergences. Mach Learn 76, 271–285 (2009). https://doi.org/10.1007/s10994-009-5132-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5132-8