Cost-sensitive learning based on Bregman divergences

Santos-Rodríguez, Raúl; Guerrero-Curieses, Alicia; Alaiz-Rodríguez, Rocío; Cid-Sueiro, Jesús

doi:10.1007/s10994-009-5132-8

Cost-sensitive learning based on Bregman divergences

Published: 23 July 2009

Volume 76, pages 271–285, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Cost-sensitive learning based on Bregman divergences

Download PDF

Raúl Santos-Rodríguez¹,
Alicia Guerrero-Curieses²,
Rocío Alaiz-Rodríguez³ &
…
Jesús Cid-Sueiro¹

801 Accesses
10 Citations
Explore all metrics

Abstract

This paper analyzes the application of a particular class of Bregman divergences to design cost-sensitive classifiers for multiclass problems. We show that these divergence measures can be used to estimate posterior probabilities with maximal accuracy for the probability values that are close to the decision boundaries. Asymptotically, the proposed divergence measures provide classifiers minimizing the sum of decision costs in non-separable problems, and maximizing a margin in separable MAP problems.

Article PDF

Diametrical Risk Minimization: theory and computations

Article 02 September 2021

Matthew D. Norton & Johannes O. Royset

Optimal Thresholding of Classifiers to Maximize F1 Measure

Notes on the H-measure of classifier performance

Article Open access 10 January 2022

D. J. Hand & C. Anagnostopoulos

References

Abe, N., Zadrozny, B., & Langford, J. (2004). An iterative method for multi-class cost-sensitive learning. In KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 3–11). New York: ACM.
Chapter Google Scholar
Banerjee, A., Guo, X., & Wang, H. (2005). On the optimality of conditional expectation as a Bregman predictor. IEEE Transactions on Information Theory, 51(7), 2664–2669.
Article MathSciNet Google Scholar
Bradford, J. P., Kunz, C., Kohavi, R., Brunk, C., & Brodley, C. E. (1998). Pruning decision trees with misclassification costs. In Proceedings of the European conference on machine learning (pp. 131–136). Berlin: Springer.
Google Scholar
Bregman, L. M. (1967). The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7(10), 200–217.
Article Google Scholar
Cid-Sueiro, J., & Figueiras-Vidal, A. R. (2001). On the structure of strict sense Bayesian cost functions and its applications. IEEE Transactions on Neural Networks, 12(3).
Cid-Sueiro, J., Arribas, J. I., Urbán-Muñoz, S., & Figueiras-Vidal, A. R. (1999). Cost functions to estimate a posteriori probabilities in multi-class problems. IEEE Transactions on Neural Networks, 10(3), 645–656.
Article Google Scholar
Dhillon, I. S., Banerjee, A., Merugu, S., & Ghosh, J. (2005). Clustering with Bregman divergences. Journal of Machine Learning Research, 6, 1705–1749.
MathSciNet Google Scholar
Fan, W., Stolfo, S. J., Zhang, J., & Chan, P. K. (1999). Adacost: misclassification cost-sensitive boosting. In Proc. 16th international conf. on machine learning (pp. 97–105). San Mateo: Morgan Kaufmann.
Google Scholar
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378.
Article MATH MathSciNet Google Scholar
Guerrero-Curieses, A., Cid-Sueiro, J., Alaiz-Rodríguez, R., & Figueiras, A. (2004). Local estimation of posterior class probabilities to minimize classification errors. IEEE Transactions on Neural Networks, 15(2), 309–317.
Article Google Scholar
Guerrero-Curieses, A., Alaiz-Rodríguez, R., & Cid-Sueiro, J. (2005). Loss function to combine learning and decision in multiclass problems. Neurocomputing, 69, 3–17.
Article Google Scholar
Kapur, J. N., & Kesavan, H. K. (1993). Entropy optimization principles with applications. San Diego: Academic Press.
Google Scholar
Kukar, M. Z., & Kononenko, I. (1998). Cost-sensitive learning with neural networks. In Proceedings of the 13th European conference on artificial intelligence (ECAI-98) (pp. 445–449). New York: Wiley.
Google Scholar
Liu, X. Y., & Zhou, Z. H. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63–77.
Article Google Scholar
Lozano, A. C., & Abe, N. (2008). Multi-class cost-sensitive boosting with p-norm loss functions. In KDD ’08: proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 506–514). New York: ACM.
Chapter Google Scholar
Marrocco, C., & Tortorella, F. (2004). A cost-sensitive paradigm for multiclass to binary decomposition schemes. Lecture notes in computer science (Vol. 3138, pp. 753–761). Berlin: Springer.
Google Scholar
Miller, J. W., Goodman, R., & Smyth, P. (1993). On loss functions which minimize to conditional expected values and posterior probabilities. IEEE Transactions on Information Theory, 39(4), 1404–1408.
Article MATH Google Scholar
O’Brien, D. B., & Gray, R. M. (2005). Improving classification performance by exploring the role of cost matrices in partitioning the estimated class probability space. In Proceedings of the ICML workshop on ROC analysis (pp. 79–86).
O’Brien, D. B., Gupta, M. R., & Gray, R. M. (2008). Cost-sensitive multi-class classification from probability estimates. In ICML ’08: proceedings of the 25th international conference on machine learning (pp. 712–719). New York: ACM.
Chapter Google Scholar
Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.
Google Scholar
Provost, F., & Fawcett, T. (2001). Robust classification systems for imprecise environments. Machine Learning, 42(3), 203–231.
Article MATH Google Scholar
Savage, L. J. (1971). Elicitation of personal probabilities and expectations. Journal of the American Statistical Association (pp. 783–801).
Stuetzle, W., Buja, A., & Shen, Y. (2005). Loss functions for binary class probability estimation and classification: Structure and applications (Technical report). Department of Statistics, University of Pennsylvania.
Zadrozny, B., & Elkan, C. (2001a). Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 204–213). New York: ACM.
Google Scholar
Zadrozny, B., & Elkan, C. (2001b). Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In ICML ’01: proceedings of the eighteenth international conference on machine learning (pp. 609–616). San Francisco: Morgan Kaufmann.
Google Scholar
Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In KDD ’02: proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 694–699). New York: ACM.
Chapter Google Scholar
Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. In ICDM ’03: proc. of the 3rd IEEE int. conf. on data mining (p. 435). Washington: IEEE Comput. Soc.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Signal Theory and Communications, Universidad Carlos III de Madrid, Leganés (Madrid), Spain
Raúl Santos-Rodríguez & Jesús Cid-Sueiro
Department of Signal Theory and Communications, Universidad Rey Juan Carlos, Fuenlabrada (Madrid), Spain
Alicia Guerrero-Curieses
Department of Electrical and Electronic Engineering, Universidad de León, León, Spain
Rocío Alaiz-Rodríguez

Authors

Raúl Santos-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Guerrero-Curieses
View author publications
You can also search for this author in PubMed Google Scholar
Rocío Alaiz-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Jesús Cid-Sueiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jesús Cid-Sueiro.

Additional information

Editors: Aleksander Kołcz, Dunja Mladenić, Wray Buntine, Marko Grobelnik, and John Shawe-Taylor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Santos-Rodríguez, R., Guerrero-Curieses, A., Alaiz-Rodríguez, R. et al. Cost-sensitive learning based on Bregman divergences. Mach Learn 76, 271–285 (2009). https://doi.org/10.1007/s10994-009-5132-8

Download citation

Received: 17 June 2009
Revised: 17 June 2009
Accepted: 21 June 2009
Published: 23 July 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s10994-009-5132-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Cost-sensitive learning based on Bregman divergences

Abstract

Article PDF

Similar content being viewed by others

Diametrical Risk Minimization: theory and computations

Optimal Thresholding of Classifiers to Maximize F1 Measure

Notes on the H-measure of classifier performance

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cost-sensitive learning based on Bregman divergences

Abstract

Article PDF

Similar content being viewed by others

Diametrical Risk Minimization: theory and computations

Optimal Thresholding of Classifiers to Maximize F1 Measure

Notes on the H-measure of classifier performance

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation