Abstract
When using squared error loss, bias and variance and their decomposition of prediction error are well understood and widely used concepts. However, there is no universally accepted definition for other loss functions. Numerous attempts have been made to extend these concepts beyond squared error loss. Most approaches have focused solely on 0-1 loss functions and have produced significantly different definitions. These differences stem from disagreement as to the essential characteristics that variance and bias should display. This paper suggests an explicit list of rules that we feel any “reasonable” set of definitions should satisfy. Using this framework, bias and variance definitions are produced which generalize to any symmetric loss function. We illustrate these statistics on several loss functions with particular emphasis on 0-1 loss. We conclude with a discussion of the various definitions that have been proposed in the past as well as a method for estimating these quantities on real data sets.
Article PDF
Similar content being viewed by others
References
Breiman, L. (1996a). Bagging predictors. Machine Learning, 26:2, 123–140.
Breiman, L. (1996b). Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California Berkeley (460).
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regresion Trees. Wadsworth (Since 1993 this book has been published by Chapman & Hall, New York.).
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, IT-13, 21–27.
Dietterich, T.,& Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
Dietterich, T. G., & Kong, E. B. (1995). Error-correcting output coding corrects bias and variance. In Proceedings of the 12th International Conference on Machine Learning (pp. 313–321). Morgan Kaufmann.
Domingos, P. (2000). A unified bias-variance decomposition and its applications. In Proceedings of the 17 th International Conference on Machine Learning.
Efron, B. (1978). Regression and anova with zero-one data. Journal of the American Statistical Association, 73, 113–121.
Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. London: Chapman and Hall.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
Fix, E., & Hodges, J. (1951). Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties. Technical Report, Randolph Field, Texas: USAF School of Aviation Medicine.
Freund,Y.,& Schapire, R. (1996). Experiments with a newboosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference.
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28, 337–407.
Friedman, J. H. (1996). On bias, variance, 0/1-loss, and the curse of dimensionality. Technical Report, Department of Statistics, Stanford University.
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1–58.
Heskes, T. (1998). Bias/variance decompositions for likelihood-based estimators. Neural Comuptation, 10, 1425–1433.
Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In Machine Learning: Proceedings of the Thirteenth International Conference.
Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26, 1651–1686.
Stone, C. (1977). Consistent nonparametric regression (with discussion). Annals of Statistics, 5, 595–645.
Tibshirani, R. (1996). Bias, variance & prediction error for classification rules. Technical Report, Department of Statistics, University of Toronto.
Wolpert, D. (1997). On bias plus variance. Neural Computation, 9, 1211–1243.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
James, G.M. Variance and Bias for General Loss Functions. Machine Learning 51, 115–135 (2003). https://doi.org/10.1023/A:1022899518027
Issue Date:
DOI: https://doi.org/10.1023/A:1022899518027