Variance and Bias for General Loss Functions

James, Gareth M.

doi:10.1023/A:1022899518027

Variance and Bias for General Loss Functions

Published: May 2003

Volume 51, pages 115–135, (2003)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Variance and Bias for General Loss Functions

Download PDF

Gareth M. James¹

3041 Accesses
94 Citations
4 Altmetric
Explore all metrics

Abstract

When using squared error loss, bias and variance and their decomposition of prediction error are well understood and widely used concepts. However, there is no universally accepted definition for other loss functions. Numerous attempts have been made to extend these concepts beyond squared error loss. Most approaches have focused solely on 0-1 loss functions and have produced significantly different definitions. These differences stem from disagreement as to the essential characteristics that variance and bias should display. This paper suggests an explicit list of rules that we feel any “reasonable” set of definitions should satisfy. Using this framework, bias and variance definitions are produced which generalize to any symmetric loss function. We illustrate these statistics on several loss functions with particular emphasis on 0-1 loss. We conclude with a discussion of the various definitions that have been proposed in the past as well as a method for estimating these quantities on real data sets.

References

Breiman, L. (1996a). Bagging predictors. Machine Learning, 26:2, 123–140.
Google Scholar
Breiman, L. (1996b). Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California Berkeley (460).
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regresion Trees. Wadsworth (Since 1993 this book has been published by Chapman & Hall, New York.).
Google Scholar
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, IT-13, 21–27.
Google Scholar
Dietterich, T.,& Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
Google Scholar
Dietterich, T. G., & Kong, E. B. (1995). Error-correcting output coding corrects bias and variance. In Proceedings of the 12th International Conference on Machine Learning (pp. 313–321). Morgan Kaufmann.
Domingos, P. (2000). A unified bias-variance decomposition and its applications. In Proceedings of the 17 ^th International Conference on Machine Learning.
Efron, B. (1978). Regression and anova with zero-one data. Journal of the American Statistical Association, 73, 113–121.
Google Scholar
Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. London: Chapman and Hall.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
Google Scholar
Fix, E., & Hodges, J. (1951). Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties. Technical Report, Randolph Field, Texas: USAF School of Aviation Medicine.
Freund,Y.,& Schapire, R. (1996). Experiments with a newboosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference.
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28, 337–407.
Google Scholar
Friedman, J. H. (1996). On bias, variance, 0/1-loss, and the curse of dimensionality. Technical Report, Department of Statistics, Stanford University.
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1–58.
Google Scholar
Heskes, T. (1998). Bias/variance decompositions for likelihood-based estimators. Neural Comuptation, 10, 1425–1433.
Google Scholar
Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In Machine Learning: Proceedings of the Thirteenth International Conference.
Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26, 1651–1686.
Google Scholar
Stone, C. (1977). Consistent nonparametric regression (with discussion). Annals of Statistics, 5, 595–645.
Google Scholar
Tibshirani, R. (1996). Bias, variance & prediction error for classification rules. Technical Report, Department of Statistics, University of Toronto.
Wolpert, D. (1997). On bias plus variance. Neural Computation, 9, 1211–1243.
Google Scholar

Download references

Author information

Authors and Affiliations

Marshall School of Business, University of Southern California, USA
Gareth M. James

Authors

Gareth M. James
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

James, G.M. Variance and Bias for General Loss Functions. Machine Learning 51, 115–135 (2003). https://doi.org/10.1023/A:1022899518027

Download citation

Issue Date: May 2003
DOI: https://doi.org/10.1023/A:1022899518027

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Variance and Bias for General Loss Functions

Abstract

Article PDF

Similar content being viewed by others

On Being Bayes and Unbiasedness

Bias Correction

Loss Functions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Variance and Bias for General Loss Functions

Abstract

Article PDF

Similar content being viewed by others

On Being Bayes and Unbiasedness

Bias Correction

Loss Functions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation