Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization

Mukherjee, Sayan; Niyogi, Partha; Poggio, Tomaso; Rifkin, Ryan

doi:10.1007/s10444-004-7634-z

Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization

Published: July 2006

Volume 25, pages 161–193, (2006)
Cite this article

Advances in Computational Mathematics Aims and scope Submit manuscript

Sayan Mukherjee^1,2,
Partha Niyogi³,
Tomaso Poggio¹ &
…
Ryan Rifkin^1,4

1418 Accesses
103 Citations
4 Altmetric
Explore all metrics

Abstract

Solutions of learning problems by Empirical Risk Minimization (ERM) – and almost-ERM when the minimizer does not exist – need to be consistent, so that they may be predictive. They also need to be well-posed in the sense of being stable, so that they might be used robustly. We propose a statistical form of stability, defined as leave-one-out (LOO) stability. We prove that for bounded loss classes LOO stability is (a) sufficient for generalization, that is convergence in probability of the empirical error to the expected error, for any algorithm satisfying it and, (b) necessary and sufficient for consistency of ERM. Thus LOO stability is a weak form of stability that represents a sufficient condition for generalization for symmetric learning algorithms while subsuming the classical conditions for consistency of ERM. In particular, we conclude that a certain form of well-posedness and consistency are equivalent for ERM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

Sum-of-Squares Relaxations for Information Theory and Variational Inference

Article 05 April 2024

References

N. Alon, S. Ben-David, N. Cesa-Bianchi and D. Haussler, Scale-sensitive dimensions, uniform convergence, and learnability, J. ACM 44(4) (1997) 615–631.
Article MathSciNet Google Scholar
P. Assouad and R.M. Dudley, Minimax nonparametric estimation over classes of sets, unpublished manuscript (1990).
M. Bertero, T. Poggio and V. Torre, Ill-posed problems in early vision, Proc. IEEE 76 (1988) 869–889.
Article Google Scholar
O. Bousquet and A. Elisseeff, Algorithmic stability and generalization performance, in: Neural Information Processing Systems, Vol. 14, Denver, CO (2000).
O. Bousquet and A. Elisseeff, Stability and generalization, J. Mach. Learning Res. (2001).
F. Cucker and S. Smale, On the mathematical foundations of learning, Bulletin AMS 39 (2001) 1–49.
Article MathSciNet Google Scholar
L. Devroye, L. Györfi and G. Lugosi, A Probabilistic Theory of Pattern Recognition, Applications of Mathematics, Vol. 31 (Springer, New York, 1996).
Google Scholar
V. De La Pena, A general class of exponential inequalities for martingales and ratios, Ann. Probab. 27(1) (1999) 537–564.
Article MATH MathSciNet Google Scholar
L. Devroye and T. Wagner, Distribution-free performance bounds for potential function rules, IEEE Trans. Inform. Theory 25(5) (1979) 601–604.
Article MathSciNet Google Scholar
R.M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics (Cambridge Univ. Press, Cambridge, 1999).
Google Scholar
R.M. Dudley, E. Giné, and J. Zinn, Uniform and universal Glivenko–Cantelli classes, J. Theoret. Probab. 4 (1991) 485–510.
Article MathSciNet Google Scholar
H. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems (Kluwer Academic, Dordrecht, 1996).
Google Scholar
T. Engniou, M. Pontil and T. Poggio, Regularization networks and support vector machines, Adv. Comput. Math. 13 (2000) 1–50.
Article MathSciNet Google Scholar
M. Kearns and D. Ron, Algorithmic stability and sanity-check bounds for leave-one-out cross-validation, Neural Comput. 11(6) (1999) 1427–1453.
Article Google Scholar
S. Kutin and P. Niyogi, Almost-everywhere algorithmic stability and generalization error, Technical Report TR-2002-03, University of Chicago (2002).
S. Mendelson, Geometric parameters in learning theory (2003) submitted for publication.
S. Mukherjee, P. Niyogi, T. Poggio and R. Rifkin, Statistical learning: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization, AI Memo 2002-024, Massachusetts Institute of Technology (2002).
S. Mukherjee, R. Rifkin and T. Poggio, Regression and classification with regularization, in: Nonlinear Estimation and Classification, Proc. of MSRI Workshop, eds. D.D. Denison, M.H. Hansen, C.C. Holmes, B. Mallick and B. Yu, Lectures Notes in Statistics, Vol. 171 (Springer, New York, 2002) pp. 107–124.
Google Scholar
T. Poggio, R. Rifkin, S. Mukherjee and P. Niyogi, General conditions for predictivity in learning theory, Nature 343 (February 2004) 644–647.
Google Scholar
T. Poggio and S. Smale, The mathematics of learning: Dealing with data, Notices Amer. Math. Soc. 50(5) (2003) 537–544.
MathSciNet Google Scholar
D. Pollard, Convergence of Stochastic Processes (Springer, Berlin, 1984).
Google Scholar
M. Talagrand, Type, infratype and the Elton–Pajor theorem, Invent. Math. 107 (1992) 41–59.
Article MATH MathSciNet Google Scholar
M. Talagrand, A new look at independence, Ann. Probab. 24 (1996) 1–34.
Article MATH MathSciNet Google Scholar
A.N. Tikhonov and V.Y. Arsenin, Solutions of Ill-posed Problems (W.H. Winston, Washington, 1977).
Google Scholar
L.G. Valiant, A theory of learnable, in: Proc. of the 1984 STOC (1984) pp. 436–445.
V.N. Vapnik, Statistical Learning Theory (Wiley, New York, 1998).
Google Scholar
V.N. Vapnik and A.Y. Chervonenkis, On the uniform convergence of relative frequences of events to their probabilities, Theory Probab. Appl. 17(2) (1971) 264–280.
Article MathSciNet Google Scholar
D. Zhou, The covering number in learning theory, J. Complexity 18 (2002) 739–767.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Center for Biological and Computational Learning, Artificial Intelligence Laboratory, and McGovern Institute, USA
Sayan Mukherjee, Tomaso Poggio & Ryan Rifkin
MIT/Whitehead Institute, Center for Genome Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Sayan Mukherjee
Department of Computer Science and Statistics, University of Chicago, Chicago, IL, 60637, USA
Partha Niyogi
Honda Research Institute, Boston, MA, 02111, USA
Ryan Rifkin

Authors

Sayan Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Partha Niyogi
View author publications
You can also search for this author in PubMed Google Scholar
Tomaso Poggio
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Rifkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sayan Mukherjee.

Additional information

Communicated by Y. Xu

Dedicated to Charles A. Micchelli on his 60th birthday

Mathematics subject classifications (2000)

68T05, 68T10, 68Q32, 62M20.

Tomaso Poggio: Corresponding author.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mukherjee, S., Niyogi, P., Poggio, T. et al. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv Comput Math 25, 161–193 (2006). https://doi.org/10.1007/s10444-004-7634-z

Download citation

Received: 15 October 2003
Accepted: 01 April 2004
Issue Date: July 2006
DOI: https://doi.org/10.1007/s10444-004-7634-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Sum-of-Squares Relaxations for Information Theory and Variational Inference

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Mathematics subject classifications (2000)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Sum-of-Squares Relaxations for Information Theory and Variational Inference

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Mathematics subject classifications (2000)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation