Skip to main content
Log in

Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization

  • Published:
Advances in Computational Mathematics Aims and scope Submit manuscript

Abstract

Solutions of learning problems by Empirical Risk Minimization (ERM) – and almost-ERM when the minimizer does not exist – need to be consistent, so that they may be predictive. They also need to be well-posed in the sense of being stable, so that they might be used robustly. We propose a statistical form of stability, defined as leave-one-out (LOO) stability. We prove that for bounded loss classes LOO stability is (a) sufficient for generalization, that is convergence in probability of the empirical error to the expected error, for any algorithm satisfying it and, (b) necessary and sufficient for consistency of ERM. Thus LOO stability is a weak form of stability that represents a sufficient condition for generalization for symmetric learning algorithms while subsuming the classical conditions for consistency of ERM. In particular, we conclude that a certain form of well-posedness and consistency are equivalent for ERM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. Alon, S. Ben-David, N. Cesa-Bianchi and D. Haussler, Scale-sensitive dimensions, uniform convergence, and learnability, J. ACM 44(4) (1997) 615–631.

    Article  MathSciNet  Google Scholar 

  2. P. Assouad and R.M. Dudley, Minimax nonparametric estimation over classes of sets, unpublished manuscript (1990).

  3. M. Bertero, T. Poggio and V. Torre, Ill-posed problems in early vision, Proc. IEEE 76 (1988) 869–889.

    Article  Google Scholar 

  4. O. Bousquet and A. Elisseeff, Algorithmic stability and generalization performance, in: Neural Information Processing Systems, Vol. 14, Denver, CO (2000).

  5. O. Bousquet and A. Elisseeff, Stability and generalization, J. Mach. Learning Res. (2001).

  6. F. Cucker and S. Smale, On the mathematical foundations of learning, Bulletin AMS 39 (2001) 1–49.

    Article  MathSciNet  Google Scholar 

  7. L. Devroye, L. Györfi and G. Lugosi, A Probabilistic Theory of Pattern Recognition, Applications of Mathematics, Vol. 31 (Springer, New York, 1996).

    Google Scholar 

  8. V. De La Pena, A general class of exponential inequalities for martingales and ratios, Ann. Probab. 27(1) (1999) 537–564.

    Article  MATH  MathSciNet  Google Scholar 

  9. L. Devroye and T. Wagner, Distribution-free performance bounds for potential function rules, IEEE Trans. Inform. Theory 25(5) (1979) 601–604.

    Article  MathSciNet  Google Scholar 

  10. R.M. Dudley, Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics (Cambridge Univ. Press, Cambridge, 1999).

    Google Scholar 

  11. R.M. Dudley, E. Giné, and J. Zinn, Uniform and universal Glivenko–Cantelli classes, J. Theoret. Probab. 4 (1991) 485–510.

    Article  MathSciNet  Google Scholar 

  12. H. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems (Kluwer Academic, Dordrecht, 1996).

    Google Scholar 

  13. T. Engniou, M. Pontil and T. Poggio, Regularization networks and support vector machines, Adv. Comput. Math. 13 (2000) 1–50.

    Article  MathSciNet  Google Scholar 

  14. M. Kearns and D. Ron, Algorithmic stability and sanity-check bounds for leave-one-out cross-validation, Neural Comput. 11(6) (1999) 1427–1453.

    Article  Google Scholar 

  15. S. Kutin and P. Niyogi, Almost-everywhere algorithmic stability and generalization error, Technical Report TR-2002-03, University of Chicago (2002).

  16. S. Mendelson, Geometric parameters in learning theory (2003) submitted for publication.

  17. S. Mukherjee, P. Niyogi, T. Poggio and R. Rifkin, Statistical learning: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization, AI Memo 2002-024, Massachusetts Institute of Technology (2002).

  18. S. Mukherjee, R. Rifkin and T. Poggio, Regression and classification with regularization, in: Nonlinear Estimation and Classification, Proc. of MSRI Workshop, eds. D.D. Denison, M.H. Hansen, C.C. Holmes, B. Mallick and B. Yu, Lectures Notes in Statistics, Vol. 171 (Springer, New York, 2002) pp. 107–124.

    Google Scholar 

  19. T. Poggio, R. Rifkin, S. Mukherjee and P. Niyogi, General conditions for predictivity in learning theory, Nature 343 (February 2004) 644–647.

    Google Scholar 

  20. T. Poggio and S. Smale, The mathematics of learning: Dealing with data, Notices Amer. Math. Soc. 50(5) (2003) 537–544.

    MathSciNet  Google Scholar 

  21. D. Pollard, Convergence of Stochastic Processes (Springer, Berlin, 1984).

    Google Scholar 

  22. M. Talagrand, Type, infratype and the Elton–Pajor theorem, Invent. Math. 107 (1992) 41–59.

    Article  MATH  MathSciNet  Google Scholar 

  23. M. Talagrand, A new look at independence, Ann. Probab. 24 (1996) 1–34.

    Article  MATH  MathSciNet  Google Scholar 

  24. A.N. Tikhonov and V.Y. Arsenin, Solutions of Ill-posed Problems (W.H. Winston, Washington, 1977).

    Google Scholar 

  25. L.G. Valiant, A theory of learnable, in: Proc. of the 1984 STOC (1984) pp. 436–445.

  26. V.N. Vapnik, Statistical Learning Theory (Wiley, New York, 1998).

    Google Scholar 

  27. V.N. Vapnik and A.Y. Chervonenkis, On the uniform convergence of relative frequences of events to their probabilities, Theory Probab. Appl. 17(2) (1971) 264–280.

    Article  MathSciNet  Google Scholar 

  28. D. Zhou, The covering number in learning theory, J. Complexity 18 (2002) 739–767.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sayan Mukherjee.

Additional information

Communicated by Y. Xu

Dedicated to Charles A. Micchelli on his 60th birthday

Mathematics subject classifications (2000)

68T05, 68T10, 68Q32, 62M20.

Tomaso Poggio: Corresponding author.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mukherjee, S., Niyogi, P., Poggio, T. et al. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv Comput Math 25, 161–193 (2006). https://doi.org/10.1007/s10444-004-7634-z

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10444-004-7634-z

Keywords

Navigation