Risk bounds for statistical learning



The Annals of Statistics

Risk bounds for statistical learning

Pascal Massart and Élodie Nédélec

Source: Ann. Statist. Volume 34, Number 5 (2006), 2326-2366.

Abstract

We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classification framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with ways of measuring the “size” of a class of classifiers other than entropy with bracketing as in Tsybakov’s work. In particular, we derive new risk bounds for the ERM when the classification rules belong to some VC-class under margin conditions and discuss the optimality of these bounds in a minimax sense.

Primary Subjects: 60E15
Secondary Subjects: 60F10, 94A17
Keywords: Classification; concentration inequalities; empirical processes; entropy with bracketing; minimax estimation; model selection; pattern recognition; regression estimation; statistical learning; structural minimization of the risk; VC-class; VC-dimension

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Alternatively, the document is available for a cost of $15. Select the "buy article" button below to purchase this document from a secured VeriSign, Inc. site.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1169571799
Digital Object Identifier: doi:10.1214/009053606000000786

References

Barron, A. R., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301--413.
Mathematical Reviews (MathSciNet): MR1679028
Digital Object Identifier: doi:10.1007/s004400050210
Birgé, L. (2005). A new lower bound for multiple hypothesis testing. IEEE Trans. Inform. Theory 51 1611--1615.
Mathematical Reviews (MathSciNet): MR2241522
Digital Object Identifier: doi:10.1109/TIT.2005.844101
Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 329--375.
Mathematical Reviews (MathSciNet): MR1653272
Digital Object Identifier: doi:10.2307/3318720
Project Euclid: euclid.bj/1174324984
Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495--500.
Mathematical Reviews (MathSciNet): MR1890640
Digital Object Identifier: doi:10.1016/S1631-073X(02)02292-6
Devroye, L. and Lugosi, G. (1995). Lower bounds in pattern recognition and learning. Pattern Recognition 28 1011--1018.
Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1720712
Zentralblatt MATH: 0951.60033
Edelsbrunner, H. (1987). Algorithms in Combinatorial Geometry. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR0904271
Zentralblatt MATH: 0634.52001
Haussler, D. (1995). Sphere packing numbers for subsets of the Boolean $n$-cube with bounded Vapnik--Chervonenkis dimension. J. Combin. Theory Ser. A 69 217--232.
Mathematical Reviews (MathSciNet): MR1313896
Digital Object Identifier: doi:10.1016/0097-3165(95)90052-7
Haussler, D., Littlestone, N. and Warmuth, M. (1994). Predicting $\ 0,1\$-functions on randomly drawn points. Inform. and Comput. 115 248--292.
Mathematical Reviews (MathSciNet): MR1304811
Digital Object Identifier: doi:10.1006/inco.1994.1097
Koltchinskii, V. I. (1981). On the central limit theorem for empirical measures. Theor. Probab. Math. Statist. 24 71--82.
Mathematical Reviews (MathSciNet): MR0628431
Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statist. 82. Springer, New York.
Mathematical Reviews (MathSciNet): MR1226450
Zentralblatt MATH: 0833.62039
Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Isoperimetry and Processes. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1102015
Zentralblatt MATH: 0748.60004
Lugosi, G. (2002). Pattern classification and learning theory. In Principles of Nonparametric Learning (L. Györfi, ed.) 1--56. Springer, Vienna.
Mathematical Reviews (MathSciNet): MR1987656
Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808--1829.
Mathematical Reviews (MathSciNet): MR1765618
Digital Object Identifier: doi:10.1214/aos/1017939240
Project Euclid: euclid.aos/1017939240
Massart, P. (2000). Some applications of concentration inequalities to statistics. Probability theory. Ann. Fac. Sci. Toulouse Math. (6) 9 245--303.
Mathematical Reviews (MathSciNet): MR1813803
Massart, P. (2006). Concentration inequalities and model selection. Lectures on Probability Theory and Statistics. Ecole d'Eté de Probabilités de Saint Flour XXXIII. Lecture Notes in Math. 1896. Springer, Berlin. To appear.
Mathematical Reviews (MathSciNet): MR2319879
Massart, P. and Rio, E. (1998). A uniform Marcinkiewicz--Zygmund strong law of large numbers for empirical processes. In Festschrift for Miklós Csörgő: Asymptotic Methods in Probability and Statistics (B. Szyszkowicz, ed.) 199--211. North-Holland, Amsterdam.
Mathematical Reviews (MathSciNet): MR1661481
Zentralblatt MATH: 0933.60015
Digital Object Identifier: doi:10.1016/B978-044450083-0/50013-7
McDiarmid, C. (1989). On the method of bounded differences. In Surveys in Combinatorics 1989 (J. Siemons, ed.) 148--188. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1036755
Zentralblatt MATH: 0712.05012
Pollard, D. (1982). A central limit theorem for empirical processes. J. Austral. Math. Soc. Ser. A 33 235--248.
Mathematical Reviews (MathSciNet): MR0668445
Reynaud-Bouret, P. (2003). Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields 126 103--153.
Mathematical Reviews (MathSciNet): MR1981635
Digital Object Identifier: doi:10.1007/s00440-003-0259-1
Talagrand, M. (1996). New concentration inequalities in product spaces. Invent. Math. 126 505--563.
Mathematical Reviews (MathSciNet): MR1419006
Digital Object Identifier: doi:10.1007/s002220050108
Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135--166.
Mathematical Reviews (MathSciNet): MR2051002
Digital Object Identifier: doi:10.1214/aos/1079120131
Project Euclid: euclid.aos/1079120131
Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer, New York.
Mathematical Reviews (MathSciNet): MR0672244
Zentralblatt MATH: 0499.62005
Vapnik, V. N. and Chervonenkis, A. Ya. (1974). Theory of Pattern Recognition. Nauka, Moscow. (In Russian.)
Mathematical Reviews (MathSciNet): MR0474638
Yang, Y. and Barron, A. R. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564--1599.
Mathematical Reviews (MathSciNet): MR1742500
Digital Object Identifier: doi:10.1214/aos/1017939142
Project Euclid: euclid.aos/1017939142
Yu, B. (1997). Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics (D. Pollard, E. Torgersen and G. L. Yang, eds.) 423--435. Springer, New York.
Mathematical Reviews (MathSciNet): MR1462963
Zentralblatt MATH: 0896.62032

2008 © Institute of Mathematical Statistics