We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classification framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with ways of measuring the “size” of a class of classifiers other than entropy with bracketing as in Tsybakov’s work. In particular, we derive new risk bounds for the ERM when the classification rules belong to some VC-class under margin conditions and discuss the optimality of these bounds in a minimax sense.
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription.
Read more about accessing full-text
Alternatively, the document is available for a cost of $15. Select the "buy article" button below to purchase this document from a secured VeriSign, Inc. site.
References
Barron, A. R., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301--413.
Birgé, L. (2005). A new lower bound for multiple hypothesis testing. IEEE Trans. Inform. Theory 51 1611--1615.
Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 329--375.
Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495--500.
Devroye, L. and Lugosi, G. (1995). Lower bounds in pattern recognition and learning. Pattern Recognition 28 1011--1018.
Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Univ. Press.
Edelsbrunner, H. (1987). Algorithms in Combinatorial Geometry. Springer, Berlin.
Haussler, D. (1995). Sphere packing numbers for subsets of the Boolean $n$-cube with bounded Vapnik--Chervonenkis dimension. J. Combin. Theory Ser. A 69 217--232.
Haussler, D., Littlestone, N. and Warmuth, M. (1994). Predicting $\ 0,1\$-functions on randomly drawn points. Inform. and Comput. 115 248--292.
Koltchinskii, V. I. (1981). On the central limit theorem for empirical measures. Theor. Probab. Math. Statist. 24 71--82.
Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statist. 82. Springer, New York.
Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Isoperimetry and Processes. Springer, Berlin.
Lugosi, G. (2002). Pattern classification and learning theory. In Principles of Nonparametric Learning (L. Györfi, ed.) 1--56. Springer, Vienna.
Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808--1829.
Massart, P. (2000). Some applications of concentration inequalities to statistics. Probability theory. Ann. Fac. Sci. Toulouse Math. (6) 9 245--303.
Massart, P. (2006). Concentration inequalities and model selection. Lectures on Probability Theory and Statistics. Ecole d'Eté de Probabilités de Saint Flour XXXIII. Lecture Notes in Math. 1896. Springer, Berlin. To appear.
Massart, P. and Rio, E. (1998). A uniform Marcinkiewicz--Zygmund strong law of large numbers for empirical processes. In Festschrift for Miklós Csörgő: Asymptotic Methods in Probability and Statistics (B. Szyszkowicz, ed.) 199--211. North-Holland, Amsterdam.
McDiarmid, C. (1989). On the method of bounded differences. In Surveys in Combinatorics 1989 (J. Siemons, ed.) 148--188. Cambridge Univ. Press.
Pollard, D. (1982). A central limit theorem for empirical processes. J. Austral. Math. Soc. Ser. A 33 235--248.
Reynaud-Bouret, P. (2003). Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields 126 103--153.
Talagrand, M. (1996). New concentration inequalities in product spaces. Invent. Math. 126 505--563.
Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135--166.
Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data. Springer, New York.
Vapnik, V. N. and Chervonenkis, A. Ya. (1974). Theory of Pattern Recognition. Nauka, Moscow. (In Russian.)
Yang, Y. and Barron, A. R. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564--1599.
Yu, B. (1997). Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics (D. Pollard, E. Torgersen and G. L. Yang, eds.) 423--435. Springer, New York.