On semiparametric M-estimation in single-index regression

https://doi.org/10.1016/j.jspi.2004.09.006Get rights and content

Abstract

In this paper we analyze a large class of semiparametric M-estimators for single-index models, including semiparametric quasi-likelihood and semiparametric maximum likelihood estimators. Some possible applications to robustness are also mentioned. The definition of these estimators involves a kernel regression estimator for which a bandwidth rule is necessary. Given the semiparametric M-estimation problem, we propose a natural bandwidth choice by joint maximization of the M-estimation criterion with respect to the parameter of interest and the bandwidth. In this way we extend a methodology first introduced by Härdle et al. (Ann. Statist. 21 (1993) 157) for semiparametric least-squares. We prove asymptotic normality for our semiparametric estimator. We derive the asymptotic equivalence between our bandwidth and the optimal bandwidth obtained through weighted cross-validation. Empirical evidence obtained from simulations suggests that our bandwidth improves the higher order asymptotics of the semiparametric M-estimator when it replaces the usual bandwidth chosen by cross-validation.

Introduction

Consider the problem of estimating a regression function m(x)=E(Y|X=x) from independent copies (Y1,X1T)T,,(Yn,XnT)T of a random vector (Y,XT)TRd+1. In GLM (generalized linear models; e.g., McCullagh and Nelder, 1989) it is assumed that m(x)=r0(xθ0) with r0 known. Hereafter, xθ is a notation for xTθ when x, θRd. The function r0 is the inverse of the so-called link function. Moreover, the conditional density fY|X=x of Y given X=x belongs to the linear exponential family, that is fY|X=x(y)=exp[B(r0(xθ0))+C(r0(xθ0))y+D(y)],where B, C and D are known functions.

A natural extension of GLM is provided by the semiparametric single-index models (SIM), where one only assumes the existence of some θ0Rd (unique up to a scale normalization factor) such thatE(YX)=E(YXθ0),that is m(x)=r0(xθ0), with unknown r0. Since the regression r0(t)=E(YXθ0=t) depends on θ0, hereafter, we shall write rθ0 instead of r0. In SIM framework, both θ0 and rθ0 are to be estimated. Numerous semiparametric approaches for root-n consistent estimation of θ0 have been proposed: M-estimation (e.g., Ichimura, 1993; Sherman, 1994b; Delecroix and Hristache, 1999; Xia and Li, 1999; Xia et al., 1999), direct (average derivative based) estimation (e.g., Powell et al., 1989; Härdle and Stoker, 1989; Hristache et al., 2001a, Hristache et al., 2001b), iterative methods (e.g., Weisberg and Welsh, 1994; Chiou and Müller, 1998; Bonneu and Gba, 1998; Xia and Härdle, 2002).

Typically, the semiparametric M-estimators mentioned above can be written asθ^=argmaxθ1ni=1nψYi,r^θ,hi(Xiθ)τn(Xi),where r^θ,hi(t) is, for instance, the leave-one-out Nadaraya–Watson estimator (with bandwidth h) of rθ(t)=E(YXθ=t), -ψ is a contrast function and τn(·) is a so-called trimming function introduced to guard against small values for the denominators appearing in r^θ,hi(t). Finally, the regression function m(x) is estimated by r^θ^,h(xθ^). Other smoothers, such as local polynomials and splines, can replace the Nadaraya–Watson estimator.

In order to estimate θ0 and rθ0(·θ0), two smoothing parameters seem to be necessary. First, after choosing a primary bandwidth h, the estimator θ^ is computed as in (1.2). Afterwards, rθ0(xθ0) is estimated by r^θ^,h*(xθ^), a kernel estimator, with bandwidth h*, of the expectation of Y given xθ^. The rates of decay for the two bandwidths should verify some conditions. When ψ(y,r)=-(y-r)2, Härdle et al. (1993) defined more directlyθ^,h^=argmaxθ,h1ni=1nψYi,r^θ,hi(Xiθ)IAXi.Here, the trimming function is IA(·), the indicator function of the set A, and A is fixed, bounded and strictly included in the support of X. The regression rθ0(·θ0) can be then estimated by r^θ^,h^(·θ^).

In this paper we consider a class of semiparametric M-estimators defined by a general function ψ. Moreover, we provide an automatic and natural choice of the smoothing parameter h used to define the estimator θ^. This bandwidth has also some optimal properties for the nonparametric regression. In particular, it is of order n-1/5. To achieve these goals we extend Härdle, Hall and Ichimura's idea, that is, given a function ψ, we maximize the semiparametric M-estimation criterion (1.2) simultaneously in θ and h. For simplicity we use a leave-one-out Nadaraya–Watson estimation of the regression function, although this approach could be applied for other smoothers like, for instance, local polynomials. Our proofs allow for discrete covariates and do not require a preliminary (pilot) estimator of θ0 having a suitable rate of convergence in probability OP(n-δ), δ>0.

The methodology we propose allows to build efficient estimators of θ0 under suitable additional model assumptions. Moreover, it can be extended and applied to a multi-index framework, that is when there exists θ01,,θ0pRd, p<d, such that E(YX)=E(YXθ01,,Xθ0p)(see Ichimura and Lee, 1991; Picone and Butler, 2000). Finally, if the probabilistic results on U-processes we use in the proofs could be extended to non-i.i.d. data, our theoretical results could be adapted easily to such a case.

The paper is organized as follows. Existing results on semiparametric M-estimation are reviewed in Section 2. Moreover, the gaps our paper aims to fulfill are clearly described. The methodology we use for the theoretical results is depicted in Section 3. As in Härdle et al. (1993), the basic idea is to show that joint maximization in θ and h is asymptotically equivalent to separate maximization of a purely parametric term with respect to θ and of a purely nonparametric term with respect to h. In this way we derive the asymptotic normality of θ^, while for h^ we obtain an asymptotic equivalence with a theoretical “optimal” bandwidth maximizing the quantity 1ni=1nψ(Yi,r^θ0,hi(Xiθ0))I{x:fθ0(xθ0)c}(Xi),where fθ0 is the density of Xθ0 and c is some positive constant. We call this quantity a ψ-CV (cross-validation) function. When ψ(y,r)=-(y-r)2, the usual cross-validation function from nonparametric smoothing is recovered up to a change of sign (Clark, 1975). In general, we show that maximizing the ψ-CV function is asymptotically equivalent to minimizing a weighted (mean-squared) cross-validation function. Chiou and Müller, 1998, Chiou and Müller, 1999 provide empirical evidence supporting the idea of choosing the bandwidth using other criteria than the usual cross-validation function. Their nonparametric quasi-likelihood criterion is closely related to a ψ-CV. Our theoretical results are stated in Section 4. Section 5 contains some empirical evidence. It is shown that other functions ψ than the usual ψ(y,r)=-(y-r)2 may provide M-estimators θ^ with better performances. The choice of ψ acts on the performances of θ^ in two ways, through the asymptotic variance and through the optimal choice of h based on the ψ-CV function. The two effects are discussed. Some comments and conclusions are given in Section 6. The assumptions and the technical proofs are provided in the appendices.

Let us end this introduction noticing that it is not clear, a priori, whether an optimal bandwidth for the regression function is also optimal for the estimation of the parameter θ. As pointed out by a referee, to find the optimal bandwidth for θ is of theoretical interest but quite difficult since it involves higher order asymptotic expansions of the semiparametric estimator. This refinement lies beyond the scope of our paper.

Section snippets

Possible choices of ψ

Flexibility in the choice of the function ψ(y,r) could be helpful, for instance, when the interest is focused on efficiency, goodness-of-fit or robustness. Sherman (1994b) and Delecroix and Hristache (1999) seem to be the only papers on semiparametric M-estimation allowing ψ to belong to a large class of functions.

Apart some technical aspects, our theoretical findings are based on two conditions ensuring that joint maximization in θ and h as in (1.3) is asymptotically equivalent to splitting

Methodology

To ensure the estimability of the parameter θ, let us fix its first component to 1 and identify θ with its last d-1 components. More precisely, from now on θ will be a vector of Rd-1 and xθ, with xRd, denotes the matrix product (1,θT)x. Accordingly, the parameter set Θ is a subset of Rd-1. Finally, without loss of generality, assume that ψ(·,·)0.

Given 1/8<β1<β2<1/4 and the constants c1,c2>0, defineHn=h:c1n-β2hc2n-β1and take hnHn, n1. Let θn, n1 be a preliminary consistent estimator of θ0

The main results

Assume that the parameter set ΘRd-1 is compact with nonvoid interior. DefineC1=K124E12222ψ(rθ0(Xθ0),rθ0(Xθ0))rθ0(Xθ0)+2rθ0(Xθ0)fθ0(Xθ0)fθ0(Xθ0)2IA(X),C2=K2E12222ψ(rθ0(Xθ0),rθ0(Xθ0))1fθ0(Xθ0)vθ0(Xθ0)IA(X)and hnopt=argmaxh(C1h4+C2n-1h-1)=(C2/4C1)1/5n-1/5.

Theorem 4.1

Suppose that the assumptions of Appendix A hold and X is bounded. If (θ^,h^) is defined as in (3.2), then h^/hnopt1, in probability, and n(θ^-θ0)DN(0,W0-1M0W0-1).If X is unbounded, consider a sequence of real numbers {dn} such that dnlnn0

Empirical evidence

In order to illustrate the finite sample properties of our estimator, we conducted a simulation study using a SAS 8.1 program. For optimization we used the NLPNRA routine of SAS/IML software. This routine is based on a Newton–Raphson method. All the estimates reported in this section were obtained with a quartic kernel K(u)=(15/16)(u2-1)2I[-1,1](u).

In the first experiment, the data were generated in the following way:

  • 1.

    Xi=(Xi(1),Xi(2),Xi(3),Xi(4))TR4:Xi(1)N(0,1/4),Xi(2)B(1,1/2),Xi(3)N(0,1/4),X

Conclusions

We introduce a large class of semiparametric M-estimators for single-index models and we show their asymptotic normality. The estimates are obtained as maximizers of a criterion S^(θ,h)=1ni=1nψ(Yi,r^θ,h(Xiθ))τn(Xi),where a nonparametric kernel estimator r^θ,h is used to estimate the conditional expectation rθ(·)=E(Y|Xθ=·). It is well-known that the (first order) asymptotics of θ^=argminθΘS^(θ,h) do not depend on the choice of h, provided that h satisfies some conditions. The decomposition S^(θ

References (38)

  • H. Ichimura

    Semiparametric least squares (SLS) and weighted SLS estimation of single-index models

    J. Econometrics

    (1993)
  • D.W.K. Andrews

    Nonparametric kernel estimation for semiparametric models

    Econometric Theory

    (1995)
  • M. Bonneu et al.

    Estimation semi-paramétrique de quasi-score

    Bull. Belg. Math. Soc.

    (1998)
  • D. Bosq et al.

    Théorie de l’estimation fonctionnelle

    (1987)
  • R.J. Carroll et al.

    Generalized partially linear single-index models

    J. Amer. Statist. Assoc.

    (1997)
  • H. Chen

    Asymptotically efficient estimation in semiparametric generalized linear models

    Ann. Statist.

    (1995)
  • J.-M. Chiou et al.

    Quasi-likelihood regression with unknown link and variance functions

    J. Amer. Statist. Assoc.

    (1998)
  • J.-M. Chiou et al.

    Nonparametric quasi-likelihood

    Ann. Statist.

    (1999)
  • R.M. Clark

    A calibration curve for radio carbon dates

    Antiquity

    (1975)
  • M. Delecroix et al.

    M-estimateurs semi-paramétriques dans les modèles à direction révélatrice unique

    Bull. Belg. Math. Soc.

    (1999)
  • Delecroix, M., Hristache, M., Patilea, V., 2004. On semiparametric M-estimation in single-index regression. Working...
  • R. Fraiman et al.

    Optimal robust M-estimates of location

    Ann. Statist.

    (2001)
  • C. Gouriéroux et al.

    Pseudo maximum likelihood methodstheory

    Econometrica

    (1984)
  • F.R. Hampel et al.

    Robust Statistics. The Approach Based on Influence Functions

    (1986)
  • W. Härdle et al.

    Optimal bandwidth selection in nonparametric regression function estimation

    Ann. Statist.

    (1985)
  • W. Härdle et al.

    Investigating smooth multiple regression by the method of average derivatives

    J. Amer. Statist. Assoc.

    (1989)
  • W. Härdle et al.

    Optimal smoothing in single-index models

    Ann. Statist.

    (1993)
  • M. Hristache et al.

    Structure adaptive approach for dimension reduction

    Ann. Statist.

    (2001)
  • M. Hristache et al.

    Direct estimation of the index coefficient in a single-index model

    Ann. Statist.

    (2001)
  • Cited by (0)

    1

    Part of the work for this paper was accomplished while this author was at LEO, Université d’Orléans.

    View full text