On semiparametric -estimation in single-index regression
Introduction
Consider the problem of estimating a regression function from independent copies of a random vector . In GLM (generalized linear models; e.g., McCullagh and Nelder, 1989) it is assumed that with known. Hereafter, is a notation for when x, . The function is the inverse of the so-called link function. Moreover, the conditional density of Y given belongs to the linear exponential family, that is where B, C and D are known functions.
A natural extension of GLM is provided by the semiparametric single-index models (SIM), where one only assumes the existence of some (unique up to a scale normalization factor) such thatthat is , with unknown . Since the regression depends on , hereafter, we shall write instead of . In SIM framework, both and are to be estimated. Numerous semiparametric approaches for root- consistent estimation of have been proposed: -estimation (e.g., Ichimura, 1993; Sherman, 1994b; Delecroix and Hristache, 1999; Xia and Li, 1999; Xia et al., 1999), direct (average derivative based) estimation (e.g., Powell et al., 1989; Härdle and Stoker, 1989; Hristache et al., 2001a, Hristache et al., 2001b), iterative methods (e.g., Weisberg and Welsh, 1994; Chiou and Müller, 1998; Bonneu and Gba, 1998; Xia and Härdle, 2002).
Typically, the semiparametric M-estimators mentioned above can be written aswhere is, for instance, the leave-one-out Nadaraya–Watson estimator (with bandwidth h) of , is a contrast function and is a so-called trimming function introduced to guard against small values for the denominators appearing in . Finally, the regression function is estimated by . Other smoothers, such as local polynomials and splines, can replace the Nadaraya–Watson estimator.
In order to estimate and , two smoothing parameters seem to be necessary. First, after choosing a primary bandwidth h, the estimator is computed as in (1.2). Afterwards, is estimated by , a kernel estimator, with bandwidth , of the expectation of Y given . The rates of decay for the two bandwidths should verify some conditions. When , Härdle et al. (1993) defined more directlyHere, the trimming function is , the indicator function of the set A, and A is fixed, bounded and strictly included in the support of X. The regression can be then estimated by .
In this paper we consider a class of semiparametric -estimators defined by a general function . Moreover, we provide an automatic and natural choice of the smoothing parameter h used to define the estimator . This bandwidth has also some optimal properties for the nonparametric regression. In particular, it is of order . To achieve these goals we extend Härdle, Hall and Ichimura's idea, that is, given a function , we maximize the semiparametric -estimation criterion (1.2) simultaneously in and h. For simplicity we use a leave-one-out Nadaraya–Watson estimation of the regression function, although this approach could be applied for other smoothers like, for instance, local polynomials. Our proofs allow for discrete covariates and do not require a preliminary (pilot) estimator of having a suitable rate of convergence in probability , .
The methodology we propose allows to build efficient estimators of under suitable additional model assumptions. Moreover, it can be extended and applied to a multi-index framework, that is when there exists , , such that (see Ichimura and Lee, 1991; Picone and Butler, 2000). Finally, if the probabilistic results on -processes we use in the proofs could be extended to non-i.i.d. data, our theoretical results could be adapted easily to such a case.
The paper is organized as follows. Existing results on semiparametric -estimation are reviewed in Section 2. Moreover, the gaps our paper aims to fulfill are clearly described. The methodology we use for the theoretical results is depicted in Section 3. As in Härdle et al. (1993), the basic idea is to show that joint maximization in and h is asymptotically equivalent to separate maximization of a purely parametric term with respect to and of a purely nonparametric term with respect to h. In this way we derive the asymptotic normality of , while for we obtain an asymptotic equivalence with a theoretical “optimal” bandwidth maximizing the quantity where is the density of and c is some positive constant. We call this quantity a (cross-validation) function. When , the usual cross-validation function from nonparametric smoothing is recovered up to a change of sign (Clark, 1975). In general, we show that maximizing the function is asymptotically equivalent to minimizing a weighted (mean-squared) cross-validation function. Chiou and Müller, 1998, Chiou and Müller, 1999 provide empirical evidence supporting the idea of choosing the bandwidth using other criteria than the usual cross-validation function. Their nonparametric quasi-likelihood criterion is closely related to a . Our theoretical results are stated in Section 4. Section 5 contains some empirical evidence. It is shown that other functions than the usual may provide -estimators with better performances. The choice of acts on the performances of in two ways, through the asymptotic variance and through the optimal choice of h based on the function. The two effects are discussed. Some comments and conclusions are given in Section 6. The assumptions and the technical proofs are provided in the appendices.
Let us end this introduction noticing that it is not clear, a priori, whether an optimal bandwidth for the regression function is also optimal for the estimation of the parameter . As pointed out by a referee, to find the optimal bandwidth for is of theoretical interest but quite difficult since it involves higher order asymptotic expansions of the semiparametric estimator. This refinement lies beyond the scope of our paper.
Section snippets
Possible choices of
Flexibility in the choice of the function could be helpful, for instance, when the interest is focused on efficiency, goodness-of-fit or robustness. Sherman (1994b) and Delecroix and Hristache (1999) seem to be the only papers on semiparametric -estimation allowing to belong to a large class of functions.
Apart some technical aspects, our theoretical findings are based on two conditions ensuring that joint maximization in and h as in (1.3) is asymptotically equivalent to splitting
Methodology
To ensure the estimability of the parameter , let us fix its first component to 1 and identify with its last components. More precisely, from now on will be a vector of and , with , denotes the matrix product . Accordingly, the parameter set is a subset of . Finally, without loss of generality, assume that .
Given and the constants , defineand take , . Let , be a preliminary consistent estimator of
The main results
Assume that the parameter set is compact with nonvoid interior. Defineand Theorem 4.1 Suppose that the assumptions of Appendix A hold and X is bounded. If is defined as in (3.2), then , in probability, and If X is unbounded, consider a sequence of real numbers such that
Empirical evidence
In order to illustrate the finite sample properties of our estimator, we conducted a simulation study using a SAS 8.1 program. For optimization we used the NLPNRA routine of SAS/IML software. This routine is based on a Newton–Raphson method. All the estimates reported in this section were obtained with a quartic kernel .
In the first experiment, the data were generated in the following way:
- 1.
:
Conclusions
We introduce a large class of semiparametric -estimators for single-index models and we show their asymptotic normality. The estimates are obtained as maximizers of a criterion where a nonparametric kernel estimator is used to estimate the conditional expectation . It is well-known that the (first order) asymptotics of do not depend on the choice of h, provided that h satisfies some conditions. The decomposition
References (38)
Semiparametric least squares (SLS) and weighted SLS estimation of single-index models
J. Econometrics
(1993)Nonparametric kernel estimation for semiparametric models
Econometric Theory
(1995)- et al.
Estimation semi-paramétrique de quasi-score
Bull. Belg. Math. Soc.
(1998) - et al.
Théorie de l’estimation fonctionnelle
(1987) - et al.
Generalized partially linear single-index models
J. Amer. Statist. Assoc.
(1997) Asymptotically efficient estimation in semiparametric generalized linear models
Ann. Statist.
(1995)- et al.
Quasi-likelihood regression with unknown link and variance functions
J. Amer. Statist. Assoc.
(1998) - et al.
Nonparametric quasi-likelihood
Ann. Statist.
(1999) A calibration curve for radio carbon dates
Antiquity
(1975)- et al.
-estimateurs semi-paramétriques dans les modèles à direction révélatrice unique
Bull. Belg. Math. Soc.
(1999)
Optimal robust -estimates of location
Ann. Statist.
Pseudo maximum likelihood methodstheory
Econometrica
Robust Statistics. The Approach Based on Influence Functions
Optimal bandwidth selection in nonparametric regression function estimation
Ann. Statist.
Investigating smooth multiple regression by the method of average derivatives
J. Amer. Statist. Assoc.
Optimal smoothing in single-index models
Ann. Statist.
Structure adaptive approach for dimension reduction
Ann. Statist.
Direct estimation of the index coefficient in a single-index model
Ann. Statist.
Cited by (0)
- 1
Part of the work for this paper was accomplished while this author was at LEO, Université d’Orléans.