Elsevier

Journal of Econometrics

Volume 214, Issue 2, February 2020, Pages 495-512
Journal of Econometrics

Robust estimation with many instruments

https://doi.org/10.1016/j.jeconom.2019.04.040Get rights and content

Abstract

Linear instrumental variables models are widely used in empirical work, but often associated with low estimator precision. This paper proposes an estimator that is robust to outliers and shows that the estimator is minimax optimal in a class of estimators that includes the limited maximum likelihood estimator (LIML). Intuitively, this optimal robust estimator combines LIML with Winsorization of the structural residuals and the Winsorization leads to improved precision under thick-tailed error distributions. Consistency and asymptotic normality of the estimator are established under many instruments asymptotics and a consistent variance estimator which allows for asymptotically valid inference is provided.

Introduction

The focus of this paper is robust estimation of the structural coefficient in a linear instrumental variables (IV) model with many instruments. Such models have received considerable attention, both because the use of many instruments can lead to efficiency gains (see, e.g., Angrist and Krueger, 1991) and because a flurry of empirical applications are naturally characterized by having a large number of instruments (see e.g., Kling, 2006, Doyle, 2007, Doyle, 2008, Chang and Schoar, 2008, Autor and Houseman, 2010, Maestas et al., 2013, French and Song, 2014, Aizer and Doyle, 2015, Dobbie and Song, 2015, Jacobsen and Van Benthem, 2015). In models with multiple instruments, it is well understood that inference based on the commonly employed two-stage least squares estimator (2SLS) and standard asymptotics may give misleading confidence intervals. Inference based on LIML and many instruments asymptotics – where the number of instruments and the sample size are allowed to be proportional – tends to correct this problem (Kunitomo, 1980, Morimune, 1983, Bekker, 1994).

A further motivation of LIML is that it is an efficient estimator when the errors of the model are jointly normal (Chioda and Jansson, 2009). However, presence of gross outliers among the errors is common in empirical applications,1 and such thick-tailed error distributions can render LIML inefficient or even inconsistent in extreme cases. As an illustration, this paper takes the model and data from Angrist and Krueger (1991) and documents that the structural errors are distributed roughly like a normal distribution contaminated with gross outliers. This example highlights the need to understand if alternatives to LIML can be more efficient under thick-tailed deviations from normal errors.

The main contribution of this paper is to propose an optimal robust estimator of the structural coefficient in a linear IV model. Intuitively, this estimator combines LIML with Winsorization of the structural residuals, and the Winsorization bounds the influence of outliers among the errors. This robustness property makes the estimator more efficient than LIML when the structural errors have thick tails, and the efficiency gain is approximately 80 percent in the empirical example, in the sense that LIML would need an 80 percent larger sample to achieve the same level of precision.

The proposed robust estimator is shown to be minimax optimal. This optimality result is derived in three steps. First, the paper proposes a new class of generalized method of moments estimators (GMM) for the linear IV model. Two particular members of this class correspond to LIML and the optimal robust estimator. The paper then shows that each estimator of the structural coefficient is consistent and asymptotically normal at the usual parametric rate under many instruments asymptotics. Second, the article characterizes an optimal estimator within the class which minimizes asymptotic variance when the shape of the joint error distribution is known. For example, if the joint error distribution is normal, then the optimal estimator is LIML. Finally, the paper uses this optimality result to show that the optimal robust estimator is the member of the class which minimizes the maximal asymptotic variance over a neighborhood of contaminated normal distributions, i.e., over mixtures between the standard normal distribution (with high probability) and some unknown contaminating distribution (with low probability). This approach mirrors the one taken by Huber, 1964, Huber, 1973, Huber, 1981 in the context of the classical robust regression model, and the optimal robust estimator derived here treats the structural residuals of the IV model the same way as Huber’s minimax estimator treats the residuals of the regression model.

These contributions add to a growing literature on many instruments asymptotics that started with Kunitomo (1980) and Morimune (1983), who derived asymptotic variances for LIML that are larger than the usual IV formulas and depend on the number of instruments. Bekker (1994) provided consistent estimators of these larger variances under normal errors, and Hansen et al. (2008) extended the variance formulas and estimators to allow for nonnormal errors.2 This paper expands the class of asymptotically normal estimators to include robust alternatives to LIML and provides formulas for their asymptotic variances that are natural extensions of the existing formulas. In addition, this paper provides consistent variance estimators inspired by the GMM setup. However, these variance estimators differ from the classical “sandwich” type estimators used with GMM (Newey and McFadden, 1994 section 4) and are also new in the well-studied special case of LIML.

This paper also adds to the literature on efficiency and robust estimation in the linear IV model. Anderson et al. (2010) showed optimality of LIML among estimators that are functions of the sufficient statistics from the normal model. Under normality of the errors, Chioda and Jansson (2009) showed optimality of LIML among estimators that are invariant to rotations of those sufficient statistics. The optimality results of this paper are complementary to the existing literature, as they imply optimality of LIML under normal errors, but for a different class of estimators than previously considered. More importantly, they also bring a new perspective to these results by presenting estimators that are robust and more efficient than LIML under nonnormal errors and many instruments. In models with a fixed number of instruments there exist multiple examples of such robust estimators or estimators that can be more efficient than LIML under nonnormal errors. Examples are the two-stage least absolute deviations estimator (Amemiya, 1982, Powell, 1983), the resistant estimator of Krasker and Welsch (1985), the two-stage quantiles and two-stage trimmed least squares estimators (Chen and Portnoy, 1996), the IV quantile regression estimator (Chernozhukov and Hansen, 2006), the robust estimators of Honoré and Hu (2004), the nonlinear IV estimators of Hansen et al. (2010), and the adaptive estimator of Cattaneo et al. (2012). However, I am unaware of papers that establish consistency or asymptotic normality of these estimators under many instruments asymptotics.

Finally, this paper makes an additional contribution of potentially independent interest. The contribution is to give high-level conditions for asymptotic normality of a single element of a GMM estimator with dimension proportional to the sample size. Following Huber (1967), there have been numerous papers giving high-level conditions in GMM setups. See, e.g., Hansen (1982), Pakes and Pollard (1989), Andrews (1994), Newey (1994), Newey and McFadden (1994), Ai and Chen (2003), Chen et al. (2003), Chen (2007) and Newey and Windmeijer (2009). These papers cover cases of smooth and non-smooth objective functions and parametric and semi-parametric estimators, but all of them rely on an intermediate result of consistency of the estimator for some pseudo-true, non-random value. In contrast, this paper allows for the estimator to have a random, sample-dependent “limit”. This is a necessary extension, as the reduced form parameters in the linear IV model do not settle down around some non-random value under many instruments asymptotics. This paper presents results for smooth and non-smooth objective functions, and verifies the high-level conditions for examples that are differentiable or Lipschitz continuous.

The next section defines the model, describes the class of estimators, and presents the associated variance estimators. Section 3 gives high-level conditions for consistency and asymptotic normality, and Section 4.1 verifies these for each estimator in the class. Section 4.2 shows consistency of the variance estimators, and 4.3 derives the minimax property of the optimal robust estimator. Sections 5 Simulations, 6 Quarter of birth and returns to schooling present simulation results and the empirical example provided by Angrist and Krueger (1991). Proofs are in a supplemental appendix (SA).

For a vector v, v=vv denotes the Euclidean norm. λmin(A) and λmax(A) are the smallest and largest eigenvalues for a symmetric matrix A. For an arbitrary matrix A, A=λmax(AA)12 gives the largest singular value of A, and σmin(A)=λmin(AA)12 returns the smallest singular value of A. For any absolutely continuous function f:RR, let f be the derivative of f where it exists and zero otherwise. Let {ani}i,n be shorthand for a triangular array {ani:i{1,,n},nN}, and let {ani}i be shorthand for a row of that array {ani:i{1,,n}}. The distribution function of the standard normal distribution is denoted Φ. Limits are considered as n unless otherwise noted.

Section snippets

Model and estimators

Consider a linear IV model with two endogenous and kn instrumental variables. The model consists of a structural and a reduced form equation given by yin=xinβ0+wiδ0+εixin=zinπ0n+wiη0+ui(i=1,,n) where the unobserved stochastic errors εi,uiR are potentially dependent so that both yin and xin are endogenous. wiRG denotes a vector of exogenous variables that includes an intercept and zinRkn is a vector of instruments. The parameter of interest is β0R while (π0n,δ0,η0)Rkn+2G serves as

High-level conditions for asymptotic normality

This section gives high-level conditions for asymptotic normality of a single element of a GMM estimator with dimension proportional to the sample size, and Section 4 verifies these high-level conditions for the estimators of the previous section. The results apply to just-identified GMM estimators θˆ of θ0Rp, where pnα[0,1), mn(θˆ)infθΘnRpmn(θ)+opn12,and the first entry of θ0, say β0, is the object of interest. Thus, the results can be seen as extensions of Pakes and Pollard

Asymptotic normality, inference, and optimality

This section presents three results. First, it gives primitive conditions on the model and estimators of Section 2 that are sufficient for the high-level conditions of Section 3 and therefore sufficient for asymptotic normality. Second, it presents a consistency result for the asymptotic variance estimators. Third, it characterizes the functions ϕ and ψ that lead to an optimal estimator or to the optimal robust estimator.

Simulations

This section presents the results of a simulation study which shows that the asymptotic results give good approximations to the finite sample behavior of the estimators considered in this paper. The simulations consider twelve estimators which are the optimal robust estimator, LIML, 2SLS, five other combinations of ϕ and ψ as one of the Huber, Cauchy, or Gauss scores, the NLIV estimator of Hansen et al. (2010), the adaptive estimator of Cattaneo et al. (2012), the LASSO adaptation of 2SLS by 

Quarter of birth and returns to schooling

This section considers the empirical example provided by the Angrist and Krueger (1991) study of the returns to schooling using quarter of birth as an instrument. The data comes from the 1980 U.S. Census and includes 329,509 males born 1930–1939. The structural equation includes a constant, year, and state dummies, and the reduced form equation includes 180 instruments which are quarter of birth times year or state of birth. This model corresponds to table 7 of Angrist and Krueger (1991). In

Summary

This paper proposed an optimal robust estimator in a linear IV model with many instruments and showed that it is consistent and asymptotically normal under many instruments asymptotics. The optimality of the estimator was shown to be in terms of minimax asymptotic variance over a neighborhood of contaminated normal distributions, and the optimal robust estimator can be substantially more efficient than LIML under thick-tailed error distributions. Furthermore, the paper provided a simple to use

Acknowledgments

I am grateful to Michael Jansson, Jim Powell, Demian Pouzo, and Noureddine El Karoui for valuable advice, and thank the editor, two anonymous referees, and seminar participants at UC Berkeley, DAEiNA, University of Michigan, University of Virginia, UNC Chapel Hill, Cornell, Chicago Booth, UW Madison, UC Davis, Aarhus University, University of Copenhagen, and University of Connecticut for helpful comments.

References (63)

  • WangW. et al.

    Bootstrap inference for instrumental variable models with many weak instruments

    J. Econometrics

    (2016)
  • AiC. et al.

    Efficient estimation of models with conditional moment restrictions containing unknown functions

    Econometrica

    (2003)
  • AizerA. et al.

    Juvenile incarceration, human capital, and future crime: Evidence from randomly-assigned judges

    Q. J. Econ.

    (2015)
  • AmemiyaT.

    Two stage least absolute deviations estimators

    Econometrica

    (1982)
  • AnatolyevS. et al.

    Asymptotics of diagonal elements of projection matrices under many instruments/regressors

    Econometric Theory

    (2016)
  • AngristJ.D. et al.

    Does compulsory school attendance affect schooling and earnings?

    Q. J. Econ.

    (1991)
  • AutorD.H. et al.

    Do temporary-help jobs improve labor market outcomes for low-skilled workers? evidence from “work first”

    Am. Econ. J. Appl. Econ.

    (2010)
  • BeanD. et al.

    Optimal m-estimation in high-dimensional regression

    Proc. Natl. Acad. Sci.

    (2013)
  • BekkerP.A.

    Alternative approximations to the distributions of instrumental variable estimators

    Econometrica

    (1994)
  • BekkerP.A. et al.

    Instrumental variable estimation based on grouped data

    Stat. Neerl.

    (2005)
  • BelloniA. et al.

    Sparse models and methods for optimal instruments with an application to eminent domain

    Econometrica

    (2012)
  • ChamberlainG. et al.

    Random effects estimators with many instrumental variables

    Econometrica

    (2004)
  • ChangT. et al.

    Judge specific differences in chapter 11 and firm outcomes

  • ChaoJ.C. et al.

    Consistent estimation with a large number of weak instruments

    Econometrica

    (2005)
  • ChaoJ.C. et al.

    Asymptotic distribution of jive in a heteroskedastic iv regression with many instruments

    Econometric Theory

    (2012)
  • ChatterjeeS.

    A new method of normal approximation

    Ann. Probab.

    (2008)
  • ChenX. et al.

    Estimation of semiparametric models when the criterion function is not smooth

    Econometrica

    (2003)
  • ChenL.-A. et al.

    Two-stage regression quantiles and two-stage trimmed least squares estimators for structural equation models

    Comm. Statist. Theory Methods

    (1996)
  • ChiodaL. et al.

    Optimal invariant inference when the number of instruments is large

    Econometric Theory

    (2009)
  • DobbieW. et al.

    Debt relief and debtor outcomes: Measuring the effects of consumer bankruptcy protection

    Am. Econ. Rev.

    (2015)
  • DoyleJ.J.

    Child protection and child outcomes: Measuring the effects of foster care

    Am. Econ. Rev.

    (2007)
  • Cited by (10)

    View all citing articles on Scopus
    View full text