Estimation of panel data models with missing covariate values

Date

2019-05-07

Authors

Coe, Jessie Elizabeth

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This dissertation presents methods for (chapters 1 and 3,) and empirical applications of (chapters 2 and 3,) estimation of panel data models in the presence of missing covariate values. Chapter 1 considers estimation of a linear fixed effects model in which covariate values may be missing. Two inverse probability weighted (IPW) estimators are proposed. The main assumption is a missing at random assumption (MAR) which allows missingness (observation) to be related to the outcome and its shocks, but requires that the probability of observation is not related to the missing values. The inverse of the estimated probability of observation is used to re-weight the estimating equations, which are then estimated in a second stage by either computationally simple pooled OLS, or more asymptotically efficient GMM. Both of the proposed estimators are consistent and [square root] N-asymptotically normal, and the asymptotic variance is derived. The main results are developed for the classical linear fixed effects model under strict exogeneity, and the approach generalizes to many panel models, including dynamic linear unobserved effects models. Chapter 2 revisits the question of the impact of local water quality in local water amenities on housing values, as in (22). Water quality, the main covariate of interest, as measured by the level of dissolved oxygen, is missing for many properties in many time periods. This chapter investigates the sensitivity of estimates of the value of local water quality to the treatment of the missing data. The inverse probability weighted estimator of chapter 1 is compared to the unweighted estimator used in (22). Empirical evidence suggests that the MAR assumption is more palatable than the assumption necessary for the more commonly used unweighted estimator. The estimation results change in both magnitude and statistical significance when the IPW estimator is used. The third chapter considers estimation of a linear fixed effects model under an ignorable missingness assumption, which assumes that observation of the covariates is not directly related to the outcome or the unobserved errors, and includes missing completely at random as a special case. Under this assumption, using the complete data will consistently estimate the coefficients, but may result in a loss of efficiency from the decreased sample size used in estimation. I propose a generalized method of moments (GMM) estimator that uses all the data, is not difficult to implement, and yields potential efficiency gains over the complete data method. For the classical linear fixed effects model with homoskedasticity, efficiency gains are realized in almost all cases. The estimator imputes a value for the missing covariates by including an additional moment in the estimation, and thus accurately accounts for the uncertainty in imputation, unlike common single imputation methods, and does not require a distributional assumption, unlike multiple imputation methods. The assumption required is that the linear projection of the missing covariates onto the fully observed variables is the same for the observed values and the missing values of the covariates. Simulation results show efficiency gains in finite samples, and an empirical illustration based on (3)'s analysis of the effect of life expectancy on economic growth is explored using both the complete data method, and the proposed GMM estimator.

Department

Description

LCSH Subject Headings

Citation