Estimating multi-way error components models with unbalanced data structures

https://doi.org/10.1016/S0304-4076(01)00087-2Get rights and content

Abstract

I develop simple matrix algebra techniques that simplify and unify much of the previous literature on estimating error components models (ECMs). In fact, the simple analytic results provided here are useful for analyzing a very broad set of models with complex error structures. To illustrate the techniques, I develop the algebra for three- and four-way ECMs explicitly. In addition, I provide Monte Carlo simulation evidence on the performance of several estimators for the three-way ECM and estimate the model using data from a retail market where the three dimensions of data variation are products selling in many locations over time.

Introduction

The error components model (ECM) is one of the most frequently used econometric models for panel data. Work by Balestra and Nerlove (1973), Wallace and Hussain (1969), Amemiya (1971), Nerlove 1971a, Nerlove 1971b, Mazodier (1972), and Fuller and Batese (1974) among others, provide a comprehensive analysis of the one- and two-way ECM for balanced panels. In practice, researchers often have to deal with missing observations. Consequently, Baltagi (1985) and Wansbeek and Kapteyn (1989) (henceforth WK), respectively, developed the one- and two-way ECM for unbalanced data.

In a related literature, the nested ECM has also been developed. Most recently, Baltagi et al. (2001) and Antweiler (2001), respectively, develop the two- and three-way nested ECM for unbalanced panels, building on earlier work by Fuller and Battesse (1973). The nested ECM has proven particularly useful for datasets with a natural grouping structure. For example, data on firms may be grouped by industry.

The aim of this paper is to extend the literature in a number of directions. First, I provide a series of mathematical results which allow an elegant analysis of the ECM for unbalanced panel data with an arbitrary number of error components.1 I illustrate the results by developing estimators for the three-way ECM in the text while results for the four-way model are presented in the appendix.2 By developing the paper's results as recurrence relations, the extensions to the five- or more-way ECMs are easily developed, while the one- and two-way ECMs are observed as special cases.

Second, I show that the standard and nested ECMs have the same mathematical structure and therefore the results I present can be applied to both. In fact, the results are useful for estimating a broad array of models with complex error structures. In each case, modeling the correlation structure among the errors is important because it allows efficient estimation of the parameters of interest and consistent estimates of the standard errors of the parameters. Naturally, the latter is crucially important for performing inference correctly.

Third, whenever possible, I relax WK's assumption that all regressors are exogenous, thereby allowing estimation of the general ECM using instrumental variables. Applied demand analysis motivated these extensions of the literature and this aspect of the results is particularly important in that and other contexts.3

The paper proceeds as follows: 2 The fixed effects model, 3 The random effects model develop the multi-way fixed- and random-effects models, respectively, for given estimates of the variance of each error component. In Section 4 quadratic unbiased estimators (QUEs), minimum norm quadratic unbiased estimators (MINQUEs) and maximum likelihood estimators (MLEs) of the variance of each error component are developed. Section 5 provides Monte Carlo evidence on estimators of the three-way ECM. In Section 6, I apply the results by estimating a differentiated product demand equation using a new dataset from a retail market. Section 7 concludes.

Section snippets

The fixed effects model

Consider the three-way error components linear regression model:yijk=Xijkβ+uijkwhereuijkijkijkand i=1,…,N1 might index products selling in location j=1,…,N2 at date k=1,…,N3. In that case, N1 would denote the total number of different products ever observed being sold at any location in any time period. Similarly, N2 and N3 would denote the number of locations and time periods for which data are available. Let n denote the total number of observations in the dataset.

Using matrix

The random effects model

Eq. (2) describes the error structure of the three-way random effects model, when μ,λ,γ and ν are random vectors. I assume μ,λ,γ and ν are independent of each other and among themselves with zero means and variances σ12,σ22,σ32 and σ02, respectively. Thus, Ω≡E[uu′]=σ02In12Δ1Δ1′+σ22Δ2Δ2′+σ32Δ3Δ3.

Given a consistent estimate of Ω−1, a set of instruments Z and the identification condition that E[u(X,y;β0)|Z,Δ]=0 at the true parameter β0, we may use the GMM framework developed by Hansen (1982) to

Estimation of the variance components

The expression for Ω−1 reduces computational time considerably relative to inverting Ω numerically given any specified values of the variances of the error components. Various methods are available to estimate the variance components themselves. In this section, I develop analysis of variance type Quadratic Unbiased Estimators (QUEs) analogous to WK's QUEs for the two-way model. Then I present the Minimum Norm Quadratic Unbiased Estimator (MINQUE) developed by Rao (1971) and finally Maximum

Design of the Monte Carlo study

While previous Monte Carlo studies of ECMs with unbalanced data have considered designs which emphasize the traditional panel data context with observations missing in only one dimension (individuals are only observed in some time periods), I consider the simplest possible regression model with a three-way error component structure and data sets which are symmetrically unbalanced in multiple dimensions. Specifically, I use the data generating processyijk=xijkβ+μijkijk,where the exogenous

Application: the demand for retail products

Berry (1994) demonstrates that discrete choice random utility models of differentiated product demand may be estimated from the linear equation15yjht=xjht′β+ψln(sjht|g)+ujht,where yjht=ln(sjht/sot), with sjht the observed market share of product j selling at location h on date t,sot is the fraction of people choosing not to consume any of the

Conclusions

In this paper I show that the econometric techniques required to construct efficient estimators allowing for instrumental variable estimation of multi-way error components models in unbalanced data structures involves simple and elegant generalizations of the panel data methods developed by Wansbeek and Kapteyn (1989). By presenting the results in terms of recurrence relations, feasible estimators for models with an arbitrary number of error components can be easily developed. In addition, the

Acknowledgements

Thanks are due to Steve Berry, Ariel Pakes, Nadia Soboleva, and Tom Stoker for helpful comments and suggestions. This paper is a substantially revised version of Chapter 3 of my Ph.D. dissertation which has benefited greatly from the suggestions of two anonymous referees and an associate editor of this journal. Financial Support from Yale University and the Robert M. Leyian Doctoral Fellowship Fund is gratefully acknowledged.

References (36)

  • H. Ahrens et al.

    On two measures of unbalancedness in a one-way model and their relation to efficiency

    Biometric Journal

    (1981)
  • T. Amemiya

    The estimation of the variances in a variance–covariance model

    International Economic Review

    (1971)
  • T. Amemiya et al.

    Instrumental-variable estimation of an error components model

    Econometrica

    (1986)
  • M. Arellano et al.

    Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations

    Review of Economic Studies

    (1991)
  • Baltagi, B.H., 1987. On Estimating from a more general time series cum cross-section data structure. The American...
  • B.H. Baltagi

    Econometric Analysis of Panel Data.

    (1995)
  • Baltagi, B.H., Song, S.H., Jung, B.C., 1999a. Further evidence on the unbalanced nested error component regression...
  • B.H. Baltagi et al.

    The unbalanced nested error component regression model

    Journal of Econometrics

    (2001)
  • Cited by (88)

    • Heteroscedastic stratified two-way EC models of single equations and SUR systems

      2020, Econometrics and Statistics
      Citation Excerpt :

      Several and different reasons, such as attrition or accretion, may produce an incomplete panel data set. Therefore, standard single-equation EC models have been extended to the econometric treatment of unbalanced panel data: Biørn (1981) and Baltagi (1985) discussed the single-equation one-way EC model, Wansbeek and Kapteyn (1989) and Davis (2002) extended such estimation method to the two and multi-way cases. Although often discarded in empirical applications, a relevant issue in panel data estimation is heteroscedasticity, which often occurs when the sample is large and observations differ in “size characteristic” (i.e., the level of the variables).

    • Economic freedom and growth across German districts

      2018, Journal of Institutional Economics
    • Random Effects Models

      2024, Advanced Studies in Theoretical and Applied Econometrics
    • Multi-Dimensional Models for Spatial Panels

      2024, Advanced Studies in Theoretical and Applied Econometrics
    View all citing articles on Scopus
    View full text