Estimating multi-way error components models with unbalanced data structures

doi:10.1016/S0304-4076(01)00087-2

Journal of Econometrics

Volume 106, Issue 1, January 2002, Pages 67-95

https://doi.org/10.1016/S0304-4076(01)00087-2 Get rights and content

Abstract

I develop simple matrix algebra techniques that simplify and unify much of the previous literature on estimating error components models (ECMs). In fact, the simple analytic results provided here are useful for analyzing a very broad set of models with complex error structures. To illustrate the techniques, I develop the algebra for three- and four-way ECMs explicitly. In addition, I provide Monte Carlo simulation evidence on the performance of several estimators for the three-way ECM and estimate the model using data from a retail market where the three dimensions of data variation are products selling in many locations over time.

Introduction

The error components model (ECM) is one of the most frequently used econometric models for panel data. Work by Balestra and Nerlove (1973), Wallace and Hussain (1969), Amemiya (1971), Nerlove 1971a, Nerlove 1971b, Mazodier (1972), and Fuller and Batese (1974) among others, provide a comprehensive analysis of the one- and two-way ECM for balanced panels. In practice, researchers often have to deal with missing observations. Consequently, Baltagi (1985) and Wansbeek and Kapteyn (1989) (henceforth WK), respectively, developed the one- and two-way ECM for unbalanced data.

In a related literature, the nested ECM has also been developed. Most recently, Baltagi et al. (2001) and Antweiler (2001), respectively, develop the two- and three-way nested ECM for unbalanced panels, building on earlier work by Fuller and Battesse (1973). The nested ECM has proven particularly useful for datasets with a natural grouping structure. For example, data on firms may be grouped by industry.

The aim of this paper is to extend the literature in a number of directions. First, I provide a series of mathematical results which allow an elegant analysis of the ECM for unbalanced panel data with an arbitrary number of error components.¹ I illustrate the results by developing estimators for the three-way ECM in the text while results for the four-way model are presented in the appendix.² By developing the paper's results as recurrence relations, the extensions to the five- or more-way ECMs are easily developed, while the one- and two-way ECMs are observed as special cases.

Second, I show that the standard and nested ECMs have the same mathematical structure and therefore the results I present can be applied to both. In fact, the results are useful for estimating a broad array of models with complex error structures. In each case, modeling the correlation structure among the errors is important because it allows efficient estimation of the parameters of interest and consistent estimates of the standard errors of the parameters. Naturally, the latter is crucially important for performing inference correctly.

Third, whenever possible, I relax WK's assumption that all regressors are exogenous, thereby allowing estimation of the general ECM using instrumental variables. Applied demand analysis motivated these extensions of the literature and this aspect of the results is particularly important in that and other contexts.³

The paper proceeds as follows: 2 The fixed effects model, 3 The random effects model develop the multi-way fixed- and random-effects models, respectively, for given estimates of the variance of each error component. In Section 4 quadratic unbiased estimators (QUEs), minimum norm quadratic unbiased estimators (MINQUEs) and maximum likelihood estimators (MLEs) of the variance of each error component are developed. Section 5 provides Monte Carlo evidence on estimators of the three-way ECM. In Section 6, I apply the results by estimating a differentiated product demand equation using a new dataset from a retail market. Section 7 concludes.

Section snippets

The fixed effects model

Consider the three-way error components linear regression model: $y_{ijk} =X_{ijk} β+u_{ijk} where u_{ijk} =μ_{i} +λ_{j} +γ_{k} +ν_{ijk}$ and i=1,…,N₁ might index products selling in location j=1,…,N₂ at date k=1,…,N₃. In that case, N₁ would denote the total number of different products ever observed being sold at any location in any time period. Similarly, N₂ and N₃ would denote the number of locations and time periods for which data are available. Let n denote the total number of observations in the dataset.

Using matrix

The random effects model

Eq. (2) describes the error structure of the three-way random effects model, when $μ, λ, γ$ and ν are random vectors. I assume $μ, λ, γ$ and ν are independent of each other and among themselves with zero means and variances $σ_{1}^{2}, σ_{2}^{2}, σ_{3}^{2}$ and σ₀², respectively. Thus, $Ω≡ E [uu′]=σ_{0}^{2} I_{n} +σ_{1}^{2} Δ_{1} Δ_{1} ′+σ_{2}^{2} Δ_{2} Δ_{2} ′+σ_{3}^{2} Δ_{3} Δ_{3} ′$ .

Given a consistent estimate of $Ω^{−1}$ , a set of instruments Z and the identification condition that E[u(X,y;β₀)|Z,Δ]=0 at the true parameter β₀, we may use the GMM framework developed by Hansen (1982) to

Estimation of the variance components

The expression for $Ω^{−1}$ reduces computational time considerably relative to inverting $Ω$ numerically given any specified values of the variances of the error components. Various methods are available to estimate the variance components themselves. In this section, I develop analysis of variance type Quadratic Unbiased Estimators (QUEs) analogous to WK's QUEs for the two-way model. Then I present the Minimum Norm Quadratic Unbiased Estimator (MINQUE) developed by Rao (1971) and finally Maximum

Design of the Monte Carlo study

While previous Monte Carlo studies of ECMs with unbalanced data have considered designs which emphasize the traditional panel data context with observations missing in only one dimension (individuals are only observed in some time periods), I consider the simplest possible regression model with a three-way error component structure and data sets which are symmetrically unbalanced in multiple dimensions. Specifically, I use the data generating process $y_{ijk} =x_{ijk} β+μ_{i} +λ_{j} +γ_{k} +ν_{ijk},$ where the exogenous

Application: the demand for retail products

Berry (1994) demonstrates that discrete choice random utility models of differentiated product demand may be estimated from the linear equation¹⁵ $y_{jht} =x_{jht} ′β+ψ ln (s_{jht|g})+u_{jht},$ where y_jht=ln(s_jht/s_ot), with s_jht the observed market share of product j selling at location h on date $t, s_{ot}$ is the fraction of people choosing not to consume any of the

Conclusions

In this paper I show that the econometric techniques required to construct efficient estimators allowing for instrumental variable estimation of multi-way error components models in unbalanced data structures involves simple and elegant generalizations of the panel data methods developed by Wansbeek and Kapteyn (1989). By presenting the results in terms of recurrence relations, feasible estimators for models with an arbitrary number of error components can be easily developed. In addition, the

Acknowledgements

Thanks are due to Steve Berry, Ariel Pakes, Nadia Soboleva, and Tom Stoker for helpful comments and suggestions. This paper is a substantially revised version of Chapter 3 of my Ph.D. dissertation which has benefited greatly from the suggestions of two anonymous referees and an associate editor of this journal. Financial Support from Yale University and the Robert M. Leyian Doctoral Fellowship Fund is gratefully acknowledged.

References (36)

W. Antweiler
Nested random effects estimation in unbalanced panel data
Journal of Econometrics
(2001)
M. Arellano et al.
Another look at the instrumental variable estimation of error-components models
Journal of Econometrics
(1995)
P. Balestra et al.
Pooling cross section and time series data in the estimation of a dynamic model: the demand for natural gas
Journal of Econometrics
(1973)
B.H. Baltagi
Simultaneous equations with error components
Journal of Econometrics
(1981)
B.H. Baltagi
Pooling cross-sections with unequal time series lengths
Economics Letters
(1985)
B.H. Baltagi et al.
Incomplete panels: a comparative study of alternative estimators for the unbalanced one-way error component regression model
Journal of Econometrics
(1994)
P. Davis
Empirical models of demand for differentiated products
European Economic Review (papers and proceedings)
(2000)
W.A. Fuller et al.
Estimation of linear functions with crossed-error structure
Journal of Econometrics
(1974)
J.R. Magnus
Maximum likelihood estimation of the GLS model with unknown parameters in the disturbance covariance matrix
Journal of Econometrics
(1978)
T. Wansbeek et al.
Estimation of the error-components model with incomplete panels
Journal of Econometrics
(1989)

H. Ahrens et al.

On two measures of unbalancedness in a one-way model and their relation to efficiency

Biometric Journal

(1981)

T. Amemiya

The estimation of the variances in a variance–covariance model

International Economic Review

(1971)

T. Amemiya et al.

Instrumental-variable estimation of an error components model

Econometrica

(1986)

M. Arellano et al.

Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations

Review of Economic Studies

(1991)

Baltagi, B.H., 1987. On Estimating from a more general time series cum cross-section data structure. The American...

B.H. Baltagi

Econometric Analysis of Panel Data.

(1995)

Baltagi, B.H., Song, S.H., Jung, B.C., 1999a. Further evidence on the unbalanced nested error component regression...

B.H. Baltagi et al.

The unbalanced nested error component regression model

Journal of Econometrics

(2001)

Cited by (88)

Diversification of procedural and administrative costs and innovation: Some firm-level evidence
2021, International Journal of Innovation Studies
We examine the diversification of administrative and procedural costs on patent stock using a large dataset from the European Patent Office with 15,000 firms for the period between 1995 and 2015. The results reveal that administrative and procedural costs are significant for firm-level patenting activity. However, not all administrative and procedural costs have equal effects. Higher administrative costs often encourage patent application and validation by solving the adverse selection problem and short-run opportunism, as well as other sources of asymmetric information. The effective administration of intellectual property law and low-cost enforcement are found to considerably foster patenting activity. The effects are robust for various mis-specification checks and do not disappear once country-level research and development infrastructure proxies are controlled for. The extreme bounds of administrative and procedural costs are computed across more than 5 billion regressions, and the sizeable impact of administration on patent application and validation outcomes is confirmed.
Heteroscedastic stratified two-way EC models of single equations and SUR systems
2020, Econometrics and Statistics
Citation Excerpt :
Several and different reasons, such as attrition or accretion, may produce an incomplete panel data set. Therefore, standard single-equation EC models have been extended to the econometric treatment of unbalanced panel data: Biørn (1981) and Baltagi (1985) discussed the single-equation one-way EC model, Wansbeek and Kapteyn (1989) and Davis (2002) extended such estimation method to the two and multi-way cases. Although often discarded in empirical applications, a relevant issue in panel data estimation is heteroscedasticity, which often occurs when the sample is large and observations differ in “size characteristic” (i.e., the level of the variables).
A relevant issue in panel data estimation is heteroscedasticity, which often occurs when the sample is large and individual units are of varying size. Furthermore, many of the available panel data sets are unbalanced in nature, because of attrition or accretion, and micro-econometric models applied to panel data are frequently multi-equation models. The general least squares estimation of the heteroscedastic stratified two-way error component (EC) models of both single equations and seemingly unrelated regressions (SUR) systems (with cross-equation restrictions) on unbalanced panel data is considered. The derived heteroscedastic estimators of both single equations and SUR systems improve the estimation efficiency.
A simple method to estimate large fixed effects models applied to wage determinants
2019, Labour Economics
Models with high-dimensional sets of fixed effects are frequently used to examine, among others, linked employer-employee data, student outcomes and migration. Estimating these models is computationally difficult because of the high-dimensional design matrix. I present a simple algorithm to compute the OLS estimates of large two-way fixed effects (TWFE) and match effect models including estimates of the fixed effects. The algorithm simplifies specification tests and variance estimation even with multi-way clustered errors. An application using German linked employer-employee data illustrates key advantages of the algorithm: Omitting match effects substantially affects estimates including the gender wage gap. Analyzing the estimated fixed effects suggest that firm fixed effects are the main channel through which job transitions drive wage dynamics, which underlines the importance of firm heterogeneity for labor market dynamics.
Economic freedom and growth across German districts
2018, Journal of Institutional Economics
Random Effects Models
2024, Advanced Studies in Theoretical and Applied Econometrics
Multi-Dimensional Models for Spatial Panels
2024, Advanced Studies in Theoretical and Applied Econometrics

View all citing articles on Scopus

View full text

Estimating multi-way error components models with unbalanced data structures

Abstract

Introduction

Section snippets

The fixed effects model

The random effects model

Estimation of the variance components

Design of the Monte Carlo study

Application: the demand for retail products

Conclusions

Acknowledgements

Journal of Econometrics

Journal of Econometrics

Journal of Econometrics

Journal of Econometrics

Economics Letters

Journal of Econometrics

European Economic Review (papers and proceedings)

Journal of Econometrics

Journal of Econometrics

Journal of Econometrics

On two measures of unbalancedness in a one-way model and their relation to efficiency

Biometric Journal

The estimation of the variances in a variance–covariance model

International Economic Review

Instrumental-variable estimation of an error components model

Econometrica

Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations

Review of Economic Studies

Econometric Analysis of Panel Data.

The unbalanced nested error component regression model

Journal of Econometrics