1 Introduction

In recent years fixed effects model specifications have enjoyed great popularity in the panel gravity literature as a way to take into account the specific three-dimensional nature of international trade flow data sets.Footnote 1 These models, differing in the specification of the fixed effects, focus on unobserved heterogeneity, but ignore any potential cross-sectional dependence due to network interactions. Faced with this problem, we propose a panel gravity model that examines multiple network interaction structures, using Bayesian model probabilities to determine those most consistent with the sample data. This is accomplished with the use of computationally efficient Markov Chain Monte Carlo (MCMC) estimation methods that produce a Monte Carlo integration estimate of the log-marginal likelihood that can be used for model comparison.

Empirical international trade modeling has focused on multilateral resistance terms based on Anderson and van Wincoop (2003, 2004), which Feenstra (2002, 2004) argued could be proxied by inclusion of origin- and destination-specific fixed effects parameters in the model specification. Ranjan and Tobias (2007) model the effects parameters as a (semi-parametric) function of an exogenous variable measuring bilateral trade resistance between origin-destination dyads. Koch and LeSage (2015) make a convincing argument that all of these models relax multilateral aspects of the dependence structure in favor of bilateral dependence, or a specification based on heterogeneity rather than simultaneous/multilateral dependence. They argue this is important given Behrens et al. (2012) demonstration that model specifications that control for regional heterogeneity do not necessarily capture the full structure of interdependence between trade flows implied by theory. Behrens et al. (2012) derive a dual version of the Anderson and van Wincoop (2003) model using a model based on quantity competition that also implies a simultaneous structure of interdependence between observed trade flows. Koch and LeSage (2015) show how linearizing the system of nonlinear multilateral resistance relationships around a free trade world leads to a structured random effects interdependence structure that can be used as a Bayesian prior to produce statistical estimates of the inward and outward multilateral resistance indices. They argue their statistical approach has advantages over the non-stochastic numerical approach used by Anderson and van Wincoop (2003) to solve for the multilateral resistance indices.

In contrast to the empirical trade literature, the regional science literature regarding modeling of flows between regions has placed more emphasis on use of spatial autoregressive models that include spatial lags of flows from regions neighboring the origin and destination, an idea promoted by LeSage and Pace (2008), who devote an entire chapter to these models in their book (LeSage and Pace 2009, Chapter 8). This paper extends this type of spatial autoregressive models to a panel data model setting. The literature discussed above did not deal with panel data models where we have a series of flow matrices covering numerous time periods, but rather more conventional gravity models consisting of a single flow matrix.

In a panel data model setting, distance as well as sociocultural factors (which we can view as generalized distance variables) are generally time invariant, so they are treated as fixed effects. Balazsi et al. (2018) explore econometric implications of using a host of alternative multidimensional fixed effects in panel gravity models, but assume trade flows to be independent. We apply two of the more widely used multidimensional fixed effects transformations from the empirical trade literature to a panel of \(N=70\) countries trade flows covering the \(T=38\) years from 1963 to 2000, in a model specification that allows for the presence of simultaneous cross-sectional dependence.

One approach uses N origin and N destination country fixed effects plus T time-specific effects proposed by Mátyás (1997) as an extension of the conventional fixed effects panel data model (e.g., Baltagi 2005) to the multidimensional situation that arises in the case of gravity models. These models take an \(N^2 T \times 1\) vector of dependent variables reflecting the matrix of trade flows between the N countries (assuming flows between all countries) at each time period, resulting in a dummy matrix of fixed effects with column rank of \(2N + T-2\). The second approach makes use of fixed effects proposed by Cheng and Wall (2005) that introduce fixed effects for origin-destination dyads as well as time periods, resulting in a dummy matrix with column rank of \(N^2 + T -1\), frequently adopted in the empirical trade literature.

Apart from accommodating heterogeneity in flows using fixed effects, the dependent variable vector of \(N^2 \times 1\) trade flows for each time period are assumed to be independent, so flows between countries that have a common currency, language, membership in trade organizations, border or colonial ties are no more likely than flows between countries having nothing in common. Cross-sectional dependence in flows suggests that flows between countries with sociocultural similarity are likely to exhibit dependence as opposed to independence. We set forth a model specification that allows for this type of dependence in flows across the \(N^2 T\) country-time dyads, along with computationally efficient MCMC estimation methods.Footnote 2

The nature of dependence that we model would be better labeled network dependence rather than spatial or cross-sectional dependence, because we introduce dependence between network nodes involving origin- and destination-dyads as well as covariance across these. We use the terms network and cross-sectional dependence interchangeably, but note that the network dependence specification introduced here reflects a special case of cross-sectional dependence that can arise in the case of origin-destination flows that has not received a great deal of attention in the literature.

An innovative aspect of our MCMC estimation approach is use of Metropolis-Hastings guided samples from the joint posterior distribution of the dependence parameters to construct a Monte Carlo integration estimate of the log-marginal likelihood useful for model comparison. The MCMC estimation approach allows for estimation and posterior inference on a vector of dependence parameters that determines the relative importance of network dependence. In our case, we rely on Markov Chain Monte Carlo sampling to estimate the model parameters with the dependence parameters sampled using a Metropolis-Hastings procedure. Since this approach produces draws of the dependence parameters that are steered by Metropolis-Hastings accept/reject decisions to areas of high density of the joint posterior, we can produce an efficient Monte Carlo integration of the log-marginal likelihood.Footnote 3

Another methodological innovation is use of multiple network interaction structures constructed to reflect spatial proximity between countries, as well as numerous types of sociocultural proximity such as common currency, language, colonial ties, and so on. An equally weighted combination of these multiple weight matrices is used to form a single weight matrix. This approach allows us to treat sociocultural factors as sources of network dependence in the panel gravity model.

The remainder of this paper is organized as follows. Section 2 outlines conventional panel gravity models as used in the empirical trade literature, along with a discussion of the two multidimensional fixed effects specifications that we explore.Footnote 4 Section 3 introduces network dependence in a conventional multidimensional fixed effects model specification, and describes the prior setup of the MCMC estimation approach. Section 4 sets forth computational challenges to estimation along with MCMC procedures to overcome these challenges. Section 5 applies our model to panel data on trade flows between 70 countries covering the 38 years from 1963 to 2000. In the application of the model we consider combinations of multiple sociocultural connectivity structures, that can be used in conjunction with log-marginal likelihood estimates to determine the relative importance of each type of connectivity. We find strong evidence of network dependence in trade flows pointing to network spillover effects, and suggesting that ignoring the presence of this type of cross-sectional dependence will result in biased estimates from conventional trade flow model specifications. Section 6 concludes the article and a technical appendix provides further details on the MCMC estimation approach.

2 Multi-indexed panel gravity models

In multidimensional panel data sets, the dependent variable of a panel gravity model is observed along three indices, such as \(y_{ijt}, i=1, \ldots , N_i, j=1,\ldots , N_j, t=1,\ldots , T\).

2.1 The Mátyás model

Mátyás (1997) made an early attempt to introduce multidimensional fixed effects for (log-linear) gravity model specifications such as that in (1).Footnote 5 The dependent variable \(y_{ijt}\) in (1) reflects an \(N^2 T \times 1\) vector of (logged) trade flows between N countries i and j at time t, so \(i=1,\ldots ,N, j=1,\ldots ,N\), and \(t=1,\ldots ,T\).

$$\begin{aligned} y_{ijt}= \,& X_{ijt} \ \beta + \alpha _i + \gamma _j + \lambda _t + \varepsilon _{ijt}, \end{aligned}$$
(1)

where the \(\alpha _i\), \(\gamma _j\) parameters are the origin and destination country-specific fixed effects, and \(\lambda _t\) are time-specific fixed effects. \(\beta\) is a \(2K \times 1\) vector of parameters on the \(N^2 T \times 2K\) (logged) covariates \(X_{ijt}=(X_{it}^1, X_{jt}^1, \ldots , X_{it}^K ,X_{jt}^K)\) with \(X_{it}^k\) (\(X_{jt}^k\)) denoting the kth measure for the economic size of the origin country i (destination country j) in the country dyads (ij) at time t. We note that distance between the countries is time-invariant and not in the set of covariates. It is assumed that \(\varepsilon _{ijt}\) are normal i.i.d. idiosyncratic disturbances with zero mean and scalar (\(\sigma ^2_{\varepsilon }\)) variance.Footnote 6

2.2 The Cheng and Wall model

There are of course other specifications for the fixed effects. For example, Egger and Pfaffermayr (2003) propose bilateral specific fixed effects \(\gamma _{ij}\). A variant of this, proposed by Cheng and Wall (2005), popular in the empirical trade literature is shown in (2),

$$\begin{aligned} y_{ijt}= \, & {} X_{ijt} \beta + \gamma _{ij} + \lambda _t + \varepsilon _{ijt}, \end{aligned}$$
(2)

or in matrix form \(y = X\beta +\Delta _\gamma \gamma + \Delta _\lambda \lambda + \varepsilon\) with corresponding dummy design matrices \(\Delta _\gamma = (\begin{array}{c} I_N \otimes I_N \otimes \iota _T \end{array})\) and \(\Delta _\lambda = ( \begin{array}{c} \iota _N \otimes \iota _N \otimes I_T \end{array})\). Balazsi et al. (2018) point out that the model in (1) represents a special case of that in (2), and there is an analogy of this 3-dimensional (3D) situation in (2) to 2-dimensional (2D) panel data models, where individuals in the 2D situation are treated as (ij) pairs in the 3D setting. In other words, individual effects are now assigned to (ij) dyads.

We note that the model specifications in (1) and (2) assume that the dependent variable vector of \(N^2 \times 1\) trade flows for each time period are statistically independent, so flows between countries that have a common currency, language, membership in trade organizations, border or colonial ties are no more likely than flows between countries having nothing in common. Network dependence in flows suggests that flows between countries with sociocultural similarity are likely to exhibit dependence as opposed to independence. In the next section, we set forth a model specification that allows for this type of dependence in flows across the \(N^2 T\) country-time dyads.

3 The econometric framework

3.1 The model

We set forth an extension of the conventional panel gravity model that allows for origin-, destination- and origin-destination-based network dependence. The matrix expressions in (3) represent the network dependence panel gravity model for origin-destination flows introduced in matrix form.

$$\begin{aligned} y= \, & {} \rho _o I_T \otimes (W \otimes I_{N}) y + \rho _d I_T \otimes (I_{N} \otimes W ) y + \rho _{w} I_T \otimes (W \otimes W) y +\\&X_{ot} \beta _o + X_{dt} \beta _d + \Delta _\gamma \gamma + \Delta _\lambda \lambda +\varepsilon , \end{aligned}$$
(3)

where y is the \(N^2 T \times 1\) dependent variable vector of origin-destination flows for each time period, organized with t being the slow index for elements \(y_{ijt}\) in the vector y. The \(N^2 T \times K\) matrix \(X_{ot}\) contains origin-specific covariates with the associated \(K \times 1\) parameter vector \(\beta _o\), where \(X_{ot} = X_t \otimes \iota _N\), and similarly for \(X_{dt} = \iota _N \otimes X_t\) with \(X_t\) being an \(N \times K\) matrix of covariates measuring the (economic) size of each destination country at time t, with associated \(\beta _d\). The \(N^2 T \times 1\) vector \(\varepsilon\) represents the normally distributed i.i.d. scalar variance disturbances. The matrices of fixed effects \(\Delta _\gamma \gamma + \Delta _\lambda \lambda\) are as defined after (2), or we could use fixed effects from Mátyás (1997) shown in (1) as well.

The model in (3) indicates that flows at each time period \(t=1,\ldots ,T\) exhibit dependence on flows of countries that are sociocultural neighbors to the origin country captured by the \(N^2T \times 1\) vector \(I_T \otimes (W \otimes I_{N}) y\) with the associated scalar parameter \(\rho _o\) measuring the strength of that dependence. The matrix W is an \(N \times N\) weight matrix that defines neighboring countries based on sociocultural connectivity structures. A neighboring country is indicated by a non-zero (ij) element in the \(N \times N\) matrix W, which has zeros on the main diagonal. The matrix W is normalized to have row-sums of unity, resulting in the \(N^2 \times 1\) vector \((W \otimes I_N) y_t\) reflecting a linear combination of trade flows at time t from countries that are sociocultural neighbors to the origin country.

The model also allows for dependence of flows in each time period from countries that are sociocultural neighbors to the destination country, captured by the vector \(I_T \otimes (I_{N} \otimes W ) y\), with associated scalar parameter \(\rho _d\), and we note that this vector relies on the same matrix W used to define (sociocultural) neighbors. The matrix \((I_N \otimes W\)) defines neighbors to the destination, the matrix \((W \otimes I_{N})\) identifies those to the origin, when the vector of flows for time t arises from a conventional \(N \times N\) origin-destination flow matrix, organized with dyads (ij) representing flows from origin j to destination i.

Another type of dependence is also included in the model, reflected by the \(N^2 T \times 1\) vector \(I_T \otimes (W \otimes W) y\) and associated scalar parameter \(\rho _w\), which captures dependence of flows from countries that are sociocultural neighbors to both the origin and destination countries. Following ideas expressed in LeSage and Pace (2008) we motivate this type of dependence using the (cross-sectional) specification in (4), where they argue that the matrix A can be viewed as a filter. For notational convenience in (4) (as in the sequel), we have defined the matrix Z to include \(X_{ot}, X_{dt}\) and the fixed effects, with the parameters \(\beta\) adjusted appropriately.

$$\begin{aligned} A \ y&=\, \alpha \ \iota _{N^2} + Z \beta + \varepsilon , \\ A&= \,(I_{N^2} - \rho _d W_d)(I_{N^2} - \rho _o W_o) \\&= \,(I_{N^2} - \rho _d W_d - \rho _o W_o + \rho _w W_w), \\ W_o&=\, W \otimes I_N, \\ W_d&= \,I_N \otimes W, \\ W_w&= \, {} W_d \otimes W_o = W_o \otimes W_d = W \otimes W. \end{aligned}$$
(4)

The argument is that the existence of origin- and destination-based dependence between trade flows (\(W_o y, W_d y\)), logically implies a covariance between these two types of dependence which is reflected in \(W_w y\). LeSage and Pace (2008) note that this filter implies a restriction that \(\rho _w = -\rho _o \rho _d\), but argue this restriction needs not be imposed during estimation, so we address the more general case here and allow for an unrestricted parameter \(\rho _w\).Footnote 7

3.2 MCMC estimation

The model can be written in a computationally efficient form as shown in (5).

$$\begin{aligned} {\tilde{y}} \ \omega= & \,{} Z \beta + \varepsilon , \ \ \varepsilon \sim {\mathcal {N}}(0, \sigma ^2 I_{N^2 T}), \\ {\tilde{y}}= & \,{} \left[ \begin{array}{cccc} y&I_T \otimes (W \otimes I_N) y&I_T \otimes (I_N \otimes W) y&I_T \otimes (W \otimes W) y \end{array} \right] , \\ \omega= \,& {} \left( \begin{array}{cccc} 1&-\rho _o&-\rho _d&-\rho _w \end{array} \right) '. \end{aligned}$$
(5)

A key feature of \({\tilde{y}}\) is that this expression separates dependence parameters to be estimated from sample data describing the simultaneous dependence, with the scalar dependence parameters in the vector \(\omega\). We assume normally distributed, zero mean, constant variance (\(\sigma ^2\)) disturbances.


Likelihood and priors

The likelihood is shown in (6), where \(|R(\omega )|\) is the determinant that depends on the dependence parameters in \(\omega\), as does the expression \(e'e\).

$$\begin{aligned} f(y;\omega , \sigma ^2, \beta )= \,& {} |R(\omega )| \;(2 \pi \sigma ^2)^{-N^2T/2} \text{ exp }\left( -\frac{1}{2 \sigma ^2} e'e\right) , \\ e= & \,{} {\tilde{y}} \ \omega - Z\beta , \\ R(\omega )= & \,{} I_{N^2 T} -\rho _o I_T \otimes (W \otimes I_N) - \rho _d I_T \otimes (I_N \otimes W) - \rho _w I_T \otimes (W \otimes W). \end{aligned}$$
(6)

To ensure that \(R(\omega )\) is non-singular, restrictions need to be placed on the dependence parameters \(\omega\) to ensure that \(R(\omega )^{-1}\) exhibits an underlying stationary process. Specifically, \(\rho _o + \rho _d + \rho _w < 1\). The parameter space \(\Xi\) for the set of parameters \((\omega , \sigma ^2, \beta )\) is: \(\Xi := \Xi _{\omega } \times \Xi _{\sigma ^2} \times \Xi _{\beta } = \Xi _{-1,1} \times (0, \infty ) \times {\mathcal {R}}^{2K+M}\) (where M is the number of fixed effects included in the matrix Z).Footnote 8

Since our focus is on large samples involving \(N^2T\) observations, we rely on uninformative priors for the parameters \(\beta\) and \(\sigma ^2\), as these would not likely impact posterior estimates. By uninformative prior we mean a “flat” prior that is within the range of valid values for these parameters, so all intervals are of equal length, indicating all values are equally likely.Footnote 9 Since the dependence parameters in \(\omega\) are a focus of inference, uniform priors for these dependence parameters are used, which must obey a stability constraint.

Mathematically, the flat or uniform prior for \(\omega , \beta\) can be represented as \(p(\omega ) \propto 1, p(\beta ) \propto 1\). The noise variance \(\sigma ^2\), is restricted to positive values, with a flat prior assigned to the log-transformed value which is denoted \(p(\sigma ^2) \propto 1/\sigma ^2\). Given this prior information, and prior independence, one can write: \(p(\omega ) \times p(\sigma ^2) \times p(\beta ) \propto 1/\sigma ^2\). While this flat prior is improper since the integral over the parameter space \(\Xi\) is not finite, the joint posterior distribution for the dependence parameters \(\omega\) is proper under relatively unrestrictive assumptions. This joint posterior is derived by analytically integrating out the parameters \(\beta , \sigma ^2\), with details regarding this integration found in Hepple (1995a, b).

To derive the joint posterior for the dependence parameters \(\omega\), begin with the full joint posterior \(p(\omega , \beta , \sigma ^2)\) and analytically integrate out \(\beta , \sigma ^2\). This relies on standard techniques from the Bayesian regression literature (Zellner 1971). Combining the likelihood function in (7) with the flat priors (and ignoring the constant \(2 \pi ^{N^2T/2}\)) leads to the joint posterior in (8), from which \(\sigma\) can be integrated out, leading to (9), where \(\Gamma (.)\) denotes the Gamma function.

$$\begin{aligned} f(y;\omega , \sigma ^2, \beta )\propto & \,{} |R(\omega )| (2 \pi \sigma ^2)^{-N^2T/2} \ \text{ exp }\left( -\frac{1}{2 \sigma ^2} \ \omega ' u'u \ \omega \right) , \end{aligned}$$
(7)
$$\begin{aligned} u= \,& {} {\tilde{y}} - Z \beta _d, \\ \beta _d= & \,{} (Z'Z)^{-1} Z' {\tilde{y}}, \end{aligned}$$
$$\begin{aligned} p(\omega , \beta | y)\propto & \,{} |R(\omega )| \int _0^\infty \sigma ^{-(N^2T+1)}\ \text{ exp } \left( - \frac{1}{2\sigma ^2} \ \omega ' u' u \ \omega \right) d \sigma \end{aligned}$$
(8)
$$\begin{aligned}= & \,{} 2^{(N^2T-2)/2} \;\Gamma (N^2T/2) \;|R(\omega )| \;\left( \omega ' \ u' u \ \omega \right) ^{-N^2T/2}. \end{aligned}$$
(9)

To integrate out the \(2K+M\) different \(\beta\) parameters, properties of the multivariate \(t-\)distribution in conjunction with ‘completing the square’ are used (see Zellner 1971). This leads to a joint distribution for the dependence parameters \(\omega\) shown in (10), with the term \(|Z'Z|^{-1/2}\) and the exponent \(-(N^2T-2K-M)/2\) arising from this integration (see Hepple 1995a, b). This expression must be numerically integrated to arrive at the log-marginal likelihood for these models.

$$\begin{aligned} p(\omega | y)= &\, {} 2^{(N^2T-2)/2} \;\Gamma (N^2T/2)\; |R(\omega )|\; |Z'Z|^{-1/2} \;(\omega ' u'u \ \omega )^{-(N^2T-2K-M)/2}. \end{aligned}$$
(10)

Dittrich et al. (2017) show there is no problem regarding posterior propriety for the cross-sectional model case and this result carries over to our model in a relatively straightforward and trivial way. The conditional distributions for the model parameters \(\beta , \sigma ^2, \omega\) needed to implement MCMC estimation are described in “Appendix A”.

A motivation for having analytically integrated out the parameters \(\beta , \sigma ^2\) is that further integration of the joint conditional posterior over the set of dependence parameters in \(\omega\), yields the log-marginal likelihood for these models. We can use Monte Carlo integration to accomplish this task. Monte Carlo integration evaluates the expression to be integrated using random draws of the parameter values. A drawback to this approach is inefficiency because many of the random draws for the parameters are not in areas of high density of the function being integrated. In our case, the Metropolis-Hastings sampling procedure used to produce draws of the dependence parameters steers these parameter values to areas of high density of the joint posterior (described later in Sect. 4.1). This allows us to produce an efficient Monte Carlo integration of the log-marginal likelihood.

Given an estimate of the log-marginal likelihood for a model \(m_i\) (\(log \, m_i\)), we can calculate: \(prob(m_i) = \text{ exp }(log \, m_i) / \sum _{i=1}^Q \text{ exp }(log \, m_{i})\) in the case of Q different models. Of course, there is a great deal of interest in comparing alternative models, for example, models based on different weight matrices, or different fixed effects specifications.

4 Computational challenges to estimating the model

One issue that arises when considering estimation of the model in (3) is that multiple dependence parameters \(\rho _o, \rho _d, \rho _w\) would require use of a multivariate optimization routine to produce estimates based on maximum likelihood. It is also the case that the dependence parameters are (well) defined over the \((-1,1)\) interval, meaning that constrained optimization would be required to ensure values \(-1< \rho _o + \rho _d + \rho _w < 1\).Footnote 10

Another challenge to maximum likelihood estimation is the log-determinant term that arises in the (log) likelihood function, specifically (log): \(|I_{N^2 T} - \rho _o I_T \otimes (W \otimes I_N) - \rho _d I_T \otimes (I_N \otimes W ) - \rho _{w} I_T \otimes (W \otimes W)|\). In the case of conventional spatial regression models involving a single weight matrix, there is a great deal of literature on approaches to efficiently calculating or approximating the log-determinant term that appears in the (log) likelihood \(|I_N - \rho W|\), (see LeSage and Pace 2008, Chapter 4). These approaches are not directly applicable to the model specifications considered here, complicating maximum likelihood estimation, since the log-determinant expression needs to be evaluated for multiple dependence parameter values during optimization. In the case of Markov Chain Monte Carlo estimation, the log-determinant term appears in the conditional distribution for the dependence parameters requiring multiple evaluations during sampling and could be demanding and quickly hit a computational bottleneck. “Appendix B” shows how to side step this computational bottleneck.

Because of the issues outlined above, we set forth estimation based on Markov Chain Monte Carlo, with no prior distributions assigned to the parameters \(\beta , \sigma ^2\). Parameter restrictions are imposed on the dependence parameters during MCMC sampling using methods described in Sect. 4.1. Since emphasis is on modeling situations involving large samples of observations, prior information would not play a role in determining posterior estimates of the parameters, so MCMC is used as a computational device to produce estimates that should be identical to those from maximum likelihood estimation. MCMC estimation involves sequentially sampling each parameter (or set of parameters) from their conditional distributions (or joint conditional distribution in the case of a set of parameters). Expressions for the conditional distributions are frequently easier to calculate than those required to evaluate the (log) likelihood, which is true for the model specifications considered here.

Another aspect of our network dependence model relates to proper interpretation of the partial derivative impacts on the dependent variable vector arising from changes in the explanatory variables, e.g., \(\partial E(y) / \partial X^r\) for the rth explanatory variable. We take this issue up in Sect. 4.2. MCMC estimation proceeds by sampling sequentially from the conditional distributions of each parameter (or set of parameters).

4.1 Block sampling the dependence parameters

One motivation for working with the joint conditional posterior distribution for the dependence parameters is the need to impose stability restrictions on these parameters. Specifically, \(-1< \rho _o + \rho _d + \rho _w < 1\). Working with the joint conditional posterior distribution for these parameters allows us to adopt a block sampling Metropolis-Hastings scheme for the dependence parameters. Block sampling means that a vector of dependence parameters in \(\omega\) is proposed and compared to the current vector of dependence parameters. The proposed vector is either accepted or rejected. This allows proposals of dependence parameters that obey the stability restriction, so any vectors that are accepted by the Metropolis-Hastings procedure will always obey the needed restrictions.

Debarsy and LeSage (2018) set forth a block-sampling approach that proposes a vector of candidate values for a similar set of dependence parameters in the context of a model involving a convex combination of weight matrices. Dependence parameters that do not meet the stability restriction can be rejected, so any values accepted are consistent with stability.

The conditional distributions for the current and proposed dependence vectors that we can label \(\omega ^c, \omega ^p\) are evaluated with a Metropolis-Hastings step used to either accept or reject the newly proposed vector \(\omega ^p\). Block sampling the dependence parameter vector \(\omega\) has the virtue that accepted vectors will obey any restrictions and reduce autocorrelation in the MCMC draws for these parameters. However, block sampling is known to produce lower acceptance rates which may require more MCMC draws in order to collect a sufficiently large sample of draws for posterior inference regarding \(\omega\). To address this issue, Debarsy and LeSage (2018) propose a hybrid approach that begins with a reversible jump sampling procedure and switches to a tuned random-walk proposal procedure for proposing vectors \(\omega\) after some initial number of start-up samples are drawn (see also LeSage et al. 2019).

We rely here on a reversible jump procedure to produce proposal values for the vector of parameters \(\rho _o, \rho _d, \rho _w\). In particular, we rely on a three-headed coin flip for each scalar parameter. By this we mean a uniform random number on the open interval coin flip = U(0, 1), with head #1 equal to a value smaller or equal to 1/3, head #2 a value larger than 1/3, but smaller or equal to 2/3 and head #3 a value larger than 2/3 and smaller than one. Given a head #1 result, we set a proposal \(\rho _o^p\) using a uniform random draw on the open interval \((-1<\rho _o^p< \rho _o^c)\), where \(\rho _o^c\) is the current value. A head #2 results in setting the proposal value equal to the current value \((\rho _o^p = \rho _o^c)\), while a head #3 selects a proposal value based on a uniform random draw on the open interval \((\rho _o^c<\rho _o^p < 1)\). Of course, a similar approach is used to produce proposals for the parameters \(\rho _d, \rho _w\). Proposed vectors of these parameters inconsistent with the stability restrictions are eliminated via rejection sampling.

The reversal jump approach to proposing the block of dependence parameters has the virtue that accepted vectors will obey the stability restriction and will also reduce autocorrelation in the MCMC draws for these parameters. However, proposals from the reversible jump procedure based on the large intervals between \((-1<\rho _o^c)\) and \((\rho _o^c<1)\) will not produce candidates likely to be accepted when these parameters are estimated with a great deal of precision, as would be the case for problems involving large \(N^2 T\). This can result in a failure to move the chain adequately over the parameter space. To address this issue, standard deviations, \(\sigma _{\rho _o}, \sigma _{\rho _d}, \sigma _{\rho _w}\) for each parameter are calculated based on the first 1000 draws (and updated thereafter using an interval of \(m=1000\) draws). These are used in a tuned random-walk procedure to produce candidate/proposed values. Specifically, we use a tuning scalar c for each parameter that is adjusted based on acceptance rates for each parameter. This is used in conjunction with the standard deviations to produce proposals: \(\rho _o^p = \rho _o^c + c \ {\mathcal {N}}(0,1) \ \sigma _{\rho _o}\), with the same approach used for \(\rho _d, \rho _w\).

The proposed estimation method relies on a great many approximations, raising the issue of whether resulting estimates have desirable properties such as small bias and mean-squared error as well as good coverage. By coverage we mean that the (say) 2.5% and 97.5% intervals from the empirical distributions of the effects estimates on which practitioners base conclusions regarding statistical significance of the effects estimates cover the true values 95% of the time. Debarsy and LeSage (2020) provide Monte Carlo evidence on this type of procedure.

4.2 Interpreting the model

The partial derivatives used to interpret how changes in (say the rth) explanatory variable of the model impacts changes in the dependent variable vector are non-linear matrix expressions. The sequence of partial derivatives for this model are shown in (11), where we record the \(N \times N\) matrices of changes in (logged) flows arising from changing the rth variable in each country i \(X_i^{r}\) using \(Y_i, i=1,\ldots ,N\), to denote the \(N \times N\) flow matrices associated with changing the rth variable in each country i. We note that because the matrix W does not change over time in our static panel data model, we have a set of \(N^2 \times N\) matrices describing the partial derivative impacts.

$$\begin{aligned} \left( \begin{array}{c} \partial Y_1 / \partial X_1^r \\ \partial Y_2 / \partial X_2^r \\ \vdots \\ \partial Y_N /\partial X_N^r \end{array} \right)= & \,{} (I_{N^2} - \rho _o W_o - \rho _d W_d - \rho _w W_w)^{-1} \left( \begin{array}{c} Jd_1 \;\beta _d^r + Jo_1 \;\beta _o^r \\ Jd_2 \; \beta _d^r + Jo_2 \; \beta _o^r \\ \vdots \\ Jd_N \; \beta _d^r + Jo_N \; \beta _o^r \end{array} \right) . \end{aligned}$$
(11)

In (11), \(Jd_i\) (\(i=1,\ldots ,N\)) is an \(N \times N\) matrix of zeros with the ith row equal to \(\iota _N' \beta _d\), and \(Jo_i\) is an \(N \times N\) matrix of zeros with the ith column equal to \(\iota _N \beta _o\), where \(\beta _o\) and \(\beta _d\) denote parameters associated with origin and destination size measures. We have N sets of \(N \times N\) outcomes, (one for each change in \(X_i^r, i=1,\ldots ,N\)) resulting in an \(N^2 \times N\) matrix of partial derivatives reflecting the total effect on flows from changing the rth characteristic of all N regions, which in the gravity modeling literature is labelled the total effect (LeSage and Thomas-Agnan 2015, and LeSage and Fischer 2016).

LeSage and Thomas-Agnan (2015) provide a motivation for the expression in (11), noting that changes in the (size) characteristics of a single country i will (potentially) produce impacts on all elements of the \(N \times N\) flow matrix. Intuitively, a change in (say) income of a single country can impact trade flows involving immediate trading partners, as well as, trade flows involving partners to the trading partners, partners to the partners of the trading partners, and so on, potentially impacting the entire \(N \times N\) flow matrix.

Since regression models typically consider changes in characteristics (say income) of all \(i=1,\ldots ,N\) observations/countries, this produces a set of N different \(N \times N\) matrices of partial derivatives associated with changes in each explanatory variable in the model. While LeSage and Thomas-Agnan (2015) propose scalar summary measures for the various types of effects that average over certain dimensions of the sequence of N different \(N \times N\) matrices, we adopt a simpler strategy here for producing scalar summary measures of the partial derivative impacts. We take an average of the diagonal elements of the N different \(N \times N\) matrices in (11) as a measure of own-partial derivative impacts reflecting own-country changes in flows arising from changes in (say) the typical country’s income. And use an average of the cumulative off-diagonal elements from each row of the N different \(N \times N\) matrices in (11) to summarize network effects arising from changes in (say) income in a typical country. Network effects are measured by a scalar summary measure of the spillover impacts on other countries associated with changes in an explanatory variable in the model (say income). The scalar summary averages over all countries, and since the model is a static panel data model, over all time periods as well. We can delineate between origin- and destination-specific effects using the expressions involving \(\beta _o, \beta _d\), which allows us to determine the relative importance of changes in (say) income at origin versus destination countries on trade flows.

In addition to point estimates of the partial derivative impacts, there might also be a need to calculate empirical measures of dispersion for the effects that could be used for inference. An empirical distribution of the scalar own- and cross-partial derivatives (labeled direct and network effects here) can be constructed using MCMC draws for the parameters \(\rho _o, \rho _d, \rho _w, \beta _o, \beta _d\) in expression (11). However, this would require inversion of the \(N^2 \times N^2\) matrix \((I_{N^2} - \rho _o W_o - \rho _d W_d - \rho _w W_w)\) thousands of times for each set of draws for \(\rho _o, \rho _d, \rho _w\), making this computationally intensive.Footnote 11

A compromise approach would be to use posterior means of the estimated parameters \(\rho _o, \rho _d, \rho _w\) to calculate a single matrix inverse: \((I_{N^2} - \rho _o W_o - \rho _d W_d - \rho _w W_w)^{-1}\) in conjunction with the MCMC draws for the parameters \(\beta _o, \beta _d\). However, this would ignore stochastic variation in the effects estimates that arise from the fact that there is uncertainty regarding the parameters \(\rho _o, \rho _d, \rho _w\). Ideally, we would like to use draws for these dependence parameters from their posterior distributions when simulating the empirical distribution of effects estimates.

5 Application of the model

5.1 Context and different interaction structures

We consider panel model specifications that use a panel of trade flows as the dependent variable vector y over the 38 years from 1963 to 2000. The (single) explanatory variable is (logged) gross domestic product per capita (GDP) lagged one year to cover the period from 1962 to 1999. The trade flows are from Feenstra et al. (2005), while the GDP data at market prices (current US$) and population data come from World Bank’s (2002) World Development Indicators. A usable sample of 70 countries (see Table 6 in “Appendix C”) was constructed for which GDP, population and trade flows were available over the 38 years.Footnote 12

Given our sample of 70 countries and 38 years, this results in \(N^2 T = 186,200\); with \(2N + T - 2 = 176\) fixed effects parameters for the case of the Mátyás (1997) model in (1), and \(N^2 + T - 1 = 4,937\) fixed effects parameters in the Cheng and Wall (2005) approach set forth in the model from (2).

We used five different definitions for the matrix W describing alternative structures of network dependence, specifically, \(W_{space}\) based on the three nearest spatial neighboring countries, \(W_{language}\) based on countries sharing a common language, \(W_{currency}\) based on common currency, \(W_{colony}\) based on countries with direct historical colonial ties, and \(W_{trade}\) based on membership in the same trade union (excluding the WTO). Details regarding countries with common borders, language, currency, colonial ties and trade union membership can be found in Tables 7, 8, 9 and 10 in “Appendix C”.Footnote 13

5.2 Model comparison of alternative structures of network dependence

Estimates from the model in (12) where the parameters \(\rho _o, \rho _d, \rho _w\) are (significantly) different from zero point to the existence of cross-sectional dependence.

$$\begin{aligned} y= & \,{} \rho _o \ W_1 \ y + \rho _d \ W_2 \ y + \rho _{w} \ W_3 \ y + GDP_o \ \beta _o + GDP_d \ \beta _d + \Delta _\gamma \gamma + \Delta _\lambda \lambda + \varepsilon , \\ W_1= & \,{} I_T \otimes (W \otimes I_N), \ \ W_2 = I_T \otimes (I_N \otimes W), \ \ W_3 = I_T \otimes (W \otimes W). \end{aligned}$$
(12)

In the presence of dependence, estimates from conventional models that ignore cross-sectional/network dependence can be shown to be biased and inconsistent (see LeSage and Fischer 2020). The presence of network dependence also implies spillover impacts arising from changes in neighboring countries \(j \ne i\) income on country i’s trade flows. In our model, neighbors are defined to include spatial neighbors in the case where \(W_{space}\) is used when estimating the model. More broadly, sociocultural neighbors arise when the matrix W used is based on common language, currency, trade union membership or direct colonial ties. Specifically, changes in income of countries j that have spatial, common language, currency, trade agreements, or colonial ties with country i will impact flows in the spatial autoregressive model, provided that the scalar dependence parameters \(\rho _o, \rho _d, \rho _w\) are different from zero and the parameters \(\beta _o, \beta _d\) are non-zero.

Table 1 Log-marginal likelihood estimates for alternative models

Table 1 shows log-marginal likelihood function values for models based on the alternative definitions of the weight matrix as well as the two alternative approaches to including fixed effects. The sum of posterior means for \(\rho _o + \rho _d + \rho _w\) are also reported, since non-zero values of these parameters point to significant network dependence. From the table, we see that models using the Cheng and Wall (2005) fixed effects have higher log-marginal likelihoods than the corresponding Mátyás (1997) model based on the same weight matrix, indicating these models are more consistent with our sample data. A second finding indicated by the estimated log-marginal likelihoods in the table is that the rank-ordering of preferred models for the various types of weight matrices is very similar for both types of fixed effects. Specifically, the model specifications based on \(W_{trade}\) have the highest log-marginal likelihood, those based on \(W_{space}\) are next highest, followed by \(W_{language}\). Turning to estimates for the dependence parameters, we see that the sums of these are substantially positive, pointing to the presence of network dependence.

5.3 Parameter estimates and partial derivative impacts for the best model

Table 2 presents estimates for the best models based on \(W_{trade}\) using both the Mátyás (1997) fixed effects and those of Cheng and Wall (2005). The table presents the mode of the parameter estimates evaluated using the joint posterior distribution as well as the mean and median based on 5000 retained MCMC draws (with an initial 5000 excluded for burn-in of the sampler). Monte Carlo (MC) error estimates are reported along with Geweke’s diagnostic that compares draws from the first ten percent of the MCMC sampling (after burn-in) and the last 50% of the draws. The test is whether the batched means are equal, which indicates convergence.

Table 2 Estimates for the \(W_{trade}\) models

From the estimates we see that the dependence parameters \(\rho _o, \rho _d, \rho _w\) are different from zero based on the credible intervals calculated from the MCMC draws. As noted in the discussion of model interpretation in Sect. 4.2, the parameters \(\beta _o, \beta _d\) do not represent partial derivative impacts of the elasticity response of trade flows to changes in origin and destination country GDP. These need to be calculated using the non-linear matrix expressions for the own- and cross-partial derivatives. The results from doing this are presented in Table 3, where we see substantial network effects. The network effects reflect cumulated off-diagonal elements of the matrix of partial derivatives (cross-partial derivatives) averaged over all countries as described in Sect. 4.2.

Table 3 Partial derivative impacts for the \(W_{trade}\) models

These estimates show larger direct and network impacts arising from changes in destination country than origin country income on trade flows in the case of both types of fixed effects. The Cheng and Wall (2005) fixed effects lead to larger direct and network destination effects than those from the Mátyás (1997) fixed effects specification, but smaller origin-specific direct and network effects than those from the Mátyás (1997) fixed effects specification.

Least-squares estimates were \({\hat{\beta }}_o = 0.9818, {\hat{\beta }}_d = 1.1562\) for the Mátyás (1997) specification, and \({\hat{\beta }}_o = 0.9822, {\hat{\beta }}_d = 1.3032\) for the Cheng and Wall (2005) specification. The total effects estimates from the cross-sectional dependence models would be comparable to the least-squares estimates, and we see that ignoring network effects that arise from cross-sectional dependence lead to a substantial downward bias in the least-squares estimates.

5.4 Model comparison of extended versions of the model

We produced estimates for models based on averages of all 26 possible combinations of two or more weight matrices. For example, we define the combined weight matrix: \(W_c = W_{space} + W_{trade} + W_{language} + W_{currency} + W_{colony}\), where \(W_c\) is row-normalized to have row-sums of unity. Note that we (implicitly) assign equal weight to all matrices in each model, so a model containing three matrices (e.g., \(W_{space}, W_{trade}, W_{language}\)) assigns a weight of 1/3 to each of these.Footnote 14 Log-marginal likelihoods are presented for these models in Table 4 for the specification based on Mátyás (1997) fixed effects, and in Table 5 for the Cheng and Wall (2005) fixed effects specification.

Table 4 Model comparison of combinations of \(W-\)matrices (based on fixed effects from Mátyás 1997)

In the Table 4 results, model #19 dominates all others leading to a posterior model probability of one assigned to this specification, based on \(W_{currency} + W_{colony} + W_{trade}\). We also note that a comparison of the log-marginal likelihood for the best single weight matrix model from Table 1 shows that combinations of weight matrices produce a specification more consistent with our sample data. That is, the log-marginal likelihood for the model based on \(W_{trade}\) alone was \(-5.1401e+05\), compared to that for model #19 based on three weight matrices of \(-5.0210e+05\). The next best model was model #10 based on \(W_{colony}+W_{trade}\) and the 3rd best model was model #23 based on \(W_{space}+W_{currency}+W_{colony} + W_{trade}\).

Table 5 Model comparison of combinations of \(W-\)matrices (based on fixed effects from Cheng and Wall 2005)

The Table 5 results are based on the extended set of fixed effects from Cheng and Wall (2005) where we see that the best model (#23) is one based on \(W_{space}+W_{currency}+W_{colony}+W_{trade}\), and the next best model (#26) included all five weight matrices, with the third-best model (#16) including \(W_{space}+W_{colony}+W_{trade}\). What seems clear from the results in Tables 4 and 5 is that membership in trade unions and historical colonial ties are an important source of interaction between countries’ trade flows. The results from Table 5 place emphasis on \(W_{space}\) not found for the model based on simpler fixed effects. Recall that the model based on Cheng and Wall (2005) fixed effects whose results are presented in Table 5 represents the preferred model as it has higher log-marginal likelihood values.

6 Conclusions

A computationally efficient approach to MCMC estimation of a multi-indexed network dependence panel gravity model was set forth and used to examine the presence of a specific type of network dependence in trade flows. The network dependence model specifications are based on a combination of multiple weight matrices reflecting different sources of network dependence such as common currency, language, colonial ties, membership in trade unions or spatial proximity of countries. In cross-sectional gravity models these are typically treated as generalized distance variables, with the interpretation being that they reflect heterogeneity impacting the intercept term. In a panel data specification, these types of commonality between countries reflect time-invariant factors that are thought to be modeled by fixed effects.

We show that after including commonly used fixed effects of the type suggested by Mátyás (1997) or Cheng and Wall (2005), there is evidence that significant network dependence in trade flows remains. Conventional gravity models assume the variable vector of \(N^2 \times 1\) trade flows for each time period are independent, so trade flows between countries that have a common currency, language, border, colonial ties or are members of a trade union are no more likely than flows between countries having nothing in common.

Our panel gravity model allows these sociocultural factors to represent a basis for trade interaction between countries, with more similar flows between countries that share common borders, currency, language etc. Application of the model to a panel of trade flows covering 38 years and 70 countries provides evidence that this is the case. Network dependence produces simultaneous dependence, which means that flows from country dyad (ij) depend on flows from other country dyads (say (kl) with \(k\ne i\), \(l \ne j\)), where the dependence structure is based on sociocultural factors. The most important sources of cross-sectional dependence were found to be trade organizations, historical colonial ties, common currency and spatial proximity of countries.