1 Introduction

Real data are often observed at different locations and times, which are called as spatial panel data (SPD). Examples are economic growth rates of major cities in China over last 40 years, monthly unemployment rates of states in USA in the last decade and daily infection rates of COVID-19 in major cites in Hubei province in China over 3 months since December 31, 2019. These data may be modelled by SPD models. The research to various SPD models can be found in Anselin (1988), Elhorst (2003), Baltagi et al. (2003), Baltagi and Li (2006), Chen and Conley (2001), Pesaran (2004), Kapoor et al. (2007), Baltagi et al. (2007), Lee and Yu (2010a), Mutl and Pfaffermayr (2011), Parent and LeSage (2011) and Baltagi et al. (2013), among others. By adding a dynamic element into a SPD model Anselin (2001) proposes a spatial dynamic panel data (SDPD) model, which increases the flexibility of a SPD model. Obviously, SPD models belong to SDPD models. There has been a growing interest in the statistical inferences for SDPD models since then. For an overview on the SPDP models, refer to Yu et al. (2008) and Lee and Yu (2010b), Su and Yang (2015), Lee and Yu (2015a, b), Yu and Lee (2010), Elhorst (2010), Elhorst (2005), Yang et al. (2006), Mutl (2006), Su and Yang (2007), and Lee and Yu (2010c), among others. There are two popular methods for the estimation and test of SPD and SDPD models: quasi-maximum likelihood (QML) approach and generalized method of moments (GMM), which can be seen in above references.

In this article, we use the empirical likelihood (EL) method, proposed by Owen (1988, 1990), to construct the confidence regions for the parameters in a SDPD model. It is observed, in the case of independent observations, that the EL method to construct confidence intervals/regions has many advantages over its counterparts like the normal-approximation-based method and the bootstrap method (e.g., Hall and La Scala 1990; Hall 1992). A excellent review on EL for regressions can be found in Chen and Keilegom (2009). There are a lot of references on EL methods for independent samples or in the context of sample surveys. To save space, we list a few of them such as Owen (2001), Qin and Lawless (1994), Chen and Qin (1993), Zhong and Rao (2000) and Wu (2004). The date are dependent and follow certain structures in SDPD models. The study of the EL method for some SPDP models enjoys a certain progress. For example, there are a few articles studying the EL method for the pure spatial data (PSD, the special case of SPD with a fixed time). For instance, Nordman (2008a, b) and Bandyopadhyay et al. (2015) use the blockwise EL (BEL) proposed by Kitamura (1997) to PSD. Recently, by exploring inherent martingale structures, Qin (2021) and Jin and Lee (2019) use the EL method to construct confidence intervals/regions in PSD models. Further, Li and Qin (2020) extends the EL method proposed by Qin (2021) and Jin and Lee (2019) to SPD models. We note that there is no research work on the EL method for SDPD models.

There are many kinds of SDPD models. Lee and Yu (2010c) gives a review on the classification and research development of some SDPD models. Su and Yang (2015) introduces the QML method to SDPD models with spatial errors, where different types of space-specific effects and different ways that initial observations being generated (exogenously or endogenously) are investigated. Since SDPD models are quite complicated, as a starting point, in this article, we study the EL method for the SDPD models in Su and Yang (2015) with the restriction that there is no space-specific effects (or named as zero drift) and initial observations are generated exogenously. The study of the EL method for general SDPD models without above restriction is left for our future study. Our research results show that the EL based confidence regions generally outperform the normal approximation (NA) based confidence regions when the space units are large enough.

The rest of the article is organized as follows. Section 2 presents the main results. Results from a simulation study are reported in Sect. 3. Section 4 gives the analysis of real data. All technical details are presented in Sect. 5.

2 Main results

In this article, we suppose that there are n individual units and T time periods and the sampling data satisfy the following SDPD model with spatial error:

$$\begin{aligned} y_t= & {} \rho y_{t-1}+x_t\beta +z\gamma +\epsilon _t, \end{aligned}$$
(1)
$$\begin{aligned} \epsilon _t= & {} \lambda W_n\epsilon _t+\nu _t, t=1, 2, \ldots , T, \end{aligned}$$
(2)

where \(y_t=(y_{1t},\ldots , y_{nt})'\) is an n-dimensional column vector of observed dependent variables, \(\rho (|\rho |< 1)\) characterizes the dynamic effect, \(x_t=(x_{1t},\ldots , x_{nt})'\) is an \(n\times p\) matrix of time-varying exogenous variables, \(z=(z_1,\ldots , z_n)'\) is an \(n\times q\) matrix of time-invariant exogenous variables, and \(\beta \) and \(\gamma \) are \(p\times 1\) and \(q\times 1\) regression coefficients, respectively. The disturbance vector \(\epsilon _t=(\epsilon _{1t},\ldots , \epsilon _{nt})'\) is an \(n\times 1\) vector of errors. The parameter \(\lambda \) is a spatial autoregressive coefficient and \(W_n\) is an \(n\times n\) spatial weighting matrix of constants, \(\nu _t=(\nu _{1t},\ldots , \nu _{nt})'\) is an \(n\times 1\) column vector, and \(\{\nu _{it}\}\) are i.i.d. across t and i with zero mean and variance \(\sigma ^2_\nu \). The spatial weighting matrix is also called contiguity matrix, which is determined by the spatial dependence of n spatial units. There are many ways to define \(W_n\) (e.g. pages 17–19 in Anselin 1988). Let \(W_{ij}\) be the (ij) element of \(W_n\). Commonly used \(W_n\) includes Rook contiguity, Bishop contiguity and Queen contiguity as follows. Rook contiguity: define \(W_{ij} = 1\) if the units i and j share a common side and \(W_{ij} = 0\), otherwise. Bishop contiguity: define \(W_{ij} = 1\) if the units i and j share a common vertex and \(W_{ij} = 0\), otherwise. Queen contiguity: define \(W_{ij} = 1\) if the units i and j share a common side or vertex and \(W_{ij} = 0\), otherwise. The choice of \(W_n\) is important. Our results hold true for all these commonly used \(W_n\).

The models (2.1)–(2.3) in Su and Yang (2015) are as follows:

$$\begin{aligned} y_t= & {} \rho y_{t-1}+x_t\beta +z\gamma +\mu +\epsilon _t, \\ \epsilon _t= & {} \lambda W_n\epsilon _t+\nu _t, t=1, 2, \ldots , T, \end{aligned}$$

where \(\mu =(\mu _1, \ldots , \mu _n)'\) represent the unobservable individual or space-specific effects and other notations are the same as in model (1) and (2). Compared to models (2.1)–(2.3) in Su and Yang (2015), the only difference is that there is no space-specific effects \(\mu \) in model (1) and (2). Our initial investigation shows that the EL method for SDPD models with space-specific effects may need an adjusted EL method. Further research is needed and left for our future work.

We develop the EL method for the SDPD model when \(y_0\) is exogenous. In this case, we can treat \(y_0\) as a fixed constant vector as it contains no information about the model parameters. For convenience, we use \(\mathbf{1} _k\) to denote a \(k\times 1\) vector of ones, \(\mathbf{0} _k\) to denote a \(k\times 1\) vector of zeros, and \(J_k=\mathbf{1} _k \mathbf{1} '_k\), where \(\otimes \) is the Kronecker product.

Let \(Y=(y'_1, y'_2, \ldots , y'_T)'\), \(Y_{-1}=(y'_0, y'_1, \ldots , y'_{T-1})'\), \(X=(x'_1, x'_2, \ldots , x'_T)'\), \(\nu =(\nu '_1, \nu '_2, \ldots , \nu '_T)'\), \(Z=\mathbf{1} _T\otimes z\), \(B=B(\lambda )=I_n-\lambda W_n\) and \(\epsilon =(\epsilon '_1, \epsilon '_2, \ldots , \epsilon '_T)'\). Then model (1) and (2) can be written in a matrix form as:

$$\begin{aligned} \left( \begin{array}{l} y_1\\ y_2\\ \vdots \\ y_T \end{array} \right) = \rho \left( \begin{array}{c} y_0\\ y_1\\ \vdots \\ y_{T-1}\end{array} \right) +\left( \begin{array}{c} x_1\\ x_2\\ \vdots \\ x_T \end{array} \right) \beta +\left( \begin{array}{l} z\\ z\\ \vdots \\ z \end{array} \right) \gamma + \left( \begin{array}{c} \epsilon _1\\ \epsilon _2\\ \vdots \\ \epsilon _T \end{array} \right) , \end{aligned}$$

with

$$\begin{aligned} \left( \begin{array}{l} \epsilon _1\\ \epsilon _2\\ \vdots \\ \epsilon _T \end{array} \right) =\left( \begin{array}{llllll}B^{-1} &{} 0 &{} 0 &{} \cdots &{} 0 &{} 0\\ 0 &{} B^{-1}&{} 0 &{} \cdots &{} 0 &{} 0\\ \vdots \\ 0&{} 0&{} 0&{} \cdots &{} 0 &{} B^{-1}\end{array} \right) \left( \begin{array}{l} \nu _1\\ \nu _2\\ \vdots \\ \nu _T \end{array} \right) , \end{aligned}$$

or

$$\begin{aligned} Y=\rho Y_{-1}+X\beta +Z\gamma +\epsilon , \end{aligned}$$
(3)

with

$$\begin{aligned} \epsilon =(I_T\otimes B^{-1})\nu , \end{aligned}$$
(4)

where \(\epsilon \sim (0, \sigma ^2_\nu \Omega )\), with

$$\begin{aligned} \Omega =\Omega (\lambda )=I_T\otimes (B'B)^{-1}. \end{aligned}$$
(5)

Let \(\theta =(\beta ', \gamma ', \rho )'\) and \(\psi =(\theta ', \sigma ^2_\nu , \lambda )'\). We adopt the QML method to derive the estimating equations for the EL method. Under the assumption of normality (which is only used at this moment), based on (3) and (4), the log-likelihood function (ignoring constants) is

$$\begin{aligned} \widetilde{L}(\psi )=-{nT\over 2}\log \sigma _\nu ^2 -\frac{1}{2}\log |\Omega |-{1\over 2\sigma _\nu ^2}\epsilon '\Omega ^{-1}\epsilon , \end{aligned}$$
(6)

where \(\epsilon =Y-\rho Y_{-1}-X\beta -Z\gamma \). It can be shown that

$$\begin{aligned}&\partial \widetilde{ L}(\psi )/\partial \theta = \sigma _\nu ^{-2}\widetilde{X}'\Omega ^{-1}\epsilon ,\\&\partial \widetilde{L}(\psi )/\partial \sigma _\nu ^2 =-\frac{nT}{2\sigma _\nu ^2}+\frac{1}{2\sigma _\nu ^4}\epsilon '\Omega ^{-1}\epsilon , \\&\partial \widetilde{L}(\psi )/\partial \lambda = -\frac{1}{2}tr(\Omega ^{-1}(I_T\otimes A))+\frac{1}{2\sigma ^{2}_\nu }\epsilon '\Omega ^{-1}(I_T\otimes A)\Omega ^{-1}\epsilon , \end{aligned}$$

where \(\widetilde{X}=(X, Z, Y_{-1})\), \(A=(B' B)^{-1}(W'_nB+B' W_n)(B' B)^{-1}\). Letting above derivatives be 0, we obtain the following estimating equations of the QML method:

$$\begin{aligned}&\widetilde{X}'\Omega ^{-1}\epsilon =0, \end{aligned}$$
(7)
$$\begin{aligned}&-nT\sigma _\nu ^2+\epsilon '\Omega ^{-1}\epsilon =0, \end{aligned}$$
(8)
$$\begin{aligned}&-\sigma ^{2}_\nu tr(\Omega ^{-1}(I_T\otimes A))+ \epsilon '\Omega ^{-1}(I_T\otimes A)\Omega ^{-1}\epsilon =0, \end{aligned}$$
(9)

Substituting (4) into (7)–(9), we have

$$\begin{aligned}&\widetilde{X}'(I_T\otimes B')\nu =0, \end{aligned}$$
(10)
$$\begin{aligned}&-nT\sigma _\nu ^2+\nu ' \nu =0, \end{aligned}$$
(11)
$$\begin{aligned}&-\sigma ^{2}_\nu tr(I_T\otimes (BAB'))+\nu '(I_T\otimes (BAB'))\nu =0. \end{aligned}$$
(12)

Noting that \(\widetilde{X}=(X, Z, Y_{-1})\) and \(Y_{-1}\) contains Xz and \(\nu \), we need to separate out \(\nu \) from \(\widetilde{X}\). To this end, denote \(l_{\rho }=(0, c_{\rho , 1}, \ldots , c_{\rho ,T-1})'\), \(c_{\rho , t}=(1-\rho ^t)/(1-\rho )\), \(Y_0=(Y'_{0, 0}, Y'_{0, 1}, \ldots , Y'_{0, T-1})'\), \(Y_{0, t}=\rho ^ty_0\),

$$\begin{aligned} F_{\rho }=\left( \begin{array}{lllll}0 &{} 1 &{} \rho &{}\cdots &{} \rho ^{T-2} \\ 0&{} 0 &{} 1&{}\cdots &{}\rho ^{T-3}\\ \vdots &{} \vdots &{} \vdots &{}\ddots &{}\vdots \\ 0 &{}0 &{}0&{}\cdots &{}1\\ 0 &{}0 &{}0&{}\cdots &{}0 \end{array} \right) , \end{aligned}$$

\(A_x=F_{\rho }' \otimes I_n \) and \(A_\nu =F_{\rho }' \otimes B^{-1} \). We use (B.3) in Su and Yang (2015) to obtain that

$$\begin{aligned} Y_{-1}=A_xX\beta +(l_{\rho }\otimes I_n )z\gamma +A_\nu \nu +Y_0, \end{aligned}$$

Let \(\widetilde{X}_1=\left( X,\ Z \right) \) and \(\widetilde{X}_2= A_xX\beta +(l_{\rho }\otimes I_n )z\gamma +Y_0\). Then (10) can be decomposed into

$$\begin{aligned}&\widetilde{X}'_1(I_T\otimes B')\nu =0, \end{aligned}$$
(13)
$$\begin{aligned}&\widetilde{X}'_2(I_T\otimes B')\nu +\nu 'A'_\nu (I_T\otimes B')\nu =0. \end{aligned}$$
(14)

For convenience, let \(e=\nu \), i.e.

$$\begin{aligned} e_{(nT)\times 1}=\left( \begin{array}{l} e_1\\ e_2\\ \vdots \\ e_{nT}\\ \end{array} \right) =\left( \begin{array}{l} \nu _1\\ \nu _2\\ \vdots \\ \nu _{T}\\ \end{array} \right) .\end{aligned}$$
(15)

Then (11)–(14) can be rewritten as

$$\begin{aligned}&\widetilde{X}'_1(I_T\otimes B')e=0, \end{aligned}$$
(16)
$$\begin{aligned}&\widetilde{X}'_2(I_T\otimes B')e+e'A'_\nu (I_T\otimes B')e=0, \end{aligned}$$
(17)
$$\begin{aligned}&-nT\sigma _\nu ^2+e'e=0, \end{aligned}$$
(18)
$$\begin{aligned}&-\sigma ^{2}_\nu tr(I_T\otimes (BAB'))+e'(I_T\otimes (BAB'))e=0. \end{aligned}$$
(19)

Observing that the above estimating equations include the quadratic forms of e, to use the EL method, we need to change the quadratic forms into the linear forms of a well behaved random variables. To this end, we let \(H_1={1\over 2}\left( A'_\nu (I_T\otimes B')+(I_T\otimes B)A_\nu \right) \) and \(H_2=I_T\otimes (BAB')\). Use \({h}_{ij,k}\), \(a_{i,1}\) and \(a_{i,2}\) to denote the (ij) element of the matrix \(H_k\) (\(k=1, 2\)), the i-th column of the matrix \(\widetilde{X}_1'(I_T\otimes B')\) and the i-th element of the vector \(\widetilde{X}_2'(I_T\otimes B')\), respectively, and adapt the convention that any sum with an upper index of less than one is zero. To deal with the quadratic form in (17) and (19), we follow Kelejian and Prucha (2001) to introduce a martingale difference array. Define the \(\sigma \)-fields: \({\mathcal {F}}_{0}=\{ {\emptyset }, \Omega \}, {\mathcal {F}}_{i}=\sigma (e_1, e_2, \ldots , e_i), 1\le i\le nT\). Let

$$\begin{aligned} M_{ik}=h_{ii,k}(e^2_i-\sigma _\nu ^2)+2e_i\sum ^{i-1}_{j=1}h_{ij,k}e_j, k=1,2. \end{aligned}$$
(20)

Then \( {\mathcal {F}}_{i-1} \subseteq {\mathcal {F}}_{i}, M_{ik}\) is \({\mathcal {F}}_{i}\)-measurable and \(E(M_{ik}|{\mathcal {F}}_{i-1})=0\). Thus \(\{M_{ik}, {\mathcal {F}}_{i}, 1\le i\le nT\}\) form a martingale difference array and

$$\begin{aligned} e'H_k e-\sigma ^{2}_\nu tr(H_k)=\sum ^{nT}_{i=1}M_{ik}, k=1,2. \end{aligned}$$
(21)

Based on (16)–(21), we propose the following EL ratio statistic for \(\psi \in R^{p+q+3}\):

$$\begin{aligned} L(\psi )=\sup _{p_i, 1\le i\le nT}\prod ^{nT}_{i=1}((nT)p_i), \end{aligned}$$

where \(\{p_i\}\) satisfy

$$\begin{aligned}&p_i\ge 0, 1\le i\le nT, \sum ^{nT}_{i=1}p_i=1, \\&\sum ^{nT}_{i=1}p_i a_{i,1}e_i=0, \\&\sum ^{nT}_{i=1}p_i \left\{ a_{i,2}e_i+ h_{ii,1}(e^2_i-\sigma _\nu ^2)+2e_i\sum ^{i-1}_{j=1}h_{ij,1}e_j \right\} =0, \\&\sum ^{nT}_{i=1}p_i(e^2_i-\sigma ^2_\nu )=0,\\&\sum ^{nT}_{i=1}p_i \left\{ h_{ii,2}(e^2_i-\sigma _\nu ^2)+2e_i\sum ^{i-1}_{j=1}h_{ij,2}e_j \right\} =0. \end{aligned}$$

Let

$$\begin{aligned} \omega _{i}(\psi )=\left( \begin{array}{l} a_{i,1}e_i\\ e^2_i-\sigma ^2_\nu \\ a_{i,2}e_i+h_{ii,1}(e^2_i-\sigma _\nu ^2)+2e_i\sum ^{i-1}_{j=1}h_{ij,1}e_j\\ h_{ii,2}(e^2_i-\sigma _\nu ^2)+2e_i\sum ^{i-1}_{j=1}h_{ij,2}e_j\\ \end{array} \right) _{(p+q+3)\times 1}, \end{aligned}$$

where \(e_i\) is the ith component of \((I_T\otimes B)(Y-\rho Y_{-1}-X\beta -Z\gamma )\). Following Owen (1990), one can show that

$$\begin{aligned} \ell (\psi )\hat{=}-2\log L(\psi )=2\sum ^{nT}_{i=1}\log \{1+\tilde{\lambda }'(\psi )\omega _{i}(\psi )\}, \end{aligned}$$
(22)

where \(\tilde{\lambda }(\psi )\in R^{p+q+3}\) is the solution of the following equation:

$$\begin{aligned} {1\over nT}\sum ^{nT}_{i=1}{\omega _{i}(\psi )\over 1+\tilde{\lambda }'(\psi )\omega _{i}(\psi )}=0. \end{aligned}$$
(23)

Let \(\vartheta _j=E\nu _{11}^j, j=3, 4\). Use Vec(diagA) to denote the vector formed by the diagonal elements of a matrix A, ||a|| to denote the \(L_2\)-norm of a vector a, and \(\lambda _{min}(H)\) and \(\lambda _{max}(H)\) to denote the minimum and maximum eigenvalues of a matrix H, respectively. To obtain the asymptotic distribution of \(\ell (\psi )\), we need following assumptions.

  • A1. (1) \(\nu _{jt}\) are mutually independent, and they are independent of \(x_{ks}\) and \(z_k\) for all jkts;

  • (2) All elements in \((x_{it}, z_i)\) have \(4+\eta _1\) moments for some \(\eta _1>0\).

  • A2. (1) \(\{\nu _{it}, t=1,\ldots , T, i=1,\ldots , n \}\) are independent and identically distributed for all i and t with mean 0, variance \(\sigma ^2_\nu >0\) and \(E|\nu _{it}|^{4+\eta _1}<\infty \) for some \(\eta _1>0\).

  • (2) \(\{x_{it}, t=\ldots , -1, 0, 1, \ldots \}\) and \(\{z_i\}\) are strictly exogenous and independent across i.

  • (3) \(|\rho |<1\).

  • A3. Let \(W_n\) and \(\{B^{-1} \}\) be as described above. They satisfy the following conditions:

  • (1) The row and column sums of \(W_n\) are uniformly bounded in absolute value;

  • (2) \(\{B^{-1}\}\) are uniformly bounded in either row or column sums, uniformly in \(\lambda \) in a compact parameter space \(\Lambda \), and \(\underline{c}_\lambda \le \inf _{\lambda \in \Lambda }\lambda _{\max }(B' B)\le \sup _{\lambda \in \Lambda }\lambda _{\max }(B' B)\le \overline{c}_\lambda <\infty \).

  • A4. There are constants \(c_j>0, j=1, 2\), such that

    $$\begin{aligned} 0<c_1\le \lambda _{min}\left( (nT)^{-1}\Sigma _{p+q+3} \right) \le \lambda _{max}\left( (nT)^{-1}\Sigma _{p+q+3} \right) \le c_2<\infty , \end{aligned}$$

    where

    $$\begin{aligned}&\Sigma _{p+q+3}=\Sigma '_{p+q+3} =Cov\left\{ \sum ^{nT}_{i=1}\omega _{i}(\psi ) \right\} \nonumber \\&\quad =\left( \begin{array}{llll} \Sigma _{11}&{} \Sigma _{12} &{} \Sigma _{13} &{} \Sigma _{14}\\ *&{} \Sigma _{22} &{} \Sigma _{23}&{} \Sigma _{24}\\ *&{} *&{} \Sigma _{33} &{}\Sigma _{34}\\ *&{} *&{} *&{} \ \Sigma _{44}\\ \end{array} \right) _{(p+q+3)\times (p+q+3)}, \end{aligned}$$
    (24)

    where

    $$\begin{aligned} \Sigma _{11}= & {} \sigma ^2_\nu E\left( \widetilde{X}_1'\Omega ^{-1}\widetilde{X}_1\right) , \Sigma _{12}=\vartheta _3 E(\widetilde{X}_1')(I_T\otimes B')\mathbf{1} _{nT}, \\ \Sigma _{13}= & {} \sigma ^2_\nu E\left( \widetilde{X}_1'\Omega ^{-1}\widetilde{X}_2\right) , \Sigma _{14}=\vartheta _3E(\widetilde{X}_1')(I_T\otimes B')vec_D(H_2) \\ \Sigma _{22}= & {} nT(\vartheta _4-\sigma _\nu ^4),\ \ \Sigma _{23}=(\vartheta _4-3\sigma _\nu ^4)\mathbf{1} '_{nT}vec_D(H_1)+2\sigma _\nu ^4tr(H_1)+\vartheta _3E(\widetilde{X}_2')(I_T\otimes B')\mathbf{1} _{nT}, \\ \Sigma _{24}= & {} (\vartheta _4-3\sigma _\nu ^4)\mathbf{1} '_{nT}vec_D(H_2)+2\sigma _\nu ^4tr(H_2),\\ \Sigma _{33}= & {} (\vartheta _4-3\sigma _\nu ^4)||vec_D(H_1)||^2+2\sigma _\nu ^4tr(H_1^2)+\sigma ^2_\nu E\left( \widetilde{X}_2'\Omega ^{-1}\widetilde{X}_2\right) +2\vartheta _3 E(\widetilde{X}_2')(I_T\otimes B')vec_D(H_1), \\ \Sigma _{34}= & {} (\vartheta _4-3\sigma _\nu ^4)vec'_D(H_1)vec_D(H_2)+2\sigma _\nu ^4tr(H_1H_2)+\vartheta _3 E(\widetilde{X}_2')(I_T\otimes B')vec_D(H_2), \\ \Sigma _{44}= & {} (\vartheta _4-3\sigma _\nu ^4)||vec_D(H_2)||^2+2\sigma _\nu ^4tr(H_2^2). \end{aligned}$$
  • A5. \(n\rightarrow \infty \) but T is fixed.

Remark 1

Conditions A1–A3 are common assumptions for spatial models, which are used in Su and Yang (2015), and the analog of \(0<c_1\le \lambda _{min}\left( (nT)^{-1}\Sigma _{p+q+3} \right) \) is employed in the assumption of Theorem 1 in Kelejian and Prucha (2001).

We now state the main results.

Theorem 1

Suppose that Assumptions A1–A5 are satisfied. Then under model (1)–(2), as \(n\rightarrow \infty,\)

$$\begin{aligned} \ell ({\psi } ){\mathop {\longrightarrow }\limits ^{d}}\chi ^2_{p+q+3}, \end{aligned}$$

where \(\chi ^2_{p+q+3}\) is a chi-squared distributed random variable with \(p+q+3\) degrees of freedom.

Let \(z_{\alpha }(p+q+3)\) satisfy \(P(\chi ^2_{p+q+3}\le z_{\alpha }(p+q+3))=\alpha \) for \(0<\alpha <1\). It follows from Theorem 1 that an EL based confidence region for \(\psi \) with asymptotically correct coverage probability \(\alpha \) can be constructed as

$$\begin{aligned} \{ \psi : \ell (\psi )\le z_{\alpha }(p+q+3) \}. \end{aligned}$$

3 Simulations

Recall that \(\theta =(\beta ', \gamma ', \rho )'\) and \(\psi =(\theta ', \sigma ^2_\nu , \lambda )'\). Denote \(P_\lambda =\Omega ^{-1}\Omega _\lambda \Omega ^{-1}\), \(\Omega _\lambda =I_T\otimes A\) and \(\Omega _{\lambda \lambda }=I_T\otimes \{2(B'B)^{-1}[(W'_nB+B' W_n)A-W_n'W_n(B'B)^{-1}]\}\). It can be shown that

$$\begin{aligned}&\frac{\partial ^2\widetilde{L}(\psi )}{\partial \theta \partial \theta '}=-\frac{1}{\sigma ^2_\nu }\widetilde{X}'\Omega ^{-1}\widetilde{X},\ \\&\frac{\partial ^2\widetilde{L}(\psi )}{\partial \theta \partial \sigma _\nu ^2}=-\frac{1}{\sigma ^4_\nu }\widetilde{X}'\Omega ^{-1}\epsilon , \\&\frac{\partial ^2\widetilde{L}(\psi )}{\partial \theta \partial \lambda }=-\frac{1}{\sigma ^2_\nu }\widetilde{X}'P_\lambda \epsilon ,\ \\&\frac{\partial ^2\widetilde{L}(\psi )}{\partial \sigma _\nu ^2\partial \sigma _\nu ^2}=-\frac{1}{\sigma ^6_\nu }\epsilon '\Omega ^{-1}\epsilon +\frac{nT}{2\sigma _\nu ^4}, \\&\frac{\partial ^2\widetilde{L}(\psi )}{\partial \sigma _\nu ^2\partial \lambda }=-\frac{1}{2\sigma ^4_\nu }\epsilon 'P_\lambda \epsilon , \\&\frac{\partial ^2\widetilde{L}(\psi )}{\partial \lambda ^2}=\frac{1}{2}tr(P_\lambda \Omega _\lambda -\Omega ^{-1}\Omega _{\lambda \lambda })\\&- \frac{1}{2\sigma _\nu ^2}\epsilon '(2P_\lambda \Omega _\lambda -\Omega ^{-1}\Omega _{\lambda \lambda })\Omega ^{-1}\epsilon . \end{aligned}$$

According to Su and Yang (2015), the QMLE \(\widehat{\psi }\) of \(\psi \) satisfies:

$$\begin{aligned} \sqrt{nT}(\widehat{\psi }-\psi ){\mathop {\longrightarrow }\limits ^{d}}N(0,-\Sigma ^{-1}), \end{aligned}$$

where \(\Sigma =\lim _{n\rightarrow \infty }\frac{1}{nT}E[\Sigma _n(\psi )]\) and \(\Sigma _n(\psi )=\frac{\partial ^2}{\partial \psi \partial \psi '}\widetilde{L}(\psi )\).

Based on the above asymptotic result, we can obtain the NA based confidence region for \(\psi \). However, we note that the NA method depends on the availability of a consistent estimator of the asymptotic covariance matrix in practical applications, while the EL method does not. This can save the implementation time for the EL method and the EL method outperforms the NA method.

We conducted a small simulation study to compare the finite sample performances of the confidence regions based on EL and NA methods with confidence level \(\alpha =0.95\), and report the proportion of \(\ell (\psi ) \le z_{0.95}(p+q+3)\) and \((\widehat{\psi }-\psi )'(-\Sigma )(\widehat{\psi }-\psi )\le z_{0.95}(p+q+3)\) respectively in 1000 replications.

In the simulations, we used the following two models:

  1. (1)

    Model 1:

    \(y_t=\rho y_{t-1}+x_t\beta +z\gamma +\epsilon _t, \epsilon _t=\lambda W_n\epsilon _t+\nu _t,\ t=1, 2,3\), where \(x_t\) were generated from N(0, 4), alternatively, \(x_t\) can be randomly generated in a similar fashion as in Hsiao et al. (2002), and the elements of z were randomly generated from Bernoulli(0.5). We selected \(\beta =1\), \(\gamma =1\), \(\sigma _\nu ^2=1\) and \((\rho , \lambda )\) were taken as \((-0.8, -0.7)\), \((-0.2, -0.1)\), (0.2, 0.1), (0.8, 0.7), \((-0.8, 0.7)\) and \((0.2, -0.1)\) respectively, and \(\nu _{it}'s\) were i.i.d. from N(0, 1), t(5) and \(\chi ^2_{4}-4\), respectively;

  2. (2)

    Model 2:

    \(y_t=\rho y_{t-1}+x_t\beta +z\gamma +\epsilon _t, \epsilon _t=\lambda W_n\epsilon _t+\nu _t,\ t=1, 2,3\), where \(x_t=\left( x_t^{(1)}, x_t^{(2)}\right) \) is an \(n\times 2\) matrix, where \(x_t^{(1)}\) were randomly generated from N(0, 1) and \(x_t^{(2)}\) were randomly generated from N(0, 4). Moreover, \(z=\left( z^{(1)}, z^{(2)}\right) \) is an \(n\times 2\) matrix, the elements of \(z^{(1)}\) were randomly generated from Bernoulli(0.3) and the elements of \(z^{(2)}\) were randomly generated from Bernoulli(0.6). We selected \(\beta =(1.5, 1.0)'\), \(\gamma =(2, 1.2)'\), \((\rho , \lambda )\) were taken as \((-0.8, -0.7)\), \((-0.2, -0.1)\), (0.2, 0.1), (0.8, 0.7), \((-0.8, 0.7)\) and \((0.2, -0.1)\) respectively, and \(\nu _{it}'s\) were i.i.d. from N(0, 1), \(t(5), \chi ^2_{4}-4, 0.1N(0, 4)+0.9N(0, 1)\) and \(0.1t(3)+0.9t(5)\), respectively.

The results of simulations under model 1 are reported in Tables 1, 2 and 3, and the results of simulations under model 2 are reported in Tables 4, 5, 6, 7 and 8.

Table 1 Coverage probabilities of the NA and EL confidence regions with \(\nu _{it}\sim N(0,1)\) under model 1
Table 2 Coverage probabilities of the NA and EL confidence regions with \(\nu _{it}\sim t(5)\) under model 1
Table 3 Coverage probabilities of the NA and EL confidence regions with \(\nu _{it}+4\sim \chi ^2_4\) under model 1
Table 4 Coverage probabilities of the NA and EL confidence regions with \(\nu _{it}\sim N(0,1)\) under model 2
Table 5 Coverage probabilities of the NA and EL confidence regions with \(\nu _{it}\sim t(5)\) under model 2
Table 6 Coverage probabilities of the NA and EL confidence regions with \(\nu _{it}+4\sim \chi ^2_4\) under model 2
Table 7 Coverage probabilities of the NA and EL confidence regions with \(\nu _{it}\sim 0.1N(0,4)+0.9N(0,1)\) under model 2
Table 8 Coverage probabilities of the NA and EL confidence regions with \(\nu _{it}\sim 0.1t(3)+0.9t(5)\) under model 2

For the contiguity weight matrix \(W_n=(W_{ij})\), we took \(W_{ij}=1\) if spatial units i and j are neighbours by queen contiguity rule (namely, they share common border or vertex), \(W_{ij}=0\) otherwise (Anselin 1988, P.18). We considered five ideal cases of spatial units: \(n=m\times m\) regular grid with \(m=7, 10, 13,16, 20\), denoting \(W_n\) as \(grid_{49}, grid_{100}, grid_{169}, grid_{256} \) and \( grid_{400}\), respectively. A transformation is often used in applications to convert the matrix \(W_n\) to the unity of row-sums. We used the standardized version of \(W_n\) in our simulations, namely \(W_{ij}\) was replaced by \(W_{ij}/\sum _{j=1}^nW_{ij}\).

Simulation results under model 1 show that the confidence regions based on NA behave well with coverage probabilities being very close to the nominal level 0.95 when the error term \(\epsilon _i\) is normally distributed and n is large, but not well in other cases. The coverage probabilities of the confidence regions based on NA fall to the range [0.812, 0.862] for the t distribution and [0.809, 0.868] for the \(\chi ^2\) distribution, which are far from the nominal level 0.95. Simulation results under model 2 are similar to those under model 1.

We can see, from Tables 1, 2, 3, 4, 5, 6, 7 and 8, that the coverage probabilities of confidence regions based on EL method converge to the nominal level 0.95 as the number of spatial units n is large enough, whether the error term \(\epsilon _i\) is normally distributed or not. These results show that the EL based confidence regions generally outperform the NA based confidence regions when n is large enough.

4 A real data example

In order to illustrate the proposed method in Sect. 2, we conducted a real data analysis. The data come from 288 prefecture-level cities in China, collected from National Bureau of Statistics of China and Anjuke. There were three variables: the logarithm of housing price per square meter (\(y_t\)), the logarithm of income per household (\(x_t\)) and the urbanization rate (z) from the years of 2010 to 2017. In order to ensure the stability and eliminate the influence of dimension, we first did difference and standardization on the above data, and then considered fitting the data via the following model: \(y_t=\rho y_{t-1}+x_t\beta +z\gamma +\epsilon _t, \epsilon _t=\lambda W_n\epsilon _t+\nu _t,\ t=1, 2, \ldots , 8\), where \(n=288\) and the spatial weighting matrix \(W_n\) was selected by the method in Sect. 3.

We separately employed the EL method in Sect. 2 and the NA method in Sect. 3 to obtain the confidence intervals for parameters \(\beta , \gamma , \rho , \lambda \) and \(\sigma ^2_\nu \) with confidence level 0.95, which were shown in Table 9.

Table 9 Analysis results for the average price of commercial housing data (with ALs shown in brackets)

Table 9 shows that the estimator of the spatial parameter is \(\lambda = 0.3743\), and 0 is not in its confidence interval, which implies that there exists a spatial relationship among the disturbances. The results also show that the lengths of the EL based intervals are uniformly shorter than those of the NA based intervals, which implies that the EL based method performs better than the NA based method for the real data.