A non-parametric spatial independence test using symbolic entropy

https://doi.org/10.1016/j.regsciurbeco.2009.11.003Get rights and content

Abstract

In the present paper, we construct a new, simple, consistent and powerful test for spatial independence, called the SG test, by using the new concept of symbolic entropy as a measure of spatial dependence. The standard asymptotic distribution of the test is an affine transformation of the symbolic entropy under the null hypothesis. The test statistic, with the proposed symbolization procedure, and its standard limit distribution have appealing theoretical properties that guarantee the general applicability of the test. An important aspect is that the test does not require specification of the W matrix and is free of a priori assumptions. We include a Monte Carlo study of our test, in comparison with the well-known Moran's I, the SBDS (de Graaff et al., 2001) and τ test (Brett and Pinkse, 1997) that are two non-parametric tests, to better appreciate the properties and the behaviour of the new test. Apart from being competitive compared to other tests, results underline the outstanding power of the new test for non-linear dependent spatial processes.

Introduction

Dependence is one of the most outstanding characteristics of spatial data. Gould (1970, pp 443–444) asks ‘Why should we expect independence in spatial observations?’ and his answer is simple: ‘I cannot imagine’. Tobler (1970, p.237) goes a step further when he refers to ‘the first law of geography: everything is related to everything else’. Along the same lines, Anselin (1988, p.12) proclaims, ‘The essence of regional science (...) is that location and distance matter, and result in a variety of interdependencies in space–time’ and Paelinck and Klaassen (1979, p.5) state that, ‘(...) it is good to start out in spatial econometric modelling with an interdependent model’. To sum up, there is a strong consensus about the importance of this question (Getis, 2007), which already forms a routine part of any spatial econometric application.

The first problem in this discussion is to detect when the hypothesis of independence is not admissible, for which it will be necessary to use some of the tests proposed in the literature. There is a wide variety of proposals and we can distinguish up to five categories. (1) The traditional approach based on the space–time interaction coefficient of Knox (1964), of which we can consider the tests of Moran, 1950, Geary, 1954, Dacey, 1965, among others, as particular cases. (2) Anselin's (1988) text fully introduces the maximum-likelihood methodology into this field, along with a new generation of more specific and flexible tests (Anselin et al., 1996, Anselin and Bera, 1998, Anselin, 2001, Leung et al., 2003, Baltagi et al., 2003). (3) Kelejian and Robinson (1993) propose using instrumental variables in connection with spatial models, which leads to a new battery of tests of spatial dependence based, directly or indirectly, on the GMM principle (Anselin and Kelejian, 1997, Kelejian and Prucha, 1999, Conley, 1999, Saavedra, 2003, Kelejian and Prucha, 2004, Kelejian and Prucha, 2007, Fingleton, 2008). (4) The KR test of Kelejian and Robinson (1992) and the Lagrange Multiplier, which Anselin and Moreno (2003) derive for the error components model of Kelejian and Robinson (1995). (5) The last type of tests incorporated into the analysis of spatial data are the non-parametric tests like the SBDS test (de Graaff et al., 2001) and the τ test (Brett and Pinkse, 1997, Pinkse et al., 2002).

In this paper, we first present a general theoretical framework for symbolic analysis applied to spatial structures. The basic idea is to divide the phase space into a finite number of regions and label each region with a number or letter or whatever other symbol. To this end, the concept of a symbolization map is provided. This setting establishes the basis for encoding any spatial data set into symbols. In general, these symbols have the mission of capturing potential dynamic structures. Data structure is interpreted in the light of the Information Theory via the new concept of symbolic entropy (which naturally is based on the well-known concept of entropy).1 As far as we know, this is the first time that the approach has been used in a context of spatial data. As an outcome, we provide the asymptotic distribution of an affine transformation of the symbolic entropy measure.

Based on the above-mentioned asymptotic distribution, we propose a new test for spatial dependence, called SG, which uses symbolic entropy as a measure of cross-sectional dependence. The new test relies on a symbolization process appropriate for dealing either with linear or non-linear spatial dependence structures. It is a non-parametric test, not very demanding in terms of a priori hypotheses. Furthermore, with the symbolization proposed, it is consistent and invariant to any monotonic transformation of the series and its asymptotic distribution function is standard. If we add that it is easy to obtain and that it is competitive against other well-established tests in the literature, we think that it could play an interesting role in the toolbox of spatial data analysis.

The immediate predecessor of the SG test is the test of serial independence in a time series developed by Matilla-García and Ruiz Marín (2008). However, Matilla-García and Ruiz Marín's paper is a particular case of the present paper. For a general point of view, the present paper provides a general theoretical framework for analyzing spatial data in terms of symbols. In this regard the present paper generalizes previous symbolic approaches.2

In order to better appreciate the characteristics of the SG test, and hence, to highlight its contribution to Regional Science with respect to the other tests found in the literature, we will refer to the following aspects:

  • Dependence vs. spatial autocorrelation

  • Linearity vs. nonlinearity

  • Normality

  • The role of the weighting matrix.

The tests to which we have referred at the beginning of this section are mostly non-autocorrelation tests. For example, in the test of Moran, we test whether the covariance between the series and its spatial lag is statistically different from zero. Moreover, the maximum-likelihood tests are linked directly to a coefficient of autocorrelation. Nevertheless, non-autocorrelation is synonymous with independence only under restrictive conditions (as in Gaussian stationary random fields; see Arbia, 1989, Arbia, 2006, for a deeper discussion of the concept of spatial random fields). The SG test is, in a strict sense, a test of independence like the SBDS and τ tests, although it is more generic than them (in the SBDS test, the structure of dependences must be absolutely regular, while that of Brett and Pinkse requires strongly mixing processes). The assumption of linearity is not an essential requirement in non-parametric tests. This is an important characteristic because the tests that use the linear correlation approach are not consistent against other alternatives of non-linear dependence with zero autocorrelation like the non-linear moving average processes or the spatial ARCH (called SARCH processes by Bera and Simlai, 2004). Normality is a minor restriction but forms part of the set of hypotheses on which the tests habitually used in a spatial context are based. The assumption is of the utmost importance for the maximum-likelihood tests and is very useful in those linked to the Knox statistic. The exact distribution of Moran's test, even assuming normality, is not standard and depends on the eigenvalues of the weighting matrix (Tiefelsdorf and Boots, 1995, Tiefelsdorf, 2000, Kelejian and Prucha, 2001). Under relatively weak conditions, this distribution converges to the normal distribution (Sen, 1976). If the assumption of normality is not acceptable, and to avoid biases in the test, Cliff and Ord (1981) propose using a type of bootstrap that they call ‘randomisation’. This discussion, in the case of the three non-parametric tests (SBDS, τ and SG), is more simple. Their exact distribution is unknown although, asymptotically and whatever happens with the finite sampling distribution, we obtain standard distributions (a normal one in the case of SBDS and chi-squared with the τ and SG tests).

The absence of a natural ordering of the data is an inevitable source of problems when dealing with spatial series. The usual solution is to specify a weighting matrix using ‘a set of weights which he (the investigator) deems appropriate from prior considerations’ (Cliff and Ord, 1981, p.17). This situation is very undesirable because it implies that the test not only examines the existence or not of spatial dependence in the data, but also the adequacy of the weighting matrix itself. In fact, as Pinkse (2004) indicates, this matrix forms part of the null hypothesis. Florax and Rey (1995) demonstrate that, if the matrix is misspecified, the tests tend to lose reliability. The consequences will be more severe the more serious the misspecification (Cliff and Ord, 1981, p.168, with respect to Moran's I). The key term in this case is ‘uncertainty’, although some authors prefer to speak of ‘flexibility’, and it remains one of the fundamental problems in applied spatial econometric modeling (Griffith, 1996, Bavaud, 1997, Haining, 2003, for a discussion). In this context, we wish to underline that the SG test, the same as the SBDS, does not require the specification of a weighting matrix, unlike the other tests (in the τ test it is necessary to specify the neighbors of each point, which is equivalent to constructing the whole weighting matrix).

This paper consists of seven sections. In the second, we introduce the concepts and the basic notation that we will use in the rest of the paper. In the third section, we construct the test of independence, based on symbolic entropy, that motivates our research. The fourth section discusses the most important properties of the test. The fifth section is dedicated to the symbolization procedure of the series, with respect to which the user has a lot of flexibility. The sixth section presents the results of a Monte Carlo experiment in which we examine the behavior of the SG test together with Moran's I, the SBDS and τ tests. The paper finishes with a section of conclusions and future perspectives.

Section snippets

General framework for symbolic analysis

Spatial dependence is a crucial topic in regional and urban economics and, in general, in fields where the importance of location is central. But it is also important in local public finance, environmental and resource economics, international trade and industrial organization, among others. One can consider any variable of interest that contains geo-referenced information of any type. For example, think of housing values in each county, production's location in Europe, variation in votes cast

Spatial independence test

In this section, we construct a spatial independence test with all the machinery defined in Section 2. We also prove that, under the null of independence, an affine transformation of the symbolic entropy defined in Eq. (5) is asymptotically χ2 distributed.

Let {Xs}s  S be a spatial process and m be a fixed embedding dimension. In order to construct a test for spatial independence in {Xs}s  S, we consider the following null hypothesis:H0:{Xs}sSindependentagainst any other alternative.

Now, for a

Consistency of the SG(m)-test

In the previous section we introduced the SG(m) test for spatial independence. To obtain the statistic of Eq. (9) is relatively simple (at the end, it depends on the frequencies of the symbols) and its asymptotic distribution function is well defined under the null of independence. Now we will add the property of consistency under mild conditions and for a very wide variety of spatial dependence processes.

Consistency is a very interesting property for any statistical test (Rohatgi, 1976) given

A proposed standard symbolization procedure

In previous sections, we have provided a general setting for dealing with spatial structures by means of standard symbolic maps. As an outcome the asymptotic distribution of a test, SG, has been presented. Now we propose a particular standard symbolization map f for the spatial process {Xs}s  S. There might be several possible standard symbolization maps. Therefore, this novel framework is adaptable to the necessities of the problem at hand, and so the procedure below can be refined in

Finite sample behavior of SG(m) and comparison with other tests for independence

In this section, we examine the finite sample behavior of the SG(m) test under the standard symbolization map given in the previous section. Moreover, we have conducted a power comparison among different non-parametric tests for spatial independence, the τ test of Brett and Pinkse (1997) and the SBDS of de Graaff et al. (2001). In Cliff and Ord, 1981, Anselin and Rey, 1991, Anselin and Florax, 1995, Florax et al., 2003, Florax and de Graaff, 2004 different simulations of some of the most

Conclusions

The present paper attempts to analyze limited and noisy data using minimal hypothesis, looking specifically at the assumption of independence. More concretely, we are interested in the competence of a non-parametric approach based on entropy measures, well-established in mainstream econometrics but, as far as we know, non-existent in a spatial context. Hong and White (2005) present some tests for independence obtained by using entropy measures and provide their asymptotic distribution. The

Acknowledgements

Mariano Matilla-García was partially supported by MCI (Ministerio de Ciencia e Innovación) and FEDER (Fondo Europeo de Desarrollo Regional), grant MTM2008-03679/MTM. Manuel Ruiz Marín was partially supported by MEC (Ministerio de Educación y Ciencia), grant MTM2009-07373 and Fundación Séneca of Región de Murcia. Fernando López and Jesús Mur were partially supported by MIC, grant ECO-2009- 10534-ECON.

References (64)

  • H. Kelejian et al.

    HAC estimation in a spatial framework

    Journal of Econometrics

    (2007)
  • H. Kelejian et al.

    A suggested method of estimation for spatial interdependent models with autocorrelated errors, and an application to a county expenditure model

    Papers in Regional Science

    (1993)
  • M. King

    A point optimal test for autoregressive disturbances

    Journal of Econometrics

    (1985)
  • M. Matilla-García et al.

    A non-parametric independence test using permutation entropy

    Journal of Econometrics

    (2008)
  • J. Pinkse

    A consistent nonparametric test for serial dependence

    Journal of Econometrics

    (1998)
  • L. Saavedra

    Tests for spatial lag dependence based on method of moments estimation

    Regional Science and Urban Economics

    (2003)
  • L. Anselin

    Spatial Econometrics

    Methods and Models

    (1988)
  • L. Anselin et al.

    Spatial dependence in linear regression models with an introduction to spatial econometrics

  • L. Anselin et al.

    Small sample properties of tests for spatial dependence in regression models some further results

  • L. Anselin et al.

    Testing for spatial error autocorrelation in the presence of endogenous regressors

    International Regional Science Review

    (1997)
  • L. Anselin et al.

    Properties of tests for spatial dependence in linear regression models

    Geographical Analysis

    (1991)
  • G. Arbia

    Spatial Data Configuration in Statistical Analysis of Regional Economics and Related Problems

    (1989)
  • G. Arbia

    Spatial Econometrics

    Statistical Foundations and Applications to Regional Convergence

    (2006)
  • A.D. Barbour et al.

    Poisson Approximation

    Oxford Studies in Probability 2

    (1992)
  • F. Bavaud

    Models for spatial weights a systematic approach

    Geographical Analysis

    (1997)
  • Bera, A., Simlai, P., 2004. Testing Spatial Autoregressive Model and a Formulation of Spatial ARCH SARCH Model with...
  • C. Brett et al.

    Those taxes are all over the map! A test for spatial independence of municipal tax rates in British Columbia

    International Regional Science Review

    (1997)
  • W. Brock et al.

    Nonlinear Dynamics, Chaos and Instability Statistical Theory and Evidence

    (1991)
  • W. Brock et al.

    A test for independence based on the correlation dimension

    Econometric Review

    (1996)
  • P. Burridge

    On the Cliff–Ord test for spatial autocorrelation

    Journal of the Royal Statistical Society B

    (1980)
  • A. Cliff et al.

    Spatial Processes

    Models and Applications

    (1981)
  • M. Dacey

    A review of measures of contiguity for two and k-color maps

  • Cited by (0)

    View full text