Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM)

https://doi.org/10.1016/j.ecolmodel.2006.02.015Get rights and content

Abstract

Spatial structures of ecological communities may originate either from the dependence of community structure on environmental variables or/and from community-based processes. In order to assess the importance of these two sources, spatial relationships must be explicitly introduced into statistical models. Recently, a new approach called principal coordinates of neighbour matrices (PCNM) has been proposed to create spatial predictors that can be easily incorporated into regression or canonical analysis models, providing a flexible tool especially when contrasted to the family of autoregressive models and trend surface analysis, which are of common use in ecological and geographical analysis. In this paper, we explore the theory of the PCNM approach and demonstrate how it is linked to spatial autocorrelation structure functions. The method basically consists of diagonalizing a spatial weighting matrix, then extracting the eigenvectors that maximize the Moran's index of autocorrelation. These eigenvectors can then be used directly as explanatory variables in regression or canonical models. We propose improvements and extensions of the original method, and illustrate them with examples that will help ecologists choose the variant that will better suit their needs.

Introduction

One of the major current questions in ecology concerns the identification and explanation of the spatial variability of ecological structures (Cormack and Ord, 1979, Smith, 2002). Space can be considered either as a factor responsible for ecological structures, or as a confounding variable leading to bias when analyzing a process of particular interest. This realization leads ecologists to introduce space as either a predictor or a covariable in statistical models. These two approaches have been used in various contexts such as the analysis of patterns of species richness (Blackburn and Gaston, 1996b, Selmi and Boulinier, 2001), species range sizes (Blackburn and Gaston, 1996a), species associations (Roxburgh and Matsuki, 1999), metacommunity analysis (Olden et al., 2001), population (Pettorelli et al., 2003) and community ecology (Borcard et al., 1992, Wagner, 2003, Borcard et al., 2004, Peres-Neto, 2004, Wagner, 2004).

Spatial structures observed in ecological communities can arise from two independent processes (Legendre, 1993, Legendre and Legendre, 1998, Section 1.1, Fortin and Dale, 2005, Chapters 1 and 5). Environmental factors that influence species distributions are usually spatially structured and then, through an indirect process, communities of species are also spatially structured; this process is called induced spatial dependence. Spatial autocorrelation can also be created directly at the community level as a result of contagious biotic processes such as growth, differential mortality, seed dispersal, or competition dynamics. In most situations, the spatial heterogeneity of communities is due to the simultaneous action of these two processes. Variation partitioning (Borcard et al., 1992, Borcard and Legendre, 1994, Méot et al., 1998) can be used to assess the importance of these two sources of spatial structure.

Incorporating spatial variation in ecological models requires tools to explicitly describe spatial relationships as predictors or covariables. Sokal (1979) used various functions of geographic distances among sites in Mantel tests to account for autocorrelation due to isolation by distance in population genetics models. Polynomial functions of the geographic coordinates can also be used as regressors to generate trend surfaces (Student [Gosset] 1914, Gittins, 1968). These spatial base functions have been used to model spatial relationships (often called “space” for short in scientific papers) in multivariate analyses such as canonical correlation analysis (Gittins, 1985, Pélissier et al., 2002, Gimaret-Carpentier et al., 2003), canonical correspondence analysis (CCA, Borcard et al., 1992, Borcard and Legendre, 1994, Méot et al., 1998) or redundancy analysis (RDA, Legendre, 1993). However, the use of trend surfaces is only satisfactory when the sampling area is roughly homogeneous, the sampling design is nearly regular, the number of spatial locations is “reasonable” (Norcliffe, 1969, Scarlett, 1972), and the spatial structure to be modelled is rather simple, such as a gradient, a single wave, or a saddle (Legendre and Legendre, 1998, Section 13.2). Moreover, the use of a trend surfaces introduces an arbitrary choice for the degree of the polynomial function. For instance, Wartenberg (1985a) used a second-degree polynomial while Borcard et al. (1992) used a polynomial of degree 3. In any case, polynomial trend surfaces of these degrees only allow the modelling of broad-scale spatial structures. Another problem concerns the correlations between these spatial predictors, which can be addressed by using an orthogonalization procedure in order to obtain orthogonal polynomials, but the higher-degree terms may be difficult to interpret in the case of surfaces.

Recently, a new approach called principal coordinates of neighbour matrices (PCNM) has been proposed as an alternative to trend surface analysis (Borcard and Legendre, 2002). This method has already been used with success in several ecological applications (Borcard et al., 2004, Brind’Amour et al., 2005, Legendre et al., 2005). PCNM base functions are obtained by a principal coordinate analysis (PCoA, Gower, 1966) of a truncated pairwise geographic distance matrix between sampling sites. Eigenvectors associated with the positive eigenvalues and corresponding to the Euclidean representation of the truncated distance matrix are used as spatial predictors in multivariate regression or canonical analysis (e.g., RDA, CCA). Even though this approach produces interesting and ecologically interpretable results (e.g., Borcard et al., 2004), it suffers from a lack of mathematical formalism. Indeed, these authors stated in the original description of the methods that: “This paper raises a number of mathematical questions […] We hope that the paper will attract the interest of mathematicians who can help us understand these properties and develop methods of spatial modeling further” (Borcard and Legendre, 2002, p. 67).

In the present paper, we investigate the mathematical foundations of PCNM analysis and show that this approach is closely related to spatial autocorrelation structure functions. Using these theoretical properties, we develop improvements and extensions of the original approach. We hope this paper will help ecologists use the full potential of PCNM analysis for ecological applications and perceive the method as an extremely flexible and robust technique for the analysis of spatial problems.

Section snippets

The original PCNM approach

Generation of PCNM base functions is quite straightforward, requiring the following three main steps (Borcard and Legendre, 2002):

  • (1)

    Compute a pairwise Euclidean (geographic) distance matrix between the n sampling locations (D = [dij]).

  • (2)

    Choose a threshold value t and construct a truncated distance matrix using the following rule:D*=dijifdijt4tifdij>t

  • (3)

    Perform principal coordinate analysis (PCoA) of the truncated distance matrix D*. This analysis consists in the diagonalization of Δ where:Δ=12(dij*2

Distances, similarities, and spatial weighting matrices

PCoA is usually computed on a distance matrix but Gower (1966) showed that this analysis can also be computed from a similarity matrix (also shown in Legendre and Legendre, 1998, p. 431). For instance, consider the similarity matrix S derived from a distance matrix D:S=[sij]=1dijmax(dij)2=11tD2max(dij)2withD2=[dij2]

These similarities vary between 0 (for dij = max(dij)) and 1 (for dij = 0). It is easy to show that a PCoA performed on the distance matrix D is equivalent to the diagonalization of Δ

Moran's eigenvector maps (MEM)

In this section, we consider the n-by-1 vector x = [x1  xn]t containing measurements of a quantitative variable of interest at n sites and a n-by-n symmetric spatial weighting matrix W. The usual formulation for Moran's index of spatial autocorrelation (Moran, 1948, Cliff and Ord, 1973) is:I(x)n(2)wij(xix¯)(xjx¯)(2)wiji=1n(xix¯)2where(2)=i=1nj=1nwithij

The values wij are weights from matrix W. In Moran's I autocorrelation analysis, we usually make wij = 1 for sites i and j that are

Notes on the original PCNM approach

We have shown that the PCNM approach is closely related to Moran's index of spatial autocorrelation. This observation provides elements that will help improve the original method proposed by Borcard and Legendre (2002). As shown in the previous section, negative eigenvalues correspond to negative autocorrelation and their associated eigenvectors can be used to describe local structures. These structures can be produced by biotic processes such as species territoriality and competition.

Choice of a spatial weighting matrix

Our interpretation of PCNM base functions as a particular case of MEM generalizes the original approach because “the use of a generalised weighting matrix […] allows the investigator to choose a set of weights which he deems appropriate from prior considerations. This allows great flexibility” (Cliff and Ord, 1973, p. 12). Indeed, the spatial weighting matrix can be defined in different ways according to particular ecological hypotheses of interest and their spatial interactions (Sokal, 1979).

Ecological illustration

Here we illustrate the use of MEM and the data-driven process for selection of the spatial weighting matrix with a real data set. We re-examine data concerning the distribution of oribatid mites in the peat blanket of a bog lake. This data set has been used to illustrate the variation partitioning method with space modelled as a third order polynomial of geographic coordinates (Borcard et al., 1992, Borcard and Legendre, 1994) as well as the original PCNM approach (Borcard and Legendre, 2002,

Relationships with other eigenvector-based approaches

The new interpretation of the PCNM approach provided in this paper highlights relationships with other existing approaches. For instance, if all sites are connected (i.e., ∀i, j [bij] = 1) and A = [1−(dij/max(dij))], the approach is equivalent to a PCoA based on dij, proposed by Critchley (1978) as an alternative to multi-dimensional scaling. In the context of spatial analyses, Méot et al. (1993) diagonalized Dw  W where Dw = Diag(pi) is a diagonal matrix containing the row sums of W pi=j=1nWij=i=1nW

MEM and spatial modelling

Autocorrelation is often related to a statistical problem because it introduces biases in standard statistical inference methods. Because the value observed at one site is influenced by the values at neighbouring sites, these values are not independent of one another. Since individual observations convey information about their neighbours, the number of degrees of freedom for a given set of observations may be reduced. That is the reason why, in the presence of positive autocorrelation,

Future directions

This paper provides new insights on the original formulation of the PCNM method, and introduces it in the framework of Moran's eigenvector maps. This formalism extends the original PCNM approach by allowing various definitions of spatial weighting matrices and other aspects related to this definition, as well as making it possible to consider negative spatial autocorrelation. Some questions remain to be solved, however. The first one concerns the choice of the eigenvectors to be introduced as

Supplement

An R package “spacemakeR” containing functions to perform the analyses presented in the paper is available online. It includes a detailed documentation indicating how to create and manage spatial weighting matrices, compute their Moran's eigenvectors, and use the model selection procedure.

Acknowledgements

We would like to thank Daniel Borcard and the two reviewers for their comments on our manuscript. This research was supported by NSERC grant OGP0007738 to P. Legendre.

References (81)

  • R. Bivand

    A Monte Carlo study of correlation coefficient estimation with spatially autocorrelated observations

    Quaest. Geogr.

    (1980)
  • T.M. Blackburn et al.

    Spatial patterns in the geographic range sizes of bird species in the New World

    Phil. Trans. Roy. Soc. Lond. Ser. B – Biol.

    (1996)
  • T.M. Blackburn et al.

    Spatial patterns in the species richness of birds in the New World

    Phil. Trans. Roy. Soc. Lond. Ser. B – Biol.

    (1996)
  • D. Borcard et al.

    Environmental control and spatial structure in ecological communities: an example using oribatid mites (Acari, Oribatei)

    Environmental and Ecological Statistics

    (1994)
  • D. Borcard et al.

    Dissecting the spatial structure of ecological data at multiple scales

    Ecology

    (2004)
  • D. Borcard et al.

    Partialling out the spatial component of ecological variation

    Ecology

    (1992)
  • A. Brind’Amour et al.

    Multiscale spatial distribution of a littoral fish community in relation to environmental variables

    Limnol. Oceanogr.

    (2005)
  • F.K.R. Chung

    Spectral graph theory.

    (1997)
  • A.D. Cliff et al.

    Spatial autocorrelation

    (1973)
  • A.D. Cliff et al.

    Spatial processes

    (1981)
  • P. Clifford et al.

    Assessing the significance of the correlation between two spatial processes

    Biometrics

    (1989)
  • J.B. Copas et al.

    Estimating the residual error variance in orthogonal regression with variable selection

    The Statistician

    (1991)
  • R.M. Cormack et al.

    Spatial and temporal analysis in ecology

    (1979)
  • F. Critchley

    Multidimensional scaling: a short critique and a new method

  • P. de Jong et al.

    On extreme values of Moran's I and Geary's c

    Geogr. Anal.

    (1984)
  • P. Dutilleul

    Modifying the t-test for assessing the correlation between two spatial processes

    Biometrics

    (1993)
  • M.-J. Fortin et al.

    Spatial analysis: a guide for ecologists

    (2005)
  • L.S. Freedman et al.

    The problem of underestimating the residual error variance in forward stepwise regression

    The Statistician

    (1992)
  • R.C. Geary

    The contiguity ratio and statistical mapping

    The incorporated Statistician

    (1954)
  • A. Getis et al.

    Constructing the spatial weights matrix using a local statistic

    Geogr. Anal.

    (2004)
  • A. Getis et al.

    Comparative spatial filtering in regression analysis

    Geographical Analysis

    (2002)
  • C. Gimaret-Carpentier et al.

    Broad-scale biodiversity pattern of the endemic tree flora of the Western Ghats (India) using canonical correlation analysis of herbarium records

    Ecography

    (2003)
  • R. Gittins

    Trend-surface analysis of ecological data

    J. Ecol.

    (1968)
  • R. Gittins

    Canonical Analysis, A Review with Applications in Ecology

    (1985)
  • E. Godinez-Dominguez et al.

    Information-theoretic approach for selection of spatial and temporal models of community organization

    Mar. Ecol.–Prog. Ser.

    (2003)
  • J.C. Gower

    Some distance properties of latent root and vector methods used in multivariate analysis

    Biometrika

    (1966)
  • D.A. Griffith

    Some guidelines for specifying the geographic weights matrix contained in spatial statistical models

  • D.A. Griffith

    Spatial autocorrelation and eigenfunctions of the geographic weights matrix accompanying geo-referenced data

    Can. Geogr.

    (1996)
  • D.A. Griffith

    A linear regression solution to the spatial autocorrelation problem

    J. Geogr. Syst.

    (2000)
  • D.A. Griffith

    Spatial autocorrelation and spatial filtering: gaining understanding through theory and scientific visualization.

    (2003)
  • Cited by (1477)

    View all citing articles on Scopus
    View full text