Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM)
Introduction
One of the major current questions in ecology concerns the identification and explanation of the spatial variability of ecological structures (Cormack and Ord, 1979, Smith, 2002). Space can be considered either as a factor responsible for ecological structures, or as a confounding variable leading to bias when analyzing a process of particular interest. This realization leads ecologists to introduce space as either a predictor or a covariable in statistical models. These two approaches have been used in various contexts such as the analysis of patterns of species richness (Blackburn and Gaston, 1996b, Selmi and Boulinier, 2001), species range sizes (Blackburn and Gaston, 1996a), species associations (Roxburgh and Matsuki, 1999), metacommunity analysis (Olden et al., 2001), population (Pettorelli et al., 2003) and community ecology (Borcard et al., 1992, Wagner, 2003, Borcard et al., 2004, Peres-Neto, 2004, Wagner, 2004).
Spatial structures observed in ecological communities can arise from two independent processes (Legendre, 1993, Legendre and Legendre, 1998, Section 1.1, Fortin and Dale, 2005, Chapters 1 and 5). Environmental factors that influence species distributions are usually spatially structured and then, through an indirect process, communities of species are also spatially structured; this process is called induced spatial dependence. Spatial autocorrelation can also be created directly at the community level as a result of contagious biotic processes such as growth, differential mortality, seed dispersal, or competition dynamics. In most situations, the spatial heterogeneity of communities is due to the simultaneous action of these two processes. Variation partitioning (Borcard et al., 1992, Borcard and Legendre, 1994, Méot et al., 1998) can be used to assess the importance of these two sources of spatial structure.
Incorporating spatial variation in ecological models requires tools to explicitly describe spatial relationships as predictors or covariables. Sokal (1979) used various functions of geographic distances among sites in Mantel tests to account for autocorrelation due to isolation by distance in population genetics models. Polynomial functions of the geographic coordinates can also be used as regressors to generate trend surfaces (Student [Gosset] 1914, Gittins, 1968). These spatial base functions have been used to model spatial relationships (often called “space” for short in scientific papers) in multivariate analyses such as canonical correlation analysis (Gittins, 1985, Pélissier et al., 2002, Gimaret-Carpentier et al., 2003), canonical correspondence analysis (CCA, Borcard et al., 1992, Borcard and Legendre, 1994, Méot et al., 1998) or redundancy analysis (RDA, Legendre, 1993). However, the use of trend surfaces is only satisfactory when the sampling area is roughly homogeneous, the sampling design is nearly regular, the number of spatial locations is “reasonable” (Norcliffe, 1969, Scarlett, 1972), and the spatial structure to be modelled is rather simple, such as a gradient, a single wave, or a saddle (Legendre and Legendre, 1998, Section 13.2). Moreover, the use of a trend surfaces introduces an arbitrary choice for the degree of the polynomial function. For instance, Wartenberg (1985a) used a second-degree polynomial while Borcard et al. (1992) used a polynomial of degree 3. In any case, polynomial trend surfaces of these degrees only allow the modelling of broad-scale spatial structures. Another problem concerns the correlations between these spatial predictors, which can be addressed by using an orthogonalization procedure in order to obtain orthogonal polynomials, but the higher-degree terms may be difficult to interpret in the case of surfaces.
Recently, a new approach called principal coordinates of neighbour matrices (PCNM) has been proposed as an alternative to trend surface analysis (Borcard and Legendre, 2002). This method has already been used with success in several ecological applications (Borcard et al., 2004, Brind’Amour et al., 2005, Legendre et al., 2005). PCNM base functions are obtained by a principal coordinate analysis (PCoA, Gower, 1966) of a truncated pairwise geographic distance matrix between sampling sites. Eigenvectors associated with the positive eigenvalues and corresponding to the Euclidean representation of the truncated distance matrix are used as spatial predictors in multivariate regression or canonical analysis (e.g., RDA, CCA). Even though this approach produces interesting and ecologically interpretable results (e.g., Borcard et al., 2004), it suffers from a lack of mathematical formalism. Indeed, these authors stated in the original description of the methods that: “This paper raises a number of mathematical questions […] We hope that the paper will attract the interest of mathematicians who can help us understand these properties and develop methods of spatial modeling further” (Borcard and Legendre, 2002, p. 67).
In the present paper, we investigate the mathematical foundations of PCNM analysis and show that this approach is closely related to spatial autocorrelation structure functions. Using these theoretical properties, we develop improvements and extensions of the original approach. We hope this paper will help ecologists use the full potential of PCNM analysis for ecological applications and perceive the method as an extremely flexible and robust technique for the analysis of spatial problems.
Section snippets
The original PCNM approach
Generation of PCNM base functions is quite straightforward, requiring the following three main steps (Borcard and Legendre, 2002):
- (1)
Compute a pairwise Euclidean (geographic) distance matrix between the n sampling locations (D = [dij]).
- (2)
Choose a threshold value t and construct a truncated distance matrix using the following rule:
- (3)
Perform principal coordinate analysis (PCoA) of the truncated distance matrix D*. This analysis consists in the diagonalization of Δ where:
Distances, similarities, and spatial weighting matrices
PCoA is usually computed on a distance matrix but Gower (1966) showed that this analysis can also be computed from a similarity matrix (also shown in Legendre and Legendre, 1998, p. 431). For instance, consider the similarity matrix S derived from a distance matrix D:
These similarities vary between 0 (for dij = max(dij)) and 1 (for dij = 0). It is easy to show that a PCoA performed on the distance matrix D is equivalent to the diagonalization of Δ
Moran's eigenvector maps (MEM)
In this section, we consider the n-by-1 vector x = [x1 … xn]t containing measurements of a quantitative variable of interest at n sites and a n-by-n symmetric spatial weighting matrix W. The usual formulation for Moran's index of spatial autocorrelation (Moran, 1948, Cliff and Ord, 1973) is:
The values wij are weights from matrix W. In Moran's I autocorrelation analysis, we usually make wij = 1 for sites i and j that are
Notes on the original PCNM approach
We have shown that the PCNM approach is closely related to Moran's index of spatial autocorrelation. This observation provides elements that will help improve the original method proposed by Borcard and Legendre (2002). As shown in the previous section, negative eigenvalues correspond to negative autocorrelation and their associated eigenvectors can be used to describe local structures. These structures can be produced by biotic processes such as species territoriality and competition.
Choice of a spatial weighting matrix
Our interpretation of PCNM base functions as a particular case of MEM generalizes the original approach because “the use of a generalised weighting matrix […] allows the investigator to choose a set of weights which he deems appropriate from prior considerations. This allows great flexibility” (Cliff and Ord, 1973, p. 12). Indeed, the spatial weighting matrix can be defined in different ways according to particular ecological hypotheses of interest and their spatial interactions (Sokal, 1979).
Ecological illustration
Here we illustrate the use of MEM and the data-driven process for selection of the spatial weighting matrix with a real data set. We re-examine data concerning the distribution of oribatid mites in the peat blanket of a bog lake. This data set has been used to illustrate the variation partitioning method with space modelled as a third order polynomial of geographic coordinates (Borcard et al., 1992, Borcard and Legendre, 1994) as well as the original PCNM approach (Borcard and Legendre, 2002,
Relationships with other eigenvector-based approaches
The new interpretation of the PCNM approach provided in this paper highlights relationships with other existing approaches. For instance, if all sites are connected (i.e., ∀i, j [bij] = 1) and A = [1−(dij/max(dij))], the approach is equivalent to a PCoA based on , proposed by Critchley (1978) as an alternative to multi-dimensional scaling. In the context of spatial analyses, Méot et al. (1993) diagonalized Dw − W where Dw = Diag(pi) is a diagonal matrix containing the row sums of W
MEM and spatial modelling
Autocorrelation is often related to a statistical problem because it introduces biases in standard statistical inference methods. Because the value observed at one site is influenced by the values at neighbouring sites, these values are not independent of one another. Since individual observations convey information about their neighbours, the number of degrees of freedom for a given set of observations may be reduced. That is the reason why, in the presence of positive autocorrelation,
Future directions
This paper provides new insights on the original formulation of the PCNM method, and introduces it in the framework of Moran's eigenvector maps. This formalism extends the original PCNM approach by allowing various definitions of spatial weighting matrices and other aspects related to this definition, as well as making it possible to consider negative spatial autocorrelation. Some questions remain to be solved, however. The first one concerns the choice of the eigenvectors to be introduced as
Supplement
An R package “spacemakeR” containing functions to perform the analyses presented in the paper is available online. It includes a detailed documentation indicating how to create and manage spatial weighting matrices, compute their Moran's eigenvectors, and use the model selection procedure.
Acknowledgements
We would like to thank Daniel Borcard and the two reviewers for their comments on our manuscript. This research was supported by NSERC grant OGP0007738 to P. Legendre.
References (81)
- et al.
Monte Carlo estimates of the log determinant of large sparse matrices
Linear Algebra Applications
(1999) - et al.
All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices
Ecological Modelling
(2002) Eigenfunction properties and approximations of selected incidence matrices employed in spatial analyses
Linear Algebra Appl.
(2000)A spatial filtering specification for the auto-Poisson model
Stat. Prob. Lett.
(2002)- et al.
On the quality of likelihood-based estimators in spatial autoregressive models when the data dependence structure is misspecified
J. Stat. Planning Inference
(1998) - et al.
Fast maximum likelihood estimation of very large spatial autoregression models. A characteristic polynomial approach
Comput. Stat. Data Anal.
(2001) A close look at the spatial structure implied by the CAR and SAR models
J. Stat. Planning Inference
(2004)- Aubry, P., 2000. Le traitement des variables régionalisées en écologie. Apports de la géomatique et de la...
Models for spatial weights: A systematic look
Geogr. Anal.
(1998)- et al.
An annotated bibliography of canonical correspondence analysis and related constrained ordination methods 1986–1993
Abstr. Bot.
(1996)
A Monte Carlo study of correlation coefficient estimation with spatially autocorrelated observations
Quaest. Geogr.
Spatial patterns in the geographic range sizes of bird species in the New World
Phil. Trans. Roy. Soc. Lond. Ser. B – Biol.
Spatial patterns in the species richness of birds in the New World
Phil. Trans. Roy. Soc. Lond. Ser. B – Biol.
Environmental control and spatial structure in ecological communities: an example using oribatid mites (Acari, Oribatei)
Environmental and Ecological Statistics
Dissecting the spatial structure of ecological data at multiple scales
Ecology
Partialling out the spatial component of ecological variation
Ecology
Multiscale spatial distribution of a littoral fish community in relation to environmental variables
Limnol. Oceanogr.
Spectral graph theory.
Spatial autocorrelation
Spatial processes
Assessing the significance of the correlation between two spatial processes
Biometrics
Estimating the residual error variance in orthogonal regression with variable selection
The Statistician
Spatial and temporal analysis in ecology
Multidimensional scaling: a short critique and a new method
On extreme values of Moran's I and Geary's c
Geogr. Anal.
Modifying the t-test for assessing the correlation between two spatial processes
Biometrics
Spatial analysis: a guide for ecologists
The problem of underestimating the residual error variance in forward stepwise regression
The Statistician
The contiguity ratio and statistical mapping
The incorporated Statistician
Constructing the spatial weights matrix using a local statistic
Geogr. Anal.
Comparative spatial filtering in regression analysis
Geographical Analysis
Broad-scale biodiversity pattern of the endemic tree flora of the Western Ghats (India) using canonical correlation analysis of herbarium records
Ecography
Trend-surface analysis of ecological data
J. Ecol.
Canonical Analysis, A Review with Applications in Ecology
Information-theoretic approach for selection of spatial and temporal models of community organization
Mar. Ecol.–Prog. Ser.
Some distance properties of latent root and vector methods used in multivariate analysis
Biometrika
Some guidelines for specifying the geographic weights matrix contained in spatial statistical models
Spatial autocorrelation and eigenfunctions of the geographic weights matrix accompanying geo-referenced data
Can. Geogr.
A linear regression solution to the spatial autocorrelation problem
J. Geogr. Syst.
Spatial autocorrelation and spatial filtering: gaining understanding through theory and scientific visualization.
Cited by (1477)
Practical methods for the control of tor-grass (Brachypodium pinnatum s.l.) and the restoration of calcareous grassland
2024, Journal for Nature ConservationHeterogeneous dispersal networks to improve biodiversity science
2024, Trends in Ecology and EvolutionMonospecific mangrove reforestation changes relationship between benthic mollusc diversity and biomass: Implication for coastal wetland management
2024, Journal of Environmental ManagementHarvest block aggregation as a driver of intensive moose browsing pressure on hardwood regeneration in a temperate forest
2024, Forest Ecology and Management