Short NoteOn some properties of the Bray-Curtis dissimilarity and their ecological meaning
Introduction
Ecologists routinely use dissimilarity measures between pairs of plots (or species assemblages, communities, sites, quadrats, etc.) to explore community assembly processes. Given two plots U and V, a logical starting point for evaluating any other dissimilarity measure is the Euclidean distance because it corresponds to our everyday feeling about interpoint distances in the visible and easily measurable 3D (physical) world:where xUj and xVj are the abundance values of species j in plots U and V, respectively, S is the total number of species recorded in these two plots: , and SU is the set of species in plot U.
However, community ecologists have repeatedly argued that this coefficient may provide misleading results for species abundance data containing zeros (e.g. Orlóci, 1972, Orlóci, 1978, Legendre and Gallagher, 2001). As an example, let us consider an artificial community composition matrix composed of four species (S1–S4) in three plots (U–W):
If we use Euclidean distance to measure dissimilarity, we find that the distance between plots U and W, which share species S1 and S2, is larger than that between plots U and V, which have no species in common: EDUV = 3.162; EDUW = 4.472; EDVW = 7.071. This is counter-intuitive ecologically, because the plots U and W contain the same species while plot V hosts a unique set of species. That is, abundance differences completely override a more fundamental issue: agreement in presence of species. This effect may be more substantial for large data matrices in which many species may easily have just a few records leading to sparse matrices predominantly filled up with zeros (Legendre and Gallagher, 2001).
In order to eliminate the problems inherent to the Euclidean distance, ecologists have developed a rich arsenal of alternative coefficients (see Legendre and Legendre, 2012, for a review). These indices incorporate some operation involving data standardization, i.e. modification of data such that each new score depends on other values in the matrix. If the plot vectors (columns in the example above) are first standardized to unit length by dividing each value with the length of the vector according to , and then the Euclidean distance is calculated from the normalized quantities , we get the chord distance (Orlóci, 1967) given by the formula:CH is equivalent to the (Euclidean) length of the chord between two objects (plots) projected onto the surface of a hypersphere of unit radius (Orlóci, 1978, Legendre and Gallagher, 2001). Therefore, in the above example, while the Euclidean distance between U and W is 4.472, their chord distance is zero (CHUW = 0), because these plots contain the same species in the same proportions. Since plot V has no species in common with plots U and W, we get the maximum distance between them (CHUV = CHVW = ≅ 1.414). Therefore, for an ecologist this index captures information on community composition in a much more meaningful way than ED.
Another measure of multivariate plot-to-plot dissimilarity that can be calculated by first transforming the plot vectors in an appropriate way and then taking the Euclidean distance of the transformed vectors is the Hellinger distance (Legendre and Gallagher, 2001). In this case, the raw values xUj are first transformed by dividing each value by the plot sum and then taking the square root of the resulting values such that . Then, the Euclidean distance is calculated from the transformed quantities as:
Raw data may be transformed by many other ways, however. The formula suggested by Bray and Curtis (1957) implies relativization of species-wise differences by the total abundance of species in the two plots:
This index reflects the proportion of the total species abundances in which the two plots differ. For the above example, BC also outperforms ED because the maximum distance is obtained when the plots being compared have no species in common (BCUV = 1 and BCVW = 1), whereas BCUW = 0.5. This latter example suggests that, unlike CD, BC takes the value zero only if the two plots being compared are identical.
These three coefficients illustrate pretty well that, although dissimilarity may appear an intuitively simple concept, there is no single, unequivocal way for its measurement. The literature of numerical ecology treats many more, even hundreds of dissimilarity functions (see e.g., Orlóci, 1978, Podani, 2000, Legendre and Legendre, 2012) and selection among them is often arbitrary, dictated by fashion, availability in commercial software or personal preference. The choice of a dissimilarity index best suited for a specific ecological problem is a complex question which does not have clear and unambiguous answer. However, while these references provide some information for ecologists to facilitate decision, the properties of even the best known indices are not fully understood.
The aim of this paper is thus to review some of the properties of the Bray-Curtis dissimilarity relevant for ecologists. The paper is organized as follows: first, we discuss the relationships of the Bray-Curtis dissimilarity with the Canberra dissimilarity family (sensu Podani, 2000). Next, we show the ability of the Bray-Curtis dissimilarity to conform to a generalization of Dalton’s (1920) principle of transfers to a pair of plots.
Section snippets
An unconventional genealogy of the Bray-Curtis dissimilarity
The Euclidean distance is a special case of a more general parametric family of dissimilarity functions called Minkowski distance:where α ≥ 1. For α = 2, we have the Euclidean distance. For α = 1, we obtain the so-called city-block (or Manhattan) distance, which is the sum of absolute differences in species abundances:
An advantage of this formula over EU is that species-wise differences are not exaggerated by squaring (Orlóci, 1972). Division by the number of
A modified principle of transfers for a pair of plots
In the previous section we showed that the Bray-Curtis dissimilarity is sensitive to differences in abundance between species, and that abundant species are weighted more than rare species. The aim of this section is now to analyze how BC is influenced by differences in species abundances between plots.
The question whether a given index is a suitable measure of dissimilarity is usually answered axiomatically by assessing whether the index meets some properties that are intuitively considered to
Discussion
Ecologists have proposed an extensive arsenal of coefficients for summarizing different aspects of plot-to-plot dissimilarity. In this view, the behavior of such measures must be understood to assess whether these measures allow useful biological distinctions between a pair of plots. In this paper we thus reviewed some of the properties of the Bray-Curtis dissimilarity that may be relevant in the context of ecology.
We started from the suggestion that the BC index is additively decomposable into
Acknowledgments
We wish to thank Paulo Inácio Prado, Dave Roberts and one anonymous reviewer for their very constructive comments on a previous version of our paper.
References (24)
- et al.
An ordination of the upland forest communities of southern Wisconsin
Ecol. Monogr.
(1957) - et al.
An analysis of the taxonomist’s judgement of affinity
Proc. Zoolog. Soc. Lond.
(1958) Non-parametric multivariate analyses of changes in community structure
Aust. J. Ecol.
(1993)Measurement of the inequality of incomes
Econ. J.
(1920)- et al.
Compositional dissimilarity as a robust measure of ecological distance
Vegetatio
(1987) - et al.
A similarity measure sensitive to the contribution of rare species and its use in investigation of variation in marine benthic communities
Oecologia
(1976) Notes on the Marczewski-Steinhaus coefficient of similarity
- et al.
Mixed data classificatory programs. I. Agglomerative systems
Aust. Comput. J.
(1967) - et al.
Ecologically meaningful transformations for ordination of species data
Oecologia
(2001) - et al.
Numerical Ecology
(2012)
Interpreting the replacement and richness difference components of beta diversity
Global Ecol. Biogeogr.
An ordination of phytoplankton populations in ponds of varying similarity and temperature
Ecology
Cited by (126)
Effect of cover crop on soil fertility and bacterial diversity in a banana plantation in southwestern China
2024, Soil and Tillage ResearchDeveloping a new testate amoeba hydrological transfer function for permafrost peatlands of NW Siberia
2023, Quaternary Science Reviews