Skip to main content
Advertisement
  • Loading metrics

Natural Constraints to Species Diversification

Abstract

Identifying modes of species diversification is fundamental to our understanding of how biodiversity changes over evolutionary time. Diversification modes are captured in species phylogenies, but characterizing the landscape of diversification has been limited by the analytical tools available for directly comparing phylogenetic trees of groups of organisms. Here, we use a novel, non-parametric approach and 214 family-level phylogenies of vertebrates representing over 500 million years of evolution to identify major diversification modes, to characterize phylogenetic space, and to evaluate the bounds and central tendencies of species diversification. We identify five principal patterns of diversification to which all vertebrate families hold. These patterns, mapped onto multidimensional space, constitute a phylogenetic space with distinct properties. Firstly, phylogenetic space occupies only a portion of all possible tree space, showing family-level phylogenies to be constrained to a limited range of diversification patterns. Secondly, the geometry of phylogenetic space is delimited by quantifiable trade-offs in tree size and the heterogeneity and stem-to-tip distribution of branching events. These trade-offs are indicative of the instability of certain diversification patterns and effectively bound speciation rates (for successful clades) within upper and lower limits. Finally, both the constrained range and geometry of phylogenetic space are established by the differential effects of macroevolutionary processes on patterns of diversification. Given these properties, we show that the average path through phylogenetic space over evolutionary time traverses several diversification stages, each of which is defined by a different principal pattern of diversification and directed by a different macroevolutionary process. The identification of universal patterns and natural constraints to diversification provides a foundation for understanding the deep-time evolution of biodiversity.

Author Summary

Are there universal laws in the evolution of biodiversity? Why do some clades go extinct and others flourish? These questions are fundamental to our understanding of present-day biodiversity. In a meta-analysis of nearly 12,000 species spanning ~500 million years of evolution, we find that there are five principal patterns of diversification to which all vertebrate families hold, and that these patterns can be mapped into a multidimensional phylogenetic space. Importantly, because certain diversification patterns invariably lead to extinction, clades do not explore all possible phylogenetic space, and thus the evolution of biodiversity is constrained by a set of loose but inviolable rules. We characterize the biotic and abiotic factors precipitating those rules with important implications for our knowledge of the emergence and maintenance of the diversity of life around us.

Introduction

The radiations and extinctions of species are recorded in the patterns of the tree of life [1,2]. Those patterns are fundamentally important to testing classical evolutionary hypotheses (e.g., the Red Queen [3] and adaptive radiation [4]), for understanding species distributions across clades [5] and regions [6], and, more generally, for piecing together how life evolves on Earth. Accordingly, the development of models for understanding how diversification unfolds using phylogenies of extant taxa has exploded in the last decade [7], and several meta-analyses have attempted to identify general principles of diversification [812].

Despite these recent and rapid developments, we do not yet have a coherent, panoramic view of the patterns of diversification across the tree of life. There is a general consensus that many phylogenies have a signal of slowdown in speciation rates through time, although this consensus is not universally accepted, and the processes behind the general pattern remain equivocal [813]. There is also broad agreement that diversification rates vary considerably across lineages [5,14], although approaches for detecting where diversification shifts happen [5,15] and what determines these shifts continue to be deliberated and debated [16]. Although model-based approaches have been crucial for testing specific hypotheses, a thorough description of the diversification patterns seen in empirical phylogenetic trees requires approaches free of any a priori assumption of how those trees behave.

We developed a novel, model-free framework rooted in spectral graph theory that allows for the direct comparison of phylogenetic trees across various groups of organisms (see Materials and Methods) [17]. In essence, we summarize the information contained in tree shape by a distribution—the spectral density—representing the spectra of eigenvalues of the modified graph Laplacian (MGL), which is a matrix constructed from the evolutionary distances between phylogenetic nodes specially designed to retain maximal information from the tree. This allows us to measure the similarities and dissimilarities between trees from different species clades by calculating the dissimilarity between their respective spectral densities. Phylogenetic trees can then be clustered according to how similar their shapes are. In addition, spectral density profiles can be used to compute summary statistics. In particular, the ln-transformed principal eigenvalue (λ*), skewness (ψ), and peak height (η) are indicative of phylogenetic expansion, stem-to-tip distribution of branching events, and heterogeneity of branch lengths, respectively [17]. Phylogenetic expansion λ* is highly correlated with measures of phylogenetic diversity and (less so) to species richness; the stem-to-tip distribution ψ is loosely correlated with the classical gamma summary statistic; and the heterogeneity of branch lengths as measured by η is not correlated with any known measure of phylogenetic balance but instead is representative of regular branching events in the tree (such that, for example, trees with localized radiations or diversification rate shifts will appear irregular). Importantly, trees constructed from different models of diversification cluster separately according to their spectral densities, and the three spectral density summary statistics are collectively sufficient to distinguish between trees governed by different diversification processes [17]. Thus, empirical trees can be grouped into clusters representing distinct diversification patterns and mapped with minimal information loss onto a three-dimensional space—which we call phylogenetic space—defined by λ*, ψ, and η.

Results

We computed the spectral density profiles of 214 family-level trees representing mammals [18], birds [19], squamates [20], amphibians [21], and ray-finned fishes [22] (see Materials and Methods), totaling 11,930 species. By clustering these profiles, we found that there are five principal ways in which all vertebrate families diversify, significantly supported by both hierarchical and k-medoids clustering (Fig 1A and 1B, respectively). Iterating our analyses on samples from a Bayesian posterior distribution of trees showed that these results were robust to phylogenetic uncertainty (Methods): we identified five clusters 100% of the time; in 75% of the iterations, the distribution of trees across clusters was identical; and in the iteration with maximum mismatch, only 6% of the trees were placed in different clusters. The five distinct clusters are characterized by both global-scale differences related to overall tree shape (as measured by λ, η, and ψ) and local-scale differences in branching patterns restricted to specific parts of the tree that are not encapsulated by the three main summary statistics but appear as smaller details in the shape of the spectral density profile (Fig 1C and 1D). Diversification types fall on a positively correlated gradient of λ* and η, with ψ varying considerably and unsystematically between types. The most abundant type (III) is defined by an intermediate λ* and η and is distinguished from the second most abundant type (I) by a relatively low ψ. The third and fourth most abundant (II and V) are virtually antipodal, the former having extremely low λ* and η and the latter being principally defined by a remarkably high ψ. The least abundant type (IV) is distinguished by very high λ* and η. Notably, as estimated by the eigengap heuristic, phylogenies were most frequently unimodal (40%), with 20% having two or three modes and the rest having between four and thirteen (S1 Data). No diversification type shows a prevalence for a number of modalities (D < 0.25, p > 0.1).

thumbnail
Fig 1. Five principal patterns of vertebrate diversification.

(A) Heatmap and hierarchical clustering of spectral density profiles for 214 vertebrate phylogenies. Bootstrap probabilities are shown above each bifurcation. (B) Silhouette widths (si) for each tree calculated from k-medoids clustering on spectral density profiles; the number of trees belonging to each cluster (n); and average silhouette width for each cluster (). (C) The median spectral density profile for each cluster (where the median profile is that which is closest to the median value for λ*, ψ, and η) and (D) apposite tree. Median values for ψ and η are listed in (C). (E) The percentage of each class of vertebrates belonging to each diversification cluster. The phylogenies included in the analyses are listed in S1 Data.

https://doi.org/10.1371/journal.pbio.1002532.g001

Despite variation in the abundance of types across families, we see no single diversification type that dominates the vertebrate landscape (Fig 1E, S2 Data). We observe, however, certain diversification types to be over- or under-represented within each vertebrate class, even when accounting for sample size (S1A Fig, S3 Data). Ray-finned fish deviate the least from an even distribution. Type III trees are absent in squamates and amphibians, but only significantly so in amphibians. This may be due, in part, to mean differences in crown age between classes (S1B Fig).

Mapping empirical phylogenies in the space defined by λ*, ψ, and η reduces the high dimensionality of phylogenies to only three dimensions, thus allowing the first visualization of vertebrate phylogenetic space (Fig 2A). The geometry of phylogenetic space reveals that vertebrate phylogenies are able to occupy a broad, multidimensional range of tree space and therefore differ from one another along multiple axes of variation rather than fall along a single line. As expected, phylogenies from distinct diversification types cluster together in phylogenetic space and, accordingly, vertebrate classes are differently distributed in space (S1C Fig). Phylogenies differing in the number of missing extant taxa are spread across phylogenetic space, suggesting that undersampling does not bias our representation of this space (S2 Fig).

thumbnail
Fig 2. Vertebrate phylogenetic space.

(A) Spectral density profiles and trees representative of different regions of phylogenetic space are shown: (top to bottom) Acanthizidae, Muridae, Pittidae, and Leporidae. (B) The distribution of trees in phylogenetic space defined by spectral density profile summary statistics: skewness (ψ), ln-transformed principal eigenvalue (λ*), and peak height (η). Points in phylogenetic space corresponding to trees in (A) are circled. Points in (B) are colored according to Fig 1. Summary statistics are listed in S1 Data.

https://doi.org/10.1371/journal.pbio.1002532.g002

The dimensionality of phylogenetic space can be further reduced to only two dimensions, as revealed by a polytope test that estimates the statistical significance of a given number of vertices (i.e., archetypes) encompassing input data (see Materials and Methods) [23]. We identified three archetypes as the minimum number that could significantly delimit phylogenetic space (p < 0.01). These three archetypes present a Pareto-optimal situation: all phylogenies fall within the optimal polytope (here, a triangle) and a different phylogenetic property is maximal nearest each archetype (i.e., vertex of the triangle) (Fig 2B). Each archetype is variably contributed to by λ*, ψ, and η—archetype a (10%, 89%, 1%); archetype b (44%, 6%, 49%); archetype c (45%, 4%, 51%)—such that, as a tree approaches an archetype, its properties change according to the proportional contributions of each property at that archetype. Direct trade-offs between λ*, ψ, and η reveal constraints on how certain phylogenetic properties covary (Fig 3A). The positive linear relationship between λ* and η reveals the tendency for phylogenies to become more regular as they accumulate phylogenetic diversity, with regularity increasing at a slower rate than diversity. The negative logarithmic describing η as a function of ψ draws a steep slope around ψ = 0, where a wide range of η is possible, but then abates, signifying the tendency for phylogenies (with ψ > 0) to become increasingly irregular as they become more tippy. Taken together, the archetypes underline phylogenetic trade-offs in the movement through phylogenetic space: as trees move from archetype a to b, they become more stemmy, and as they move from b to c, they become more expansive and regular.

thumbnail
Fig 3. Constraints and trade-offs in vertebrate phylogenetic space.

(A) The geometry of phylogenetic space (in Fig 2B) is best fit by a triangle, with each vertex corresponding to an archetype defind by a unique set of phylogenetic properties. (A, inset) The explained variability of the data distribution elbows at three vertices. (B) The trade-offs between phylogenetic properties in vertebrate phylogenetic space. (C) The set of empirical trees (black) and simulated phylogeny-type networks (grey) plotted in phylogenetic space. (D) Barplots of the percentage of empirical phylogenies of each diversification type with maximum likelihood support from a particular diversification model. Points in (A) are colored according to Fig 1. See S1 Data for summary statistic and model AICc values and S4 Data for model percentages.

https://doi.org/10.1371/journal.pbio.1002532.g003

The trade-offs we observe in phylogenetic space reveal biological rather than structural constraints. Comparing the three-dimensional empirical phylogenetic space to the entirety of the space available to phylogeny-type networks (i.e., bifurcating, ultrametric, and rooted trees restricted to the λ* range of vertebrate trees; see Materials and Methods) shows that empirical phylogenies occupy only a small portion (36%) of this possible space (Fig 3B). In fact, any combination of high, low, and medium values of λ*, η, and ψ can be reached by simulated phylogeny-type networks. The polytope encompassing phylogeny-type networks ranges more extensively in high-η, low-ψ phylogenetic space, resulting in a symmetrical, Pareto-poor geometry (S3 Fig). The difference between empirical and potential space is not linked to the restricted number of taxonomic families we analyzed in comparison with the total number of vertebrate families. Indeed, rarefaction analyses suggest that the sample of trees we analyze saturates empirical phylogenetic space, with coordinates of the two-dimensional polytope reaching an asymptote (S4 Fig). Hence, empirical phylogenetic space is truly evolutionarily constrained, with the parts of space epitomizing trees that are regular (high η) and either particularly stemmy (low ψ) or particularly tippy (high ψ) not reached by diversification trajectories. To understand what types of trees, in diversification terms, occupy these empty spaces, we simulated ultrametric trees under varying diversification parameters. We found that much of the regular space could be occupied by the simulated trees, with the stemmy space occupied by trees with a declining speciation rate and the tippy one occupied by trees with an increasing speciation rate (S5 Fig). There is overlap between the simulated and empirical trees, but most trees simulated with β = ±0.1 fall outside of vertebrate phylogenetic space.

In order to interpret different regions of phylogenetic space in traditional macroevolutionary terms, we estimated the statistical support for state-of-the-art, processed-based phylogenetic diversification models that are comparable in a likelihood framework (see Materials and Methods). The processes featured by these models were mass extinction events (constant speciation and extinction punctuated by events when a fraction of species is lost [24]), diversity dependence (diversification rates varying as a function of the number of extant species [25]), temperature dependence (diversification rates varying as a function of temperature [26]), and speciation by genetic differentiation (SGD) (speciation emerging from the accumulation of mutations [27]). Each diversification type is predominantly (or exclusively) supported by a different model. Specifically, 100% of type II phylogenies are best supported by an exponential diversity-dependent model; 100% of type V by a mass extinction model; 80% and 20% of type I by SGD and temperature dependence, respectively; phylogenies of type III and IV show support from all four diversification models, although they are disproportionately supported by SGD and temperature dependence, respectively (Fig 3C, S4 Data). Linear diversity dependence and linear temperature dependence were never selected above their exponential counterparts. On average, corrected Akaike Information Criterion differences (ΔAICc) between models showing the best and second-best likelihood support was modest (ΔAICc = 1.49±1.22; S6 Fig, S1 Data), which illustrates the well-recognized difficulty of unambiguously disentangling different diversification scenarios with model-based phylogenetic comparative methods [7]. Trees simulated under the same four diversification processes, plus a trait-dependent process [28], and mapped onto phylogenetic space (see Materials and Methods) corroborate that certain processes coincide with certain diversification types, but that the matching between type and process is not perfect. Trees generated under diversity-dependent and mass extinction models are the most localized in phylogenetic space, whereas other models are more widespread (S7 Fig). Nonetheless, the phylogenetic space of model-based trees largely corresponds (~85% overlap) with empirical phylogenetic space, and different regions of phylogenetic space are disproportionately associated with different diversification processes. Accordingly, these processes have shaped the geometry of vertebrate phylogenetic space by constraining family-level vertebrate phylogenies to diversify in particular ways.

We estimated the trajectory that, on average, families traverse through phylogenetic space over time by binning empirical phylogenies by crown age (see Materials and Methods). The trajectory traverses four diversification types starting from type II (the diversity-dependent type) followed by type I, III, and finally V (the mass extinction type) (Fig 4A). This is confirmed by a significant age dependence to each type (S8A Fig), although there is considerable variation within each bin (S8B Fig). The trajectory is characterized by an exponential increase in λ* and a linear increase in ψ (Fig 4B), but no significant trend in η or, despite its relevance to λ* [17], species richness (p = 0.34). However, when we look at taxonomic classes separately, we see that each takes a divergent average trajectory through phylogenetic space (S9 Fig), which conforms with their different distributions in space (S1B Fig) and suggests phylogenetic constraints to how diversification evolves. The trajectory for birds traverses a path involving types II/III, I, and V that resembles that of the global trajectory, whereas the mammalian trajectory takes a different path, traversing types II, III, and IV. It is notable that phylogenies that predate the Cretaceous–Paleogene (K–Pg) boundary (~66 million years [my]), which here comprise only non-mammalian families, fall within the region of phylogenetic space associated with mass extinction, whereas the most ancient mammalian phylogenies (<60 my) fall within the region associated with temperature dependence.

thumbnail
Fig 4. The average trajectory through phylogenetic space over time.

All phylogenies were binned according to crown age. (A) The average position in phylogenetic space of each binned group. (B) The explicit relationship between crown age and λ* (y ∼ −0.42x + 0.41x2 − 0.02x3, R2 = 0.34, p < 0.001), ψ (y ∼ 0.6x, R2 = 0.15, p < 0.001), and η (p = 0.20). (C) Tree plots and spectral density profiles of the median phylogenies in each binned group.

https://doi.org/10.1371/journal.pbio.1002532.g004

Discussion

Finding general patterns in the way that clades diversify has been notoriously difficult, limiting our ability to understand how biodiversity emerges in deep time. Mapping clades in phylogenetic space allows us to have a much clearer view of the landscape of diversification. Importantly, we find that there are constraints to diversification. Vertebrate phylogenetic space fills only a third of all possible tree space—akin to the constrained morphospace of Raup‘s mollusc shells [29]—due to ostensible limitations to how vertebrate families successfully diversify. Biodiversity might well follow, if not entirely predictable, at least constrained trajectories through evolutionary time.

We find that there are five principal types of diversification, which are bound in phylogenetic space by three vertices characterized by: tippy trees; stemmy, irregular trees; and large, regular trees (vertices a, b, and c from Fig 2A, which correspond to type V, II, and IV trees, respectively). The bird family Acanthizidae is an example of archetype a; the mammalian family Muridae is an example of archetype b; and the mammalian family Nycteridae is an example of archetype c, while the mammalian family Leporidae is not optimized at any archetype but rather at an intermediate position in phylogenetic space. The support for mass extinction events among tippy trees suggests that the tippy part of phylogenetic space (high ψ) is occupied by clades that radiated after mass extinction left niche space unoccupied. The support for diversity-dependent models among disproportionally irregular stemmy trees (type II trees) suggests that the slowdown in speciation is differentially distributed across lineages due, perhaps, to differences in species abundances. Finally, support for temperature-dependent speciation among regular trees (type IV trees) suggests that radiation events resulting from environmental factors may similarly and pervasively affect all lineages. As certain regions of phylogenetic space are dominated by particular biotic or abiotic processes, it is ultimately the combined effect of intrinsic and extrinsic limiting factors on each family that shapes the geometry of phylogenetic space.

The three vertices bounding phylogenetic space are the result of trade-offs between various characteristics of the trees. There is a long history of work showing how phenotypic evolution is constrained in every instance: there are always trade-offs between optimizing different fitness strategies [30,31]. As a result, species adapt to one fitness optimum at the expense of others, or to a concessionary optimum that serves several while optimizing none [32]. While phylogenetic trees are not adaptive in the sense that phenotypes are, there are trade-offs and combinations of tree characteristics that are not seen in nature, suggesting that certain diversification patterns are unstable (i.e., they lead to clade-wide extinction). Trees simulated with a speciation rate > α * e±0.1, for example, fall overwhelmingly outside of vertebrate phylogenetic space. Therefore, successful diversification in vertebrate families is approximately bounded by these speciation rates, and the reason we see trade-offs in phylogenetic properties and a constrained phylogenetic space is because families that surpass those bounds go extinct. Consequently, we may expect phylogenies of extinct clades to violate these rules and fall outside of phylogenetic space.

Specifically, there are trade-offs between both stemminess and expansiveness with regularity among vertebrate families. Trees become more irregular as they become tippier. If tippyness is indeed the mark of mass extinction events, this trade-off could be the result of (mass) extinction unevenly culling parts of trees. We also observe a lower boundary for stemminess, which suggests that clades that diversify only early in their history go extinct. Finally, trees that are expansive tend to be more regular, suggesting that the various processes generating irregularity (e.g., runaway diversification in one lineage versus others, multiple slowdowns in diversification) result in contracted clades or widespread extinction.

As a consequence of these phylogenetic trade-offs, there are parts of phylogenetic space unoccupied by vertebrate families. Those parts are predominantly characterized by both stemmy and tippy regular trees and can be occupied by trees simulated with considerably decreasing and increasing rates of speciation, respectively. The fact that vertebrate family phylogenies are wedged between these two extremes of speciation provides the first evidence for the range of speciation possible at this level of vertebrate evolution. Such limitations to species diversification must be the consequence of a set of limiting factors such as minimum generation time, responsiveness to abiotic events, spatiotemporal availability of niche space, or some other facets of vertebrate speciation that favor certain diversification patterns while prohibiting others.

There is a significant positive correlation between crown age and both expansiveness and tippyness. The former correlation is unsurprising and suggests, firstly, that expansiveness is not bounded by any kind of upper ecological limit and, secondly, that branch length accumulates at a more or less similar rate across families, such that older families are typically more expansive than younger ones simply as a result of their having had more time to expand. The observation that tippyness also increases as a function of crown age is somewhat more surprising, although it is consistent with the observation that young clades tend to carry a stronger signal of early burst radiations (captured by stemminess) than older, more inclusive ones [9,19]. This suggests that young clades show a signal of early burst radiations, with speciation rates declining as ecological niches are filled, but also that this limitation in niche space is not absolute and radiations can happen later in a clade‘s history. As clades age, new opportunities for radiations may arise if, for example, ecological opportunity is recovered after a mass extinction event, or if new factors allow the exploitation of a part of ecological space that was not previously explored (e.g., new interspecific interactions prompting character displacement, fluctuations in temperature producing new niches under favorable environmental conditions for speciation).

The distinct distributions in and evolutionary trajectories through phylogenetic space of birds and mammals may reveal something about evolutionary differences between classes. Bird families are distributed nearly homogeneously in phylogenetic space and their average trajectory is similar to that of squamate, amphibian, and fish families. Mammals, on the other hand, are disproportionately underrepresented at archetype a and show an average trajectory through phylogenetic space orthogonal to those of other vertebrates. The relative preponderance of mammal families in diversification type IV is consistent with the metabolic theory of ecology [33] as well as work suggesting that mammalian diversification is inversely correlated with temperature [34]. We can speculate that the evolutionary novelty swaying mammalian biodiversity towards temperature dependency is connected to endothermic viviparity, which has led to considerable reproductive diversity in mammals [35] and has been shown to directly impact diversification in other vertebrates [36]. It is also worth noting that the clear age dependence of diversification types may bias the distribution of mammal families in phylogenetic space and, therefore, their relative absence near archetype a may be explained, at least in part, by their median youngness.

The benefit of generating an absolute phylogenetic space for species trees is that it provides a reference frame for describing and contextualizing phylogenies generally. It furthermore advances the idea that phylogenies are not simply the result of a single biotic or abiotic factor, but that a time-sensitive series of factors differentially influences diversification during a clade’s evolutionary history. By mapping non-vertebrate taxa in this space, it will be interesting to see whether the constraints observed here are unique to vertebrates or extend to other animal or even all organismal life.

Materials and Methods

Empirical Phylogenies

We compiled a dataset of dated, family-level ultrametric trees from an exhaustive literature search of species-level vertebrate phylogenies. In order to avoid biased comparisons between trees, we included only phylogenies that were ≥80% sampled, which is shown to be a sufficient sampling estimate to make use of the MGL [17]. We restricted our analyses to family-level trees because, after discarding insufficiently sampled trees, this provided us with the largest number of trees of adequate size (>20 species) while allowing us to cover all vertebrate classes. We used the most recent molecular mammalian tree with 4,160 extant species (v.1001) [18]. We used a molecular bird tree with 6,670 extant species constructed on the Hackett backbone [19]. To avoid known (and unknown) polytomies and negative branch-lengths in the mammal and bird phylogenies, we computed maximum clade credibility (MCC) trees in BEAST 2 [37] that selected trees with the highest posterior probability product. We used a phylogeny of 4,161 extant squamate species [20] and one of 3,309 extant amphibian species [21], both of which were constructed with a supermatrix analysis of molecular data. Finally, we used a phylogeny of 7,822 extant actinopterygians (ray-finned fishes), which was constructed with a maximum-likelihood approach and made ultrametric using penalized likelihood [22]. Our compilation resulted in 72 Mammalia, 102 Aves, 7 Squamata, 9 Amphibia, and 24 Actinopterygii family-level phylogenies for a total of 214 trees. To assess phylogenetic uncertainty, we randomly sampled 100 trees from posterior distributions of 1,000 mammal trees [18] and 10,000 bird trees (birdtree.org) and from each sample parsed family-level trees.

Clustering of Trees Based on Spectral Density Profiles

Following the non-parametric approach described in detail in [17], given a phylogenetic tree, we defined its standard MGL as the difference between its degree matrix (the diagonal matrix where diagonal element i is the sum of the branch lengths from node i to all the other nodes in the phylogeny) and its distance matrix (where element (i,j) is the branch length between nodes i and j). We then obtained the spectral density profile of the tree by convolving the spectra of eigenvalues of the MGL with a Gaussian kernel. Finally, we measured the distance between phylogenies using the Jensen–Shannon index [38] and subjected these distances to unsupervised hierarchical and k-medoids clustering. For hierarchical clustering, significance was calculated with bootstrap resampling and clusters were considered significant at α ≥ 0.95. For k-medoids clustering, the number of clusters was estimated by optimum average silhouette width. Average silhouette widths greater than 0.5 were considered significantly supported. These analyses are implemented in the R package RPANDA [39].

To assess the effect of phylogenetic uncertainty on clustering results, we repeated both clustering protocols over 100 iterations by randomly drawing from the mammal and bird trees sampled from posterior distributions. We tallied the number of times we duplicated the original result using the MCC trees versus those where at least one tree was placed in a different cluster. When there were mismatches, we measured the percentage of trees that were placed in a different cluster.

We determined whether the distribution of trees across clusters was more or less than expected by chance for each taxonomic class. Given a specific taxonomic class represented by x trees, we iteratively distributed x hypothetical trees in the five clusters, where the probability of being placed in a particular cluster was defined by the relative size of that cluster (as calculated above). For each cluster, 500 iterations of this process yielded a null distribution of the expected number of trees in the cluster. We deemed significant the deviation of the actual number of families assigned to each cluster from the expected if the actual number fell in the lower or upper tail of the distribution (α < 0.05).

Defining Phylogenetic Space

We computed the spectral density profile summary statistics (λ*, ψ, and η) of phylogenies as described in [17] (S1 Data). Briefly, each eigenvalue computed from the MGL of a phylogeny describes the distance between nodes, such that larger and smaller eigenvalues describe greater and shorter distances, which in practice represent speciation-poor and -rich regions of the tree, respectively. The largest distance, therefore, is denoted by the largest eigenvalue, λ*. When the spectrum of eigenvalues is convolved with a Gaussian kernel, it is possible to identify patterns in its distribution. The skewness, ψ, of the distribution indicates whether the eigenvalues of the MGL are preponderantly large or small and thus whether the phylogeny is comprised of mostly speciation-rich or -poor regions. Likewise, the peak height, η, demonstrates whether branch lengths in the phylogeny are heterogeneous (low η) or homogeneous (high η)—that is, irregular or regular, respectively. These statistics were used to map empirical phylogenies into a three-dimensional space. Spectral density profile summary statistics were regressed against each other and best-fit slopes were selected using a stepwise AIC. Modality in the MGL, which was not used to define phylogenetic space, was calculated using the eigengap heuristic (S1 Data) [17], which determines the number of modes of division in a phylogeny by the position of the greatest difference between ranked eigenvalues.

We determined the geometry of the space by finding the polytope with the smallest number of vertices (i.e., archetypes) that could explain the most variance (or beyond which little explanatory power was added) in the distribution of phylogenies [32]. Archetypes were positioned around the data using the SISAL algorithm and allowing for up to six archetypes. Best fits at p < 0.01 were determined by t-ratio tests. The phylogenetic features optimized at each archetype were determined by the relative contributions of each summary statistic at that archetype [23]. To assess how sufficiently our sample of phylogenies captured the entirety of vertebrate phylogenetic space, we performed rarefaction analyses. We calculated the maximum coordinates of the archetypes for bootstrapped samples of phylogenies while fixing the number of dimensions at three and fit saturation curves for those coordinates as a function of sample size. Bootstrapped samples were replicated 50 times.

Characterizing the Phylogenetic Space Available to Trees

To assess the entirety of phylogenetic space available to trees consistent with the empirical distribution of λ*, we extensively generated phylogeny-type networks (i.e., bifurcating, ultrametric, rooted networks) while constraining λ* to be in the empirical distribution. Given a λ* from the empirical distribution, we constructed a graph Laplacian by populating a symmetric, positive, semidefinite M-matrix with zero-sum rows and columns. We computed a distance matrix from the graph Laplacian using the classical Dijkstra algorithm, which finds distances between nodes in a graph given an adjacency matrix [40]. We obtained the corresponding non-ultrametric tree from the distance matrix using complete linkage clustering [41], and the resulting tree was made ultrametric using mean path length [42]. Finally, the tree was rooted at the stem and, where necessary, polytomies were resolved using the R package ape [43]. We constructed 10,000 such trees. The proportion of phylogenetic space available to trees that is occupied by empirical phylogenies was calculated as the volume of three-dimensional space occupied between the most distal empirical points along each axis as a percentage of the total space occupied by the simulated trees. The geometry of tree-type space was determined with a polytope analysis as above, using 1,000 trees randomly sampled from the 10,000 for computing purposes.

To interpret phylogenetic space in terms of statistical diversification processes, we simulated ultrametric birth–death trees using the R package TESS [44]. We simulated 500 trees at ages 10–60 my under constant speciation rates (b = 0.1–0.25), decreasing speciation rates (α * eβt for α = 0.01–−1 and β = 0.05–−0.2, and increasing speciation rates (α * eβt for α = 0.01–−0.05 and β = 0.05–−0.2); the extinction rate was held constant (d = 0–0.1). We discarded trees outside the range of species richness for empirical phylogenies (20–700). The final set of trees for each birth–death model had a comparable mean species richness to the empirical set. We then computed their spectral density profiles and plotted them in phylogenetic space as above.

Interpreting Phylogenetic Space in Macroevolutionary Terms

We selected phylogenetic diversification models meant to directly model a process (rather than a temporal trend) and that were comparable in a likelihood framework. This resulted in five models featuring mass extinction events, linear and exponential diversity dependence of speciation rates, exponential temperature dependence, and SGD. The mass extinction model was fitted using the |bd.shifts.optim| function in the R package TreePar, which maximizes the likelihood of a model with constant speciation and extinction rates and one or more sampling events (i.e., mass extinctions) at discrete time points t1,t2,…,tn in the history of the clade [24]. We allowed the mass extinction events to occur at any time. Diversity-dependent models were fitted using the dd_ML function in DDD [25]. We tested models with speciation rates showing both linear and exponential dependence to the number of extant species. Temperature-dependent models were fitted with the fit_env function in RPANDA with constant, linear, and exponential speciation—and extinction-rate dependence to temperature [26]. Temperature data were inferred from benthic foraminiferal δ18O measurements extending to 108 my ago [45] at 0.1 my time-steps. Finally, the SGD model was fit using the fit_sgd function in RPANDA with Nelder–Mead optimization [27]. We fitted these models to all empirical phylogenies with ≥50 species (a total of 62 phylogenies). Likelihood values were normalized after [46] in order to make the likelihood estimates directly comparable. Models with the lowest AICc scores were considered the most supportive of the phylogeny [47]. Trait-dependent models were not considered, as their likelihood functions are not, in their current form, comparable to the likelihoods of the other models.

We simulated trees under five diversification models—birth–death with mass extinction, exponential diversity dependence (which was always favored to linear dependence), exponential temperature dependence, SGD, and continuous trait dependence [28]—and mapped those trees in phylogenetic space. In order to restrict the parameter space to realistic values, we first estimated parameters by fitting models to the 62 empirical phylogenies with ≥50 species as above. We fitted the continuous trait-dependent model with ln-transformed body mass (or, for ray-finned fish, maximum length) as trait data using QuaSSE in diversitree [48]. Data were collected from the literature for mammals [49], birds [50], and ray-finned fish [51]. For each empirical parameter estimate, 30 trees were simulated, resulting in 1,500 simulated trees per diversification model. We calculated the distribution of simulated trees in phylogenetic space by drawing polygons around each empirical diversification type and counting the number of trees simulated under a diversification model falling in each polygon. Polygon vertices were selected using the most distal points in each type. The volume of phylogenetic space for simulated trees was estimated as above and the percentage correspondence of space between empirical and simulated trees was calculated as the volume of overlapping space over the total volume occupied by both empirical and simulated trees.

Drawing Evolutionary Trajectories through Phylogenetic Space

We binned empirical phylogenies by crown age using a natural breaks optimization algorithm [52] without specifying a number of desired bins. For each bin, we calculated the arithmetic mean for λ*, ψ, and η and then plotted those values in empirical phylogenetic space. The diversification type corresponding to each bin was determined by drawing polygons around each diversification type as above. Median phylogenies (Fig 4C) were defined as those falling nearest the arithmetic means for λ*, ψ, and η. The same was done for mammals and birds separately. There were too few trees in the other taxonomic classes to analyze them similarly.

Supporting Information

S1 Data. Statistics for vertebrate phylogenies.

https://doi.org/10.1371/journal.pbio.1002532.s001

(CSV)

S2 Data. Distribution of diversification types across classes.

https://doi.org/10.1371/journal.pbio.1002532.s002

(CSV)

S3 Data. Deviation of classes across diversification types from null distribution.

https://doi.org/10.1371/journal.pbio.1002532.s003

(CSV)

S4 Data. Percentage of support for each model among empirical trees in different regions of phylogenetic space.

https://doi.org/10.1371/journal.pbio.1002532.s004

(CSV)

S5 Data. Spectral density profile summary statistics for trees simulated under different models.

https://doi.org/10.1371/journal.pbio.1002532.s005

(CSV)

S6 Data. Percentage of models falling within each region of phylogenetic space.

https://doi.org/10.1371/journal.pbio.1002532.s006

(CSV)

S1 Fig.

(A) The deviation of the number of families of each diversification type in each class (see Fig 1E) from an expected null distribution based on the number of families in each class and in each diversification type. Asterisks denote a significant deviation at p < 0.05. (B) Boxplot of mean crown age for each phylogenetic class (S1 Data). Phylogenies sampled for squamates, amphibians, and fish are significantly (T > 2.51, p < 0.05) older than those sampled for birds and mammals. (C) Plots of the distribution of mammals (top), birds (middle), and other vertebrate families (bottom) in phylogenetic space. See S1 Data and S3 Data.

https://doi.org/10.1371/journal.pbio.1002532.s007

(TIF)

S2 Fig. The distribution of sampling fractions in phylogenetic space.

(A) Empirical phylogenies in phylogenetic space, colored according to the sampling fraction available for that phylogeny (S1 Data): 100% (black), 90%–99% (red), and 80%–89% (green). (B) Histograms of λ* (top), η (middle), and ψ (bottom) across trees of a given sampling fraction. Differences between the distributions of trees of different sampling fractions, calculated by a Kolmogorov-Smirnov test, are not significant along either axis (D > 0.15, p > 0.42).

https://doi.org/10.1371/journal.pbio.1002532.s008

(TIF)

S3 Fig. The geometry of tree-type space computed on spectral density summary statistics for 1,000 simulated MGLs (randomly sampled from 10,000 simulated).

The distribution cannot be entirely encompassed by a polytope; variability is best explained by a single point.

https://doi.org/10.1371/journal.pbio.1002532.s009

(TIF)

S4 Fig. Max coordinates for each vertex in 2-D triangular phylogenetic space (see Fig 2B) for bootstrapped samples of empirical phylogenies.

Error bars represent standard deviations calculated from 50 replicates. Logarithmic curves were fit to data for each vertex (p < 0.01).

https://doi.org/10.1371/journal.pbio.1002532.s010

(TIF)

S5 Fig.

(A) Ultrametric trees simulated under decreasing rates of speciation (0.8 * e−0.1t; red) and increasing rates of speciation (0.04 * e0.1t; green) projected into empirical phylogenetic space. Constant-rate trees (not shown) fell within the range of empirical trees. (B) Trade-offs between phylogenetic properties in phylogenetic space. Statistics for best-supported relationships for decreasing speciation rate trees are shown in red on the right. No significant (p < 0.05) relationships were found between phylogenetic properties for increasing speciation-rate trees.

https://doi.org/10.1371/journal.pbio.1002532.s011

(TIF)

S6 Fig. Boxplot of the ΔAICc between models fit to empirical phylogenies (see Fig 2C).

For all phylogenies best supported by a certain model, we computed the ΔAICc as the difference between the best and second-best supported model for each phylogeny. See S1 Data.

https://doi.org/10.1371/journal.pbio.1002532.s012

(TIF)

S7 Fig. The percentage of trees simulated under different diversification models that fall within the phylogenetic space occupied by each empirical diversification type (numerals refer to types).

See S5 Data and S6 Data.

https://doi.org/10.1371/journal.pbio.1002532.s013

(TIF)

S8 Fig.

(A) Boxplot of mean crown age for each diversification type. (B) Boxplot of mean λ*, ψ, and η for each binned age group. See S1 Data.

https://doi.org/10.1371/journal.pbio.1002532.s014

(TIF)

S9 Fig. The average trajectory of families from different classes through phylogenetic space over time.

(A) Families from mammals and birds were separately binned into seven bins by crown age based on Jenks optimization. A line is drawn through the average position in phylogenetic space for each binned group. (B) The best-fit regression slope for λ* (mammals, y ∼ −0.644x + 0.338x2, R2 = 0.32, p < 0.01; birds, y ∼ 0.243x, R2 = 0.23, p < 0.01), ψ (mammals, R2 = 0.03, p = 0.07; birds, y ∼ 0.10x, R2 = 0.05, p < 0.015), and η (mammals, y ∼ 0.108x, R2 = 0.05, p = 0.03; birds, R2 = −0.01, p = 0.94) as a function of crown age for each clade (S1 Data). (C) Representative spectral density profiles and trees for each binned group in mammals (top) and birds (bottom). Boxes around trees are colored according to the diversification type that bin coincides with in phylogenetic space.

https://doi.org/10.1371/journal.pbio.1002532.s015

(TIF)

Acknowledgments

We would like to thank Olivier Billaud, Julien Clavel, Jonathan Drury, Odile Maliet, Marc Manceau, and Olivier Missa for helpful comments on the manuscript. EL would like to thank Evan Charles for helpful discussion.

Author Contributions

  1. Conceived and designed the experiments: EL HM.
  2. Performed the experiments: EL.
  3. Analyzed the data: EL.
  4. Contributed reagents/materials/analysis tools: HM.
  5. Wrote the paper: EL HM.

References

  1. 1. Benton MJ, Pearson PN. Speciation in the fossil record. Trends Ecol Evolut. 2001 Jul;16(7):405–411.
  2. 2. Alroy J. Dynamics of origination and extinction in the marine fossil record. Proc Nat Acad Sci. 2008 Aug;105(Supplement 1):11536–11542.
  3. 3. Van Valen L. A new evolutionary law. Evolutionary Theory. 1973;1:1–30.
  4. 4. Schluter D. The ecology of adaptive radiation. Oxford University Press; 2000.
  5. 5. Alfaro ME, Santini F, Brock C, Alamillo H, Dornburg A, Rabosky DL, et al. Nine exceptional radiations plus high turnover explain species diversity in jawed vertebrates. Proc Nat Acad Sci. 2009;106(32):13410–13414. pmid:19633192
  6. 6. Rolland J, Condamine FL, Jiguet F, Morlon H. Faster Speciation and Reduced Extinction in the Tropics Contribute to the Mammalian Latitudinal Diversity Gradient. PLoS Biol. 2014 Jan;12(1):e1001775. pmid:24492316
  7. 7. Morlon H. Phylogenetic approaches for studying diversification. Ecol letters. 2014 Apr;17(4):508–525.
  8. 8. McPeek MA. The ecological dynamics of clade diversification and community assembly. The Am Nat. 2008 Dec;172(6):E270–284. pmid:18851684
  9. 9. Hedges SB, Marin J, Suleski M, Paymer M, Kumar S. Tree of life reveals clock-like speciation and diversification. Mol Bio Evol. 2015 Apr;32(4):835–845.
  10. 10. Morlon H, Potts MD, Plotkin JB. Inferring the dynamics of diversification: a coalescent approach. PLoS Biol. 2010;8(9).
  11. 11. Phillimore AB, Price TD. Density-dependent cladogenesis in birds. PLoS Biol. 2008 Mar;6(3):e71. pmid:18366256
  12. 12. Rabosky DL, Slater GJ, Alfaro ME. Clade age and species richness are decoupled across the eukaryotic tree of life. PLoS Biol. 2012 Aug;10(8):e1001381. pmid:22969411
  13. 13. Moen D, Morlon H. Why does diversification slow down? Trends in Ecol Evolut. 2014 Apr;29(4):190–197.
  14. 14. Blum MGB, François O. Which Random Processes Describe the Tree of Life? A Large-Scale Study of Phylogenetic Tree Imbalance. Syst Bio. 2006;55(4):685–691.
  15. 15. Rabosky DL. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PloS ONE. 2014;9(2):e89543. pmid:24586858
  16. 16. Rabosky DL, Goldberg EE. Model inadequacy and mistaken inferences of trait-dependent speciation. Syst Bio. 2015 Mar;64(2):340–355.
  17. 17. Lewitus E, Morlon H. Characterizing and Comparing Phylogenies from their Laplacian Spectrum. Syst Bio. 2016 May;65(3):495–507.
  18. 18. Faurby S, Svenning JC. A species-level phylogeny of all extant and late Quaternary extinct mammals using a novel heuristic-hierarchical Bayesian approach. Mol Phylo Evol. 2015 Mar;84:14–26.
  19. 19. Jetz W, Thomas GH, Joy JB, Hartmann K, Mooers AO. The global diversity of birds in space and time. Nature. 2012 Nov;491(7424):444–448. pmid:23123857
  20. 20. Pyron R, Burbrink FT, Wiens JJ. A phylogeny and revised classification of Squamata, including 4161 species of lizards and snakes. BMC Evol Bio. 2013;13(1):93.
  21. 21. Pyron RA. Biogeographic analysis reveals ancient continental vicariance and recent oceanic dispersal in amphibians. Syst Bio. 2014 Sep;63(5):779–797.
  22. 22. Rabosky DL, Santini F, Eastman J, Smith SA, Sidlauskas B, Chang J, et al. Rates of speciation and morphological evolution are correlated across the largest vertebrate radiation. Nature Comm. 2013 Jun;4.
  23. 23. Hart Y, Sheftel H, Hausser J, Szekely P, Ben-Moshe NB, Korem Y, et al. Inferring biological tasks using Pareto analysis of high-dimensional data. Nature Methods. 2015 Mar;12(3):233–235, 3 p following 235. pmid:25622107
  24. 24. Stadler T. Mammalian phylogeny reveals recent diversification rate shifts. Proc Nat Acad Sci. 2011 Apr;108(15):6187–6192. pmid:21444816
  25. 25. Etienne RS, Haegeman B, Stadler T, Aze T, Pearson PN, Purvis A, et al. Diversity-dependence brings molecular phylogenies closer to agreement with the fossil record. Proceedings Biological Sciences / The Royal Society. 2012 Apr;279(1732):1300–1309. pmid:21993508
  26. 26. Condamine FL, Rolland J, Morlon H. Macroevolutionary perspectives to environmental change. Ecol Letters. 2013 May;p. 72–85.
  27. 27. Manceau M, Lambert A, Morlon H. Phylogenies support out-of-equilibrium models of biodiversity. Ecol Letters. 2015 Feb;p. n/a–n/a.
  28. 28. FitzJohn RG. Quantitative Traits and Diversification. Syst Bio. 2010 Dec;59(6):619–633.
  29. 29. Raup DM. Geometric Analysis of Shell Coiling: General Problems. Journal of Paleontology. 1966;40(5):1178.
  30. 30. Parker GA, Smith JM. Optimality theory in evolutionary biology. Nature. 1990 Nov;348(6296):27–33.
  31. 31. Whitlock MC, Phillips PC, Moore FB, Tonsor SJ. Multiple Fitness Peaks and Epistasis. Ann Rev Ecol Syst. 1995 Nov;26(1):601–629.
  32. 32. Shoval O, Sheftel H, Shinar G, Hart Y, Ramote O, Mayo A, et al. Evolutionary trade-offs, Pareto optimality, and the geometry of phenotype space. Science (New York, NY). 2012 Jun;336(6085):1157–1160.
  33. 33. Allen AP, Gillooly JF, Savage VM, Brown JH. Kinetic effects of temperature on rates of genetic divergence and speciation. Proc Nat Acad Sci of the United States of America. 2006 Jun;103(24):9130–9135.
  34. 34. Machac A, Zrzavý J, Smrckova J, Storch D. Temperature dependence of evolutionary diversification: differences between two contrasting model taxa support the metabolic theory of ecology. J Evol Bio. 2012 Dec;25(12):2449–2456.
  35. 35. Lewitus E, Soligo C. Life-History Correlates of Placental Structure in Eutherian Evolution. Evol Bio. 2011;38(3):287–305.
  36. 36. Helmstetter AJ, Papadopulos AST, Igea J, Van Dooren TJM, Leroi AM, Savolainen V. Viviparity stimulates diversification in an order of fish. Nature Comm. 2016 Apr;7:11271.
  37. 37. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol. 2014 Apr;10(4):e1003537. pmid:24722319
  38. 38. Endres DM, Schindelin JE. A new metric for probability distributions. IEEE Transactions on Information Theory. 2003;49(7):1858–60.
  39. 39. Morlon H, Lewitus E, Condamine FL, Manceau M, Clavel J, Drury J. RPANDA: an R package for macroevolutionary analyses on phylogenetic trees. Method Ecol Evol. 2016 May;7(5):589–597.
  40. 40. Dijkstra EW. A note on two problems in connexion with graphs. Numerische Mathematik. 1959 Dec;1(1):269–271.
  41. 41. Day WE, Edelsbrunner H. Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification. 1984 Dec;1(1):7–24.
  42. 42. Britton T, Anderson CL, Jacquet D, Lundqvist S, Bremer K. Estimating divergence times in large phylogenetic trees. Syst Bio. 2007 Oct;56(5):741–752.
  43. 43. Paradis E, Claude J, Strimmer K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics (Oxford, England). 2004 Jan;20(2):289–290.
  44. 44. Höhna S. Fast simulation of reconstructed phylogenies under global time-dependent birth-death processes. Bioinformatics (Oxford, England). 2013 Jun;29(11):1367–1374.
  45. 45. Cramer BS, Miller KG, Barrett PJ, Wright JD. Late Cretaceous–Neogene trends in deep ocean temperature and continental ice volume: Reconciling records of benthic foraminiferal geochemistry (δ 18 O and Mg/Ca) with sea level history. J of Geophys Res. 2011 Dec;116(C12).
  46. 46. Stadler T. How Can We Improve Accuracy of Macroevolutionary Rate Estimates? Syst Bio. 2013 Mar;62(2):321–329.
  47. 47. Burnham KP, Anderson DR, editors. Model Selection and Multimodel Inference. New York, NY: Springer New York; 2004.
  48. 48. FitzJohn RG. Diversitree: comparative phylogenetic analyses of diversification in R. Methods Ecol Evol. 2012 Dec;3(6):1084–1092.
  49. 49. Jones KE, Bielby J, Cardillo M, Fritz SA, O’Dell J, Orme CDL, et al. PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecol. 2009 Aug;90(9):2648–2648.
  50. 50. Wilman H, Belmaker J, Simpson J, de la Rosa C, Rivadeneira MM, Jetz W. EltonTraits 1.0: Species-level foraging attributes of the world’s birds and mammals. Ecol. 2014 Jul;95(7):2027–2027.
  51. 51. Frimpong EA, Angermeier PL. Fish Traits: A Database of Ecological and Life-history Traits of Freshwater Fishes of the United States. Fisheries. 2009 Oct;34(10):487–495.
  52. 52. Jenks GF, Caspall FC. Error on Choroplethic Maps: Definition, Measurements, Reduction. Ann Assoc of Am Geo. 1971 Jun;61(2):217–244.