Trends in Genetics
ReviewMolecular phylogenetics: state-of-the-art methods for looking into the past
Section snippets
Modelling evolution
To be both powerful and robust, statistical inference techniques require accurate probabilistic models of the biological processes that generate the data observed. For the phylogenetic analysis of aligned sequences, virtually all methods describe sequence evolution using a model that consists of two components: a phylogenetic tree and a description of the way individual sequences evolve by nucleotide or amino acid replacement along the branches of that tree. These replacements are usually
Inferential methodology
All the models of sequence evolution described above can be used to estimate the phylogenetic tree that generated the observed sequences. Ideally, the inference method used will extract the maximum amount of information available in the sequence data, will combine this with prior knowledge of patterns of sequence evolution (encapsulated in the evolutionary models), and will deal with model parameters (e.g. the transition/transversion bias κ) whose values are not known a priori. The three major
Statistical testing in phylogenetics
In the past decade, one of the most important topics in evolutionary sequence analysis was the development of methods for the statistical testing of phylogenetic hypotheses. These advances are available almost exclusively within the likelihood framework. They permit assessment of which model provides the best fit for a given dataset – vital for the selection of the optimal model with which to perform phylogenetic inference. Additionally, the rejection of simpler models in favour of those that
Model comparisons
The likelihood framework permits estimation of parameter values and their standard errors from the observed data, with no need for any a priori knowledge 8. For example, a transition/transversion bias estimated as κ=2.3±0.16 effectively excludes the possibility that there is no such bias (κ=1), whereas κ=2.3±1.6 does not.
Comparisons of two competing models are also possible, using likelihood ratio tests 6, 8, 62 (LRTs; Fig. 3). Competing models are compared (using their maximized likelihoods)
Non-parametric bootstrapping of phylogeny
In many applications, the primary interest is in the topology of the inferred evolutionary tree. As with estimates of model parameters, a single point-estimate is of little value without some measure of the confidence we can place in it. A popular way of assessing the robustness of a tree is by the method of non-parametric bootstrapping 14, 65 (Fig. 4). Comparisons of an inferred tree with the set of bootstrap replicate trees, typically in the form of tabulation of the proportion of the
Increasing the robustness of a tree
The best possible phylogenetic estimates will arise from using robust inference methods allied with accurate evolutionary models. However, after statistical assessment of the results it could still be necessary to attempt to improve the quality of inferences drawn. The two most obvious ways of increasing the accuracy of a phylogenetic inference are to include more sequences in the data or to increase the length of the sequences used. Until recently, the likely effects of these approaches had
Conclusion
Molecular evolutionary studies are central to a huge range of biological areas; this is increasingly true as sequence databases grow (and include numerous whole genomes and proteomes). The phylogenetic methodology required for these studies has progressed greatly in the past few years. Maximum likelihood methods permit the application of mathematical models that incorporate our prior knowledge of typical patterns of sequence evolution accumulated over more than 30 years, resulting in more
Glossary
- Bootstrap:
- A statistical method by which distributions that are difficult to calculate exactly can be estimated by the repeated creation and analysis of artificial datasets. In the non-parametric bootstrap, these datasets are generated by resampling from the original data, whereas in the parametric bootstrap, the data are simulated according to the hypothesis being tested. The name derives from the near-miraculous way in which the method can ‘pull itself up by its bootstraps’ and generate
References (75)
- et al.
Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene
Genetics
(1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution
Genetics
(1998)Full reconstruction of Markov models on evolutionary trees: identifiability and consistency
Math. Biosci.
(1996)Bayesian statistics in genetics: a guide for the uninitiated
Trends Genet.
(1999)- et al.
Bayesian phylogenetic inference using DNA sequences: Markov chain Monte Carlo methods
Mol. Biol. Evol.
(1997) Loss of information in genetic distances
Nature
(1988)- et al.
Parsimony, likelihood, and the role of models in molecular phylogenetics
Mol. Biol. Evol.
(2000) Philosophy and the transformation of cladistics revisited
Cladistics
(1985)- et al.
Phylogenetic methods come of age: testing hypotheses in an evolutionary context
Science
(1997)