Trends in Genetics
Volume 17, Issue 5, 1 May 2001, Pages 262-272
Journal home page for Trends in Genetics

Review
Molecular phylogenetics: state-of-the-art methods for looking into the past

https://doi.org/10.1016/S0168-9525(01)02272-7Get rights and content

Abstract

As the amount of molecular sequence data in the public domain grows, so does the range of biological topics that it influences through evolutionary considerations. In recent years, a number of developments have enabled molecular phylogenetic methodology to keep pace. Likelihood-based inferential techniques, although controversial in the past, lie at the heart of these new methods and are producing the promised advances in the understanding of sequence evolution. They allow both a wide variety of phylogenetic inferences from sequence data and robust statistical assessment of all results. It cannot remain acceptable to use outdated data analysis techniques when superior alternatives exist. Here, we discuss the most important and exciting methods currently available to the molecular phylogeneticist.

Section snippets

Modelling evolution

To be both powerful and robust, statistical inference techniques require accurate probabilistic models of the biological processes that generate the data observed. For the phylogenetic analysis of aligned sequences, virtually all methods describe sequence evolution using a model that consists of two components: a phylogenetic tree and a description of the way individual sequences evolve by nucleotide or amino acid replacement along the branches of that tree. These replacements are usually

Inferential methodology

All the models of sequence evolution described above can be used to estimate the phylogenetic tree that generated the observed sequences. Ideally, the inference method used will extract the maximum amount of information available in the sequence data, will combine this with prior knowledge of patterns of sequence evolution (encapsulated in the evolutionary models), and will deal with model parameters (e.g. the transition/transversion bias κ) whose values are not known a priori. The three major

Statistical testing in phylogenetics

In the past decade, one of the most important topics in evolutionary sequence analysis was the development of methods for the statistical testing of phylogenetic hypotheses. These advances are available almost exclusively within the likelihood framework. They permit assessment of which model provides the best fit for a given dataset – vital for the selection of the optimal model with which to perform phylogenetic inference. Additionally, the rejection of simpler models in favour of those that

Model comparisons

The likelihood framework permits estimation of parameter values and their standard errors from the observed data, with no need for any a priori knowledge 8. For example, a transition/transversion bias estimated as κ=2.3±0.16 effectively excludes the possibility that there is no such bias (κ=1), whereas κ=2.3±1.6 does not.

Comparisons of two competing models are also possible, using likelihood ratio tests 6, 8, 62 (LRTs; Fig. 3). Competing models are compared (using their maximized likelihoods)

Non-parametric bootstrapping of phylogeny

In many applications, the primary interest is in the topology of the inferred evolutionary tree. As with estimates of model parameters, a single point-estimate is of little value without some measure of the confidence we can place in it. A popular way of assessing the robustness of a tree is by the method of non-parametric bootstrapping 14, 65 (Fig. 4). Comparisons of an inferred tree with the set of bootstrap replicate trees, typically in the form of tabulation of the proportion of the

Increasing the robustness of a tree

The best possible phylogenetic estimates will arise from using robust inference methods allied with accurate evolutionary models. However, after statistical assessment of the results it could still be necessary to attempt to improve the quality of inferences drawn. The two most obvious ways of increasing the accuracy of a phylogenetic inference are to include more sequences in the data or to increase the length of the sequences used. Until recently, the likely effects of these approaches had

Conclusion

Molecular evolutionary studies are central to a huge range of biological areas; this is increasingly true as sequence databases grow (and include numerous whole genomes and proteomes). The phylogenetic methodology required for these studies has progressed greatly in the past few years. Maximum likelihood methods permit the application of mathematical models that incorporate our prior knowledge of typical patterns of sequence evolution accumulated over more than 30 years, resulting in more

Glossary

Bootstrap:
A statistical method by which distributions that are difficult to calculate exactly can be estimated by the repeated creation and analysis of artificial datasets. In the non-parametric bootstrap, these datasets are generated by resampling from the original data, whereas in the parametric bootstrap, the data are simulated according to the hypothesis being tested. The name derives from the near-miraculous way in which the method can ‘pull itself up by its bootstraps’ and generate

References (75)

  • P. Liò et al.

    Models of molecular evolution and phylogeny

    Genome Res.

    (1998)
  • J. Adachi et al.

    Model of amino acid substitution in proteins encoded by mitochondrial DNA

    J. Mol. Evol.

    (1996)
  • J. Adachi

    Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA

    J. Mol. Evol.

    (2000)
  • N. Goldman

    Statistical tests of models of DNA substitution

    J. Mol. Evol.

    (1993)
  • Z. Yang

    Estimating the pattern of nucleotide substitution

    J. Mol. Evol.

    (1994)
  • Z. Yang

    Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation

    Mol. Biol. Evol.

    (1994)
  • Z. Yang

    Models of amino acid substitution and applications to mitochondrial protein evolution

    Mol. Biol. Evol.

    (1998)
  • B.S. Gaut et al.

    Success of maximum-likelihood phylogeny inference in the 4-taxon case

    Mol. Biol. Evol.

    (1995)
  • J.P. Huelsenbeck

    Performance of phylogenetic methods in simulation

    Syst. Biol.

    (1995)
  • Kuhner, M.K. and Felsenstein, J. (1994) Simulation comparison of phylogeny algorithms under equal and unequal...
  • Z. Yang

    Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem

    Syst. Biol.

    (1995)
  • D.L. Swofford

    Phylogenetic inference

  • R.D.M. Page et al.

    Molecular Evolution

    (1998)
  • W.M. Brown

    Mitochondrial DNA sequences of primates: tempo and mode of evolution

    J. Mol. Evol.

    (1982)
  • M. Kimura

    A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences

    J. Mol. Evol.

    (1980)
  • Z. Yang

    Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods

    J. Mol. Evol.

    (1994)
  • Z. Yang

    Among-site rate variation and its impact on phylogenetic analysis

    Trends Ecol. Evol.

    (1996)
  • Z. Yang

    Maximum-likelihood models for combined analyses of multiple sequence data

    J. Mol. Evol.

    (1996)
  • M. Hasegawa

    Dating of the human-ape splitting by a molecular clock of mitochondrial DNA

    J. Mol. Evol.

    (1985)
  • M. Nei

    Molecular Evolutionary Genetics

    (1987)
  • M.O. Dayhoff

    A model of evolutionary change in proteins

  • M.O. Dayhoff

    A model of evolutionary change in proteins

  • D.T. Jones

    The rapid generation of mutation data matrices from protein sequences

    CABIOS

    (1992)
  • Whelan, S. and Goldman, N. A general empirical model of protein evolution derived from multiple protein families using...
  • Y. Cao

    Phylogenetic relationships among eutherian orders estimated from inferred sequences of mitochondrial proteins: instability of a tree based on a single gene

    J. Mol. Evol.

    (1994)
  • J.L. Thorne

    Models of protein sequence evolution and their applications

    Curr. Opin. Genet. Dev.

    (2000)
  • Z. Yang et al.

    Synonymous and nonsynonymous rate variation in nuclear genes of mammals

    J. Mol. Evol.

    (1998)
  • Cited by (0)

    View full text