Skip to main content
Log in

Statistical tests of models of DNA substitution

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Summary

Penny et al. have written that “The most fundamental criterion for a scientific method is that the data must, in principle, be able to reject the model. Hardly any [phylogenetic] tree-reconstruction methods meet this simple requirement.” The ability to reject models is of such great importance because the results of all phylogenetic analyses depend on their underlying models—to have confidence in the inferences, it is necessary to have confidence in the models. In this paper, a test statistics suggested by Cox is employed to test the adequacy of some statistical models of DNA sequence evolution used in the phylogenetic inference method introduced by Felsentein. Monte Carlo simulations are used to assess significance levels. The resulting statistical tests provide an objective and very general assessment of all the components of a DNA substitution model; more specific versions of the test are devised to test individual components of a model. In all cases, the new analyses have the additional advantage that values of phylogenetic parameters do not have to be assumed in order to perform the tests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Atkinson AC (1970) A method for discriminating between models. J R Statist Soc B 32:323–345

    Google Scholar 

  • Avery PJ (1987) The analysis of intron data and their use in the detection of short signals. J Mol Evol 26:335–340

    PubMed  Google Scholar 

  • Bailey WJ, Fitch DFA, Tagle DA, Czelusniak J (1991) Molecular evolution of the ψη-globin gene locus: gibbon phylogeny and the hominoid slowdown. Mol Biol Evol 8:155–184

    PubMed  Google Scholar 

  • Bartlett MS (1963) The spectral analysis of point processes. J R Statist Soc B 25:264–296

    Google Scholar 

  • Bishop MJ, Friday AE (1985) Evolutionary trees from nucleic acid and protein sequences. Proc R Soc Lond B 226:271–302

    Google Scholar 

  • Bross ID (1990) How to eradicate fraudulent statistical methods: statisticians must do science. Biometrics 46:1213–1225

    PubMed  Google Scholar 

  • Bulmer M (1987) A statistical analysis of nucleotide sequences in introns and exons in human genes. Mol Biol Evol 4:395–405

    PubMed  Google Scholar 

  • Bulmer M (1989) Estimating the variability of substitution rates. Genetics 123:615–619

    PubMed  Google Scholar 

  • Cavender JA (1989) Mechanized derivation of linear invariants. Mol Biol Evol 6:301–316

    PubMed  Google Scholar 

  • Churchill GA (1989) Stochastic models for heterogeneous DNA sequences. Bull Math Biol 51:79–94

    PubMed  Google Scholar 

  • Cox DR (1961) Tests of separate families of hypotheses. Proceedings of the 4th Berkeley Symposium (University of California Press) 1:105–123

    Google Scholar 

  • Cox DR (1962) Further results on tests of separate families of hypotheses. J R Statist Soc B 24:406–424

    Google Scholar 

  • Cox DR, Miller HD (1977) The theory of stochastic processes. Chapman and Hall, London, pp 146–198

    Google Scholar 

  • Dams E, Hendriks L, Van de Peer Y, Neefs JM, Smits G, Vanderbempt I, de Wachter R (1988) Compilation of small subunit RNA subsequences. Nucl Acids Res 16:r87-r174

    PubMed  Google Scholar 

  • Edwards AWF (1972) Likelihood. Cambridge University Press, Cambridge, pp 31, 70–102

    Google Scholar 

  • Efron B (1982) The jackknife, the bootstrap and other resampling plans. Soc Ind Appl Math CBMS-Natl Sci Found Monogr 38

  • Efron B, Gong G (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Statistician 37:36–48

    Google Scholar 

  • Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci 1:54–77

    Google Scholar 

  • Felsenstein J (1973) Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249

    Google Scholar 

  • Felsenstein J (1978) The number of evolutionary trees. Syst Zool 27:27–33

    Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376

    PubMed  Google Scholar 

  • Felsenstein J (1983) Statistical inference of phylogenies. J R Statist Soc A 146:246–272

    Google Scholar 

  • Felsenstein J (1988) Phylogenies from molecular sequences: inference and reliability. Ann Rev Genet 22:521–565

    PubMed  Google Scholar 

  • Felsenstein J (1991a) Counting phylogenetic invariants in some simple cases. J Theor Biol 152:357–376

    PubMed  Google Scholar 

  • Felsenstein J (1991b) PHYLIP (Phylogenetic Inference Package) version 3.4, documentation. University of Washington, Seattle

    Google Scholar 

  • Gillespie JH (1986) Rates of molecular evolution. Ann Rev Ecol Syst 17:637–665

    Google Scholar 

  • Gillespie JH (1989) Lineage effects and the index of dispersion of molecular evolution. Mol Biol Evol 6:636–647

    PubMed  Google Scholar 

  • Goldman N (1990) Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses. Syst Zool 39:345–361

    Google Scholar 

  • Goldman N (1991) Statistical estimation of phylogenetic trees. PhD Thesis, University of Cambridge, Cambridge, pp 70–73

    Google Scholar 

  • Hall P, Wilson SR (1991) Two guidelines for bootstrap hypothesis testing. Biometrics 47:757–762

    Google Scholar 

  • Hasegawa M, Horai S (1991) Time of the deepest root for polymorphism in human mitochondrial DNA. J Mol Evol 32:37–42

    PubMed  Google Scholar 

  • Hasegawa M, Iida Y, Yano T, Takaiwa F, Iwabuchi M (1985a) Phylogenetic relationships among eukaryotic kingdoms inferred from ribosomal RNA sequences. J Mol Evol 22:32–38

    PubMed  Google Scholar 

  • Hasegawa M, Kishino H, Yano T (1985b) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174

    PubMed  Google Scholar 

  • Hasegawa M, Kishino H, Yano T (1987) Man's place in Hominoidea as inferred from molecular clocks of DNA. J Mol Evol 26:132–147

    PubMed  Google Scholar 

  • Hasegawa M, Kishino H, Yano T (1988) Phylogenetic inference from DNA sequence data. In: Matusita K (ed) Statistical theory and data analysis II. Elsevier, Holland, pp 1–13

    Google Scholar 

  • Hasegawa M, Kishino H, Yano T (1989) Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea. J Hum Evol 18:461–476

    Google Scholar 

  • Hasegawa M, Kishino H, Hayasaka K, Horai S (1990) Mitochondrial DNA evolution in primates: transition rate has been extremely low in lemur. J Mol Evol 31:113–121

    PubMed  Google Scholar 

  • Hasegawa M, Yano T, Kishino H (1984) A new molecular clock of mitochondrial DNA and the evolution of hominoids. Proc Jpn Acad B 60:95–98

    Google Scholar 

  • Holmes EC, Pesole G, Saccone C (1989) Stochastic models of molecular evolution and the estimation of phylogeny and rates of nucleotide substitution in the hominoid primates. J Hum Evol 18:775–794

    Google Scholar 

  • Hope ACA (1968) A simplified Monte Carlo significance test procedure. J R Statist Soc B 30:582–598

    Google Scholar 

  • Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism, vol 3. Academic Press, New York, pp 21–132

    Google Scholar 

  • Kendall M, Stuart A (1979) The advanced theory of statistics, vol 2. 4th ed. Charles Griffin, London, pp 240–252

    Google Scholar 

  • Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge, pp 65–89

    Google Scholar 

  • Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179

    PubMed  Google Scholar 

  • Kishino H, Hasegawa M (1990) Converting distance to time: application to human evolution. Meth Enz 183:550–570

    Google Scholar 

  • Koop BF, Goodman M, Xu P, Chan K, Slightom JL (1986) Primate eta-globin DNA sequences and man's place among the great apes. Nature 319:234–238

    PubMed  Google Scholar 

  • Lake JA (1987) A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4:167–191

    PubMed  Google Scholar 

  • Lake JA (1988) Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences. Nature 331:184–186

    PubMed  Google Scholar 

  • Lanave C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93

    PubMed  Google Scholar 

  • Langley CH, Fitch WM (1974) An examination of the constancy of the rate of molecular evolution. J Mol Evol 3:161–177

    PubMed  Google Scholar 

  • Li W-H, Gojobori T, Nei M (1981) Pseudogenes as a paradigm of neutral evolution. Nature 292:237–239

    PubMed  Google Scholar 

  • Lindgren BW (1976) Statistical theory. 3rd ed. Macmillan, New York, pp 307–308, 331, 424

    Google Scholar 

  • Lindsay JK (1974a) Comparison of probability distributions. J R Statist Soc B 36:38–44

    Google Scholar 

  • Lindsay JK (1974b) Construction and comparison of statistical models. J R Statist Soc B 36:418–425

    Google Scholar 

  • Lockhart PJ, Penny D, Hendy MD, Howe CJ, Beanland TJ, Larkum AD (1992) Controversy on chloroplast origins. FEBS Lett 301:127–131

    PubMed  Google Scholar 

  • Loh W-Y (1985) A new method for testing separate families of hypotheses. J Am Stat Assoc 80:362–368

    Google Scholar 

  • Maeda N, Wu CI, Bliska J, Reneke J (1988) Molecular evolution of intergenic DNA in higher primates: pattern of DNA changes, molecular clock, and evolution of repetitive sequences. Mol Biol Evol 5:1–20

    PubMed  Google Scholar 

  • Marriott FHC (1979) Barnard's Monte Carlo tests: how many simulations? Appl Statist 28:75–77

    Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models. 2nd ed. Chapman and Hall, London, pp 119, 174

    Google Scholar 

  • Navidi WC, Churchill GA, von Haeseler A (1991) Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. Mol Biol Evol 8:128–143

    PubMed  Google Scholar 

  • Oliver JL, Marín A, Medina J-R (1989) SDSE: a software package to simulate the evolution of a pair of DNA sequences. CABIOS 5:47–50

    PubMed  Google Scholar 

  • Penny D (1982) Towards a basis for classification: the incompleteness of distance measures, incompatibility analysis and phenetic classification. J Theor Biol 96:129–142

    PubMed  Google Scholar 

  • Penny D, Hendy MD (1986) Estimating the reliability of evolutionary trees. Mol Biol Evol 3:403–417

    PubMed  Google Scholar 

  • Penny D, Hendy MD, Steel MA (1992) Progress with methods for constructing evolutionary trees. TREE 7:73–79

    Google Scholar 

  • Pesole G, Bozzetti MP, Lanave C, Preparata G, Saccone C (1991) Glutamine synthetase gene evolution: a good molecular clock. Proc Natl Acad Sci USA 88:522–526

    PubMed  Google Scholar 

  • Ripley BD (1987) Stochastic simulation. John Wiley and Sons, New York, pp 171–174, 176

    Google Scholar 

  • Ritland K, Clegg MT (1987) Evolutionary analysis of plant DNA sequences. Am Nat 130:S74-S100

    Google Scholar 

  • Rodríguez F, Oliver JL, Marín A, Medina JR (1990) The general stochastic model of nucleotide substitution. J Theor Biol 142:485–501

    PubMed  Google Scholar 

  • Silvey SD (1975) Statistical inference. Chapman and Hall, London, pp 108–114

    Google Scholar 

  • Thorne JL, Kishino H, Felsenstein J (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 33:114–124 and Erratum, J Mol Evol (1992) 34:91

    PubMed  Google Scholar 

  • Thorne JL, Kishino H, Felsenstein J (1992) Inching toward reality: an improved likelihood model of sequence evolution. J Mol Evol 34:3–16

    PubMed  Google Scholar 

  • Williams DA (1970) Discrimination between regression models to determine the pattern of enzyme synthesis in synchronous cell cultures. Biometrics 26:23–32

    PubMed  Google Scholar 

  • Wilson AC, Carlson SS, White TJ (1977) Biochemical evolution. Ann Rev Biochem 46:573–639

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goldman, N. Statistical tests of models of DNA substitution. J Mol Evol 36, 182–198 (1993). https://doi.org/10.1007/BF00166252

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00166252

Key words

Navigation