Skip to main content

Estimating Maximum Likelihood Phylogenies with PhyML

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 537))

Abstract

Our understanding of the origins, the functions and/or the structures of biological sequences strongly depends on our ability to decipher the mechanisms of molecular evolution. These complex processes can be described through the comparison of homologous sequences in a phylogenetic framework. Moreover, phylogenetic inference provides sound statistical tools to exhibit the main features of molecular evolution from the analysis of actual sequences. This chapter focuses on phylogenetic tree estimation under the maximum likelihood (ML) principle. Phylogenies inferred under this probabilistic criterion are usually reliable and important biological hypotheses can be tested through the comparison of different models. Estimating ML phylogenies is computationally demanding, and careful examination of the results is warranted. This chapter focuses on PhyML, a software that implements recent ML phylogenetic methods and algorithms. We illustrate the strengths and pitfalls of this program through the analysis of a real data set. PhyML v3.0 is available from http://atgc_montpellier.fr/phyml/.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17, 368–76.

    Article  PubMed  CAS  Google Scholar 

  2. Rogers, J., and Swofford, D. (1999) Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. Mol Biol Evol 16, 1079–85.

    CAS  Google Scholar 

  3. Huelsenbeck, J. P., and Hillis, D. (1993) Success of phylogenetic methods in the four-taxon case. Syst Biol 42, 247–64.

    Google Scholar 

  4. Swofford, D., Olsen, G., Waddel, P., and Hillis, D. (1996) Phylogenetic inference. In D. Hillis, C. Moritz, B. Mable, eds., Molecular Systematics, chapter 11. Sinauer, Sunderland, MA.

    Google Scholar 

  5. Guindon, S., and Gascuel, O. (2003) A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52, 696–704.

    Article  Google Scholar 

  6. Olsen, G., Matsuda, H., Hagstrom, R ., and Overbeek, R. (1994) fastDNAml: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput Appl Biosci 10, 41–8.

    CAS  Google Scholar 

  7. Hordijk, W., and Gascuel, O. (2005) Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood. Bioinformatics 21, 4338–47.

    Article  PubMed  CAS  Google Scholar 

  8. Anisimova, M., and Gascuel, O. (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55, 539–52.

    Article  PubMed  Google Scholar 

  9. Shimodaira, H., and Hasegawa, M. (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol, 16, 1114–6.

    CAS  Google Scholar 

  10. Jukes, T., and Cantor, C. (1969) Evolution of protein molecules. In H. Munro, ed., Mammalian Protein Metabolism, volume III, chapter 24, 21–132. Academic Press, New York.

    Google Scholar 

  11. Kimura, M. (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16, 111–20.

    Article  PubMed  CAS  Google Scholar 

  12. Felsenstein, J. (1993) PHYLIP (PHYLogeny Inference Package) Version 3.6a2. Distributed by the author, Department of Genetics, University of Washington, Seattle.

    Google Scholar 

  13. Hasegawa, M., Kishino, H., and Yano, T. (1985) Dating of the Human-Ape splitting by a molecular clock of mitochondrial-DNA. J Mol Evol 22, 160–74.

    Article  PubMed  CAS  Google Scholar 

  14. Tamura, K., and Nei, M. (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10, 512–26.

    CAS  Google Scholar 

  15. Lanave, C., Preparata, G., Saccone, C., and Serio, G. (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20, 86–93.

    Article  PubMed  CAS  Google Scholar 

  16. Tavaré, S. (1986) Some probabilistic and statistical problems on the analysis of DNA sequences. Lect Mathe Life Sci, 17, 57–86.

    Google Scholar 

  17. Whelan, S., and Goldman, N. (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18, 691–9.

    Article  CAS  Google Scholar 

  18. Dayhoff, M., Schwartz, R., and Orcutt, B. (1978) A model of evolutionary change in proteins. In M. Dayhoff, ed., Atlas of Protein Sequence and Structure, volume 5, 345–52. National Biomedical Research Foundation, Washington, D. C.

    Google Scholar 

  19. Jones, D., Taylor, W., and Thornton, J. (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci, 8, 275–82.

    PubMed  CAS  Google Scholar 

  20. Henikoff, S., and Henikoff, J. (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89, 10915–9.

    Google Scholar 

  21. Adachi, J., and Hasegawa, M. (1996) MOLPHY version 2.3. programs for molecular phylogenetics based on maximum likelihood. In M. Ishiguro, G. Kitagawa, Y. Ogata, H. Takagi, Y. Tamura, T. Tsuchiya, eds., Computer Science Monographs, 28. The Institute of Statistical Mathematics, Tokyo.

    Google Scholar 

  22. Dimmic, M., Rest, J., Mindell, D., and Goldstein, D. (2002) rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol 55, 65–73.

    Article  PubMed  CAS  Google Scholar 

  23. Adachi, J., P., Martin, W., and Hasegawa, M. (2000) Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J Mol Evol 50, 348–58.

    PubMed  CAS  Google Scholar 

  24. Kosiol, C., and Goldman, N. (2004) Different versions of the Dayhoff rate matrix. Mol Biol and Evol 22, 193–9.

    Article  Google Scholar 

  25. Muller, T., and Vingron, M. (2000) Modeling amino acid replacement. J Comput Biol 7, 761–76.

    Article  PubMed  CAS  Google Scholar 

  26. Cao, Y., Janke, A., Waddell, P., Westerman, M., Takenaka, O., Murata, S., Okada, N., Paabo, S., and Hasegawa, M. (1998) Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J Mol Evol 47, 307–22.

    Article  PubMed  CAS  Google Scholar 

  27. Yang, Z. (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39, 306–14.

    Article  PubMed  CAS  Google Scholar 

  28. Gascuel, O. (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14, 685–95.

    CAS  Google Scholar 

  29. Posada, D., and Crandall, K. (1998) Modeltest: testing the model of DNA substitution. Bioinformatics 14, 817–918.

    Article  PubMed  CAS  Google Scholar 

  30. Abascal, F., Zardoya, R., and Posada, D. (2005) Prottest: selection of best-fit models of protein evolution. Bioinformatics 21, 2104–5.

    Article  PubMed  CAS  Google Scholar 

  31. Galtier, N., and Jean-Marie, A. (2004) Markov-modulated Markov chains and the covarion process of molecular evolution. J Comput Biol, 11, 727–33.

    Article  PubMed  CAS  Google Scholar 

  32. Lin, Y.-H., McLenachan, P., Gore, A., Phillips, M., Ota, R., Hendy, M., and Penny, D. (2002) Four new mitochondrial genomes, and the stability of evolutionary trees of mammals. Mol Biol Evol 19, 2060–70.

    Article  CAS  Google Scholar 

  33. Reyes, A., Gissi, C., Catzeflis, F., Nevo, E., Pesole, G., and Saccone, C. (2004) Congruent mammalian trees from mitochondrial and nuclear genes using bayesian methods. Mol Biol Evol 21, 397–403.

    Article  CAS  Google Scholar 

  34. Murphy, M., Eizirik, E., O'Brien, S., Madsen, O., Scally, M., Douady, C., Teeling, E., Ryder, O., Stanhope, M., de Jong, W., and Springer, M. (2001) Resolution of the early placental mammal radiation using bayesian phylogenetics. Science 294, 2348–51.

    Article  PubMed  CAS  Google Scholar 

  35. Delsuc, F., Scally, M., Madsen, O., Stanhope, M., de Jong, W., Catzeflis, F., Springer, M., and Douzery, E. (2002) Molecular phylogeny of living xenarthrans and the impact of character and taxon sampling on the placental tree rooting. Mol Biol Evol 19, 1656–71.

    Article  CAS  Google Scholar 

  36. Amrine-Madsen, H., Koepfli, K., Wayne, R., and Springer, M. (2003) A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships. Mol Phylogenet Evol 28, 225–40.

    Article  CAS  Google Scholar 

  37. Springer, M., Bry, R. D., Douady, C., Amrine, H., Madsen, O., de Jong, W., and Stanhope., M. (2001) Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Mol Biol Evol 18, 132–43.

    Article  CAS  Google Scholar 

  38. D'Erchia, A., Gissi, C., Pesole, G., Saccone, C., and Arnason, U. (1996) The guinea-pig is not a rodent. Nature 381, 597–600.

    Article  PubMed  Google Scholar 

  39. Reyes, A., Pesole, G., and Saccone, C. (1998) Complete mitochondrial DNA sequence of the fat dormouse, Glis glis: further evidence of rodent paraphyly. Mol Biol Evol 15, 499–505.

    CAS  Google Scholar 

  40. Reyes, A., Pesole, G., and Saccone, C. (2000) Long-branch attraction phenomenon and the impact of among-site rate variation on rodent phylogeny. Gene 259, 177–87.

    Article  PubMed  CAS  Google Scholar 

  41. Philippe, H. (1997) Rodent monophyly: pitfalls of molecular phylogenies. J Mol Evol 45, 712–5.

    PubMed  CAS  Google Scholar 

  42. Sullivan, J., and Swofford, D. (1997) Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J Mammal Evol 4, 77–86.

    Article  Google Scholar 

  43. Felsenstein, J. (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–91.

    Article  Google Scholar 

  44. Felsenstein, J., and Churchill, G. (1996) A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13, 93–104.

    PubMed  CAS  Google Scholar 

  45. Schniger, M., and von Haesler, A. (1994) A stochastic model for the evolution of autocorrelated DNA sequences. Mol Phylogeny Evol 3, 240–7.

    Article  Google Scholar 

  46. Muse, S. (1995) Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics 139, 1429–39.

    PubMed  CAS  Google Scholar 

  47. Tillier, E., and Collins, R. (1998) High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal rna. Genetics 148, 1993–2002.

    PubMed  CAS  Google Scholar 

  48. Aarts, E., and Lenstra, J. K. (1997) Local Search in Combinatorial Optimization. Wiley, Chichester.

    Google Scholar 

  49. Yang, Z. (1997) PAML : a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13, 555–6.

    PubMed  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the “MITOSYS” grant from ANR. The chapter itself is the contribution 2007–08 of the Institut des Sciences de l'Evolution (UMR5554-CNRS).

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Guindon, S., Delsuc, F., Dufayard, JF., Gascuel, O. (2009). Estimating Maximum Likelihood Phylogenies with PhyML. In: Posada, D. (eds) Bioinformatics for DNA Sequence Analysis. Methods in Molecular Biology, vol 537. Humana Press. https://doi.org/10.1007/978-1-59745-251-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-251-9_6

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-910-9

  • Online ISBN: 978-1-59745-251-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics