skip to main content
10.1145/1854776.1854806acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Inferring species trees from gene duplication episodes

Published:02 August 2010Publication History

ABSTRACT

Gene tree parsimony, which infers a species tree that implies the fewest gene duplications across a collection of gene trees, is a method for inferring phylogenetic trees from paralogous genes. However, it assumes that all duplications are independent, and therefore, it does not account for large-scale gene duplication events like whole genome duplications. We describe two methods to infer species trees based on gene duplication events that may involve multiple genes. First, gene episode parsimony seeks the species tree that implies the fewest possible gene duplication episodes. Second, adjusted gene tree parsimony corrects the number of gene duplications at each node in the species tree by treating the largest possible gene duplication episode as a single duplication. We test both new methods, as well as gene tree parsimony, using 7,091 gene trees representing 7 plant taxa. Gene tree parsimony and adjusted gene tree parsimony both perform well, returning the species tree after an exhaustive search of the tree space. By contrast, gene episode parsimony fails to rank the true species tree within the top third of all possible topologies. Furthermore, gene trees with randomly permuted leaf labels can imply fewer duplication episodes than gene trees with the correct leaf labels. Adjusted gene tree parsimony reflects a potentially more realistic and, at least for small data sets, computationally feasible model for counting gene duplication events than treating each duplication independently or minimizing the number of possible duplication episodes.

References

  1. Guigó, R., Muchnik, I., and Smith, T. F. 1996 Reconstruction of ancient molecular phylogeny. Mol Phylogenet Evol. 6, 189--213.Google ScholarGoogle ScholarCross RefCross Ref
  2. Page, R. D. M. and Cotton, J. A. 2002 Vertebrate phylogenomics: reconciled trees and gene duplications. Pacific Symposium on Biocomputing 536--547.Google ScholarGoogle Scholar
  3. Goodman, M., Czelusniak, J., Moore G. W., Romero-Herrera, A. E., and Matsuda, G. 1979 Fitting the gene lineage into its species lineage: a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Zool. 28, 132--163.Google ScholarGoogle ScholarCross RefCross Ref
  4. Maddison, W. P. 1997 Gene trees in species trees. Syst Biol. 46, 523--536.Google ScholarGoogle ScholarCross RefCross Ref
  5. Page, R. D. M. and Charleston, M. A. 1997 From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Mol Phylogenet Evol. 7, 231--240.Google ScholarGoogle ScholarCross RefCross Ref
  6. Slowinski, J. B., Knight, A., and Rooney, A. P. 1997. Inferring species trees from gene trees: A phylogenetic analysis of the elapidae (serpentes) based on the amino acid sequences of venom proteins. Mol Phylogenet Evol. 8, 349--362.Google ScholarGoogle ScholarCross RefCross Ref
  7. Martin, A. P. and Burg, T. M. 2002 Perils of paralogy: using HSP70 genes for inferring organismal phylogenies. Syst Biol. 41, 570--587.Google ScholarGoogle ScholarCross RefCross Ref
  8. Page, R. D. M. 2000 Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. Mol Phylogenet Evol. 14, 89--106.Google ScholarGoogle ScholarCross RefCross Ref
  9. Cotton, J. A. and Page, R. D. M. 2002. Going nuclear: gene family evolution and vertebrate phylogeny reconciled. P Roy Soc Lond B. Biol. 269, 1555--1561.Google ScholarGoogle ScholarCross RefCross Ref
  10. Cotton, J. A. and Page, R. D. M. 2004. Tangled tales from multiple markers: reconciling conflict between phylogenies to build molecular supertrees. In: Bininda-Emonds ORP, editor. Phylogenetic supertrees: combining information to reveal the tree of life. Dordrecht, Netherlands: Springer-Verlag. p. 107--125.Google ScholarGoogle Scholar
  11. McGowen, M. R., Clark, C., and Gatesy, J. 2008 The vestigial olfactory receptor subgenome of odontocete whales: phylogenetic congruence between gene-tree reconciliation and supermatrix methods. Syst. Biol. 57, 574--590.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sanderson, M. J., and McMahon, M. M. 2007 Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evol Biol. 7, S3.Google ScholarGoogle ScholarCross RefCross Ref
  13. Bansal, M. S., Burleigh, J. G., Eulenstein, O., and Wehe, A. 2007 Heuristics for the gene-duplication problem: A θ(n) speed-up for the local search RECOMB 2007, LNCS 4453, 238--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Wehe A, Bansal MS, Burleigh JG, Eulenstein O. 2008. DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics. 24:1540--1541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Simmons, M. P. and Freudenstein, J. V. 2002 Uninode coding vs gene tree parsimony for phylogenetic reconstruction using duplicate genes. Mol Phylogenet Evol. 23, 481--498.Google ScholarGoogle ScholarCross RefCross Ref
  16. Cotton, J. A. and Page, R. D. M. 2003 Gene tree parsimony vs. uninode coding for phylogenetic reconstruction. Mol Phylogenet Evol. 29, 298--308.Google ScholarGoogle ScholarCross RefCross Ref
  17. Wilkinson, M., Cotton, J. A., Creevey, C., Eulenstein, O., Harris, S. R., Lapointe, F. J., Levasseur, C., McInerney, J. O., Pisani, D., and Thorley, J. L. 2005 The shape of supertrees to come: tree shape related properties of fourteen supertree methods. Syst Biol. 54, 419--431.Google ScholarGoogle ScholarCross RefCross Ref
  18. Wood, T. E., Takebayashi, N., Barker, M. S., Mayrose, I., Greenspoon, P. B., and Rieseberg, L. H. 2009 The frequency of polyploidy speciation in vascular plants. Proc. Natl. Acad. Sci. USA 106, 13875--13879.Google ScholarGoogle ScholarCross RefCross Ref
  19. Maere, S., De Bodt, S., Raes, J., Casneuf, T., Van Montagu, M., Kuiper, M., and Van de Peer, Y. 2005 Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci USA. 102, 5454--5459.Google ScholarGoogle ScholarCross RefCross Ref
  20. Bansal, M. S. and Eulenstein, O. 2008 The multiple gene duplication problem revisited. Bioinformatics. 24, i132--i138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Luo, C. W., Chen, M. C., Chen, Y. C., Yang, R. W. L., Liu, H. F., and Chao, K. M. 2009 Linear-time algorithms for the multiple gene duplication problems. IEEE/ACM Transactions on Computational Biology and Bioinformatics 99, 5555.Google ScholarGoogle Scholar
  22. Eulenstein, O. 1998 Predictions of gene-duplications and their phylogenetic development. PhD thesis, University of Bonn, Germany.Google ScholarGoogle Scholar
  23. Fellows, M., Hallet, M., and Stege, U. 1998 On the multiple gene duplication problem. ISAAC'98, LNCS 1533, 347--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Doyon, J. P., Chauve, C., and Hamel, S. 2009 Space of gene/species trees reconciliations and parsimonious models. J. Comput. Biol. 16, 1399--1418.Google ScholarGoogle ScholarCross RefCross Ref
  25. APG III. 2009 An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG III. Bot J Linn Soc. 161, 105--121.Google ScholarGoogle ScholarCross RefCross Ref
  26. Cui, L., Wall, P. K., Leebens-Mack, J. H., Lindsay, B. G., Soltis, D. E., Doyle, J. J., Soltis, P. S., Carlson, J. E., Arumuganathan, K., Barakat, A., Albert, V. A., Ma, H., and dePamphilis, C. W. 2006 Widespread genome duplications throughout the history of flowering plants. Genome Res. 16, 738--749.Google ScholarGoogle ScholarCross RefCross Ref
  27. Soltis, D. E., Albert, A. A., Leebens-Mack, J., Bell, C. D., Paterson, A. H., Zheng, C., Sankoff, D., dePamphilis, C. W., Wall, P. K., and P. S. Soltis. 2009 Polyploidy and angiosperm diversification. Am. J. Bot. 96, 336--348.Google ScholarGoogle ScholarCross RefCross Ref
  28. Hartmann, S., Lu, D., Phillips, J., and Vision, T. J. 2006 Phytome: a platform for plant comparative genomics. Nucleic Acids Res. 34, 724--730.Google ScholarGoogle ScholarCross RefCross Ref
  29. Hartmann, S., and Vision, T. J. 2008. Using ESTs for phylogenomics: Can one accurately infer a phylogenetic tree from a gappy alignment? BMC Evol Biol. 8, 95.Google ScholarGoogle ScholarCross RefCross Ref
  30. Stamatakis, A. 2006 RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 22, 2688--2690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jones, D. T., Taylor, W. R., and Thornton, J. M. 1992 The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275--282.Google ScholarGoogle Scholar
  32. Górecki, P. and Tiuryn, J. 2007 Urec: a system for unrooted reconciliation. Bioinformatics. 23, 511--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hahn, M. 2007 Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biol. 8, R141.Google ScholarGoogle ScholarCross RefCross Ref
  34. Burleigh, J. G., Bansal, M. S., Wehe, A., and Eulenstein, O. 2009 Locating large-scale gene duplication events through reconciled trees: implications for identifying ancient polyploidy in plants. J. Comput. Biol. 16, 1071--1083.Google ScholarGoogle ScholarCross RefCross Ref
  35. Bansal, M. S. and Eulenstein, O 2007. An Ω(n 2/log n) speed-up of TBR heuristics for the gene-duplication problem. WABI 2007, LNCS 4645, 124--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Bansal, M. S. and Eulenstein O. 2008 The gene-duplication problem: near-linear time algorithms for NNI based local searches. ISBRA 2008, LNCS 4983, 14--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wehe, A. and Burleigh, J. G. 2010 Scaling the gene duplication problem towards the tree of life: accelerating the rSPR heuristic search. BiCob 2010, LNCS, In press.Google ScholarGoogle Scholar

Index Terms

  1. Inferring species trees from gene duplication episodes

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
          August 2010
          705 pages
          ISBN:9781450304382
          DOI:10.1145/1854776

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 August 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate254of885submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader