ABSTRACT
Gene tree parsimony, which infers a species tree that implies the fewest gene duplications across a collection of gene trees, is a method for inferring phylogenetic trees from paralogous genes. However, it assumes that all duplications are independent, and therefore, it does not account for large-scale gene duplication events like whole genome duplications. We describe two methods to infer species trees based on gene duplication events that may involve multiple genes. First, gene episode parsimony seeks the species tree that implies the fewest possible gene duplication episodes. Second, adjusted gene tree parsimony corrects the number of gene duplications at each node in the species tree by treating the largest possible gene duplication episode as a single duplication. We test both new methods, as well as gene tree parsimony, using 7,091 gene trees representing 7 plant taxa. Gene tree parsimony and adjusted gene tree parsimony both perform well, returning the species tree after an exhaustive search of the tree space. By contrast, gene episode parsimony fails to rank the true species tree within the top third of all possible topologies. Furthermore, gene trees with randomly permuted leaf labels can imply fewer duplication episodes than gene trees with the correct leaf labels. Adjusted gene tree parsimony reflects a potentially more realistic and, at least for small data sets, computationally feasible model for counting gene duplication events than treating each duplication independently or minimizing the number of possible duplication episodes.
- Guigó, R., Muchnik, I., and Smith, T. F. 1996 Reconstruction of ancient molecular phylogeny. Mol Phylogenet Evol. 6, 189--213.Google ScholarCross Ref
- Page, R. D. M. and Cotton, J. A. 2002 Vertebrate phylogenomics: reconciled trees and gene duplications. Pacific Symposium on Biocomputing 536--547.Google Scholar
- Goodman, M., Czelusniak, J., Moore G. W., Romero-Herrera, A. E., and Matsuda, G. 1979 Fitting the gene lineage into its species lineage: a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Zool. 28, 132--163.Google ScholarCross Ref
- Maddison, W. P. 1997 Gene trees in species trees. Syst Biol. 46, 523--536.Google ScholarCross Ref
- Page, R. D. M. and Charleston, M. A. 1997 From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. Mol Phylogenet Evol. 7, 231--240.Google ScholarCross Ref
- Slowinski, J. B., Knight, A., and Rooney, A. P. 1997. Inferring species trees from gene trees: A phylogenetic analysis of the elapidae (serpentes) based on the amino acid sequences of venom proteins. Mol Phylogenet Evol. 8, 349--362.Google ScholarCross Ref
- Martin, A. P. and Burg, T. M. 2002 Perils of paralogy: using HSP70 genes for inferring organismal phylogenies. Syst Biol. 41, 570--587.Google ScholarCross Ref
- Page, R. D. M. 2000 Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. Mol Phylogenet Evol. 14, 89--106.Google ScholarCross Ref
- Cotton, J. A. and Page, R. D. M. 2002. Going nuclear: gene family evolution and vertebrate phylogeny reconciled. P Roy Soc Lond B. Biol. 269, 1555--1561.Google ScholarCross Ref
- Cotton, J. A. and Page, R. D. M. 2004. Tangled tales from multiple markers: reconciling conflict between phylogenies to build molecular supertrees. In: Bininda-Emonds ORP, editor. Phylogenetic supertrees: combining information to reveal the tree of life. Dordrecht, Netherlands: Springer-Verlag. p. 107--125.Google Scholar
- McGowen, M. R., Clark, C., and Gatesy, J. 2008 The vestigial olfactory receptor subgenome of odontocete whales: phylogenetic congruence between gene-tree reconciliation and supermatrix methods. Syst. Biol. 57, 574--590.Google ScholarCross Ref
- Sanderson, M. J., and McMahon, M. M. 2007 Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evol Biol. 7, S3.Google ScholarCross Ref
- Bansal, M. S., Burleigh, J. G., Eulenstein, O., and Wehe, A. 2007 Heuristics for the gene-duplication problem: A θ(n) speed-up for the local search RECOMB 2007, LNCS 4453, 238--252. Google ScholarDigital Library
- Wehe A, Bansal MS, Burleigh JG, Eulenstein O. 2008. DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics. 24:1540--1541. Google ScholarDigital Library
- Simmons, M. P. and Freudenstein, J. V. 2002 Uninode coding vs gene tree parsimony for phylogenetic reconstruction using duplicate genes. Mol Phylogenet Evol. 23, 481--498.Google ScholarCross Ref
- Cotton, J. A. and Page, R. D. M. 2003 Gene tree parsimony vs. uninode coding for phylogenetic reconstruction. Mol Phylogenet Evol. 29, 298--308.Google ScholarCross Ref
- Wilkinson, M., Cotton, J. A., Creevey, C., Eulenstein, O., Harris, S. R., Lapointe, F. J., Levasseur, C., McInerney, J. O., Pisani, D., and Thorley, J. L. 2005 The shape of supertrees to come: tree shape related properties of fourteen supertree methods. Syst Biol. 54, 419--431.Google ScholarCross Ref
- Wood, T. E., Takebayashi, N., Barker, M. S., Mayrose, I., Greenspoon, P. B., and Rieseberg, L. H. 2009 The frequency of polyploidy speciation in vascular plants. Proc. Natl. Acad. Sci. USA 106, 13875--13879.Google ScholarCross Ref
- Maere, S., De Bodt, S., Raes, J., Casneuf, T., Van Montagu, M., Kuiper, M., and Van de Peer, Y. 2005 Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci USA. 102, 5454--5459.Google ScholarCross Ref
- Bansal, M. S. and Eulenstein, O. 2008 The multiple gene duplication problem revisited. Bioinformatics. 24, i132--i138. Google ScholarDigital Library
- Luo, C. W., Chen, M. C., Chen, Y. C., Yang, R. W. L., Liu, H. F., and Chao, K. M. 2009 Linear-time algorithms for the multiple gene duplication problems. IEEE/ACM Transactions on Computational Biology and Bioinformatics 99, 5555.Google Scholar
- Eulenstein, O. 1998 Predictions of gene-duplications and their phylogenetic development. PhD thesis, University of Bonn, Germany.Google Scholar
- Fellows, M., Hallet, M., and Stege, U. 1998 On the multiple gene duplication problem. ISAAC'98, LNCS 1533, 347--356. Google ScholarDigital Library
- Doyon, J. P., Chauve, C., and Hamel, S. 2009 Space of gene/species trees reconciliations and parsimonious models. J. Comput. Biol. 16, 1399--1418.Google ScholarCross Ref
- APG III. 2009 An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG III. Bot J Linn Soc. 161, 105--121.Google ScholarCross Ref
- Cui, L., Wall, P. K., Leebens-Mack, J. H., Lindsay, B. G., Soltis, D. E., Doyle, J. J., Soltis, P. S., Carlson, J. E., Arumuganathan, K., Barakat, A., Albert, V. A., Ma, H., and dePamphilis, C. W. 2006 Widespread genome duplications throughout the history of flowering plants. Genome Res. 16, 738--749.Google ScholarCross Ref
- Soltis, D. E., Albert, A. A., Leebens-Mack, J., Bell, C. D., Paterson, A. H., Zheng, C., Sankoff, D., dePamphilis, C. W., Wall, P. K., and P. S. Soltis. 2009 Polyploidy and angiosperm diversification. Am. J. Bot. 96, 336--348.Google ScholarCross Ref
- Hartmann, S., Lu, D., Phillips, J., and Vision, T. J. 2006 Phytome: a platform for plant comparative genomics. Nucleic Acids Res. 34, 724--730.Google ScholarCross Ref
- Hartmann, S., and Vision, T. J. 2008. Using ESTs for phylogenomics: Can one accurately infer a phylogenetic tree from a gappy alignment? BMC Evol Biol. 8, 95.Google ScholarCross Ref
- Stamatakis, A. 2006 RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 22, 2688--2690. Google ScholarDigital Library
- Jones, D. T., Taylor, W. R., and Thornton, J. M. 1992 The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275--282.Google Scholar
- Górecki, P. and Tiuryn, J. 2007 Urec: a system for unrooted reconciliation. Bioinformatics. 23, 511--512. Google ScholarDigital Library
- Hahn, M. 2007 Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biol. 8, R141.Google ScholarCross Ref
- Burleigh, J. G., Bansal, M. S., Wehe, A., and Eulenstein, O. 2009 Locating large-scale gene duplication events through reconciled trees: implications for identifying ancient polyploidy in plants. J. Comput. Biol. 16, 1071--1083.Google ScholarCross Ref
- Bansal, M. S. and Eulenstein, O 2007. An Ω(n 2/log n) speed-up of TBR heuristics for the gene-duplication problem. WABI 2007, LNCS 4645, 124--135. Google ScholarDigital Library
- Bansal, M. S. and Eulenstein O. 2008 The gene-duplication problem: near-linear time algorithms for NNI based local searches. ISBRA 2008, LNCS 4983, 14--25. Google ScholarDigital Library
- Wehe, A. and Burleigh, J. G. 2010 Scaling the gene duplication problem towards the tree of life: accelerating the rSPR heuristic search. BiCob 2010, LNCS, In press.Google Scholar
Index Terms
- Inferring species trees from gene duplication episodes
Recommendations
A Characterization of the Set of Species Trees that Produce Anomalous Ranked Gene Trees
Ranked gene trees, which consider both the gene tree topology and the sequence in which gene lineages separate, can potentially provide a new source of information for use in modeling genealogies and performing inference of species trees. Recently, we ...
From Gene Trees to Species Trees II: Species Tree Inference by Minimizing Deep Coalescence Events
When gene copies are sampled from various species, the resulting gene tree might disagree with the containing species tree. The primary causes of gene tree and species tree discord include incomplete lineage sorting, horizontal gene transfer, and gene ...
Synthesizing Species Trees from Unrooted Gene Trees: A Parameterized Approach
ACM-BCB '17: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health InformaticsSynthesizing species trees from a collection of smaller gene trees is a widely used approach for inferring credible species tree estimates. While corresponding computational problems are typically NP-hard, several of these problems have been effectively ...
Comments