EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates
- Albert J. Vilella1,
- Jessica Severin1,3,
- Abel Ureta-Vidal1,4,
- Li Heng2,
- Richard Durbin2 and
- Ewan Birney1,5
- 1 EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom;
- 2 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, United Kingdom
Abstract
We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.
Footnotes
-
↵3 Present addresses: RIKEN Yokohama Institute, Genomic Sciences Center (GSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan;
-
↵4 Eagle Genomics, 19 Forge End, Stapleford, Cambridge CB22 5BN, UK.
-
↵5 Corresponding author.
↵E-mail birney{at}ebi.ac.uk; fax 44-1223-494919.
-
[Supplemental material is available online at www.genome.org.]
-
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.073585.107.
-
- Received October 26, 2007.
- Accepted November 18, 2008.
-
Freely available online through the Genome Research open access option.
- Copyright © 2009 by Cold Spring Harbor Laboratory Press