EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates

  1. Albert J. Vilella1,
  2. Jessica Severin1,3,
  3. Abel Ureta-Vidal1,4,
  4. Li Heng2,
  5. Richard Durbin2 and
  6. Ewan Birney1,5
  1. 1 EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom;
  2. 2 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, United Kingdom

    Abstract

    We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project.

    Footnotes

    • 3 Present addresses: RIKEN Yokohama Institute, Genomic Sciences Center (GSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan;

    • 4 Eagle Genomics, 19 Forge End, Stapleford, Cambridge CB22 5BN, UK.

    • 5 Corresponding author.

      E-mail birney{at}ebi.ac.uk; fax 44-1223-494919.

    • [Supplemental material is available online at www.genome.org.]

    • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.073585.107.

      • Received October 26, 2007.
      • Accepted November 18, 2008.
    • Freely available online through the Genome Research open access option.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server