A benchmark for evaluation of phylogeny reconstruction programs

Sergei Spirin

doi:10.7287/peerj.preprints.2628v1

Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia

DOI: 10.7287/peerj.preprints.2628v1

Published: 2016-12-08
Accepted: 2016-12-08

Subject Areas: Bioinformatics, Computational Biology, Evolutionary Studies
Keywords: Phylogeny reconstruction, Benchmark, Evolutionary protein domain

Copyright: © 2016 Spirin
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Spirin S. 2016. A benchmark for evaluation of phylogeny reconstruction programs. PeerJ Preprints 4:e2628v1 https://doi.org/10.7287/peerj.preprints.2628v1

Abstract

There are a lot of algorithms and programs for reconstruction of phylogeny of a set of proteins basing on multiple sequence alignment. Many programs allow users to choose a number of parameters, for example, a model for maximum likelihood programs. Different programs and different parameters often produce different results. However at the moment all published benchmarks for evaluation of relative accuracy of programs or different choices of parameters are based on simulated sequences. The aim of the present work is to create a benchmark that allows a comparison of phylogenetic programs on large sets of alignments of natural protein sequences.

Author Comment

This is a preprint submission to PeerJ Preprints. A poster describing this work was presented at the CHARME / EMBnet / NETTAB 2016 Workshop in Rome.

Supplemental Information

Reference trees

Reference trees for three sets of organisms

DOI: 10.7287/peerj.preprints.2628v1/supp-1

Download

Alignments

The archive contains alignments of orthologous series of protein domains. In each alignment sequences are orthologs from different organisms. Names of sequences are Uniprot mnemonics of corresponding organisms. See tables in the file Organisms.xlsx for full names of the organisms. Names of files are Pfam AC's with figures (1, 2, ets.) added to distinguish different orthologous series from one Pfam family. In folders Metazoa25, Fungi45 and Proteobacteria45 there are alignments of full-size orthologous series, in other folders there are random selections of 10 and 15 (for Metazoa) or 15 and 30 (for Fungi and Proteobacteria) sequences from each orthologous series.

DOI: 10.7287/peerj.preprints.2628v1/supp-2

Download

Tables of organisms

The file contains tables of organisms from which the protein domains were taken. There are three tables: with 25 Metazoa, 45 Fungi and 45 Proteobacteria. For each organism, its Uniprot mnemonic is presented, these mnemonics are used in alignments and trees. For Metazoa, taxonomic divisions are presented that allows to construct a binary (i.e. fully resolved) tree. For Fungi and Protebacteria also some taxonomic divisions are presented, just to simplify orientation in data.

DOI: 10.7287/peerj.preprints.2628v1/supp-3

Download

Supplemental Information

Reference trees

Alignments

Tables of organisms

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article