Fishing for minimum evolution trees with Neighbor-Nets
Introduction
A phylogenetic tree on a given set of species X is a connected, acyclic graph such that its leaf set is X and all its non-leaf vertices have degree at least three [24]. Such trees are used by biologists to represent the evolutionary history of the species in X. An important problem in phylogenetics is to construct such trees, and various methods have been developed for this purpose [18]. A common approach to tackling this problem is to search through the space of phylogenetic trees, trying to find a tree (or trees) that optimize some score such as the minimum evolution criterion [23]. However, a straight-forward exhaustive search is hampered by the fact that the space of phylogenetic trees on X grows exponentially in . Moreover, it has been shown that finding an optimal tree is NP-hard for many of the popular optimization criteria (see e.g. [6], [8]).
Interestingly, there is an alternative approach to searching through tree space, which was studied quite early on in the development of phylogenetics (see e.g. [10], [21]), and more recently in [3], but that has received somewhat less attention in the literature. In particular, instead of searching through the set of all possible trees on the set X, we look for trees within a collection of bipartitions or splits of X. The rationale behind this approach is that any phylogenetic tree induces a set of splits of X in which every split corresponds to a branch of the tree, and that this set of splits uniquely determines the tree (cf. [24]). Intriguingly, in [4] a dynamic programming framework is developed to search for trees in a given collection of splits of X, also called a split system. Although still requiring exponential time in general, this approach has the advantage that it can yield polynomial time algorithms when restricted to split systems having size that is polynomial in . It is therefore of interest to develop efficient algorithms to search for trees in special classes of split systems, as well as ways to generate split systems which capture salient information.
In this vein, here we develop an algorithm for searching for a tree that locally optimizes the minimum evolution criterion by searching in a circular split system. This is a special type of split system that can be generated, for example, by the NeighborNet algorithm [5] for constructing phylogenetic networks (see Fig. 1 for an example). In particular, we show that for a circular split system there is an time algorithm for computing an optimal minimum evolution tree, which improves on the run time of for the more general minimum evolution algorithm presented by Bryant in [4, Section 5.5]. We also present some simulations which indicate that minimum evolution trees in circular split systems generated by NeighborNet can compare favorably with those obtained by searching through the whole of tree space.
Before continuing, we note that, in view of the fact that split systems are often displayed by phylogenetic networks such as the one in Fig. 1, it might appear that the problem of searching for trees in split systems is closely related to the problem of finding optimal subtrees in phylogenetic networks. While some recent results on this latter problem can be found in [16], [17], it is, in fact, quite different from the problem we study here since, for example, the minimum evolution tree in a circular split system generated by NeighborNet is not necessarily a subtree of the network used to display this split system.
The structure of the rest of this paper is as follows. After recalling some background material on the minimum evolution problem in the next section, in Section 3, we recall Bryantʼs dynamic programming algorithm for finding minimum evolution trees in a split system. We then describe our new algorithm in Section 4 and, in the following section, we present a short investigation into how the minimum evolution trees within split systems generated by NeighborNet and some related methods compare with those generated by FastME [11], one of the leading programs for finding minimum evolution trees by searching through tree space. We conclude with a discussion of some possible future directions in Section 6.
Section snippets
The minimum evolution problem
We begin by recalling some relevant terminology and notation (cf. also [24]). Let X be a finite, non-empty set, usually corresponding to some set of species or taxa. A phylogenetic tree (on X) is a connected, acyclic graph with leaf set X. Any non-leaf vertex of T is called an internal vertex of T, a branch incident to a leaf is called an external branch of T and a branch whose endpoints are both internal vertices is called an internal branch of T. In this paper, we consider only binary
Bryantʼs algorithm
In the following, we denote the split of X into two non-empty subsets A and B by () and the set of all possible splits of X by . Any subset is called a split system on X. As mentioned in the introduction, we are interested in the problem of searching for trees in split systems. More specifically, given a distance matrix D and a split system Σ on X, the restricted ME-problem requires us to find the minimum of over all binary phylogenetic trees T on X with ,
Computing ME-trees in circular split systems
We now focus on the restricted ME-problem for a circular split system. This is a special type of split system that can be generated, for example, from a distance matrix D using the NeighborNet algorithm [5]. More specifically, a split system is circular [1] if there exists an ordering of the elements in X such that, for every split , there exist with or . If such an ordering of X exists it can be computed in time, , [12]
Simulations
To measure the computational performance of the algorithm, we tested it on simulated data sets and compared it with FastME [11], one of the leading methods to construct an approximation of an ME tree. Note that FastME performs a local search in tree space using a neighborhood based on certain types of tree edit operations. In FastME we chose the NeighborJoining-tree option as the start topology for the local search together with nearest neighbor interchange (NNI) tree edit operations, and
Discussion
We have presented an efficient algorithm for finding a restricted ME tree in a circular split system which improves on the run time of a more general algorithm presented in [4]. We have also seen that the restricted ME trees obtained in split systems generated by NeighborNet compare favorably with the ones produced by FastME. This is of some interest since the split systems generated by NeighborNet only represent a tiny fraction of the total number of all possible splits ( vs. on a
Acknowledgements
We would like to thank the three reviewers for their helpful comments.
References (24)
- et al.
A canonical decomposition theory for metrics on a finite set
Adv. Math.
(1992) - et al.
Split decomposition: A new and useful approach to phylogenetic analysis of distance data
Mol. Phylogenet. Evol.
(1992) - et al.
Locating a tree in a phylogenetic network
Inf. Process. Lett.
(2010) - et al.
Seeing the trees and their branches in the network is hard
Theor. Comput. Sci.
(2008) Hunting for trees in binary character sets: Efficient algorithms for extraction, enumeration and optimization
J. Comput. Biol.
(1996)Building trees, hunting for trees and comparing trees
(1997)- et al.
Neighbor-net: An agglomerative method for the construction of phylogenetic networks
Mol. Biol. Evol.
(2004) The minimum evolution problem: Overview and classification
Networks
(2009)Estimating phylogenies from molecular data
- et al.
Finding a maximum likelihood tree is hard
J. ACM
(2006)