Elsevier

Information Processing Letters

Volume 114, Issues 1–2, January–February 2014, Pages 13-18
Information Processing Letters

Fishing for minimum evolution trees with Neighbor-Nets

https://doi.org/10.1016/j.ipl.2013.10.003Get rights and content

Highlights

  • New heuristic for the minimum evolution problem arising in phylogenetics.

  • Improved run time of a dynamic programming approach for circular split systems.

  • Comparison with local search based heuristic through simulations.

Abstract

In evolutionary biology, biologists commonly use a phylogenetic tree to represent the evolutionary history of some set of species. A common approach taken to construct such a tree is to search through the space of all possible phylogenetic trees on the set so as to find one that optimizes some score function, such as the minimum evolution criterion. However, this is hampered by the fact that the space of phylogenetic trees is extremely large in general. Interestingly, an alternative approach, which has received somewhat less attention in the literature, is to instead search for trees within some set of bipartitions or splits of the set of species in question. Here we consider the problem of searching through a set of splits that is circular. Such sets can, for example, be generated by the NeighborNet algorithm for constructing phylogenetic networks. More specifically, we present an O(n4) time algorithm for finding an optimal minimum evolution tree in a circular set of splits on a set of species of size n. In addition, using simulations, we compare the performance of this algorithm when applied to NeighborNet output with that of FastME, a leading method for searching for minimum evolution trees in tree space. We find that, even though a circular set of splits represents just a tiny fraction of the total number of possible splits of a set, the trees obtained from circular sets compare quite favorably with those obtained with FastME, suggesting that the approach could warrant further investigation.

Introduction

A phylogenetic tree on a given set of species X is a connected, acyclic graph such that its leaf set is X and all its non-leaf vertices have degree at least three [24]. Such trees are used by biologists to represent the evolutionary history of the species in X. An important problem in phylogenetics is to construct such trees, and various methods have been developed for this purpose [18]. A common approach to tackling this problem is to search through the space of phylogenetic trees, trying to find a tree (or trees) that optimize some score such as the minimum evolution criterion [23]. However, a straight-forward exhaustive search is hampered by the fact that the space of phylogenetic trees on X grows exponentially in n=|X|. Moreover, it has been shown that finding an optimal tree is NP-hard for many of the popular optimization criteria (see e.g. [6], [8]).

Interestingly, there is an alternative approach to searching through tree space, which was studied quite early on in the development of phylogenetics (see e.g. [10], [21]), and more recently in [3], but that has received somewhat less attention in the literature. In particular, instead of searching through the set of all possible trees on the set X, we look for trees within a collection of bipartitions or splits of X. The rationale behind this approach is that any phylogenetic tree induces a set of splits of X in which every split corresponds to a branch of the tree, and that this set of splits uniquely determines the tree (cf. [24]). Intriguingly, in [4] a dynamic programming framework is developed to search for trees in a given collection of splits of X, also called a split system. Although still requiring exponential time in general, this approach has the advantage that it can yield polynomial time algorithms when restricted to split systems having size that is polynomial in n=|X|. It is therefore of interest to develop efficient algorithms to search for trees in special classes of split systems, as well as ways to generate split systems which capture salient information.

In this vein, here we develop an algorithm for searching for a tree that locally optimizes the minimum evolution criterion by searching in a circular split system. This is a special type of split system that can be generated, for example, by the NeighborNet algorithm [5] for constructing phylogenetic networks (see Fig. 1 for an example). In particular, we show that for a circular split system there is an O(n4) time algorithm for computing an optimal minimum evolution tree, which improves on the run time of O(n7) for the more general minimum evolution algorithm presented by Bryant in [4, Section 5.5]. We also present some simulations which indicate that minimum evolution trees in circular split systems generated by NeighborNet can compare favorably with those obtained by searching through the whole of tree space.

Before continuing, we note that, in view of the fact that split systems are often displayed by phylogenetic networks such as the one in Fig. 1, it might appear that the problem of searching for trees in split systems is closely related to the problem of finding optimal subtrees in phylogenetic networks. While some recent results on this latter problem can be found in [16], [17], it is, in fact, quite different from the problem we study here since, for example, the minimum evolution tree in a circular split system generated by NeighborNet is not necessarily a subtree of the network used to display this split system.

The structure of the rest of this paper is as follows. After recalling some background material on the minimum evolution problem in the next section, in Section 3, we recall Bryantʼs dynamic programming algorithm for finding minimum evolution trees in a split system. We then describe our new algorithm in Section 4 and, in the following section, we present a short investigation into how the minimum evolution trees within split systems generated by NeighborNet and some related methods compare with those generated by FastME [11], one of the leading programs for finding minimum evolution trees by searching through tree space. We conclude with a discussion of some possible future directions in Section 6.

Section snippets

The minimum evolution problem

We begin by recalling some relevant terminology and notation (cf. also [24]). Let X be a finite, non-empty set, usually corresponding to some set of species or taxa. A phylogenetic tree (on X) is a connected, acyclic graph T=(V,E) with leaf set X. Any non-leaf vertex of T is called an internal vertex of T, a branch incident to a leaf is called an external branch of T and a branch whose endpoints are both internal vertices is called an internal branch of T. In this paper, we consider only binary

Bryantʼs algorithm

In the following, we denote the split of X into two non-empty subsets A and B by A|B (=B|A) and the set of all possible splits of X by Σ(X). Any subset ΣΣ(X) is called a split system on X. As mentioned in the introduction, we are interested in the problem of searching for trees in split systems. More specifically, given a distance matrix D and a split system Σ on X, the restricted ME-problem requires us to find the minimum σ(D,Σ) of σD(T) over all binary phylogenetic trees T on X with ΣTΣ,

Computing ME-trees in circular split systems

We now focus on the restricted ME-problem for a circular split system. This is a special type of split system that can be generated, for example, from a distance matrix D using the NeighborNet algorithm [5]. More specifically, a split system ΣΣ(X) is circular [1] if there exists an ordering x1,x2,,xn of the elements in X such that, for every split A|BΣ, there exist 1ij<n with A={xi,xi+1,,xj} or B={xi,xi+1,,xj}. If such an ordering of X exists it can be computed in O(nk) time, k=|Σ|, [12]

Simulations

To measure the computational performance of the algorithm, we tested it on simulated data sets and compared it with FastME [11], one of the leading methods to construct an approximation of an ME tree. Note that FastME performs a local search in tree space using a neighborhood based on certain types of tree edit operations. In FastME we chose the NeighborJoining-tree option as the start topology for the local search together with nearest neighbor interchange (NNI) tree edit operations, and

Discussion

We have presented an efficient algorithm for finding a restricted ME tree in a circular split system which improves on the run time of a more general algorithm presented in [4]. We have also seen that the restricted ME trees obtained in split systems generated by NeighborNet compare favorably with the ones produced by FastME. This is of some interest since the split systems generated by NeighborNet only represent a tiny fraction of the total number of all possible splits ((n2) vs. 2n11 on a

Acknowledgements

We would like to thank the three reviewers for their helpful comments.

References (24)

  • T. Cormen et al.

    Introduction to Algorithms

    (2009)
  • W. Day et al.

    Computational complexity of inferring phylogenies by compatibility

    Syst. Biol.

    (1986)
  • Cited by (0)

    View full text