Skip to main content
Log in

Gene tree reconciliation including transfers with replacement is NP-hard and FPT

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

Phylogenetic trees illustrate the evolutionary history of genes and species. Although genes evolve along with the species they belong to, a species tree and gene tree are often not identical. The reasons for this are the evolutionary events at the gene level, like duplication or transfer. These differences are handled by phylogenetic reconciliation, which formally is a mapping between a gene tree nodes and a species tree nodes and branches. We investigate models of reconciliation with gene transfers replacing existing genes, which is a biologically important event, but has never been included in the reconciliation models. The problem is close to the dated version of the classical subtree prune and regraft (SPR) distance problem, where a pruned subtree has to be regrafted only on a branch closer to the root. We prove that the reconciliation problem including transfer with replacement is NP-hard, and that, if speciations and transfers with replacement are the only allowed evolutionary events, it is fixed-parameter tractable with respect to the reconciliation’s weight. We prove that the results extend to the dated SPR problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

Download references

Acknowledgements

ET was supported by the French Agence Nationale de la Recherche (ANR) through Grant No. ANR-10-BINF-01–01 ‘Ancestrome’.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Damir Hasić.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

Appendix: Proofs

Here we give proofs that are omitted in the main text.

Proof of Lemma 1

We will use a reduction from the Max 2-Sat, where every variable appears in at most three clauses. This problem is NP-hard (Raman et al. 1998).

Let F be an instance of the Max 2-Sat, and every variable appears in at most three clauses. If there is a variable x that appears exactly once, and it belongs to a clause C, then we can assign it a value and make C a true clause. Similarly, if there is a variable y that has only positive, or only negative literals, then we can assign it a value to make the corresponding clauses true.

In this way we eliminate all the variables that appear exactly once, or have only positive or only negative literals. Therefore, we can assume that F has variables that appear two or three times, and have both positive and negative literals.

Let \(x_0\) be a variable that appears in exactly two clauses. After inserting \((x_0\vee x^0_1)\wedge (x^0_1\vee x^0_2)\wedge (\lnot x^0_1\vee x^0_3)\wedge (\lnot x^0_2\vee x^0_3)\wedge (x^0_2\vee \lnot x^0_3)\) into F, we obtain a logical formula with \(x_0\) in exactly three clauses. The new variables \(x^0_1,x^0_2,x^0_3\) also appear in exactly three clauses, and they have positive and negative literal present.

In this way we obtain a logical formula \(F'\) that has every variable in exactly three clauses, with both positive and negative literal present. If the added variables are true, then all the new clauses will be true. Therefore the number of true clauses in F is maximized if and only if the number of true clauses in \(F'\) is maximized.

The previous reduction is obviously polynomial in n and m, where n is the number of variables, and m is the number of clauses in F. \(\square \)

Proof of Lemma 4

To prove (a) and (b), we identify two cases, according to the positions of \(r_{j_1}\) and \(r_{j_2}\).

Case 1. Node \(r_{j_1}\) or \(r_{j_2}\) is above the border line. In order to obtain \(\omega (F_j)<4\), we need that some of the nodes \(r^0_{j_1}, r^0_{j_2}, r^1_{j_1}, r^1_{j_2}\) is neither a duplication nor incident with a transfer. The only way to have this is if some of them is placed in \(s^0_{C_j}=root(S_{C_j})\).

Let us take \(\rho (r^0_{j_1}) = s^0_{C_j}\). Then \(\rho (r_{j_1}) > s^0_{C_j}\), or \(\rho (r_{j_1})\) and \(s^0_{C_j}\) are incomparable. If \(\rho (r^1_{j_1}) < s^0_{C_j}\), then \((r_{j_1},r^1_{j_1})\) is a transfer, and the weight of \(F_j\) is not decreased. If \(\rho (r^1_{j_1}) = s^0_{C_j}\), then \(r_{j_1}\) is a duplication, or one of the edges \((r_{j_1}, r^0_{j_1})\) and \((r_{j_1}, r^1_{j_1})\) contains a transfer. In this way we eliminate two transfers (that were incident with \(r^0_{j_1}\) and \(r^1_{j_1}\)), and obtain one transfer or duplication. But we generate at least one non-free loss in \(S_{C_j}\). Similar considerations apply to the other nodes of \(F_j\). Hence we cannot obtain \(\omega (F_j)<4\).

Case 2. Both nodes \(r_{j_1}\) and \(r_{j_2}\) are under the border line. Then none of the nodes \(r^0_{j_1}, r^1_{j_1}, r^0_{j_2}, r^1_{j_2}\) is placed in \(s^0_{C_j}\), therefore every one of them is incident with at least one transfer. If we wish to eliminate transfers starting at \(r_{j_1}\) or \(r_{j_2}\), then we need to place them both in \(lca(B_5,B_6,B_7,B_8)\), i.e. in the minimal node in \(S_{C_j}\) that is ancestor of \(B_5\), \(B_6\), \(B_7\), and \(B_8\) (Fig. 19). In this case we increase the number of non-free losses. Whichever placement we choose, we have \(\omega (F_j)\ge 5\).

Fig. 19
figure 19

Lemma 4a, b, Case 2. When \(r_{j1}\) and \(r_{j2}\) are positioned in \(lca(B_5,B_6,B_7,B_8)\) we obtain extra non-free losses

(c) The proof is similar in spirit to the proof of (a). See Figs. 4 and 5 . The idea is to see what happens if some of the 17 transfers, present in a proper reconciliation that belongs to \(G_{x_i}\), is not present in some other reconciliation.

First, note that if some of the nodes \(d^j_i\) are not placed in \(D^i_j\) (\(j=1,\ldots ,P(n)\)), then we would have transfers that are not present in a proper reconciliation. Also, if none of the nodes \(d^j_i\) is placed in \(D^i_j\) (\(j=1,\ldots ,P(n)\)), then we would have a reconciliation more expensive than any proper reconciliation. Hence we can assume that, for the anchoring nodes \(d^1_i\), we have \(\rho (d^1_i)=D^i_j\) (\(j=1,\ldots ,P(n)\)).

In a proper reconciliation, there are 14 transfers incident with \(b^s_i\)\((s=1,\ldots , 14)\). In an arbitrary reconciliation, we can achieve that no transfer or a duplication is incident with \(b^s_i\) only if \(\rho (b^s_i)=s^0_{x_i}\). Then a parent of \(b^s_i\) (i.e. \(c^{s-1}_i\)), as well as \(c^0_i\), is a duplication, or is incident with a transfer, and two or more non-free losses are created. Therefore by having \(\rho (b^s_i)=s^0_{x_i}\), for some values of s, does not give \(\omega (G_{x_i})<17\).

Assume that the nodes \(b^s_i\) (\(s=1,\ldots ,14\)) are placed as in the proper reconciliation. Observe nodes \(x^1_i, x^2_i\), and assume that they are not incident with a transfer, and the edge \((x^2_i, x^1_i)\) does not contain a transfer. Then we have at least two transfers at the edges \((c^4_i, x^2_i)\) and \((x^1_i, c^3_i)\), or at some other edges leading to some of the \(b^s_i\). Similar considerations apply for \(x^3_i\). Therefore, in this case too we cannot decrease the number of transfers.

Can we have less than 17 transfers if take \(\rho (b^s_i)=s^0_{x_i}\), for some values of s, and eliminate transfers incident with \(x^1_i, x^2_i\)? Let us take \(\rho (b^7_i)=s^0_{x_i}\). Then the nodes \(c^6_i\) and \(c^0_i\) are not placed as in the proper reconciliation. Hence we have at least two transfers or duplications, and non-free losses not present in the proper reconciliation. Also, if the nodes \(x^1_i,x^2_i\) are not placed as in the proper reconciliation, we have a transfer, different from the previous two, that is not present in the proper reconciliation. Therefore, we have at least three evolutionary events not present in the proper reconciliation, and we cannot obtain less than 17 events.

(d) Let us take that \(x^1_i\) and \(x^3_i\) are under the border line. Then at least three of the nodes \(c^1_i,\ldots , c^{12}_i\) are not on the gadgets positions. Some of these nodes are \(c^1_i,c^2_i,c^3_i\), because they are descendants of \(x^1_i\) in G. The paths \((c^1_i,b^2_i, A^3_i)\), \((c^2_i,b^3_i, A^5_i)\), \((c^3_i,b^4_i, A^7_i)\) generate extra three transfers. An extra transfer is created on the edge \((x^2_i,x^1_i)\), or on some other edge that is an ancestor of \(x^2_i\). Even if we we eliminate the two transfers incident with \(x^1_i\) and \(x^2_i\), we gain 4 more. Hence \(\omega (G_{x_i})\ge 19\). \(\square \)

Proof of Theorem 1

Let \({\mathfrak {R}}\) be a minimum \(DTLCT_R\) reconciliation. We use \({\mathfrak {R}}\) to construct \({\mathfrak {R}}'\) that is both minimum and proper.

The construction of a proper reconciliation is described earlier. The only thing that we need to specify in \({\mathfrak {R}}'\) is the positions of \(x^1_i,x^2_i\), and \(x^3_i\) with respect to the border line, as well as the positions of \(r_{j_1}\) and \(r_{j_2}\).

If \(x^1_i\) and \(x^2_i\) are not on the same side of the border line as \(x^3_i\) (in \({\mathfrak {R}}\)), then they are on the same side in \({\mathfrak {R}}'\) as in \({\mathfrak {R}}\). If \(x^1_i\) or \(x^2_i\) is on the same side as \(x^3_i\) (in \({\mathfrak {R}}\)), then \(x^1_i\) and \(x^2_i\) are above, and \(x^3_i\) is under the border line (in \({\mathfrak {R}}'\)).

Next, the vertices of \(F_j\) are placed in \(S_{C_j}\) as in the description of the proper reconciliation (Definition 16), so that the nodes \(r_{j_1}\) and \(r_{j_2}\) are placed on the same side of the border line as \(x'_{j_1}\) and \(x'_{j_2}\) (in \({\mathfrak {R}}'\)), respectively. A reconciliation, obtained in this way, we denote by \({\mathfrak {R}}'\). By construction, it is a proper reconciliation. Let us prove that it is a minimum reconciliation.

We have \(\omega _{{\mathfrak {R}}}(G_{x_i})\ge 17 = \omega _{{\mathfrak {R}}'}(G_{x_i})\), \(\omega _{{\mathfrak {R}}}(F_j)\ge 4\), and \(\omega _{{\mathfrak {R}}'}(F_j)\in \{4, 5\}\) (Lemma 4).

Let \(i\in \{1,\ldots ,n\}\), \(x^1_i,x^2_i,x^3_i\) be connected with \(r_{a_1}\in V(F_a), r_{b_1}\in V(F_b),\)\(r_{c_1}\in V(F_c)\) via transfers. We introduce a notation \(\varOmega _{{\mathfrak {R}}}(i)=\omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c)\).

Case 1. Assume that \(\omega _{{\mathfrak {R}}}(F_a)\ge \omega _{{\mathfrak {R}}'}(F_a)\), \(\omega _{{\mathfrak {R}}}(F_b)\ge \omega _{{\mathfrak {R}}'}(F_b)\), \(\omega _{{\mathfrak {R}}}(F_c)\ge \omega _{{\mathfrak {R}}'}(F_c)\). Then \(\omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c) \ge \omega _{{\mathfrak {R}}'}(G_{x_i})+\omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+\omega _{{\mathfrak {R}}'}(F_c)\), i.e.\(\varOmega _{{\mathfrak {R}}}(i) \ge \varOmega _{{\mathfrak {R}}'}(i)\).

Case 2. Assume that \(\omega _{{\mathfrak {R}}}(F_a)=4\), \(\omega _{{\mathfrak {R}}'}(F_a)=5\), \(\omega _{{\mathfrak {R}}}(F_b)\ge \omega _{{\mathfrak {R}}'}(F_b)\), \(\omega _{{\mathfrak {R}}}(F_c)\ge \omega _{{\mathfrak {R}}'}(F_c)\). Since \(\omega _{{\mathfrak {R}}'}(F_a)=5\), we have that \(x^1_i\) is under the border line (in \({\mathfrak {R}}'\)). Because of the transformation rules, at the beginning of the proof, we have that \(x^1_i\), \(x^2_i\) are under the border line (in \({\mathfrak {R}}\) and \({\mathfrak {R}}'\)), while \(x^3_i\) is above the line (in \({\mathfrak {R}}\) and \({\mathfrak {R}}'\)).

Let \(y_1\) be a literal of variable \(x_s\) (i.e.\(y_1\in \{x^1_s,x^2_s,x^3_s\}\)) connected with \(r_{a_2}\in V(F_a)\) via transfer. Since \(\omega _{{\mathfrak {R}}}(F_a)=4\), \(\omega _{{\mathfrak {R}}'}(F_a)=5\), we have that \(y_1\) is above the border line in \({\mathfrak {R}}\), and under the line in \({\mathfrak {R}}'\), hence \(y_1=x^3_s\).

Assume that \(F_{a'},F_{b'}\) are connected with \(x^1_s,x^2_s\) via transfers. Then \(\omega _{{\mathfrak {R}}'}(F_{a'})=\omega _{{\mathfrak {R}}'}(F_{b'})=4\), \(\omega _{{\mathfrak {R}}}(G_{x_s}) \ge 19\). We have \(\omega _{{\mathfrak {R}}}(F_{a'})\ge 4=\omega _{{\mathfrak {R}}'}(F_{a'})\) and \(\omega _{{\mathfrak {R}}}(F_{b'})\ge 4=\omega _{{\mathfrak {R}}'}(F_{b'})\).

From the previous arguments, \(\omega _{{\mathfrak {R}}}(G_{x_s})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_a) \ge \)\(19+4+4 =\)\(17+5+5 =\)\(\omega _{{\mathfrak {R}}'}(G_{x_s})+\omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_a)\).

Finally, \(\big ( \omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c) \big ) + \)\(\big ( \omega _{{\mathfrak {R}}}(G_{x_s})+\omega _{{\mathfrak {R}}}(F_{a'})+\omega _{{\mathfrak {R}}}(F_{b'})+\omega _{{\mathfrak {R}}}(F_{a}) \big ) \ge \)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_i})+ \omega _{{\mathfrak {R}}'}(F_a) +\omega _{{\mathfrak {R}}'}(F_b)+\omega _{{\mathfrak {R}}'}(F_c) \big ) + \)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_s}) + \omega _{{\mathfrak {R}}'}(F_{a'}) +\omega _{{\mathfrak {R}}'}(F_{b'})+\omega _{{\mathfrak {R}}'}(F_{a}) \big )\), i.e. \(\varOmega _{{\mathfrak {R}}}(i)+\varOmega _{{\mathfrak {R}}}(s) \ge \varOmega _{{\mathfrak {R}}'}(i)+\varOmega _{{\mathfrak {R}}'}(s)\).

The next cases use the approach of Case 2.

Case 3. Assume that \(\omega _{{\mathfrak {R}}}(F_b)=4\), \(\omega _{{\mathfrak {R}}'}(F_b)=5\), \(\omega _{{\mathfrak {R}}}(F_a)\ge \omega _{{\mathfrak {R}}'}(F_a)\), \(\omega _{{\mathfrak {R}}}(F_c)\ge \omega _{{\mathfrak {R}}'}(F_c)\). This case is analogous to Case 2.

Case 4. Assume that \(\omega _{{\mathfrak {R}}}(F_c)=4\), \(\omega _{{\mathfrak {R}}'}(F_c)=5\), \(\omega _{{\mathfrak {R}}}(F_a)\ge \omega _{{\mathfrak {R}}'}(F_a)\), \(\omega _{{\mathfrak {R}}}(F_b)\ge \omega _{{\mathfrak {R}}'}(F_b)\). Then \(x^3_i\) is under, and \(x^1_i,x^2_i\) are above the border line in \({\mathfrak {R}}'\). We have two subcases.

Case 4.1. Assume that \(x^1_i\) or \(x^2_i\) was on the same side of the line as \(x^3_i\) (in \({\mathfrak {R}}\)). Then \(\omega _{{\mathfrak {R}}}(G_{x_i})\ge 19\). Hence \(\omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c) \ge \)\(19 + \omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+ 4>\)\(17 + \omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+ 5 =\)\(\omega _{{\mathfrak {R}}'}(G_{x_i})+\omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+\omega _{{\mathfrak {R}}'}(F_c)\), i.e. \(\varOmega _{{\mathfrak {R}}}(i) > \varOmega _{{\mathfrak {R}}'}(i)\).

Case 4.2. Assume that \(x^1_i\) and \(x^2_i\) were not on the same side of the line as \(x^3_i\) (in \({\mathfrak {R}}\)). Then \(x^3_i\) is under the line (in \({\mathfrak {R}}\) and \({\mathfrak {R}}'\)). Now we proceed similar to Case 2.

Let \(y_3\in \{x^1_l,x^2_l,x^3_l\}\) and it is connected with \(r_{c_2}\in V(F_c)\) via transfer. From \(\omega _{{\mathfrak {R}}}(F_c)=4\), \(\omega _{{\mathfrak {R}}'}(F_c)=5\), we have that \(y_3\) in \({\mathfrak {R}}\) was above the line, and in \({\mathfrak {R}}'\) is under the line, hence \(y_3=x^3_l\), \(\omega _{{\mathfrak {R}}}(G_{x_l}) \ge 19\), \(\omega _{{\mathfrak {R}}'}(F_{a''}) = \omega _{{\mathfrak {R}}'}(F_{b''}) = 4\), where \(F_{a''}\) and \(F_{b''}\) are connected with \(x^1_l\) and \(x^2_l\) via transfers.

It follows that \(\omega _{{\mathfrak {R}}}(G_{x_l}) + \omega _{{\mathfrak {R}}}(F_c) + \omega _{{\mathfrak {R}}}(F_c) \ge 19+4+4 =\)\(17+5+5=\)\(\omega _{{\mathfrak {R}}'}(G_{x_l}) + \omega _{{\mathfrak {R}}'}(F_c) + \omega _{{\mathfrak {R}}'}(F_c)\).

Next, \(\big ( \omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c) \big ) + \)\(\big ( \omega _{{\mathfrak {R}}}(G_{x_l})+\omega _{{\mathfrak {R}}}(F_{ a''})+\omega _{{\mathfrak {R}}}(F_{b''})+\omega _{{\mathfrak {R}}}(F_{c}) \big ) \ge \)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_i})+\omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+\omega _{{\mathfrak {R}}'}(F_c) \big ) + \)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_l})+\omega _{{\mathfrak {R}}'}(F_{a''})+\omega _{{\mathfrak {R}}'}(F_{b''})+\omega _{{\mathfrak {R}}'}(F_{c}) \big )\), i.e. \(\varOmega _{{\mathfrak {R}}}(i)+\varOmega _{{\mathfrak {R}}}(l) \ge \varOmega _{{\mathfrak {R}}'}(i)+\varOmega _{{\mathfrak {R}}'}(l)\).

Case 5. Assume that \(\omega _{{\mathfrak {R}}}(F_a)=\omega _{{\mathfrak {R}}}(F_b)=4\), \(\omega _{{\mathfrak {R}}'}(F_a)=\omega _{{\mathfrak {R}}'}(F_b)=5\), and \(\omega _{{\mathfrak {R}}}(F_c)\ge \omega _{{\mathfrak {R}}'}(F_c)\). By a similar argument as in the previous cases, we have that \(x^1_i,x^2_i\) are under the line (in \({\mathfrak {R}}\) and \({\mathfrak {R}}'\)), while \(x^3_i\) is above the line (in \({\mathfrak {R}}\) and \({\mathfrak {R}}'\)). Let \(y_1\in \{x^1_r,x^2_r,x^3_r\}\) be connected with \(r_{a_2}\in V(F_a)\), and \(y_2\in \{x^1_t,x^2_t,x^3_t\}\) be connected with \(r_{b_2}\in V(F_b)\). As in the previous cases, we have \(y_1=x^3_r\), \(y_2=x^3_t\), and they were above the line in \({\mathfrak {R}}\), and under the line in \({\mathfrak {R}}'\). Hence \(\omega _{{\mathfrak {R}}}(G_{x_r}) \ge 19\) and \(\omega _{{\mathfrak {R}}}(G_{x_t}) \ge 19\). Let \(x^1_r,x^2_r,x^1_t,x^2_t\) be connected with \(F_{a_r},F_{b_r},F_{a_t},F_{b_t}\). Then \(\omega _{{\mathfrak {R}}'}(F_{a_r})=\omega _{{\mathfrak {R}}'}(F_{b_r})=\omega _{{\mathfrak {R}}'}(F_{a_t})=\omega _{{\mathfrak {R}}'}(F_{b_t})=4\).

Therefore \(\omega _{{\mathfrak {R}}}(G_{x_r}) + \omega _{{\mathfrak {R}}}(G_{x_t}) + \omega _{{\mathfrak {R}}}(F_a) + \omega _{{\mathfrak {R}}}(F_a) + \omega _{{\mathfrak {R}}}(F_b) + \omega _{{\mathfrak {R}}}(F_b) \ge \)\(19+19+4+4+4+4 = 17+17+5+5+5+5 = \)\(\omega _{{\mathfrak {R}}'}(G_{x_r}) + \omega _{{\mathfrak {R}}'}(G_{x_t}) + \omega _{{\mathfrak {R}}'}(F_a) + \omega _{{\mathfrak {R}}'}(F_a) + \omega _{{\mathfrak {R}}'}(F_b) + \omega _{{\mathfrak {R}}'}(F_b)\).

Hence \(\big ( \omega _{{\mathfrak {R}}}(G_{x_i})+\omega _{{\mathfrak {R}}}(F_a)+\omega _{{\mathfrak {R}}}(F_b)+\omega _{{\mathfrak {R}}}(F_c) \big ) +\)\(\big ( \omega _{{\mathfrak {R}}}(G_{x_r})+\omega _{{\mathfrak {R}}}(F_{a_r})+\omega _{{\mathfrak {R}}}(F_{b_r})+\omega _{{\mathfrak {R}}}(F_a) \big ) +\)\(\big ( \omega _{{\mathfrak {R}}}(G_{x_t})+\omega _{{\mathfrak {R}}}(F_{a_t})+\omega _{{\mathfrak {R}}}(F_{b_t})+\omega _{{\mathfrak {R}}}(F_b) \big ) \ge \)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_i})+\omega _{{\mathfrak {R}}'}(F_a)+\omega _{{\mathfrak {R}}'}(F_b)+\omega _{{\mathfrak {R}}'}(F_c) \big ) +\)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_r})+\omega _{{\mathfrak {R}}'}(F_{a_r})+\omega _{{\mathfrak {R}}'}(F_{b_r})+\omega _{{\mathfrak {R}}'}(F_a) \big ) +\)\(\big ( \omega _{{\mathfrak {R}}'}(G_{x_t})+\omega _{{\mathfrak {R}}'}(F_{a_t})+\omega _{{\mathfrak {R}}'}(F_{b_t})+\omega _{{\mathfrak {R}}'}(F_b) \big )\), i.e. \(\varOmega _{{\mathfrak {R}}}(i) + \varOmega _{{\mathfrak {R}}}(r) + \varOmega _{{\mathfrak {R}}}(t) \ge \)\(\varOmega _{{\mathfrak {R}}'}(i) + \varOmega _{{\mathfrak {R}}'}(r) + \varOmega _{{\mathfrak {R}}'}(t)\).

The next three cases are not possible, because \(x^3_i\) cannot be on the same side of the line as \(x^1_i\) or \(x^2_i\) in \({\mathfrak {R}}'\).

Case 6. Assume that \(\omega _{{\mathfrak {R}}}(F_a)=\omega _{{\mathfrak {R}}}(F_c)=4\), \(\omega _{{\mathfrak {R}}'}(F_a)=\omega _{{\mathfrak {R}}'}(F_c)=5\), \(\omega _{{\mathfrak {R}}}(F_b) \ge \omega _{{\mathfrak {R}}'}(F_b)\).

Case 7. Assume that \(\omega _{{\mathfrak {R}}}(F_b)=\omega _{{\mathfrak {R}}}(F_c)=4\), \(\omega _{{\mathfrak {R}}'}(F_b)=\omega _{{\mathfrak {R}}'}(F_c)=5\), \(\omega _{{\mathfrak {R}}}(F_a) \ge \omega _{{\mathfrak {R}}'}(F_a)\).

Case 8. Assume that \(\omega _{{\mathfrak {R}}}(F_a)=\omega _{{\mathfrak {R}}}(F_b)=\omega _{{\mathfrak {R}}}(F_c)=4\), \(\omega _{{\mathfrak {R}}'}(F_a)=\omega _{{\mathfrak {R}}'}(F_b)=\omega _{{\mathfrak {R}}'}(F_c)=5\).

Every \(i\in \{1,\ldots , n\}\) belongs to exactly one case. Variables s (from Cases 2 and 3), l (Case 4.2), t and r (Case 5) are equal to some \(i\in \{1,\ldots , n\}\), but are different among themselves, i.e. there is no value that repeats itself among variables slrt. Let \(A_1\) be the set of all values of i from the Case 1 that are different from all slrt. In a similar manner we introduce sets \(A_{2,3}\), \(A_{4.1}\), \(A_{4.2}\), \(A_{5}\)

We will use the previous cases to prove \(\omega ({\mathfrak {R}}) \ge \omega ({\mathfrak {R}}')\). We have \(2\cdot \omega ({\mathfrak {R}}) = \sum _{i}\omega _{{\mathfrak {R}}}(G_{x_i}) + \sum _{i}\varOmega _{{\mathfrak {R}}}(i) = \)\(\sum _{i}\omega _{{\mathfrak {R}}}(G_{x_i})\)\(+\)\(\sum _{A_1} \varOmega _{{\mathfrak {R}}}(i)\)\(+\)\(\sum _{A_{2,3}}\big ( \varOmega _{{\mathfrak {R}}}(i) + \varOmega _{{\mathfrak {R}}}(s) \big )\)\(+\)\(\sum _{A_{4.1}} \varOmega _{{\mathfrak {R}}}(i)\)\(+\)\(\sum _{A_{4.2}}\big ( \varOmega _{{\mathfrak {R}}}(i) + \varOmega _{{\mathfrak {R}}}(l)\big )\)\(+\)\(\sum _{A_5}\big ( \varOmega _{{\mathfrak {R}}}(i) + \varOmega _{{\mathfrak {R}}}(r) + \varOmega _{{\mathfrak {R}}}(t) \big )\)\(\ge \)\(\sum _{i}\omega _{{\mathfrak {R}}'}(G_{x_i})\)\(+\)\(\sum _{A_1} \varOmega _{{\mathfrak {R}}'}(i)\)\(+\)\(\sum _{A_{2,3}}\big ( \varOmega _{{\mathfrak {R}}'}(i) + \varOmega _{{\mathfrak {R}}'}(s) \big )\)\(+\)\(\sum _{A_{4.1}} \varOmega _{{\mathfrak {R}}'}(i)\)\(+\)\(\sum _{A_{4.2}}\big ( \varOmega _{{\mathfrak {R}}'}(i) + \varOmega _{{\mathfrak {R}}'}(l)\big )\)\(+\)\(\sum _{A_5}\big ( \varOmega _{{\mathfrak {R}}'}(i) + \varOmega _{{\mathfrak {R}}'}(r) + \varOmega _{{\mathfrak {R}}'}(t) \big )\)\(=\)\(2\cdot \omega ({\mathfrak {R}}')\).

Finally, \(\omega ({\mathfrak {R}}) \ge \omega ({\mathfrak {R}}')\). Therefore \({\mathfrak {R}}'\) is a minimum reconciliation. \(\square \)

Proof of Theorem 4

We will transform \({\mathfrak {R}}\) into \({\mathfrak {R}}'\) in two steps. First, we adjust all transfers that need to be adjusted, and then we alternately raise and translocate nodes.

Note that the definitions of the operations give the sufficient conditions for executing them. We assume that we perform these operations only if the conditions are satisfied. Additionally, we will translocate nodes only if \(y'\) (\(y'\) is from Definition 21) is a diagonal transfer child. This results in a raisable node after translocation.

Step 1. We adjust all transfers, with the transfer parent not from G, in an arbitrary order. This procedure will end in polynomial number of steps.

Indeed, we have two situations. In the first situation (Fig. 13a) we obtain another transfer that needs adjusting (transfer \((x'',x'_2)\)), but the transfer parent \(x''\) is positioned above x in \(G'\). In the second situation (Fig. 13b), the total number of transfers waiting for adjusting decreases by 1. Hence the effect of adjusting transfer is either positioning transfer parent higher in \(G'\), or reducing the number of unadjusted transfer. Since we are bounded by the size of \(G'\) and the number of transfers, the number of adjustments is finite and it is polynomial in size of G and S.

Therefore all transfers will be adjusted.

Step 2. Take an arbitrary transfer parent or a transfer child \(x\in V(G')\) that we can raise, and raise it. Repeat the previous procedure, as long as there is a node that we can raise.

After there are no more transfer parents or children that can be raised, translocate some node, it there is such a node. Then, again, raise all nodes that can be raised. Note that we translocate a node only if it results in a raisable node (see the second paragraph of this proof).

Repeat the previous procedure of raising and translocating nodes as long as possible.

This procedure will end in a polynomial number of steps. Indeed, by raising a node x, \(\tau (x)\) increases. Since \(\tau (x)<\tau (root(G))\), we have that Step 2 must end in polynomial number of steps, i.e. we will obtain a reconciliation in which no transfer parent or child can be raised.

By applying Steps 1 and 2 we obtain a reconciliation \({\mathfrak {R}}'\). Since the number of transfers is not changed (Lemmas 910, and 11 ), we have \(\omega ({\mathfrak {R}}')=\omega ({\mathfrak {R}})\), i.e. \({\mathfrak {R}}'\) is a minimum reconciliation. We need to prove that \({\mathfrak {R}}'\) is a normalized reconciliation.

Let \((x,x')\in E(G')\) be a transfer, \(y\in V(G')\) be the maximal element such that \(x \le y\), \(\rho (x) \le \rho (y)\), and \(\tau (y)\le \tau (x)+1\).

Let us prove that y is not a transfer parent. Assume the opposite. Then \(y\in V(G)\). Let \(x''\) be the element as described in Definition 20 obtained by raising y. Then \(\tau (y)=\tau (x'')\) and \(x''\) is a transfer parent or child; or \(\tau (y)=\tau (x'')-1\) and \(x''\) is a speciation or root(G). In both cases we obtain a contradiction with the maximality of y.

Let y be a transfer child. We need to prove that \(\tau (x)=\tau (y)\). Assume the opposite, i.e. \(\tau (x)<\tau (y)\). Let us take the maximal \(x_1\) such that \(x\le x_1\le y\) and \(\tau (x_1)=\tau (x)\). Since all the transfers are adjusted, we have \(x_1\in V(G)\) and \(x_1\) is a transfer parent. Since \(\tau (x_1)<\tau (y)\), node \(x_1\) can be raised, which is a contradiction with Step 2, where we raise and translocate nodes as long as possible. Therefore \(\tau (x)=\tau (y)\).

Let y be a diagonal transfer child. We need to prove that \((x,x')\) is a diagonal transfer. Assume the opposite, i.e. \(\tau (x)=\tau (x')\). Let \(\rho (y)=E_1\in E(S')\) and \(\rho (x')=E_2\in E(S')\). We have \(\tau (E_1)=\tau (E_2)=\tau (x)\). Since there are no speciations from S with the same date (see a comment after Definition 1), one of the edges \(E_1\) or \(E_2\) is not incident with a speciation from S. If \(E_1\) is not incident with a speciation, then we can raise y. If \(E_2\) is not incident with a speciation, then we can translocate x to \(E_2\) and raise x. In both situations we have a contradiction with Step 2. Therefore, \((x,x')\) is a diagonal transfer.

Let y be a speciation, or \(y=root(G)\). Then \(\tau (x)<\tau (y)\). Since \(\tau (y)\le \tau (x)+1\), we have \(\tau (y) = \tau (x)+1\).

Let \((x,x')\) be a diagonal transfer, l a loss assigned to \(x'\), \(T_l\) a lost subtree with a leaf l. From Step 2 we have that we cannot raise \(x'\). This is possible only if \(\tau (T_l)=\tau (l)+1\), and therefore \(T_l\) has only one edge.

We proved that the properties of a normalized reconciliation are satisfied. Hence \({\mathfrak {R}}'\) is a normalized reconciliation.\(\square \)

Proof of Theorem 5

Since \({\mathfrak {R}}\) is a normalized reconciliation, it is also, by definition, minimum. If I is a time slice, then \(R_I\) denotes the partial reconciliation induced by I, i.e. the part of \({\mathfrak {R}}\) that is inside I, and all other time slices before I, where “before” refer to those lower in the tree. We will prove that the algorithm constructs \({\mathfrak {R}}_I\) during the execution. We will use mathematical induction on I.

Let \(I_0\) be the first time slice (i.e. the lowest time slice), and \(s_0\in V(S)\) be a speciation such that \(\tau (s_0)\in I_0\) (Fig. 17), \(E_1, E_2\in E(S)\) are incident with \(s_0\). Next, \(e_1,e_2\) are the minimal edges of \(G'\) contained in \(E_1\) and \(E_2\). More precisely, \(e_1=(x_1,x_2)\), \(e_2=(y_1,y_2)\) are the minimal edges of \(G'\) such that \(\rho (x_2)\le E_1 \le \rho (x_1)\) and \(\rho (y_2)\le E_2 \le \rho (y_1)\). Edges \(e_1\) and \(e_2\) are unique (Lemma 6).

Let us prove that we can obtain \({\mathfrak {R}}_{I_0}\) during the execution of the algorithm. We have several cases.

Case 1. Edges \(e_1\) and \(e_2\) are incident. Let us prove that \(e_1\) and \(e_2\) coalesce at \(s_0\). Assume the opposite, \(\rho (x)\ne s_0\), where \(x\in V(G)\) is incident with both \(e_1\) and \(e_2\). Then \(e_1\) or \(e_2\) is a transfer, hence we can construct a reconciliation with smaller weight by placing x in \(s_0\), which contradicts the minimality of \({\mathfrak {R}}\). Figure 20 depicts a more detailed argumentation.

Fig. 20
figure 20

Situations that cannot occur in a normalized reconciliation. Time slice \(I_k\) contains \(E_1,E_2\), and \(s_0\). Edges \(e_1,e_2\) are active edges, and are incident with x. Edges \(E_1,E_2 \in E(S')\) are incident with \(s_0\). Node \(x\in V(G)\) must be positioned in \(s_0\in V(S)\), i.e. \(\rho (x)=s_0\). a If \(\rho (x)=E_1\), then the reconciliation is not minimal, since we can eliminate one transfer by taking \(\rho (x)=s_0\). Here, \(x''={{\textsc {p}}}^{{\textsc {v}}}_{G'}(x)\) is a transfer child, and \(x'={{\textsc {p}}}^{{\textsc {v}}}_{G'}(x'')\) is a transfer parent. b In a similar way we obtain a reconciliation with smaller weight. c, d These situaions are not possible, since the reconciliation is normalized, and we cannot have a transfer \((x_1,x_2)\) with \(x_1\notin V(G)\). Also, we could obtain a reconciliation with smaller weight, by taking \(\rho (x)=s_0\). e Here we have \(\tau (x')=\tau (x)\). Since the reconciliation is normalized, and we have only one speciation from S (i.e. \(s_0\)) in the time slice \(I_k\), then we have \(\tau (x')>\tau (s_0)\), hence this situation is not possible

Case 2. Edges \(e_1\) and \(e_2\) are not incident. We will investigate subcases. Some subcases are not obtainable by the algorithm. For them, we will prove they cannot occur in \({\mathfrak {R}}\). Let x be the minimal element from V(G) that is an ancestor of \(e_1\) and \(e_2\).

Case 2.1. Let \(\rho (x)=s_0\), \(\rho (x'_{i_1})=E_1\), \(\rho (x''_{i_2})=E_2\)\((i_1=1,\ldots ,k_1\); \(i_2=1,\ldots ,k_2.)\), where \(x'_{i_1}\) and \(x''_{i_2}\) are explained in Sect. 4.2. This case refers to Case 3b of Sect. 4.2.

We will prove that there is a random choice such that the random expansion of \(x'_1\) produces placement of the nodes identical to the one in \({\mathfrak {R}}\).

Assume the opposite, there is no such random choice. This means that we cannot obtain a situation depicted by Fig. 18b. Then there are descendants of \(x'_1\), denoted by \(y'_j\) (\(j=1,\ldots ,k\)) (Fig. 21) such that \(y'_1,\ldots ,y'_{k-1}\in V(G)\), \(y'_j={{\textsc {p}}}_{G'}(y'_{j+1})\) (\(j=1,\ldots ,k-1\)), \(y'_k\) is a transfer parent, \(y'_k \in V(G')\backslash V(G)\), and \(\rho (y'_1)=\ldots =\rho (y'_k)=E_3\). Let \(E_4\in E(S')\) be the edge that contains \(y'_{k+1}\), which is a child of \(y'_k\). By translocating \(y'_1,\ldots y'_k\) to \(E_4\) we obtain a reconciliation with one transfer less, which contradicts the minimality of \({\mathfrak {R}}\). Another reason why this case is not possible is that transfer \((y'_k,y'_{k+1})\) is not adjusted, which contradicts the fact that \({\mathfrak {R}}\) is a normalized reconciliation.

Fig. 21
figure 21

a A situation impossible for an optimal \(T_R\) reconciliation. Node \(y'_1\) is a transfer child, and its descendant \(y'_k\in V(G')\backslash V(G)\) is a transfer parent. b If we translocate \(y'_1,\ldots ,y'_k\) to \(E_4\), we eliminate one transfer, hence obtain a reconciliation with smaller weight

Therefore, we can obtain the expansion of \(x'_1\). The same reasoning applies for \(x'_2,\ldots ,x'_{k_1}\) and \(x''_1,\ldots ,x''_{k_2}\) and their children.

Case 2.2. Assume that \(E_i\in E(S')\) receives a diagonal transfer and \(e_{3-i}\in E(G)\) is propagated to the next time slice (\(i=1\) or \(i=2\)). This case is also obtainable by the algorithm (Cases 3\(a_1\) and 3\(a_2\)).

Case 2.3. Both \(e_1\) and \(e_2\) are propagated to the next time slice. Then \(s_0\) contains two unaligned edges from G, which is impossible for a \(T_R\) reconciliation (Lemma 6). Therefore this case cannot occur.

Case 2.4 We have \(\rho (x)=s_0\), and there is \(y_1\in \{x'_1,\ldots ,x'_{k_1}\}\), or \(y_2\in \{x''_1,\ldots ,x''_{k_2}\}\) such that \(\rho (y_1)\ne E_1\), or \(\rho (y_2)\ne E_2\). Since \(s_0\) is the only speciation in S in the current time slice, then all \(x'_1,\ldots ,x'_{k_1}\) and \(x''_1,\ldots ,x''_{k_2}\) are transfers.

Let \({\mathfrak {R}}'\) be a reconciliation such that \(\rho _{{\mathfrak {R}}'}(x'_1)=\ldots =\rho _{{\mathfrak {R}}'}(x'_{k_1})=E_1\), \(\rho _{{\mathfrak {R}}'}(x''_1)=\ldots =\rho _{{\mathfrak {R}}'}(x''_{k_2})=E_2\), and \(\rho _{{\mathfrak {R}}'}(y)=\rho _{{\mathfrak {R}}}(y)\) for all the remaining \(y\in V(G)\). Then \({\mathfrak {R}}'\) is a reconciliation with smaller weight than \({\mathfrak {R}}\), which contradicts the optimality of \({\mathfrak {R}}\).

Case 2.5. Assume that \(\tau (x) > \tau (s_0)\), \(\tau (y_1)\le \tau (s_0)\), and \(\tau (y_2)\le \tau (s_0)\) for some \(y_1\in \{x'_1,\ldots ,x'_{k_1}\}\), \(y_2\in \{x''_1,\ldots ,x''_{k_2}\}\). Then \({\mathfrak {R}}\) is not a normalized reconciliation. Hence this case is not possible.

Case 2.6. If x is in \(I_0\) and \(\rho (x) \ne s_0\), then by taking \(\rho (x)=s_0\) we get a reconciliation with fewer transfers (similarly to Case 1 and Fig. 20), contrary to the minimality of \({\mathfrak {R}}\).

For the inductive hypothesis part, assume that the statement is true for time slices \(I_0, I_1,\ldots , I_{k-1}\). Let us prove that it is true for \(I_k\). Proving the statement for \(I_k\) is the same as for \(I_0\), therefore we will not repeat it.

Hence \({\mathfrak {R}}_I\) is obtainable by the procedure. Since \({\mathfrak {R}}_I={\mathfrak {R}}\) for the final time slice I, \({\mathfrak {R}}\) is also obtainable by the algorithm. Since it is a minimal reconciliation, \({\mathfrak {R}}\) is a possible output of the algorithm. \(\square \)

We will use the next lemma in the proof of Theorem 6. Basically, it states that it is not important which random choice we select in Case 3b of the algorithm (see Sect. 4.2).

Lemma 13

The random choice in Case 3b of the algorithm does not affect the weight of an output reconciliation.

Proof

Let \(I_k\) be the observed time interval, and \(I_0,\ldots ,I_{k-1}\) be the time intervals before \(I_{k}\) (Fig. 22).

Let \(x'_1\) be a node that we randomly expand, and assume we have more than one choice for a random active edge with maximal \(\tau \)-value that is a descendant of \(x'_1\). Let \(e_{31}\) and \(e_{32}\) be two of those edges.

We will use notations from Sect. 4.2. When constructing \({\mathfrak {R}}_{I_k}\) from \({\mathfrak {R}}_{I_{k-1}}\), we are adding some new nodes from G. Only x is a speciation, and all other nodes are transfer parents. Since every transfer has a parent from V(G), we obtain that the number of newly added transfers is equal to the number of newly added nodes from V(G) minus one. Therefore \(\omega ({\mathfrak {R}}_{I_k})\) is not affected by a choice of \(e_3\).

Note that the active edges in \(I_{k+1}\) are not affected by a choice of \(e_3\).\(\square \)

Fig. 22
figure 22

Case 3b of the algorithm from Sect. 4.2. When randomly expanding \(x'_1\), we can have more than one candidate for \(e_3\) (denoted by \(e_{31}\) and \(e_{32}\)). The choice of \(e_3\) does not affect minimality of an output reconciliation. a We choose \(e_{31}\). b We choose \(e_{32}\). We can obtain this choice from (a) by translocating \(x^1_1\). A transfer is lost, and another is gained. Hence the number of transfers is unchanged

Proof of Theorem 6

It is obvious that \(\omega ({\mathfrak {R}})\le k\), because the algorithm cuts an edge of the branch and bound tree if \(t>k\), where t is the number of transfers in a partially constructed reconciliation.

Now we will prove that the conditions of Definition 22 are satisfied. Let \((x, x')\in E(G')\) be a transfer in \({\mathfrak {R}}\), and \(y\in V(G')\) be the maximal element such that \(x\le y\), \(\rho (x)\le \rho (y)\), \(\tau (y)\le \tau (x)+1\).

In the algorithm, transfers are created when nodes are randomly expanded. Since only nodes in V(G) are randomly expanded, every transfer starts in a node from V(G). Hence \(x\in V(G)\).

Transfers are constructed in Cases 3b and 4 (see Sect. 4.2). Therefore, y can be a speciation from G, transfer child, or root(G). If y is a speciation, then \(x\in \{x'_1,\ldots , x'_{k_1}, x''_1,\ldots , x''_{k_2}\}\), where \(x'_1,\ldots , x'_{k_1}, x''_1,\ldots , x''_{k_2}\) are explained in Sect. 4.2, and \(\tau (y)=\tau (x)+1\). If y is a transfer child, then \(\tau (y)=\tau (x)\). If y is root(G), then \(\tau (y)=\tau (x)+1\).

Let y be a diagonal transfer child. Diagonal transfers are made by using edges from G that were on hold. From Case 3a we have that a loss l, assigned to y, belongs to a lost subtree \(T_l\) with one edge and \(\tau (root(T_l))=\tau (y)+1\). Also, \((x,x')\) cannot be a horizontal transfer, because when we put an edge on hold all other edges are propagated to the next time slice, leaving no room for accepting a transfer.

Now we will prove that \({\mathfrak {R}}\) is a minimal reconciliation. The algorithm given in Sect. 4.2 is branch and bound, and it exhaustively observes every case possible for a normalized reconciliation, which we stated in the proof of Theorem 5. Also, which random option it takes in Case 3b does not affect the optimality of an output (Lemma 13). The algorithm always chooses a reconciliation of a smaller weight, if it finds one. Therefore, if it returns a reconciliation as a output, then it is a minimal reconciliation. i.e. \({\mathfrak {R}}\) is a minimal reconciliation. \(\square \)

Proof of Lemma 12

Note that if \((a_2,a_1)\in E(G)\), then there is a path in \(G'\)\((a_2, b_1, \ldots , b_s, a_1)\). The length of this path is at least 1, i.e. \(s\ge 0\). Hence every edge from G is a path in \(G'\). Also, \((a_2, a_1)\) can contain a transfer. In this proof we assume that all transfers are adjusted (as described by Definition 19 and Fig. 13), i.e. all transfers start in V(G).

We introduce a coloring of edges and nodes that were involved in some SPR operation. Let \(spr((a_2, a_1), (b_2, b_1)) = a'_2\) be the i-th SPR operation \(T_i \rightarrow T_{i+1}\). Then we color the edge \((a'_2, a_1)\) and node \(a'_2\) with color \(C_i\). If the edge \((b_2,b_1)\) was colored, then edges \((b_2,a'_2)\) and \((a'_2,b_1)\) are colored with the same color. Let \(c_1\) be the child of \(a_2\) (in \(T_i\)) different from \(a_1\), and \(c_2\) be the parent of \(a_2\) (in \(T_i\)). Then \(c_2\) is the parent of \(c_1\) (in \(T_{i+1}\)). If edge \((c_2,a_2)\) was colored with a color, then the edge \((c_2,c_1)\) is colored with the same color.

To a minimum SPR scenario we will assign a minimum \(T_R\) reconciliation. Colored edges will represent transfers, colored nodes will be transfer parents, non-colored edges will coincide with the edges of the species tree, and non-colored nodes will be speciations.

Let us first demonstrate the reduction from k-Minimum\(T_R\)Reconciliation to k-Minimum Dated SPR Scenario. Let S and G be a species and gene tree, \(S=T_0\rightarrow T_1\rightarrow \cdots \rightarrow T_k=G\) be a minimum SPR scenario transforming S into G. Using this minimum SPR scenario, we will construct a minimum \(T_R\) reconciliation.

Note that in \(T_k\) we have at most k nodes that are colored. Also, colored edges form (colored) subtrees of \(T_k\) with colored roots and inner nodes, while the leaves of these trees are not colored.

If \(a\in V(T_k)\) is a non-colored node, then it can be observed as a node from S and node from G. Take \(\rho (a)=a\in V(S)\), for all non-colored nodes \(a\in V(T_k)=V(G)\). Non-colored paths connect non-colored nodes. All non-colored edges from \(T_k=G\) place inside S so that they contain no transfer. Note that the leaves of \(T_k\) are non-colored.

Now, inside S we will place colored nodes and colored edges. Let \(T_c\) be an arbitrary colored tree, and \(c_0\) be its root. Then \(c_0\) is on a non-colored path of G, and we will leave it there in S. Next, the inner nodes of \(T_c\) we place inside S. Let \(L(T_c)=\{l_1,\ldots , l_s\}\), and \(\tau (l_1)\ge \cdots \ge \tau (l_s)\). Assume that \(c^1_1, c^1_2, \ldots , c^1_{i_1}\) are inner nodes of \(T_c\) in the path from \(l_1\) to \(c_0\) whose placement inside S is not defined. Then place these nodes in the edge of \(S'\) just above \(l_1\), i.e. \(\rho (c^1_1) = \ldots =\rho (c^1_{i_1})={{\textsc {p}}}^{{\textsc {e}}}_{S'}(\rho (l_1))\). Repeat the previous process for leaves \(l_2, \ldots l_s\). In this way we obtain a reconciliation with transfers, and every edge of S at any moment contains at most one lineage from \(G'\), hence if we extend losses we obtain a \(T_R\) reconciliation. Since a transfer can start only at a colored node, we have at most k transfers, i.e. \(\omega ({\mathfrak {R}}) \le k\).

After the next reduction, we will prove that \({\mathfrak {R}}\) is a minimum reconciliation.

In the second part, we demonstrate a reduction from k-Minimum Dated SPR Scenario to k-Minimum\(T_R\)Reconciliation. Let T be a dated and \(T'\) is an undated binary rooted tree. We need a minimum dated SPR scenario \(T=T_0\rightarrow T_1\rightarrow \cdots \rightarrow T_k=T'\).

Take \(S=T\) and \(G=T'\). Let \({\mathfrak {R}}\) be a minimum \(T_R\) reconciliation, and \(\omega ({\mathfrak {R}})=k\). We will prove that the length of minimum dated SPR scenario is k, and reconstruct it using \({\mathfrak {R}}\).

First, let us construct a scenario of the length k. Adjust all transfers in \({\mathfrak {R}}\), so they start at the nodes from V(G), just like in the first step of the proof of Theorem 4 (Definition 19, Fig. 13).

Take \(T_k=T'\), \(G_k=G\), \(G'_k=G'\), and \({\mathfrak {R}}_k={\mathfrak {R}}\). Let \((x_2, x_1)\) be an arbitrary transfer, \(x'_1\) be the child of \(x_1\) in \(G'\), l be the loss assigned to \(x_1\), and \(l_0=root(T_l)\), where \(T_l\) is a lost subtree such that \(l\in L(T_l)\). Let \(p_k = (l_0, l_1, \ldots , l_{s-1}, l_s=l)\) be a path in \(G'\) (i.e. in \(T_l\)), and therefore a lost path. Remove \((x_2, x_1)\) from \(G'_k\), suppress \(x_2\), include the path \(p_k\) into \(G_k\) (\(p_k\) is not a lost path anymore), suppress \(x_1\). Thus we eliminate one transfer, and obtain \(G_{k-1}, G'_{k-1}, {\mathfrak {R}}_{k-1}\), where \(\omega ({\mathfrak {R}}_{k-1})=\omega ({\mathfrak {R}}_k)-1\). By repeating this procedure, we obtain an SPR scenario \(T'=T_k\rightarrow T_{k-1}\rightarrow \cdots \rightarrow T_0=T\), i.e. \(T=T_0\rightarrow T_1\rightarrow \cdots \rightarrow T_k=T'\).

Since the transfers can be horizontal or diagonal, corresponding SPR operations are dated. We proved that optimal dated SPR scenario transforming T into \(T'\) has the length at most k.

Let us prove that the previous reductions construct a minimum reconciliation (the first reduction) and a minimum SPR scenario (the second reduction). Let \(T_1 \rightarrow \ldots T_k\) be a minimum SPR scenario. Take \(S=T_1, G=T_k\) and \({\mathfrak {R}}\) is a reconciliation obtained in the first reduction. We have \(k'=\omega ({\mathfrak {R}})\le k\). Now, let \(T_1=T'_1\rightarrow T'_2\rightarrow \cdots \rightarrow T'_{k''}=T_k\) be a SPR scenario obtained from G and S in the second reduction. Then \(k''\le k' \le k\). Since there is no SPR scenario, transforming \(T_1\) into \(T_k\), with the length less than k, we have \(k''=k'=k\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hasić, D., Tannier, E. Gene tree reconciliation including transfers with replacement is NP-hard and FPT. J Comb Optim 38, 502–544 (2019). https://doi.org/10.1007/s10878-019-00396-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-019-00396-z

Keywords

Navigation