Abstract
Reconstructing the evolutionary history of a set of species is a central task in computational biology. In real data, it is often the case that some information is missing: the Incomplete Directed Perfect Phylogeny (IDPP) problem asks, given a collection of species described by a set of binary characters with some unknown states, to complete the missing states in such a way that the result can be explained with a directed perfect phylogeny. Pe’er et al. [SICOMP 2004] proposed a solution that takes \(\tilde{\mathcal {O}}(nm)\) time (the \(\tilde{\mathcal {O}}(\cdot )\) notation suppresses polylog factors) for n species and m characters. Their algorithm relies on pre-existing dynamic connectivity data structures: a computational study recently conducted by Fernández-Baca and Liu showed that, in this context, complex data structures perform worse than simpler ones with worse asymptotic bounds.
This gives us the motivation to look into the particular properties of the dynamic connectivity problem in this setting, so as to avoid the use of sophisticated data structures as a blackbox. Not only are we successful in doing so, and give a much simpler \(\mathcal {O}(nm\log n)\)-time algorithm for the IDPP problem; our insights into the specific structure of the problem lead to an asymptotically optimal \(\mathcal {O}(nm)\)-time algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The \(\tilde{\mathcal {O}}(\cdot )\) notation suppresses polylog factors.
References
Bashir, A., Ye, C., Price, A.L., Bafna, V.: Orthologous repeats and mammalian phylogenetic inference. Genome Res. 15(7), 998–1006 (2005)
Bodlaender, H.L., Fellows, M.R., Hallett, M.T., Wareham, H.T., Warnow, T.J.: The hardness of perfect phylogeny, feasible register assignment and other problems on thin colored graphs. Theoret. Comput. Sci. 244(1–2), 167–188 (2000)
Bonizzoni, P., Braghin, C., Dondi, R., Trucco, G.: The binary perfect phylogeny with persistent characters. Theoret. Comput. Sci. 454, 51–63 (2012)
Bonizzoni, P., Ciccolella, S., Della Vedova, G., Soto, M.: Beyond perfect phylogeny: Multisample phylogeny reconstruction via ilp. In: 8th ACM-BCB, pp. 1–10 (2017)
Camin, J.H., Sokal, R.R.: A method for deducing branching sequences in phylogeny. Evolution, pp. 311–326 (1965)
El-Kebir, M.: Sphyr: tumor phylogeny estimation from single-cell sequencing data under loss and error. Bioinformatics 34(17), i671–i679 (2018)
Eppstein, D., Galil, Z., Italiano, G.F., Nissenzweig, A.: Sparsification-a technique for speeding up dynamic graph algorithms. J. ACM 44(5), 669–696 (1997)
Even, S., Shiloach, Y.: An on-line edge-deletion problem. J. ACM 28(1), 1–4 (1981)
Fernández-Baca, D., Liu, L.: Tree compatibility, incomplete directed perfect phylogeny, and dynamic graph connectivity: An experimental study. Algorithms 12(3), 53 (2019)
Gibb, D., Kapron, B., King, V., Thorn, N.: Dynamic graph connectivity with improved worst case update time and sublinear space. arXiv:1509.06464 (2015)
Gusfield, D.: Efficient algorithms for inferring evolutionary trees. Networks 21(1), 19–28 (1991)
Gusfield, D.: Persistent phylogeny: a galled-tree and integer linear programming approach. In: 6th ACM-BCB, pp. 443–451 (2015)
Halperin, E., Karp, R.M.: Perfect phylogeny and haplotype assignment. In: Proceedings of the Eighth Annual International Conference on Resaerch in Computational Molecular Biology, pp. 10–19 (2004)
Henzinger, M.R., King, V., Warnow, T.: Constructing a tree from homeomorphic subtrees, with applications to computational evolutionary biology. Algorithmica 24(1), 1–13 (1999)
Holm, J., De Lichtenberg, K., Thorup, M.: Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity. J. ACM 48(4), 723–760 (2001)
Huang, S.E., Huang, D., Kopelowitz, T., Pettie, S.: Fully dynamic connectivity in \(O(\log n(\log \log n)^2)\) amortized expected time. In: 28th SODA, pp. 510–520. SIAM (2017)
Kimmel, G., Shamir, R.: The incomplete perfect phylogeny haplotype problem. J. Bioinform. Comput. Biol. 3(02), 359–384 (2005)
Kirkpatrick, B., Stevens, K.: Perfect phylogeny problems with missing values. IEEE/ACM Trans. Comput. Biol. Bioinf. 11(5), 928–941 (2014)
Nikaido, M., Rooney, A.P., Okada, N.: Phylogenetic relationships among cetartiodactyls based on insertions of short and long interpersed elements: hippopotamuses are the closest extant relatives of whales. Proc. Natl. Acad. Sci. 96(18), 10261–10266 (1999)
Pe’er, I., Pupko, T., Shamir, R., Sharan, R.: Incomplete directed perfect phylogeny. SIAM J. Comput. 33(3), 590–607 (2004)
Satas, G., Zaccaria, S., Mon, G., Raphael, B.J.: Scarlet: Single-cell tumor phylogeny inference with copy-number constrained mutation losses. Cell Syst. 10(4), 323–332 (2020)
Satya, R.V., Mukherjee, A.: The undirected incomplete perfect phylogeny problem. IEEE/ACM Trans. Comput. Biol. Bioinf. 5(4), 618–629 (2008)
Shiloach, Y., Vishkin, U.: An \(o(\log n)\) parallel connectivity algorithm. J. Algorithms 3(1), 57–67 (1982)
Stevens, K., Gusfield, D.: Reducing multi-state to binary perfect phylogeny with applications to missing, removable, inserted, and deleted data. In: Moulton, V., Singh, M. (eds.) Algorithms in Bioinformatics. WABI 2010. Lecture Notes in Computer Science, vol. 6293. Springer, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15294-8_23
Thorup, M.: Decremental dynamic connectivity. J. Algorithms 33(2), 229–243 (1999)
Acknowledgements
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 872539. GB was supported by the Netherlands Organisation for Scientific Research (NWO) under project OCENW.GROOT.2019.015 “Optimization for and with Machine Learning (OPTIMAL)”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bernardini, G., Bonizzoni, P., Gawrychowski, P. (2021). Incomplete Directed Perfect Phylogeny in Linear Time. In: Lubiw, A., Salavatipour, M., He, M. (eds) Algorithms and Data Structures. WADS 2021. Lecture Notes in Computer Science(), vol 12808. Springer, Cham. https://doi.org/10.1007/978-3-030-83508-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-83508-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83507-1
Online ISBN: 978-3-030-83508-8
eBook Packages: Computer ScienceComputer Science (R0)