ABSTRACT
In the Shortest Superstring problem, we are given a set of strings and we are asking for a common superstring, which has the minimum number of characters. The Shortest Superstring problem is NP-hard and several constant-factor approximation algorithms are known for it. Of particular interest is the GREEDY algorithm, which repeatedly merges two strings of maximum overlap until a single string remains. The GREEDY algorithm, being simpler than other well-performing approximation algorithms for this problem, has attracted attention since the 1980s and is commonly used in practical applications.
Tarhio and Ukkonen (TCS 1988) conjectured that GREEDY gives a 2-approximation. In a seminal work, Blum, Jiang, Li, Tromp, and Yannakakis (STOC 1991) proved that the superstring computed by GREEDY is a 4-approximation, and this upper bound was improved to 3.5 by Kaplan and Shafrir (IPL 2005).
We show that the approximation guarantee of GREEDY is at most (13+√57)/6 ≈ 3.425. Furthermore, we prove that the Shortest Superstring can be approximated within a factor of (37+√57)/18≈ 2.475, improving slightly upon the currently best 2 11/23-approximation algorithm by Mucha (SODA 2013).
- Chris Armen and Clifford Stein. 1995. Improved Length Bounds for the Shortest Superstring Problem. In Proceedings of the 4th International Workshop on Algorithms and Data Structures (WADS). 494–505. https://doi.org/10.1007/3-540-60220-8_88 Google ScholarCross Ref
- Chris Armen and Clifford Stein. 1998. A 2 2/3 Superstring Approximation Algorithm. Discret. Appl. Math., 88, 1-3 (1998), 29–57. https://doi.org/10.1016/S0166-218X(98)00065-1 Google ScholarDigital Library
- Avrim Blum, Tao Jiang, Ming Li, John Tromp, and Mihalis Yannakakis. 1994. Linear Approximation of Shortest Superstrings. J. ACM, 41, 4 (1994), 630–647. https://doi.org/10.1145/179812.179818 Google ScholarDigital Library
- Dany Breslauer, Tao Jiang, and Zhigen Jiang. 1997. Rotations of Periodic Strings and Short Superstrings. J. Algorithms, 24, 2 (1997), 340–353. https://doi.org/10.1006/jagm.1997.0861 Google ScholarDigital Library
- Artur Czumaj, Leszek Gasieniec, Marek Piotrów, and Wojciech Rytter. 1997. Sequential and Parallel Approximation of Shortest Superstrings. J. Algorithms, 23, 1 (1997), 74–100. https://doi.org/10.1006/jagm.1996.0823 Google ScholarDigital Library
- Alan M. Frieze and Wojciech Szpankowski. 1998. Greedy Algorithms for the Shortest Common Superstring That Are Asymptotically Optimal. Algorithmica, 21, 1 (1998), 21–36. https://doi.org/10.1007/PL00009207 Google ScholarCross Ref
- John Gallant, David Maier, and James A. Storer. 1980. On Finding Minimal Length Superstrings. J. Comput. Syst. Sci., 20 (1980), 50–58. https://doi.org/10.1016/0022-0000(80)90004-5 Google ScholarCross Ref
- Michael R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman.Google ScholarDigital Library
- Theodoros P. Gevezes and Leonidas S. Pitsoulis. 2014. The Shortest Superstring Problem. Springer, New York, NY. 189–227. isbn:978-1-4939-0808-0 https://doi.org/10.1007/978-1-4939-0808-0_10 Google ScholarCross Ref
- Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press. https://doi.org/10.1017/CBO9780511574931 Google ScholarCross Ref
- Lucian Ilie and Cristian Popescu. 2006. The Shortest Common Superstring Problem and Viral Genome Compression. Fundamenta Informaticae, 73, 1-2 (2006), 153–164.Google Scholar
- Haim Kaplan, Moshe Lewenstein, Nira Shafrir, and Maxim Sviridenko. 2005. Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs. J. ACM, 52, 4 (2005), 602–626. https://doi.org/10.1145/1082036.1082041 Google ScholarDigital Library
- Haim Kaplan and Nira Shafrir. 2005. The Greedy Algorithm for Shortest Superstrings. Inf. Process. Lett., 93, 1 (2005), 13–17. https://doi.org/10.1016/j.ipl.2004.09.012 Google ScholarCross Ref
- Marek Karpinski and Richard Schmied. 2013. Improved Inapproximability Results for the Shortest Superstring and Related Problems. In Proceedings of the 19th Computing: The Australasian Theory Symposium (CATS). 27–36.Google Scholar
- S. Rao Kosaraju, James K. Park, and Clifford Stein. 1994. Long Tours and Short Superstrings. In Proceedings of the 35th IEEE Symposium on Foundations of Computer Science (FOCS). 166–177. https://doi.org/10.1109/SFCS.1994.365696 Google ScholarDigital Library
- Arthur M. Lesk. 1988. Computational Molecular Biology: Sources and Methods for Sequence Analysis. Oxford University Press.Google Scholar
- Ming Li. 1990. Towards a DNA Sequencing Theory (Learning a String). In Proceedings of the 31st IEEE Symposium on Foundations of Computer Science (FOCS). 125–134. https://doi.org/10.1109/FSCS.1990.89531 Google ScholarDigital Library
- Bin Ma. 2009. Why Greed Works for Shortest Common Superstring Problem. Theor. Comput. Sci., 410, 51 (2009), 5374–5381. https://doi.org/10.1016/j.tcs.2009.09.014 Google ScholarDigital Library
- Marcin Mucha. 2007. A Tutorial on Shortest Superstring Approximation. https://www.mimuw.edu.pl/ mucha/teaching/aa2008/ss.pdf [Accessed 28-October-2021]Google Scholar
- Marcin Mucha. 2013. Lyndon Words and Short Superstrings. In Proceedings of the 24th ACM-SIAM Symposium on Discrete Algorithms (SODA). 958–972. https://doi.org/10.1137/1.9781611973105.69 Google ScholarCross Ref
- Eugene W. Myers Jr. 2016. A history of DNA sequence assembly. It-Information Technology, 58, 3 (2016), 126–132. https://doi.org/10.1515/itit-2015-0047 Google ScholarCross Ref
- Katarzyna Paluch. 2020. New Approximation Algorithms for Maximum Asymmetric Traveling Salesman and Shortest Superstring. arxiv:2005.10800Google Scholar
- Katarzyna Paluch, Khaled Elbassioni, and Anke van Zuylen. 2012. Simpler Approximation of the Maximum Asymmetric Traveling Salesman Problem. In Proceedings of the 29th Symposium on Theoretical Aspects of Computer Science (STACS). 501–506. https://doi.org/10.4230/LIPIcs.STACS.2012.501 Google ScholarCross Ref
- Z. Sweedyk. 1999. A 2onehalf -Approximation Algorithm for Shortest Superstring. SIAM J. Comput., 29, 3 (1999), 954–986. https://doi.org/10.1137/S0097539796324661 Google ScholarDigital Library
- Jorma Tarhio and Esko Ukkonen. 1988. A Greedy Approximation Algorithm for Constructing Shortest Common Superstrings. Theor. Comput. Sci., 57 (1988), 131–145. https://doi.org/10.1016/0304-3975(88)90167-3 Google ScholarDigital Library
- Shang-Hua Teng and Frances Yao. 1997. Approximating Shortest Superstrings. SIAM J. Comput., 26, 2 (1997), 410–417. https://doi.org/10.1137/S0097539794286125 Google ScholarDigital Library
- Jonathan S. Turner. 1989. Approximation Algorithms for the Shortest Common Superstring Problem. Inf. Comput., 83, 1 (1989), 1–20. https://doi.org/10.1016/0890-5401(89)90044-8 Google ScholarDigital Library
Index Terms
- Improved approximation guarantees for shortest superstrings using cycle classification by overlap to length ratios
Recommendations
Linear approximation of shortest superstrings
We consider the following problem: given a collection of strings s1,…, sm, find the shortest string s such that each si appears as a substring (a consecutive block) of s. Although this problem is known to be NP-hard, a simple greedy procedure appears to ...
Approximating shortest superstrings
SFCS '93: Proceedings of the 1993 IEEE 34th Annual Foundations of Computer ScienceThe Shortest Superstring Problem is to find a shortest possible string that contains every string in a given set as substrings. This problem has applications to data compression and DNA sequencing. As the problem is NP-hard and MAX SNP-hard, ...
A linear-time algorithm for finding approximate shortest common superstrings
AbstractApproximate shortest common superstrings for a given setR of strings can be constructed by applying the greedy heuristics for finding a longest Hamiltonian path in the weighted graph that represents the pairwise overlaps between the strings inR. ...
Comments