skip to main content
10.1145/3519935.3520001acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Improved approximation guarantees for shortest superstrings using cycle classification by overlap to length ratios

Published:10 June 2022Publication History

ABSTRACT

In the Shortest Superstring problem, we are given a set of strings and we are asking for a common superstring, which has the minimum number of characters. The Shortest Superstring problem is NP-hard and several constant-factor approximation algorithms are known for it. Of particular interest is the GREEDY algorithm, which repeatedly merges two strings of maximum overlap until a single string remains. The GREEDY algorithm, being simpler than other well-performing approximation algorithms for this problem, has attracted attention since the 1980s and is commonly used in practical applications.

Tarhio and Ukkonen (TCS 1988) conjectured that GREEDY gives a 2-approximation. In a seminal work, Blum, Jiang, Li, Tromp, and Yannakakis (STOC 1991) proved that the superstring computed by GREEDY is a 4-approximation, and this upper bound was improved to 3.5 by Kaplan and Shafrir (IPL 2005).

We show that the approximation guarantee of GREEDY is at most (13+√57)/6 ≈ 3.425. Furthermore, we prove that the Shortest Superstring can be approximated within a factor of (37+√57)/18≈ 2.475, improving slightly upon the currently best 2 11/23-approximation algorithm by Mucha (SODA 2013).

References

  1. Chris Armen and Clifford Stein. 1995. Improved Length Bounds for the Shortest Superstring Problem. In Proceedings of the 4th International Workshop on Algorithms and Data Structures (WADS). 494–505. https://doi.org/10.1007/3-540-60220-8_88 Google ScholarGoogle ScholarCross RefCross Ref
  2. Chris Armen and Clifford Stein. 1998. A 2 2/3 Superstring Approximation Algorithm. Discret. Appl. Math., 88, 1-3 (1998), 29–57. https://doi.org/10.1016/S0166-218X(98)00065-1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Avrim Blum, Tao Jiang, Ming Li, John Tromp, and Mihalis Yannakakis. 1994. Linear Approximation of Shortest Superstrings. J. ACM, 41, 4 (1994), 630–647. https://doi.org/10.1145/179812.179818 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dany Breslauer, Tao Jiang, and Zhigen Jiang. 1997. Rotations of Periodic Strings and Short Superstrings. J. Algorithms, 24, 2 (1997), 340–353. https://doi.org/10.1006/jagm.1997.0861 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Artur Czumaj, Leszek Gasieniec, Marek Piotrów, and Wojciech Rytter. 1997. Sequential and Parallel Approximation of Shortest Superstrings. J. Algorithms, 23, 1 (1997), 74–100. https://doi.org/10.1006/jagm.1996.0823 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alan M. Frieze and Wojciech Szpankowski. 1998. Greedy Algorithms for the Shortest Common Superstring That Are Asymptotically Optimal. Algorithmica, 21, 1 (1998), 21–36. https://doi.org/10.1007/PL00009207 Google ScholarGoogle ScholarCross RefCross Ref
  7. John Gallant, David Maier, and James A. Storer. 1980. On Finding Minimal Length Superstrings. J. Comput. Syst. Sci., 20 (1980), 50–58. https://doi.org/10.1016/0022-0000(80)90004-5 Google ScholarGoogle ScholarCross RefCross Ref
  8. Michael R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Theodoros P. Gevezes and Leonidas S. Pitsoulis. 2014. The Shortest Superstring Problem. Springer, New York, NY. 189–227. isbn:978-1-4939-0808-0 https://doi.org/10.1007/978-1-4939-0808-0_10 Google ScholarGoogle ScholarCross RefCross Ref
  10. Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press. https://doi.org/10.1017/CBO9780511574931 Google ScholarGoogle ScholarCross RefCross Ref
  11. Lucian Ilie and Cristian Popescu. 2006. The Shortest Common Superstring Problem and Viral Genome Compression. Fundamenta Informaticae, 73, 1-2 (2006), 153–164.Google ScholarGoogle Scholar
  12. Haim Kaplan, Moshe Lewenstein, Nira Shafrir, and Maxim Sviridenko. 2005. Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs. J. ACM, 52, 4 (2005), 602–626. https://doi.org/10.1145/1082036.1082041 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Haim Kaplan and Nira Shafrir. 2005. The Greedy Algorithm for Shortest Superstrings. Inf. Process. Lett., 93, 1 (2005), 13–17. https://doi.org/10.1016/j.ipl.2004.09.012 Google ScholarGoogle ScholarCross RefCross Ref
  14. Marek Karpinski and Richard Schmied. 2013. Improved Inapproximability Results for the Shortest Superstring and Related Problems. In Proceedings of the 19th Computing: The Australasian Theory Symposium (CATS). 27–36.Google ScholarGoogle Scholar
  15. S. Rao Kosaraju, James K. Park, and Clifford Stein. 1994. Long Tours and Short Superstrings. In Proceedings of the 35th IEEE Symposium on Foundations of Computer Science (FOCS). 166–177. https://doi.org/10.1109/SFCS.1994.365696 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Arthur M. Lesk. 1988. Computational Molecular Biology: Sources and Methods for Sequence Analysis. Oxford University Press.Google ScholarGoogle Scholar
  17. Ming Li. 1990. Towards a DNA Sequencing Theory (Learning a String). In Proceedings of the 31st IEEE Symposium on Foundations of Computer Science (FOCS). 125–134. https://doi.org/10.1109/FSCS.1990.89531 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bin Ma. 2009. Why Greed Works for Shortest Common Superstring Problem. Theor. Comput. Sci., 410, 51 (2009), 5374–5381. https://doi.org/10.1016/j.tcs.2009.09.014 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Marcin Mucha. 2007. A Tutorial on Shortest Superstring Approximation. https://www.mimuw.edu.pl/ mucha/teaching/aa2008/ss.pdf [Accessed 28-October-2021]Google ScholarGoogle Scholar
  20. Marcin Mucha. 2013. Lyndon Words and Short Superstrings. In Proceedings of the 24th ACM-SIAM Symposium on Discrete Algorithms (SODA). 958–972. https://doi.org/10.1137/1.9781611973105.69 Google ScholarGoogle ScholarCross RefCross Ref
  21. Eugene W. Myers Jr. 2016. A history of DNA sequence assembly. It-Information Technology, 58, 3 (2016), 126–132. https://doi.org/10.1515/itit-2015-0047 Google ScholarGoogle ScholarCross RefCross Ref
  22. Katarzyna Paluch. 2020. New Approximation Algorithms for Maximum Asymmetric Traveling Salesman and Shortest Superstring. arxiv:2005.10800Google ScholarGoogle Scholar
  23. Katarzyna Paluch, Khaled Elbassioni, and Anke van Zuylen. 2012. Simpler Approximation of the Maximum Asymmetric Traveling Salesman Problem. In Proceedings of the 29th Symposium on Theoretical Aspects of Computer Science (STACS). 501–506. https://doi.org/10.4230/LIPIcs.STACS.2012.501 Google ScholarGoogle ScholarCross RefCross Ref
  24. Z. Sweedyk. 1999. A 2onehalf -Approximation Algorithm for Shortest Superstring. SIAM J. Comput., 29, 3 (1999), 954–986. https://doi.org/10.1137/S0097539796324661 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jorma Tarhio and Esko Ukkonen. 1988. A Greedy Approximation Algorithm for Constructing Shortest Common Superstrings. Theor. Comput. Sci., 57 (1988), 131–145. https://doi.org/10.1016/0304-3975(88)90167-3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Shang-Hua Teng and Frances Yao. 1997. Approximating Shortest Superstrings. SIAM J. Comput., 26, 2 (1997), 410–417. https://doi.org/10.1137/S0097539794286125 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jonathan S. Turner. 1989. Approximation Algorithms for the Shortest Common Superstring Problem. Inf. Comput., 83, 1 (1989), 1–20. https://doi.org/10.1016/0890-5401(89)90044-8 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improved approximation guarantees for shortest superstrings using cycle classification by overlap to length ratios

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      STOC 2022: Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing
      June 2022
      1698 pages
      ISBN:9781450392648
      DOI:10.1145/3519935

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 June 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,469of4,586submissions,32%

      Upcoming Conference

      STOC '24
      56th Annual ACM Symposium on Theory of Computing (STOC 2024)
      June 24 - 28, 2024
      Vancouver , BC , Canada
    • Article Metrics

      • Downloads (Last 12 months)66
      • Downloads (Last 6 weeks)6

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader