Skip to main content

Faster Recovery of Approximate Periods over Edit Distance

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2018)

Abstract

The approximate period recovery problem asks to compute all approximate word-periods of a given word S of length n: all primitive words P (\(|P|=p\)) which have a periodic extension at edit distance smaller than \(\tau _p\) from S, where \(\tau _p = \lfloor \frac{n}{(3.75+\epsilon )\cdot p} \rfloor \) for some \(\epsilon >0\). Here, the set of periodic extensions of P consists of all finite prefixes of \(P^\infty \).

We improve the time complexity of the fastest known algorithm for this problem of Amir et al. [Theor. Comput. Sci., 2018] from \({\mathcal {O}}(n^{4/3})\) to \({\mathcal {O}}(n \log n)\). Our tool is a fast algorithm for Approximate Pattern Matching in Periodic Text. We consider only verification for the period recovery problem when the candidate approximate word-period P is explicitly given up to cyclic rotation; the algorithm of Amir et al. reduces the general problem in \({\mathcal {O}}(n)\) time to a logarithmic number of such more specific instances.

J. Radoszewski and J. Straszyński—Supported by the “Algorithms for text processing with errors and uncertainties” project carried out within the HOMING programme of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund.

W. Rytter—Supported by the Polish National Science Center, grant no. 2014/13/B/ST6/00770.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Also the APR problem under the Hamming distance was considered [2] for which an \({\mathcal {O}}(n \log n)\)-time algorithm was presented [1] for the threshold \(\lfloor \frac{n}{(2+\epsilon )\cdot p} \rfloor \) with \(\epsilon >0\).

References

  1. Amir, A., Amit, M., Landau, G.M., Sokol, D.: Period recovery of strings over the Hamming and edit distances. Theor. Comput. Sci. 710, 2–18 (2018). https://doi.org/10.1016/j.tcs.2017.10.026

    Article  MathSciNet  MATH  Google Scholar 

  2. Amir, A., Eisenberg, E., Levy, A., Porat, E., Shapira, N.: Cycle detection and correction. ACM Trans. Algorithms 9(1), 13:1–13:20 (2012). https://doi.org/10.1145/2390176.2390189

    Article  MathSciNet  MATH  Google Scholar 

  3. Amit, M., Crochemore, M., Landau, G.M., Sokol, D.: Locating maximal approximate runs in a string. Theor. Comput. Sci. 700, 45–62 (2017). https://doi.org/10.1016/j.tcs.2017.07.021

    Article  MathSciNet  MATH  Google Scholar 

  4. Apostolico, A., Ehrenfeucht, A.: Efficient detection of quasiperiodicities in strings. Theor. Comput. Sci. 119(2), 247–265 (1993). https://doi.org/10.1016/0304-3975(93)90159-Q

    Article  MathSciNet  MATH  Google Scholar 

  5. Apostolico, A., Farach, M., Iliopoulos, C.S.: Optimal superprimitivity testing for strings. Inf. Process. Lett. 39(1), 17–20 (1991). https://doi.org/10.1016/0020-0190(91)90056-N

    Article  MathSciNet  MATH  Google Scholar 

  6. Breslauer, D.: An on-line string superprimitivity test. Inf. Process. Lett. 44(6), 345–347 (1992). https://doi.org/10.1016/0020-0190(92)90111-8

    Article  MathSciNet  MATH  Google Scholar 

  7. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007). https://doi.org/10.1017/cbo9780511546853

    Book  MATH  Google Scholar 

  8. Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2003). https://doi.org/10.1142/4838

    Book  MATH  Google Scholar 

  9. Fischetti, V.A., Landau, G.M., Sellers, P.H., Schmidt, J.P.: Identifying periodic occurrences of a template with applications to protein structure. Inf. Process. Lett. 45(1), 11–18 (1993). https://doi.org/10.1016/0020-0190(93)90245-5

    Article  MathSciNet  MATH  Google Scholar 

  10. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997). https://doi.org/10.1017/cbo9780511574931

    Book  MATH  Google Scholar 

  11. Kociumaka, T., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: A linear time algorithm for seeds computation. In: Rabani, Y. (ed.) 23rd Annual ACM-SIAM Symposium on Discrete Algorithms. SODA 2012, pp. 1095–1112. SIAM (2012). https://doi.org/10.1137/1.9781611973099

    MATH  Google Scholar 

  12. Kolpakov, R.M., Kucherov, G.: Finding approximate repetitions under Hamming distance. Theor. Comput. Sci. 303(1), 135–156 (2003). https://doi.org/10.1016/s0304-3975(02)00448-6

    Article  MathSciNet  MATH  Google Scholar 

  13. Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989). https://doi.org/10.1016/0196-6774(89)90010-2

    Article  MathSciNet  MATH  Google Scholar 

  14. Li, Y., Smyth, W.F.: Computing the cover array in linear time. Algorithmica 32(1), 95–106 (2002). https://doi.org/10.1007/s00453-001-0062-2

    Article  MathSciNet  MATH  Google Scholar 

  15. Popov, V.Y.: The approximate period problem for DNA alphabet. Theor. Comput. Sci. 304(1–3), 443–447 (2003). https://doi.org/10.1016/S0304-3975(03)00211-1

    Article  MathSciNet  MATH  Google Scholar 

  16. Sim, J.S., Iliopoulos, C.S., Park, K., Smyth, W.F.: Approximate periods of strings. Theor. Comput. Sci. 262(1), 557–568 (2001). https://doi.org/10.1016/S0304-3975(00)00365-0

    Article  MathSciNet  MATH  Google Scholar 

  17. Sokol, D., Benson, G., Tojeira, J.: Tandem repeats over the edit distance. Bioinformatics 23(2), 30–35 (2007). https://doi.org/10.1093/bioinformatics/btl309

    Article  Google Scholar 

  18. Sokol, D., Tojeira, J.: Speeding up the detection of tandem repeats over the edit distance. Theor. Comput. Sci. 525, 103–110 (2014). https://doi.org/10.1016/j.tcs.2013.04.021

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juliusz Straszyński .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kociumaka, T., Radoszewski, J., Rytter, W., Straszyński, J., Waleń, T., Zuba, W. (2018). Faster Recovery of Approximate Periods over Edit Distance. In: Gagie, T., Moffat, A., Navarro, G., Cuadros-Vargas, E. (eds) String Processing and Information Retrieval. SPIRE 2018. Lecture Notes in Computer Science(), vol 11147. Springer, Cham. https://doi.org/10.1007/978-3-030-00479-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00479-8_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00478-1

  • Online ISBN: 978-3-030-00479-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics