Skip to main content

Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

  • Conference paper
  • First Online:
Combinatorial Algorithms (IWOCA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8986))

Included in the following conference series:

  • International Workshop on Combinatorial Algorithms

Abstract

In this article, we introduce a new and simple data structure, the prefix table under Hamming distance, and present two algorithms to compute it efficiently: one asymptotically fast; the other very fast on average and in practice. Because the latter approach avoids the computation of global data structures, such as the suffix array and the longest common prefix array, it yields algorithms much faster in practice than existing methods. We show how this data structure can be used to solve two string problems of interest: (a) approximate string matching under Hamming distance; and (b) longest approximate overlap under Hamming distance. Analogously, we introduce the prefix table under edit distance, and present an efficient algorithm for its computation. In the process, we also define the border array under both distance measures, and provide an algorithm for conversion between prefix tables and border arrays.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.inf.kcl.ac.uk/research/projects/asmf/.

References

  1. Abrahamson, K.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  2. Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with \(k\) mismatches. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), pp. 794–803. Society for Industrial and Applied Mathematics, USA (2000)

    Google Scholar 

  3. Bland, W., Kucherov, G., Smyth, W.F.: Prefix table construction and conversion. In: Lecroq, T., Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 41–53. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  4. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, New York (2007)

    Book  MATH  Google Scholar 

  5. Dori, S., Landau, G.M.: Construction of Aho Corasick automaton in linear time for integer alphabets. Inf. Process. Lett. 98(2), 66–72 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  6. Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  8. Fredriksson, K., Navarro, G.: Average-optimal single and multiple approximate string matching. J. Exp. Algorithmics 9, 1–47 (2004). http://doi.acm.org/10.1145/1005813.1041513

    MathSciNet  Google Scholar 

  9. Galil, Z., Giancarlo, R.: Improved string matching with \(k\) mismatches. ACM SIGACT News 17(4), 52–54 (1986)

    Article  Google Scholar 

  10. Hall, H.S., Knight, S.R.: Higher Algebra. MacMillan, London (1950)

    Google Scholar 

  11. Hsu, P.-H., Chen, K.-Y., Chao, K.-M.: Finding all approximate gapped palindromes. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1084–1093. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/3-540-12689-9_129

    Chapter  Google Scholar 

  12. Ilie, L., Navarro, G., Tinta, L.: The longest common extension problem revisited and applications to approximate string searching. J. Discrete Algorithms 8(4), 418–428 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  13. Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27–2, 557–582 (1998)

    Article  MathSciNet  Google Scholar 

  14. Landau, G.M., Vishkin, U.: Efficient string matching in the presence of errors. In: IEEE (ed.) Proceedings of the 26th Annual Symposium on Foundations of Computer Science (FOCS 1985), USA, pp. 126–136. IEEE Computer Society (1985)

    Google Scholar 

  15. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical report 8 (1966)

    Google Scholar 

  16. Main, M.G., Lorentz, R.J.: An \(\cal O\)(n log n) algorithm for finding all repetitions in a string. J. Algs 5, 422–432 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  17. Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Proceedings of the 2009 Data Compression Conference, DCC 2009, pp. 193–202, IEEE Computer Society, Washington, DC (2009)

    Google Scholar 

  18. Pizza & Chili, April 2013. http://pizzachili.dcc.uchile.cl/

  19. Smyth, B.: Computing Patterns in Strings. Pearson Addison-Wesley, London (2003)

    Google Scholar 

  20. Smyth, W.F., Wang, S.: New perspectives on the prefix array. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 133–143. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  21. StringPedia, April 2013. http://stringpedia.bsmithers.co.uk

  22. Ukkonen, E.: On approximate string matching. In: Karpinski, M. (ed.) Foundations of Computation Theory. Lecture Notes in Computer Science, vol. 158, pp. 487–495. Springer, Heidelberg (1983). http://dx.doi.org/10.1007/3-540-12689-9_129

    Chapter  Google Scholar 

  23. Välimäki, N., Ladra, S., Mäkinen, V.: Approximate all-pairs suffix/prefix overlaps. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 76–87. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  24. Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)

    Article  Google Scholar 

  25. Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina paired-end reAd mergeR. Bioinformatics 30(5), 614–620 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Solon P. Pissis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Barton, C., Iliopoulos, C.S., Pissis, S.P., Smyth, W.F. (2015). Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance. In: Jan, K., Miller, M., Froncek, D. (eds) Combinatorial Algorithms. IWOCA 2014. Lecture Notes in Computer Science(), vol 8986. Springer, Cham. https://doi.org/10.1007/978-3-319-19315-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19315-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19314-4

  • Online ISBN: 978-3-319-19315-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics