Abstract
In this article, we introduce a new and simple data structure, the prefix table under Hamming distance, and present two algorithms to compute it efficiently: one asymptotically fast; the other very fast on average and in practice. Because the latter approach avoids the computation of global data structures, such as the suffix array and the longest common prefix array, it yields algorithms much faster in practice than existing methods. We show how this data structure can be used to solve two string problems of interest: (a) approximate string matching under Hamming distance; and (b) longest approximate overlap under Hamming distance. Analogously, we introduce the prefix table under edit distance, and present an efficient algorithm for its computation. In the process, we also define the border array under both distance measures, and provide an algorithm for conversion between prefix tables and border arrays.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abrahamson, K.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987)
Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with \(k\) mismatches. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), pp. 794–803. Society for Industrial and Applied Mathematics, USA (2000)
Bland, W., Kucherov, G., Smyth, W.F.: Prefix table construction and conversion. In: Lecroq, T., Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 41–53. Springer, Heidelberg (2013)
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, New York (2007)
Dori, S., Landau, G.M.: Construction of Aho Corasick automaton in linear time for integer alphabets. Inf. Process. Lett. 98(2), 66–72 (2006)
Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011)
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)
Fredriksson, K., Navarro, G.: Average-optimal single and multiple approximate string matching. J. Exp. Algorithmics 9, 1–47 (2004). http://doi.acm.org/10.1145/1005813.1041513
Galil, Z., Giancarlo, R.: Improved string matching with \(k\) mismatches. ACM SIGACT News 17(4), 52–54 (1986)
Hall, H.S., Knight, S.R.: Higher Algebra. MacMillan, London (1950)
Hsu, P.-H., Chen, K.-Y., Chao, K.-M.: Finding all approximate gapped palindromes. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1084–1093. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/3-540-12689-9_129
Ilie, L., Navarro, G., Tinta, L.: The longest common extension problem revisited and applications to approximate string searching. J. Discrete Algorithms 8(4), 418–428 (2010)
Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27–2, 557–582 (1998)
Landau, G.M., Vishkin, U.: Efficient string matching in the presence of errors. In: IEEE (ed.) Proceedings of the 26th Annual Symposium on Foundations of Computer Science (FOCS 1985), USA, pp. 126–136. IEEE Computer Society (1985)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical report 8 (1966)
Main, M.G., Lorentz, R.J.: An \(\cal O\)(n log n) algorithm for finding all repetitions in a string. J. Algs 5, 422–432 (1984)
Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Proceedings of the 2009 Data Compression Conference, DCC 2009, pp. 193–202, IEEE Computer Society, Washington, DC (2009)
Pizza & Chili, April 2013. http://pizzachili.dcc.uchile.cl/
Smyth, B.: Computing Patterns in Strings. Pearson Addison-Wesley, London (2003)
Smyth, W.F., Wang, S.: New perspectives on the prefix array. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 133–143. Springer, Heidelberg (2008)
StringPedia, April 2013. http://stringpedia.bsmithers.co.uk
Ukkonen, E.: On approximate string matching. In: Karpinski, M. (ed.) Foundations of Computation Theory. Lecture Notes in Computer Science, vol. 158, pp. 487–495. Springer, Heidelberg (1983). http://dx.doi.org/10.1007/3-540-12689-9_129
Välimäki, N., Ladra, S., Mäkinen, V.: Approximate all-pairs suffix/prefix overlaps. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 76–87. Springer, Heidelberg (2010)
Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)
Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina paired-end reAd mergeR. Bioinformatics 30(5), 614–620 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Barton, C., Iliopoulos, C.S., Pissis, S.P., Smyth, W.F. (2015). Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance. In: Jan, K., Miller, M., Froncek, D. (eds) Combinatorial Algorithms. IWOCA 2014. Lecture Notes in Computer Science(), vol 8986. Springer, Cham. https://doi.org/10.1007/978-3-319-19315-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-19315-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19314-4
Online ISBN: 978-3-319-19315-1
eBook Packages: Computer ScienceComputer Science (R0)