Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance

Barton, Carl; Iliopoulos, Costas S.; Pissis, Solon P.; Smyth, William F.

doi:10.1007/978-3-319-19315-1_5

Carl Barton¹⁶,
Costas S. Iliopoulos^16,18,
Solon P. Pissis¹⁶ &
…
William F. Smyth¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8986))

Included in the following conference series:

International Workshop on Combinatorial Algorithms

724 Accesses
1 Citations

Abstract

In this article, we introduce a new and simple data structure, the prefix table under Hamming distance, and present two algorithms to compute it efficiently: one asymptotically fast; the other very fast on average and in practice. Because the latter approach avoids the computation of global data structures, such as the suffix array and the longest common prefix array, it yields algorithms much faster in practice than existing methods. We show how this data structure can be used to solve two string problems of interest: (a) approximate string matching under Hamming distance; and (b) longest approximate overlap under Hamming distance. Analogously, we introduce the prefix table under edit distance, and present an efficient algorithm for its computation. In the process, we also define the border array under both distance measures, and provide an algorithm for conversion between prefix tables and border arrays.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.inf.kcl.ac.uk/research/projects/asmf/.

References

Abrahamson, K.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987)
Article MATH MathSciNet Google Scholar
Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with \(k\) mismatches. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), pp. 794–803. Society for Industrial and Applied Mathematics, USA (2000)
Google Scholar
Bland, W., Kucherov, G., Smyth, W.F.: Prefix table construction and conversion. In: Lecroq, T., Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 41–53. Springer, Heidelberg (2013)
Chapter Google Scholar
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, New York (2007)
Book MATH Google Scholar
Dori, S., Landau, G.M.: Construction of Aho Corasick automaton in linear time for integer alphabets. Inf. Process. Lett. 98(2), 66–72 (2006)
Article MATH MathSciNet Google Scholar
Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011)
Chapter Google Scholar
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)
Article MATH MathSciNet Google Scholar
Fredriksson, K., Navarro, G.: Average-optimal single and multiple approximate string matching. J. Exp. Algorithmics 9, 1–47 (2004). http://doi.acm.org/10.1145/1005813.1041513
MathSciNet Google Scholar
Galil, Z., Giancarlo, R.: Improved string matching with \(k\) mismatches. ACM SIGACT News 17(4), 52–54 (1986)
Article Google Scholar
Hall, H.S., Knight, S.R.: Higher Algebra. MacMillan, London (1950)
Google Scholar
Hsu, P.-H., Chen, K.-Y., Chao, K.-M.: Finding all approximate gapped palindromes. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 1084–1093. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/3-540-12689-9_129
Chapter Google Scholar
Ilie, L., Navarro, G., Tinta, L.: The longest common extension problem revisited and applications to approximate string searching. J. Discrete Algorithms 8(4), 418–428 (2010)
Article MATH MathSciNet Google Scholar
Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27–2, 557–582 (1998)
Article MathSciNet Google Scholar
Landau, G.M., Vishkin, U.: Efficient string matching in the presence of errors. In: IEEE (ed.) Proceedings of the 26th Annual Symposium on Foundations of Computer Science (FOCS 1985), USA, pp. 126–136. IEEE Computer Society (1985)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Technical report 8 (1966)
Google Scholar
Main, M.G., Lorentz, R.J.: An \(\cal O\)(n log n) algorithm for finding all repetitions in a string. J. Algs 5, 422–432 (1984)
Article MATH MathSciNet Google Scholar
Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Proceedings of the 2009 Data Compression Conference, DCC 2009, pp. 193–202, IEEE Computer Society, Washington, DC (2009)
Google Scholar
Pizza & Chili, April 2013. http://pizzachili.dcc.uchile.cl/
Smyth, B.: Computing Patterns in Strings. Pearson Addison-Wesley, London (2003)
Google Scholar
Smyth, W.F., Wang, S.: New perspectives on the prefix array. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 133–143. Springer, Heidelberg (2008)
Chapter Google Scholar
StringPedia, April 2013. http://stringpedia.bsmithers.co.uk
Ukkonen, E.: On approximate string matching. In: Karpinski, M. (ed.) Foundations of Computation Theory. Lecture Notes in Computer Science, vol. 158, pp. 487–495. Springer, Heidelberg (1983). http://dx.doi.org/10.1007/3-540-12689-9_129
Chapter Google Scholar
Välimäki, N., Ladra, S., Mäkinen, V.: Approximate all-pairs suffix/prefix overlaps. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 76–87. Springer, Heidelberg (2010)
Chapter Google Scholar
Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)
Article Google Scholar
Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina paired-end reAd mergeR. Bioinformatics 30(5), 614–620 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

King’s College London, London, UK
Carl Barton, Costas S. Iliopoulos & Solon P. Pissis
McMaster University, Hamilton, Canada
William F. Smyth
University of Western Australia, Crawley, Australia
Costas S. Iliopoulos

Authors

Carl Barton
View author publications
You can also search for this author in PubMed Google Scholar
Costas S. Iliopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Solon P. Pissis
View author publications
You can also search for this author in PubMed Google Scholar
William F. Smyth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Solon P. Pissis .

Editor information

Editors and Affiliations

Faculty of Mathematics and Physics, Charles University, Praha, Czech Republic
Kratochvíl Jan
University of Newcastle, Newcastle, New South Wales, Australia
Mirka Miller
University of Minnesota, Duluth Department of Mathematics, Duluth, Minnesota, USA
Dalibor Froncek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barton, C., Iliopoulos, C.S., Pissis, S.P., Smyth, W.F. (2015). Fast and Simple Computations Using Prefix Tables Under Hamming and Edit Distance. In: Jan, K., Miller, M., Froncek, D. (eds) Combinatorial Algorithms. IWOCA 2014. Lecture Notes in Computer Science(), vol 8986. Springer, Cham. https://doi.org/10.1007/978-3-319-19315-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-19315-1_5
Published: 07 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19314-4
Online ISBN: 978-3-319-19315-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics