Abstract
In this paper, we discuss word sequence matching, and we adapt the common edit distance metric for approximate string matching to searching for words and sequences of words. We furthermore create a variant of the Sparse Suffix Tree([3]) and adapt algorithms for approximate word and word sequence matching over the Sparse Suffix Tree variant. The algorithms have been implemented and tested in WWW information retrieval environment, and performance data is presented.
Preview
Unable to display preview. Download preview PDF.
References
Cobbs A. L. (1995) “Fast Approximate Matching using Suffix Trees,” In Proceedings of Sixth Symposium on Combinatorial Pattern Matching (CPM'95) Springer Verlag, pp. 41–54.
Gonnet G.H, Baeza-Yates R.A., Snider T. (1991) “Lexicographical indices for text: Inverted files vs. PAT trees.,” Technical Report OED-91-10, Center for the new OED, University of Waterloo.
Kärkkäinen J., Ukkonen E. “Sparse Suffix Trees“ In Proceedings of the Second Annual International Computing and Combinatorias Conference (COCOON 96), Springer Verlag, pp. 219–230.
Levenstein, V.I. (1965) “Binary codes capable of correcting deletions, insertions, and reversals,” (Russian) Doklady Akademii nauk SSSR, Vol. 163, No. 4, p. 845–8 (also Cybernetics and Control Theory, Vol. 10, No. 8, p. 707–10, 1966).
Morrison D.R. (1968) “PATRICIA — Practical Algorithm To Retrieve Information Coded in Alphanumeric,” Journal of the ACM, 15, pp. 514–534.
Sbang H., Merrettal T.H. (1996) “Tries for Approximate String Matching,” IEEE Transactions on Knowledge and Data Engineering, Vol 5, No. 4, p. 540–547.
Ukkonen E. (1985) “Finding Approximate Patterns in Strings,” Journal of Algorithms, vol. 6, pp. 132–137.
Weiner P. (1973) “Linear pattern matching algorithms,” In Proceedings of the IEEE 14th Annual Symposium on Switching and Automata Theory, pp. 1–11.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Risvik, K.M. (1998). Approximate word sequence matching over Sparse Suffix Trees. In: Farach-Colton, M. (eds) Combinatorial Pattern Matching. CPM 1998. Lecture Notes in Computer Science, vol 1448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0030781
Download citation
DOI: https://doi.org/10.1007/BFb0030781
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64739-3
Online ISBN: 978-3-540-69054-2
eBook Packages: Springer Book Archive