Abstract
A pattern \(\alpha \) is a string of variables and terminal letters. We say that \(\alpha \) matches a word w, consisting only of terminal letters, if w can be obtained by replacing the variables of \(\alpha \) by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patterns with restricted structure. If we are interested in what is the minimum Hamming distance between w and any word u obtained by replacing the variables of \(\alpha \) by terminal words (so matching under Hamming distance), one can devise efficient algorithms and matching conditional lower bounds for the class of regular patterns (in which no variable occurs twice), as well as for classes of patterns where we allow unbounded repetitions of variables, but restrict the structure of the pattern, i.e., the way the occurrences of different variables can be interleaved. Moreover, under Hamming distance, if a variable occurs more than once and its occurrences can be interleaved arbitrarily with those of other variables, even if each of these occurs just once, the matching problem is intractable. In this paper, we consider the problem of matching patterns with variables under edit distance. We still obtain efficient algorithms and matching conditional lower bounds for the class of regular patterns, but show that the problem becomes, in this case, intractable already for unary patterns, consisting of repeated occurrences of a single variable interleaved with terminals.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amir, A., Nor, I.: Generalized function matching. J. Discrete Algorithms 5, 514–523 (2007). https://doi.org/10.1016/j.jda.2006.10.001
Angluin, D.: Finding patterns common to a set of strings. J. Comput. Syst. Sci. 21(1), 46–62 (1980)
Backurs, A., Indyk, P.: Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput. 47(3), 1087–1097 (2018)
Bernardini, G., et al.: String sanitization under edit distance. In: 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). LIPIcs, vol. 161, pp. 7:1–7:14 (2020). https://doi.org/10.4230/LIPIcs.CPM.2020.7
Bernardini, G., Pisanti, N., Pissis, S.P., Rosone, G.: Approximate pattern matching on elastic-degenerate text. Theor. Comput. Sci. 812, 109–122 (2020)
Câmpeanu, C., Salomaa, K., Yu, S.: A formal study of practical regular expressions. Int. J. Found. Comput. Sci. 14, 1007–1018 (2003). https://doi.org/10.1142/S012905410300214X
Charalampopoulos, P., Kociumaka, T., Mozes, S.: Dynamic string alignment. In: 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). LIPIcs, vol. 161, pp. 9:1–9:13 (2020). https://doi.org/10.4230/LIPIcs.CPM.2020.9
Charalampopoulos, P., Kociumaka, T., Wellnitz, P.: Faster approximate pattern matching: a unified approach. In: Irani, S. (ed.) 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, pp. 978–989. IEEE (2020). https://doi.org/10.1109/FOCS46700.2020.00095
Charalampopoulos, P., Kociumaka, T., Wellnitz, P.: Faster pattern matching under edit distance. arXiv preprint arXiv:2204.03087 (2022)
Day, J.D., Fleischmann, P., Manea, F., Nowotka, D.: Local patterns. In: Proceedings of the 37th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2017). LIPIcs, vol. 93, pp. 24:1–24:14 (2017). https://doi.org/10.4230/LIPIcs.FSTTCS.2017.24
Downey, R.G., Fellows, M.R.: Parameterized complexity. In: Monographs in Computer Science, Springer, NY (1999). https://doi.org/10.1007/978-1-4612-0515-9
Fagin, R., Kimelfeld, B., Reiss, F., Vansummeren, S.: Document spanners: a formal approach to information extraction. J. ACM 62(2), 12:1–12:51 (2015). https://doi.org/10.1145/2699442
Fernau, H., Manea, F., Mercas, R., Schmid, M.L.: Revisiting Shinohara’s algorithm for computing descriptive patterns. Theor. Comput. Sci. 733, 44–54 (2018)
Fernau, H., Manea, F., Mercas, R., Schmid, M.L.: Pattern matching with variables: efficient algorithms and complexity results. ACM Trans. Comput. Theory 12(1), 6:1–6:37 (2020). https://doi.org/10.1145/3369935
Fernau, H., Schmid, M.L.: Pattern matching with variables: a multivariate complexity analysis. Inf. Comput. 242, 287–305 (2015). https://doi.org/10.1016/j.ic.2015.03.006
Fernau, H., Schmid, M.L., Villanger, Y.: On the parameterised complexity of string morphism problems. Theory Comput. Syst. 59(1), 24–51 (2016)
Freydenberger, D.D.: Extended regular expressions: succinctness and decidability. Theory Comput. Syst. 53, 159–193 (2013). https://doi.org/10.1007/s00224-012-9389-0
Freydenberger, D.D.: A logic for document spanners. Theory Comput. Syst. 63(7), 1679–1754 (2019)
Freydenberger, D.D., Holldack, M.: Document spanners: from expressive power to decision problems. Theory Comput. Syst. 62(4), 854–898 (2018)
Freydenberger, D.D., Schmid, M.L.: Deterministic regular expressions with back-references. J. Comput. Syst. Sci. 105, 1–39 (2019)
Friedl, J.E.F.: Mastering Regular Expressions, 3rd edn. O’Reilly, Sebastopol, CA (2006)
Gawrychowski, P., Manea, F., Siemer, S.: Matching patterns with variables under hamming distance. In: 46th International Symposium on Mathematical Foundations of Computer Science, MFCS 2021. LIPIcs, vol. 202, pp. 48:1–48:24 (2021). https://doi.org/10.4230/LIPIcs.MFCS.2021.48
Gawrychowski, P., Manea, F., Siemer, S.: Matching patterns with variables under edit distance (2022). https://doi.org/10.48550/ARXIV.2207.07477
Hoppenworth, G., Bentley, J.W., Gibney, D., Thankachan, S.V.: The fine-grained complexity of median and center string problems under edit distance. In: 28th Annual European Symposium on Algorithms (ESA 2020). LIPIcs, vol. 173, pp. 61:1–61:19 (2020). https://doi.org/10.4230/LIPIcs.ESA.2020.61
Kleest-Meißner, S., Sattler, R., Schmid, M.L., Schweikardt, N., Weidlich, M.: Discovering event queries from traces: laying foundations for subsequence-queries with wildcards and gap-size constraints. In: 25th International Conference on Database Theory, ICDT 2022. LIPIcs, vol. 220, pp. 18:1–18:21 (2022). https://doi.org/10.4230/LIPIcs.ICDT.2022.18
Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)
Levenshtein, V.: Binary codes capable of correcting spurious insertions and deletions of ones. Probl. Inf. Transm. 1, 8–17 (1965)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707 (1966)
Lothaire, M.: Combinatorics on Words. Cambridge University Press, Cambridge (1997). https://doi.org/10.1017/CBO9780511566097
Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002). https://doi.org/10.1017/CBO9781107326019
Manea, F., Schmid, M.L.: Matching patterns with variables. In: Mercaş, R., Reidenbach, D. (eds.) WORDS 2019. LNCS, vol. 11682, pp. 1–27. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28796-2_1
Mieno, T., Pissis, S.P., Stougie, L., Sweering, M.: String sanitization under edit distance: improved and generalized. In: 32nd Annual Symposium on Combinatorial Pattern Matching, CPM 2021. LIPIcs, vol. 191, pp. 19:1–19:18 (2021)
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Nicolas, F., Rivals, E.: Hardness results for the center and median string problems under the weighted and unweighted edit distances. J. Discrete Algorithms 3(2–4), 390–415 (2005)
Reidenbach, D., Schmid, M.L.: Patterns with bounded treewidth. Inf. Comput. 239, 87–99 (2014)
Sankoff, D.: Minimal mutation trees of sequences. SIAM J. Appl. Math. 28(1), 35–42 (1975)
Schmid, M.L.: A note on the complexity of matching patterns with variables. Inf. Process. Lett. 113(19), 729–733 (2013). https://doi.org/10.1016/j.ipl.2013.06.011
Schmid, M.L., Schweikardt, N.: A purely regular approach to non-regular core spanners. In: Proceedings of the 24th International Conference on Database Theory, ICDT 2021. LIPIcs, vol. 186, pp. 4:1–4:19 (2021). https://doi.org/10.4230/LIPIcs.ICDT.2021.4
Schmid, M.L., Schweikardt, N.: Document spanners - a brief overview of concepts, results, and recent developments. In: PODS 2022: International Conference on Management of Data, pp. 139–150. ACM (2022). https://doi.org/10.1145/3517804.3526069
Shinohara, T.: Polynomial time inference of pattern languages and its application. In: Proceedings of the 7th IBM Symposium on Mathematical Foundations of Computer Science, MFCS, pp. 191–209 (1982)
Shinohara, T., Arikawa, S.: Pattern inference. In: Jantke, K.P., Lange, S. (eds.) Algorithmic Learning for Knowledge-Based Systems. LNCS, vol. 961, pp. 259–291. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60217-8_13
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gawrychowski, P., Manea, F., Siemer, S. (2022). Matching Patterns with Variables Under Edit Distance. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-20643-6_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20642-9
Online ISBN: 978-3-031-20643-6
eBook Packages: Computer ScienceComputer Science (R0)