Skip to main content

Matching Patterns with Variables Under Edit Distance

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2022)

Abstract

A pattern \(\alpha \) is a string of variables and terminal letters. We say that \(\alpha \) matches a word w, consisting only of terminal letters, if w can be obtained by replacing the variables of \(\alpha \) by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patterns with restricted structure. If we are interested in what is the minimum Hamming distance between w and any word u obtained by replacing the variables of \(\alpha \) by terminal words (so matching under Hamming distance), one can devise efficient algorithms and matching conditional lower bounds for the class of regular patterns (in which no variable occurs twice), as well as for classes of patterns where we allow unbounded repetitions of variables, but restrict the structure of the pattern, i.e., the way the occurrences of different variables can be interleaved. Moreover, under Hamming distance, if a variable occurs more than once and its occurrences can be interleaved arbitrarily with those of other variables, even if each of these occurs just once, the matching problem is intractable. In this paper, we consider the problem of matching patterns with variables under edit distance. We still obtain efficient algorithms and matching conditional lower bounds for the class of regular patterns, but show that the problem becomes, in this case, intractable already for unary patterns, consisting of repeated occurrences of a single variable interleaved with terminals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amir, A., Nor, I.: Generalized function matching. J. Discrete Algorithms 5, 514–523 (2007). https://doi.org/10.1016/j.jda.2006.10.001

    Article  MathSciNet  MATH  Google Scholar 

  2. Angluin, D.: Finding patterns common to a set of strings. J. Comput. Syst. Sci. 21(1), 46–62 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  3. Backurs, A., Indyk, P.: Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput. 47(3), 1087–1097 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bernardini, G., et al.: String sanitization under edit distance. In: 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). LIPIcs, vol. 161, pp. 7:1–7:14 (2020). https://doi.org/10.4230/LIPIcs.CPM.2020.7

  5. Bernardini, G., Pisanti, N., Pissis, S.P., Rosone, G.: Approximate pattern matching on elastic-degenerate text. Theor. Comput. Sci. 812, 109–122 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  6. Câmpeanu, C., Salomaa, K., Yu, S.: A formal study of practical regular expressions. Int. J. Found. Comput. Sci. 14, 1007–1018 (2003). https://doi.org/10.1142/S012905410300214X

    Article  MathSciNet  MATH  Google Scholar 

  7. Charalampopoulos, P., Kociumaka, T., Mozes, S.: Dynamic string alignment. In: 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). LIPIcs, vol. 161, pp. 9:1–9:13 (2020). https://doi.org/10.4230/LIPIcs.CPM.2020.9

  8. Charalampopoulos, P., Kociumaka, T., Wellnitz, P.: Faster approximate pattern matching: a unified approach. In: Irani, S. (ed.) 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, pp. 978–989. IEEE (2020). https://doi.org/10.1109/FOCS46700.2020.00095

  9. Charalampopoulos, P., Kociumaka, T., Wellnitz, P.: Faster pattern matching under edit distance. arXiv preprint arXiv:2204.03087 (2022)

  10. Day, J.D., Fleischmann, P., Manea, F., Nowotka, D.: Local patterns. In: Proceedings of the 37th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2017). LIPIcs, vol. 93, pp. 24:1–24:14 (2017). https://doi.org/10.4230/LIPIcs.FSTTCS.2017.24

  11. Downey, R.G., Fellows, M.R.: Parameterized complexity. In: Monographs in Computer Science, Springer, NY (1999). https://doi.org/10.1007/978-1-4612-0515-9

  12. Fagin, R., Kimelfeld, B., Reiss, F., Vansummeren, S.: Document spanners: a formal approach to information extraction. J. ACM 62(2), 12:1–12:51 (2015). https://doi.org/10.1145/2699442

  13. Fernau, H., Manea, F., Mercas, R., Schmid, M.L.: Revisiting Shinohara’s algorithm for computing descriptive patterns. Theor. Comput. Sci. 733, 44–54 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  14. Fernau, H., Manea, F., Mercas, R., Schmid, M.L.: Pattern matching with variables: efficient algorithms and complexity results. ACM Trans. Comput. Theory 12(1), 6:1–6:37 (2020). https://doi.org/10.1145/3369935

  15. Fernau, H., Schmid, M.L.: Pattern matching with variables: a multivariate complexity analysis. Inf. Comput. 242, 287–305 (2015). https://doi.org/10.1016/j.ic.2015.03.006

    Article  MathSciNet  MATH  Google Scholar 

  16. Fernau, H., Schmid, M.L., Villanger, Y.: On the parameterised complexity of string morphism problems. Theory Comput. Syst. 59(1), 24–51 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  17. Freydenberger, D.D.: Extended regular expressions: succinctness and decidability. Theory Comput. Syst. 53, 159–193 (2013). https://doi.org/10.1007/s00224-012-9389-0

    Article  MathSciNet  MATH  Google Scholar 

  18. Freydenberger, D.D.: A logic for document spanners. Theory Comput. Syst. 63(7), 1679–1754 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  19. Freydenberger, D.D., Holldack, M.: Document spanners: from expressive power to decision problems. Theory Comput. Syst. 62(4), 854–898 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  20. Freydenberger, D.D., Schmid, M.L.: Deterministic regular expressions with back-references. J. Comput. Syst. Sci. 105, 1–39 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  21. Friedl, J.E.F.: Mastering Regular Expressions, 3rd edn. O’Reilly, Sebastopol, CA (2006)

    Google Scholar 

  22. Gawrychowski, P., Manea, F., Siemer, S.: Matching patterns with variables under hamming distance. In: 46th International Symposium on Mathematical Foundations of Computer Science, MFCS 2021. LIPIcs, vol. 202, pp. 48:1–48:24 (2021). https://doi.org/10.4230/LIPIcs.MFCS.2021.48

  23. Gawrychowski, P., Manea, F., Siemer, S.: Matching patterns with variables under edit distance (2022). https://doi.org/10.48550/ARXIV.2207.07477

  24. Hoppenworth, G., Bentley, J.W., Gibney, D., Thankachan, S.V.: The fine-grained complexity of median and center string problems under edit distance. In: 28th Annual European Symposium on Algorithms (ESA 2020). LIPIcs, vol. 173, pp. 61:1–61:19 (2020). https://doi.org/10.4230/LIPIcs.ESA.2020.61

  25. Kleest-Meißner, S., Sattler, R., Schmid, M.L., Schweikardt, N., Weidlich, M.: Discovering event queries from traces: laying foundations for subsequence-queries with wildcards and gap-size constraints. In: 25th International Conference on Database Theory, ICDT 2022. LIPIcs, vol. 220, pp. 18:1–18:21 (2022). https://doi.org/10.4230/LIPIcs.ICDT.2022.18

  26. Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  27. Levenshtein, V.: Binary codes capable of correcting spurious insertions and deletions of ones. Probl. Inf. Transm. 1, 8–17 (1965)

    MATH  Google Scholar 

  28. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707 (1966)

    MathSciNet  Google Scholar 

  29. Lothaire, M.: Combinatorics on Words. Cambridge University Press, Cambridge (1997). https://doi.org/10.1017/CBO9780511566097

    Book  MATH  Google Scholar 

  30. Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002). https://doi.org/10.1017/CBO9781107326019

    Book  MATH  Google Scholar 

  31. Manea, F., Schmid, M.L.: Matching patterns with variables. In: Mercaş, R., Reidenbach, D. (eds.) WORDS 2019. LNCS, vol. 11682, pp. 1–27. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28796-2_1

    Chapter  Google Scholar 

  32. Mieno, T., Pissis, S.P., Stougie, L., Sweering, M.: String sanitization under edit distance: improved and generalized. In: 32nd Annual Symposium on Combinatorial Pattern Matching, CPM 2021. LIPIcs, vol. 191, pp. 19:1–19:18 (2021)

    Google Scholar 

  33. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  34. Nicolas, F., Rivals, E.: Hardness results for the center and median string problems under the weighted and unweighted edit distances. J. Discrete Algorithms 3(2–4), 390–415 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  35. Reidenbach, D., Schmid, M.L.: Patterns with bounded treewidth. Inf. Comput. 239, 87–99 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  36. Sankoff, D.: Minimal mutation trees of sequences. SIAM J. Appl. Math. 28(1), 35–42 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  37. Schmid, M.L.: A note on the complexity of matching patterns with variables. Inf. Process. Lett. 113(19), 729–733 (2013). https://doi.org/10.1016/j.ipl.2013.06.011

    Article  MathSciNet  MATH  Google Scholar 

  38. Schmid, M.L., Schweikardt, N.: A purely regular approach to non-regular core spanners. In: Proceedings of the 24th International Conference on Database Theory, ICDT 2021. LIPIcs, vol. 186, pp. 4:1–4:19 (2021). https://doi.org/10.4230/LIPIcs.ICDT.2021.4

  39. Schmid, M.L., Schweikardt, N.: Document spanners - a brief overview of concepts, results, and recent developments. In: PODS 2022: International Conference on Management of Data, pp. 139–150. ACM (2022). https://doi.org/10.1145/3517804.3526069

  40. Shinohara, T.: Polynomial time inference of pattern languages and its application. In: Proceedings of the 7th IBM Symposium on Mathematical Foundations of Computer Science, MFCS, pp. 191–209 (1982)

    Google Scholar 

  41. Shinohara, T., Arikawa, S.: Pattern inference. In: Jantke, K.P., Lange, S. (eds.) Algorithmic Learning for Knowledge-Based Systems. LNCS, vol. 961, pp. 259–291. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60217-8_13

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefan Siemer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gawrychowski, P., Manea, F., Siemer, S. (2022). Matching Patterns with Variables Under Edit Distance. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20643-6_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20642-9

  • Online ISBN: 978-3-031-20643-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics