Skip to main content

Advertisement

Log in

Serial and parallel algorithms for order-preserving pattern matching based on the duel-and-sweep paradigm

  • Original Article
  • Published:
Acta Informatica Aims and scope Submit manuscript

Abstract

Given a text and a pattern over an alphabet, the classic exact matching problem searches for all occurrences of the pattern in the text. Unlike exact matching, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their exact values. In this paper, we propose efficient algorithms for the OPPM problem using the “duel-and-sweep” paradigm. For a pattern of length m and a text of length n, our serial algorithm runs in \(O(n + m\log m)\) time, and our parallel algorithm runs in \(O(\log ^2 m)\) time and \(O(n \log ^2 m)\) work with \(O(\log m)\) time and \(O(m \log m)\) work pattern preprocessing on the Priority Concurrent Read Concurrent Write Parallel Random-Access Machines (P-CRCW PRAM).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

Notes

  1. Our preliminary paper [18] on this topic presented in SOFSEM 2020 is in error.

  2. The value of the last argument r in the function GetMismatchPos \( Lmax _{X}, Lmin _{X},Y,r\) is usually 0 except the call from Algorithm 12.

  3. Actually Lemma 29 holds when \(i > j\) and \(j-i + 1 \le w_1\), but we are concerned only with the case where \(i < j\).

References

  1. Amir, A., Kondratovsky, E.: Sufficient conditions for efficient indexing under different matchings. In: Proceedings of 30th annual symposium on combinatorial pattern matching (CPM 2019), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2019)

  2. Amir, A., Benson, G., Farach, M.: An alphabet independent approach to two-dimensional pattern matching. SIAM J. Comput. 23(2), 313–323 (1994)

    Article  MathSciNet  Google Scholar 

  3. Berkman, O., Schieber, B., Vishkin, U.: Optimal doubly logarithmic parallel algorithms based on finding all nearest smaller values. J. Algorithms 14(3), 344–370 (1993)

    Article  MathSciNet  Google Scholar 

  4. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun of the ACM 20(10), 762–772 (1977). https://doi.org/10.1145/359842.3598599

    Article  Google Scholar 

  5. Cantone, D., Faro, S., Külekci, M.O.: An efficient skip-search approach to the order-preserving pattern matching problem. In: PSC, pp 22–35 (2015)

  6. Chhabra, T., Tarhio, J.: A filtration method for order-preserving matching. Inf. Process. Lett. 116(2), 71–74 (2016). https://doi.org/10.1016/j.ipl.2015.10.005

    Article  MathSciNet  Google Scholar 

  7. Chhabra, T., Külekci, M.O., Tarhio, J.: Alternative algorithms for order-preserving matching. In: PSC, pp 36–46 (2015)

  8. Cho, S., Na, J.C., Park, K., et al.: A fast algorithm for order-preserving pattern matching. Inf. Process. Lett. 115(2), 397–402 (2015)

    Article  MathSciNet  Google Scholar 

  9. Cole, R.: Parallel merge sort. SIAM J. Comput. 17(4), 770–785 (1988)

    Article  MathSciNet  Google Scholar 

  10. Cole, R., Hazay, C., Lewenstein, M., et al.: Two-dimensional parameterized matching. ACM Trans. Algorithms 11(2), 1–12 (2014). https://doi.org/10.1145/2650220

    Article  MathSciNet  Google Scholar 

  11. Crochemore, M., Iliopoulos, C.S., Kociumaka, T., et al.: Order-preserving indexing. Theor. Comput. Sci. 638, 122–135 (2016). https://doi.org/10.1016/j.tcs.2015.06.050

    Article  MathSciNet  Google Scholar 

  12. Faro, S., Külekci, M. O.: Efficient algorithms for the order preserving pattern matching problem. In: International Conference on Algorithmic Applications in Management, Springer, pp 185–196 (2016)

  13. Faro, S., Lecroq, T.: The exact online string matching problem: a review of the most recent results. ACM Comput. Surv. (CSUR) 45(2), 1–42 (2013)

    Article  Google Scholar 

  14. Hasan, M.M., Islam, A.S., Rahman, M.S., et al.: Order preserving pattern matching revisited. Pattern Recogn. Lett. 55, 15–21 (2015)

    Article  Google Scholar 

  15. Horspool, R.N.: Practical fast searching in strings. Softw: Pract. Exp. 10(6), 501–506 (1980). https://doi.org/10.1002/spe.4380100608

    Article  Google Scholar 

  16. JáJá, J.: An Introduction to Parallel Algorithms, vol. 17. Addison-Wesley, Reading (1992)

    Google Scholar 

  17. Jargalsaikhan, D., Diptarama, Ueki, Y. et al: Duel and sweep algorithm for order-preserving pattern matching. In: SOFSEM 2018: theory and practice of computer science 44th international conference on current trends in theory and practice of computer science, Krems, Austria, January 29-February 2, Proceedings pp 624-635, (2018)

  18. Jargalsaikhan, D., Hendrian, D., Yoshinaka, R. et al: Parallel duel-and-sweep algorithm for the order-preserving pattern matching. In: International conference on current trends in theory and practice of informatics, pp 211–222 (2020)

  19. Jargalsaikhan, D., Hendrian, D., Yoshinaka, R. et al: Parallel algorithm for pattern matching problems under substring consistent equivalence relations. In: 33rd Annual symposium on combinatorial pattern matching, CPM 2022, June 27-29, 2022, Prague, Czech Republic, LIPIcs, vol 223. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, pp 28:1–28:21, (2022a) https://doi.org/10.4230/LIPIcs.CPM.2022.28

  20. Jargalsaikhan, D., Hendrian, D., Yoshinaka, R. et al: Parallel algorithm for pattern matching problems under substring consistent equivalence relations. CoRR abs/2202.13284. (2022b) https://arxiv.org/abs/2202.13284, 2202.13284

  21. Kim, J., Eades, P., Fleischer, R., et al.: Order-preserving matching. Theoret. Comput. Sci. 525, 68–79 (2014)

    Article  MathSciNet  Google Scholar 

  22. Knuth, D.E., Morris, J.H., Jr., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977). https://doi.org/10.1137/0206024

    Article  MathSciNet  Google Scholar 

  23. Kubica, M., Kulczyński, T., Radoszewski, J., et al.: A linear time algorithm for consecutive permutation pattern matching. Inf. Process. Lett. 113(12), 430–433 (2013)

    Article  MathSciNet  Google Scholar 

  24. Matsuoka, Y., Aoki, T., Inenaga, S., et al.: Generalized pattern matching and periodicity under substring consistent equivalence relations. Theoret. Comput. Sci. 656, 225–233 (2016)

    Article  MathSciNet  Google Scholar 

  25. Ueki, Y., Narisawa, K., Shinohara, A.: A fast order-preserving matching with \(q\)-neighborhood filtration using SIMD instructions. In: SOFSEM (Student Research Forum Papers/Posters), pp 108–115 (2016)

  26. Vishkin, U.: Optimal Parallel Pattern Matching in Strings. International Colloquium on Automata, Languages, and Programming, pp. 497–508. Springer, Berlin (1985)

    Google Scholar 

  27. Vishkin, U.: Deterministic sampling: a new technique for fast pattern matching. SIAM J. Comput. 20(1), 22–40 (1991)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to express our sincere gratitude to the anonymous reviewers for their constructive comments and valuable feedback. Their contributions have been helpful in refining the final version of this paper.This work was supported by JSPS KAKENHI Grant Numbers JP15H05706, JP18K11150, JP19K20208, JP20H05703, and JP21K11745. This work was also supported by ImPACT Program of the Council for Science, Technology and Innovation (Cabinet Office, Government of Japan). Davaajav Jargalsaikhan was supported by a research grant from Tohoku University Division for International Advanced Research and Education.

Author information

Authors and Affiliations

Authors

Contributions

All the co-authors contributed equally to this work.

Corresponding authors

Correspondence to Diptarama Hendrian, Ryo Yoshinaka or Ayumi Shinohara.

Ethics declarations

Conflict of interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A. Comparison with the parallel SCER matching algorithms

Appendix A. Comparison with the parallel SCER matching algorithms

Our proposed parallel algorithm is largely based on the general matching algorithm for arbitrary SCERs proposed by Jargalsaikhan et al. [19, 20]. An equivalence relation \(\cong \) over \(\Sigma ^*\) is called an SCER just in the case where \(X \cong Y\) implies \(|X|=|Y|\) and \(X[i \mathbin {:} j] \cong Y[i \mathbin {:} j]\) for any \(1 \le i \le j \le |X|\). Clearly, the OPM relation \(\approx \) is an SCER. The algorithm in [19] uses an encoding that satisfies the following conditions.

Definition 34

(\(\cong \)-encoding, [19, Definition 3]) Let \(\Sigma \) and \(\Delta \) be (possibly infinite) alphabets. We say a function \(f:\Sigma ^* \rightarrow \Delta ^*\) is an \(\cong \)-encoding if

  1. (1)

    \(f(X) = f(Y)\) iff \(X \cong Y\),

  2. (2)

    \(f(X[1:i]) = f(X)[1:i]\) for any \(i \le |X|\), and

  3. (3)

    \(f(X)[i] = f(Y)[i]\) implies \(f(X[j+1:k])[i-j] = f(Y[j+1:k])[i-j]\) for any \(j < i \le k\).

If \(X \not \cong Y\), one can find a witness position i such that \(f(X)[i] \ne f(Y)[i]\). The conditions (2) and (3) imply that when a position witnesses mismatch between substrings of X and Y, then that position witnesses the mismatch between the whole strings X and Y as well. This is an important property to “transfer” witnesses on an offset for other offsets in their algorithm. Accordingly, the efficiency of their algorithm depends on the efficiency of the calculation of the encoding of a string as well as that of recalculating the encoding of a substring from the encoding of the whole string.

Indeed, every SCER \(\cong \) admits an encoding satisfying the above: let \(\Delta \) be the power set of \(\Sigma ^*\), and \(f(\varepsilon )=\varepsilon \) and \(f(Xc) = f(X) \cdot [Xc]_{\cong }\) for \(c \in \Sigma \) where \([Y]_{\cong }\) is the equivalence class of \(Y \in \Sigma ^*\) under \(\cong \) (or, \(\Delta \) can be any set of symbols that can represent those equivalence classes). This construction guarantees that their algorithm works for every SCER in theory, but computing this encoding is apparently expensive. Actually, many SCERs, like exact match, parameterized match, and Cartesian match, admit computationally cheaper encodings satisfying Definition 3435. However, we do not yet know if there is such a reasonable encoding for OPM.

Concerning the OPM relation \(\approx \), recall that \(X \approx Y\) if and only if \( Lmax _{X} = Lmax _{Y}\) and \( Lmin _{X} = Lmin _{Y}\). The encoding \( Lmix _X\) of X defined as \( Lmix _X[i]= \langle Lmin _{X}[i], Lmax _{X}[i] \rangle \) fulfills (1)–(2) of Definition 3435, but not (3). For example, consider the mismatch between \(X=(4,6,1,5)\) and \(Y=(4,6,9,5)\). We have

$$\begin{aligned} Lmix _X&= (\langle 0,0 \rangle ,\ \langle 0,1 \rangle ,\ \langle 1,0 \rangle ,\ \langle 2,1 \rangle ) \,,\\ Lmix _Y&= (\langle 0,0 \rangle ,\ \langle 0,1 \rangle ,\ \langle 0,1 \rangle ,\ \langle 2,1 \rangle ) \,. \end{aligned}$$

The mismatch is witnessed at position 3, but not at the last position, with this encoding. On the other hand, the mismatch of their suffixes \(X'=(1,5) \not \approx Y'=(9,5)\) is witnessed at the last position.

$$\begin{aligned} Lmix _{X'}&= (\langle 0,0 \rangle ,\ \langle 0,1 \rangle ) \,,\\ Lmix _{Y'}&= (\langle 0,0 \rangle ,\ \langle 1,0 \rangle ) \,. \end{aligned}$$

Therefore, the algorithm proposed in [19] does not work with this encoding \( Lmix \). Instead, we have designed an algorithm that uses two positions as a mismatch witness, where we do not compare the encoded characters. This modification exempts us from the encoding costs. Furthermore, the OPM relation \(\approx \) is closed under reversal, i.e., \(X \approx Y\) iff \(\textrm{rev}(X) \approx \textrm{rev}(Y)\), which is not guaranteed in general SCERs. Thanks to this property, our proposed pattern preprocessing (Algorithm 9 more specifically) runs faster than the one for general SCERs in [19].

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jargalsaikhan, D., Hendrian, D., Ueki, Y. et al. Serial and parallel algorithms for order-preserving pattern matching based on the duel-and-sweep paradigm. Acta Informatica 61, 415–444 (2024). https://doi.org/10.1007/s00236-024-00464-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00236-024-00464-w