Skip to main content

Balancing Run-Length Straight-Line Programs

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13617))

Abstract

It was recently proved that any SLP generating a given string w can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We show that this result also holds for RLSLPs, which are SLPs extended with run-length rules of the form \(A \rightarrow B^t\) for \(t>2\), deriving \(\texttt {exp}(A) = \texttt {exp}(B)^t\). An immediate consequence is the simplification of the algorithm for extracting substrings of an RLSLP-compressed string. We also show that several problems like answering RMQs and computing Karp-Rabin fingerprints on substrings can be solved in \(\mathcal {O}(g_{rl})\) space and \(\mathcal {O}(\log n)\) time, \(g_{rl}\) being the size of the smallest RLSLP generating the string, of length n. We extend the result to solving more general operations on string ranges, in \(\mathcal {O}(g_{rl})\) space and \(\mathcal {O}(\log n)\) applications of the operation. In general, the smallest RLSLP can be asymptotically smaller than the smallest SLP by up to an \(\mathcal {O}(\log n)\) factor, so our results can make a difference in terms of the space needed for computing these operations efficiently for some string families.

Funded in part by Basal Funds FB0001, Fondecyt Grant 1-200038, and two Conicyt Doctoral Scholarships, ANID, Chile.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Seen another way, \(\lambda (A) \not = \lambda (B)\) because \(\log _2 \pi (A,W) = \log _2 (t \cdot \pi (B,W)) > 1 + \log _2 \pi (B,W)\).

References

  1. Bille, P., Gørtz, I.L., Cording, P.H., Sach, B., Vildhøj, H.W., Vind, S.: Fingerprints in compressed strings. J. Comput. Syst. Sci. 86, 171–180 (2017). https://doi.org/10.1016/j.jcss.2017.01.002, https://www.sciencedirect.com/science/article/pii/S0022000017300028

  2. Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2015). https://doi.org/10.1137/130936889

    Article  MathSciNet  MATH  Google Scholar 

  3. Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Tech. report, DIGITAL SRC RESEARCH REPORT (1994)

    Google Scholar 

  4. Charikar, M., et al.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  5. Christiansen, A., Ettienne, M., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms 17, 1–39 (2020). https://doi.org/10.1145/3426473

    Article  MathSciNet  MATH  Google Scholar 

  6. Fischer, J., Mäkinen, V., Navarro, G.: Faster entropy-bounded compressed suffix trees. Theoret. Comput. Sci. 410(51), 5354–5364 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011). https://doi.org/10.1137/090779759

    Article  MathSciNet  MATH  Google Scholar 

  8. Gagie, T., Navarro, G., Prezza, N.: Optimal-time text indexing in BWT-runs bounded space. In: Proceedings 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1459–1477 (2018)

    Google Scholar 

  9. Gagie, T., Navarro, G., Prezza, N.: Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM 67(1), 1–54 (2020). https://doi.org/10.1145/3375890

    Article  MathSciNet  MATH  Google Scholar 

  10. Ganardi, M., Jeż, A., Lohrey, M.: Balancing straight-line programs. J. ACM 68(4), 1–40 (2021). https://doi.org/10.1145/3457389

    Article  MathSciNet  Google Scholar 

  11. Jeż, A.: Approximation of grammar-based compression via recompression. Theoret. Comput. Sci. 592, 115–134 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  12. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987). https://doi.org/10.1147/rd.312.0249

    Article  MathSciNet  MATH  Google Scholar 

  13. Kempa, D., Kociumaka, T.: Resolution of the burrows-wheeler transform conjecture. Commun. ACM 65(6), 91–98 (2022). https://doi.org/10.1145/3531445

    Article  Google Scholar 

  14. Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (2018). https://doi.org/10.1145/3188745.3188814

  15. Kini, D., Mathur, U., Viswanathan, M.: Data race detection on compressed traces. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 26–37. ESEC/FSE 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3236024.3236025

  16. Kreft, S., Navarro, G.: Lz77-like compression with fast random access. In: 2010 Data Compression Conference, pp. 239–248 (2010)

    Google Scholar 

  17. Larsson, N., Moffat, A.: Offline dictionary-based compression. In: Proceedings DCC 1999 Data Compression Conference (Cat. No. PR00096), pp. 296–305 (1999)

    Google Scholar 

  18. Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1), 75–81 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  19. Navarro, G.: Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. 54(2), article 29 (2021)

    Google Scholar 

  20. Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artif. Intell. Res. 7(1), 67–82 (1997)

    Article  MATH  Google Scholar 

  21. Nishimoto, T., Inenaga, S., Bannai, H., Takeda, M.: Fully dynamic data structure for LCE queries in compressed space. In: 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), vol. 58, pp. 72:1–72:15 (2016)

    Google Scholar 

  22. Przeworski, M., Hudson, R., Di Rienzo, A.: Adjusting the focus on human variation. Trends Genetics: TIG 16(7), 296–302 (2000)

    Article  Google Scholar 

  23. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302(1), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  24. Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Proceedings 24th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 247–258 (2013)

    Google Scholar 

  25. Zhang, M., Mathur, U., Viswanathan, M.: Checking LTL[F, G, X] on compressed traces in polynomial time. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 131–143. ESEC/FSE 2021, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3468264.3468557

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristian Urbina .

Editor information

Editors and Affiliations

A PSV and NSV Queries

A PSV and NSV Queries

Other relevant queries are previous smaller value (PSV) and next smaller value (NSV) [6, 9], defined as follows:

  • \(\texttt {psv}(i)= \texttt {max}(\{j \,|\, j< i, w[j] < w[i]\}\cup \{0\})\)

  • \(\texttt {nsv}(i) = \texttt {min}(\{j \,|\, j > i, w[j] < w[i]\}\cup \{n+1\})\)

  • \(\texttt {psv}'(i, d)= \texttt {max}(\{j \,|\, j< i, w[j] < d\}\cup \{0\})\)

  • \(\texttt {nsv}'(i, d) = \texttt {min}(\{j \,|\, j > i, w[j] < d\}\cup \{n+1\})\)

Note that the first two queries can be computed by accessing w[i] in \(\mathcal {O}(\log n)\) time, and then calling one of the latter two queries, respectively. We show that the latter queries can be answered in \(\mathcal {O}(g_{rl})\) space and \(\mathcal {O}(\log n)\) time.

Theorem 5

It is possible to construct an index of size \(\mathcal {O}(g_{rl})\) supporting PSV and NSV queries in \(\mathcal {O}(\log n)\) time.

Proof

Let G be a balanced RLSLP of size \(\mathcal {O}(g_{rl})\) constructed as in Theorem 1. Store the values \(L[A] = |\texttt {exp}(A)|\) and \(M[A] = \texttt {min}(\{\texttt {exp}(A)[i]\,|\, i \in [1.. L[A]]\})\), for every variable A, as arrays. These arrays add only \(\mathcal {O}(g_{rl})\) extra space. To compute \(\texttt {psv}'(A, i, d)\), do as follows:

  1. 1.

    If \(i=1\) or \(M[A] \ge d\), return 0.

  2. 2.

    If \(A \rightarrow a\), return 1.

  3. 3.

    If \(A \rightarrow BC\), then:

    1. (a)

      If \(i \le L[B]+1\), return \(\texttt {psv}'(B, i, d)\).

    2. (b)

      If \(L[B]+1 < i\), let \(k = \texttt {psv}'(C, i - L[B], d)\). If \(k > 0\), return \(L[B] + k\), otherwise, return \(\texttt {psv}'(B, i, d)\).

  4. 4.

    If \(A \rightarrow B^t\) for \(t > 2\), then:

    1. (a)

      If \(i \le L[B]+1\), return \(\texttt {psv}'(B, i, d)\).

    2. (b)

      If \(i \in [t'L[B] +1..(t'+1)L[B]]\), let \(k = \texttt {psv}'(B, i - t'L[B], d)\). If \(k > 0\), return \(t'L[B] + k\). Otherwise, return \((t'-1)L[B] + \texttt {psv}'(B,i,d)\).

    3. (c)

      If \(L[A]<i\), return \((t-1)L[B]+\texttt {psv}'(B,i,d)\).

The guard in point 1 guarantees that, in the simple case where i is beyond \(|\texttt {exp}(A)|\), at most one recursive call needs more than \(\mathcal {O}(1)\) time. In general, we can make two calls in case 3(b), but then the second call (inside B) is of the simple type from there on. The case of run-length rules is similar. Thus, we obtain \(\mathcal {O}(\log n)\) time. The query \(\texttt {nsv}'\) is handled similarly.    \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Navarro, G., Olivares, F., Urbina, C. (2022). Balancing Run-Length Straight-Line Programs. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20643-6_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20642-9

  • Online ISBN: 978-3-031-20643-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics