Balancing Run-Length Straight-Line Programs

Navarro, Gonzalo; Olivares, Francisco; Urbina, Cristian

doi:10.1007/978-3-031-20643-6_9

Balancing Run-Length Straight-Line Programs

Gonzalo Navarro⁹,
Francisco Olivares⁹ &
Cristian Urbina⁹

Conference paper
First Online: 01 November 2022

399 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13617))

Abstract

It was recently proved that any SLP generating a given string w can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We show that this result also holds for RLSLPs, which are SLPs extended with run-length rules of the form \(A \rightarrow B^t\) for \(t>2\), deriving \(\texttt {exp}(A) = \texttt {exp}(B)^t\). An immediate consequence is the simplification of the algorithm for extracting substrings of an RLSLP-compressed string. We also show that several problems like answering RMQs and computing Karp-Rabin fingerprints on substrings can be solved in \(\mathcal {O}(g_{rl})\) space and \(\mathcal {O}(\log n)\) time, \(g_{rl}\) being the size of the smallest RLSLP generating the string, of length n. We extend the result to solving more general operations on string ranges, in \(\mathcal {O}(g_{rl})\) space and \(\mathcal {O}(\log n)\) applications of the operation. In general, the smallest RLSLP can be asymptotically smaller than the smallest SLP by up to an \(\mathcal {O}(\log n)\) factor, so our results can make a difference in terms of the space needed for computing these operations efficiently for some string families.

Funded in part by Basal Funds FB0001, Fondecyt Grant 1-200038, and two Conicyt Doctoral Scholarships, ANID, Chile.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Seen another way, \(\lambda (A) \not = \lambda (B)\) because \(\log _2 \pi (A,W) = \log _2 (t \cdot \pi (B,W)) > 1 + \log _2 \pi (B,W)\).

References

Bille, P., Gørtz, I.L., Cording, P.H., Sach, B., Vildhøj, H.W., Vind, S.: Fingerprints in compressed strings. J. Comput. Syst. Sci. 86, 171–180 (2017). https://doi.org/10.1016/j.jcss.2017.01.002, https://www.sciencedirect.com/science/article/pii/S0022000017300028
Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2015). https://doi.org/10.1137/130936889
Article MathSciNet MATH Google Scholar
Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Tech. report, DIGITAL SRC RESEARCH REPORT (1994)
Google Scholar
Charikar, M., et al.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Article MathSciNet MATH Google Scholar
Christiansen, A., Ettienne, M., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms 17, 1–39 (2020). https://doi.org/10.1145/3426473
Article MathSciNet MATH Google Scholar
Fischer, J., Mäkinen, V., Navarro, G.: Faster entropy-bounded compressed suffix trees. Theoret. Comput. Sci. 410(51), 5354–5364 (2009)
Article MathSciNet MATH Google Scholar
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011). https://doi.org/10.1137/090779759
Article MathSciNet MATH Google Scholar
Gagie, T., Navarro, G., Prezza, N.: Optimal-time text indexing in BWT-runs bounded space. In: Proceedings 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1459–1477 (2018)
Google Scholar
Gagie, T., Navarro, G., Prezza, N.: Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM 67(1), 1–54 (2020). https://doi.org/10.1145/3375890
Article MathSciNet MATH Google Scholar
Ganardi, M., Jeż, A., Lohrey, M.: Balancing straight-line programs. J. ACM 68(4), 1–40 (2021). https://doi.org/10.1145/3457389
Article MathSciNet Google Scholar
Jeż, A.: Approximation of grammar-based compression via recompression. Theoret. Comput. Sci. 592, 115–134 (2015)
Article MathSciNet MATH Google Scholar
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987). https://doi.org/10.1147/rd.312.0249
Article MathSciNet MATH Google Scholar
Kempa, D., Kociumaka, T.: Resolution of the burrows-wheeler transform conjecture. Commun. ACM 65(6), 91–98 (2022). https://doi.org/10.1145/3531445
Article Google Scholar
Kempa, D., Prezza, N.: At the roots of dictionary compression: string attractors. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (2018). https://doi.org/10.1145/3188745.3188814
Kini, D., Mathur, U., Viswanathan, M.: Data race detection on compressed traces. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 26–37. ESEC/FSE 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3236024.3236025
Kreft, S., Navarro, G.: Lz77-like compression with fast random access. In: 2010 Data Compression Conference, pp. 239–248 (2010)
Google Scholar
Larsson, N., Moffat, A.: Offline dictionary-based compression. In: Proceedings DCC 1999 Data Compression Conference (Cat. No. PR00096), pp. 296–305 (1999)
Google Scholar
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22(1), 75–81 (1976)
Article MathSciNet MATH Google Scholar
Navarro, G.: Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv. 54(2), article 29 (2021)
Google Scholar
Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artif. Intell. Res. 7(1), 67–82 (1997)
Article MATH Google Scholar
Nishimoto, T., Inenaga, S., Bannai, H., Takeda, M.: Fully dynamic data structure for LCE queries in compressed space. In: 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016). Leibniz International Proceedings in Informatics (LIPIcs), vol. 58, pp. 72:1–72:15 (2016)
Google Scholar
Przeworski, M., Hudson, R., Di Rienzo, A.: Adjusting the focus on human variation. Trends Genetics: TIG 16(7), 296–302 (2000)
Article Google Scholar
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302(1), 211–222 (2003)
Article MathSciNet MATH Google Scholar
Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Proceedings 24th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 247–258 (2013)
Google Scholar
Zhang, M., Mathur, U., Viswanathan, M.: Checking LTL[F, G, X] on compressed traces in polynomial time. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 131–143. ESEC/FSE 2021, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3468264.3468557

Download references

Author information

Authors and Affiliations

CeBiB — Center for Biotechnology and Bioengineering, Department of Computer Science, University of Chile, Santiago, Chile
Gonzalo Navarro, Francisco Olivares & Cristian Urbina

Authors

Gonzalo Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Olivares
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Urbina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristian Urbina .

Editor information

Editors and Affiliations

Universidad Técnica Federico Santa María, Valparaíso, Chile
Diego Arroyuelo
Universidad de Chile, Santiago, Chile
Barbara Poblete

A PSV and NSV Queries

Other relevant queries are previous smaller value (PSV) and next smaller value (NSV) [6, 9], defined as follows:

\(\texttt {psv}(i)= \texttt {max}(\{j \,|\, j< i, w[j] < w[i]\}\cup \{0\})\)
\(\texttt {nsv}(i) = \texttt {min}(\{j \,|\, j > i, w[j] < w[i]\}\cup \{n+1\})\)
\(\texttt {psv}'(i, d)= \texttt {max}(\{j \,|\, j< i, w[j] < d\}\cup \{0\})\)
\(\texttt {nsv}'(i, d) = \texttt {min}(\{j \,|\, j > i, w[j] < d\}\cup \{n+1\})\)

Note that the first two queries can be computed by accessing w[i] in \(\mathcal {O}(\log n)\) time, and then calling one of the latter two queries, respectively. We show that the latter queries can be answered in \(\mathcal {O}(g_{rl})\) space and \(\mathcal {O}(\log n)\) time.

Theorem 5

It is possible to construct an index of size \(\mathcal {O}(g_{rl})\) supporting PSV and NSV queries in \(\mathcal {O}(\log n)\) time.

Proof

Let G be a balanced RLSLP of size \(\mathcal {O}(g_{rl})\) constructed as in Theorem 1. Store the values \(L[A] = |\texttt {exp}(A)|\) and \(M[A] = \texttt {min}(\{\texttt {exp}(A)[i]\,|\, i \in [1.. L[A]]\})\), for every variable A, as arrays. These arrays add only \(\mathcal {O}(g_{rl})\) extra space. To compute \(\texttt {psv}'(A, i, d)\), do as follows:

1.
If \(i=1\) or \(M[A] \ge d\), return 0.
2.
If \(A \rightarrow a\), return 1.
3.
If \(A \rightarrow BC\), then:
1. (a)
  If \(i \le L[B]+1\), return \(\texttt {psv}'(B, i, d)\).
2. (b)
  If \(L[B]+1 < i\), let \(k = \texttt {psv}'(C, i - L[B], d)\). If \(k > 0\), return \(L[B] + k\), otherwise, return \(\texttt {psv}'(B, i, d)\).
4.
If \(A \rightarrow B^t\) for \(t > 2\), then:
1. (a)
  If \(i \le L[B]+1\), return \(\texttt {psv}'(B, i, d)\).
2. (b)
  If \(i \in [t'L[B] +1..(t'+1)L[B]]\), let \(k = \texttt {psv}'(B, i - t'L[B], d)\). If \(k > 0\), return \(t'L[B] + k\). Otherwise, return \((t'-1)L[B] + \texttt {psv}'(B,i,d)\).
3. (c)
  If \(L[A]<i\), return \((t-1)L[B]+\texttt {psv}'(B,i,d)\).

The guard in point 1 guarantees that, in the simple case where i is beyond \(|\texttt {exp}(A)|\), at most one recursive call needs more than \(\mathcal {O}(1)\) time. In general, we can make two calls in case 3(b), but then the second call (inside B) is of the simple type from there on. The case of run-length rules is similar. Thus, we obtain \(\mathcal {O}(\log n)\) time. The query \(\texttt {nsv}'\) is handled similarly. \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Navarro, G., Olivares, F., Urbina, C. (2022). Balancing Run-Length Straight-Line Programs. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-20643-6_9
Published: 01 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20642-9
Online ISBN: 978-3-031-20643-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A PSV and NSV Queries

A PSV and NSV Queries

Theorem 5

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation