A Succinct Grammar Compression

Tabei, Yasuo; Takabatake, Yoshimasa; Sakamoto, Hiroshi

doi:10.1007/978-3-642-38905-4_23

Yasuo Tabei¹⁸,
Yoshimasa Takabatake¹⁹ &
Hiroshi Sakamoto^19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7922))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

1138 Accesses
9 Citations

Abstract

We solve an open problem related to an optimal encoding of a straight line program (SLP), a canonical form of grammar compression deriving a single string deterministically. We show that an information-theoretic lower bound for representing an SLP with n symbols requires at least 2n + logn! + o(n) bits. We then present a succinct representation of an SLP; this representation is asymptotically equivalent to the lower bound. The space is at most 2nlogρ(1 + o(1)) bits for \(\rho \leq 2\sqrt{n}\), while supporting random access to any production rule of an SLP in O(loglogn) time. In addition, we present a novel dynamic data structure associating a digram with a unique symbol. Such a data structure is called a naming function and has been implemented using a hash table that has a space-time tradeoff. Thus, the memory space is mainly occupied by the hash table during the development of production rules. Alternatively, we build a dynamic data structure for the naming function by leveraging the idea behind the wavelet tree. The space is strictly bounded by 2nlogn(1 + o(1)) bits, while supporting O(logn) query and update time.

This study was supported by KAKENHI(23680016,20589824) and JST PRESTO program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apostolico, A., Lonardi, S.: Off-line Compression by Greedy Textual Substitution. Proceedings of the IEEE 88, 1733–1744 (2000)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: SDM, pp. 158–174 (2002)
Google Scholar
Barbay, J., Gagie, T., Navarro, G., Nekrich, Y.: Alphabet Partitioning for Compressed Rank/Select and Applications. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part II. LNCS, vol. 6507, pp. 315–326. Springer, Heidelberg (2010)
Chapter Google Scholar
Barbay, J., Navarro, G.: Compressed Representations of Permutations, and Applications. In: STACS, pp. 111–122 (2009)
Google Scholar
Botelho, F.C., Pagh, R., Ziviani, N.: Simple and Space-Efficient Minimal Perfect Hash Functions. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 139–150. Springer, Heidelberg (2007)
Chapter Google Scholar
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inform. Theory 51, 2554–2576 (2005)
Article MathSciNet Google Scholar
Claude, F., Navarro, G.: Self-Indexed Grammar-Based Compression. Fundam. Inform. 111, 313–337 (2011)
MathSciNet MATH Google Scholar
Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theor. Comput. Sci. 372, 115–121 (2007)
Article MathSciNet MATH Google Scholar
Fomin, F.V., Kratsch, D., Novelli, J.-C.: Approximating minimum cocolorings. Inf. Process. Lett. 84, 285–290 (2002)
Article MathSciNet MATH Google Scholar
Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: SODA, pp. 368–373 (2006)
Google Scholar
González, R., Navarro, G.: Statistical Encoding of Succinct Data Structures. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 294–305. Springer, Heidelberg (2006)
Chapter Google Scholar
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 841–850 (2003)
Google Scholar
Jacobson, G.: Space-efficient Static Trees and Graphs. In: FOCS, pp. 549–554 (1989)
Google Scholar
Jansson, J., Sadakane, K., Sung, W.-K.: Ultra-succinct representation of ordered trees with applications. J. Comput. Syst. Sci. 78, 619–631 (2012)
Article MathSciNet MATH Google Scholar
Jansson, J., Sadakane, K., Sung, W.-K.: CRAM: Compressed Random Access Memory. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part I. LNCS, vol. 7391, pp. 510–521. Springer, Heidelberg (2012)
Chapter Google Scholar
Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid Identification of Repeated Patterns in Strings, Trees and Arrays. In: STOC, pp. 125–136 (1972)
Google Scholar
Karp, R.M., Rabin, M.O.: Efficient Randomized Pattern-Matching Algorithms. IBM Journal of Research and Development 31, 249–260 (1987)
Article MathSciNet MATH Google Scholar
Karpinski, M., Rytter, W., Shinohara, A.: An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions. Nordic J. Comp. 4, 172–186 (1997)
MathSciNet MATH Google Scholar
Larsson, N.J., Moffat, A.: Offline Dictionary-Based Compression. In: DCC, pp. 296–305 (1999)
Google Scholar
Lehman, E.: Approximation Algorithms for Grammar-Based Compression. PhD thesis, MIT (2002)
Google Scholar
Lehman, E., Shelat, A.: Approximation algorithms for grammar-based compression. In: SODA, pp. 205–212 (2002)
Google Scholar
Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 398–409. Springer, Heidelberg (2011)
Chapter Google Scholar
Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. J. Discrete Algorithms 18, 100–112 (2013)
Article MathSciNet Google Scholar
Maruyama, S., Sakamoto, H., Takeda, M.: An Online Algorithm for Lightweight Grammar-Based Compression. Algorithms 5, 213–235 (2012)
Article MathSciNet Google Scholar
Navarro, G.: Wavelet Trees for All. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 2–26. Springer, Heidelberg (2012)
Chapter Google Scholar
Navarro, G., Providel, E.: Fast, small, simple rank/Select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012)
Chapter Google Scholar
Okanohara, D., Sadakane, K.: Practical Entropy-Compressed Rank/Select Dictionary. In: ALENEX (2007)
Google Scholar
Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 233–242 (2002)
Google Scholar
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302, 211–222 (2003)
Article MathSciNet MATH Google Scholar
Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: SODA, pp. 1230–1239 (2006)
Google Scholar
Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algorithms 3, 416–430 (2005)
Article MathSciNet MATH Google Scholar
Takabatake, Y., Tabei, Y., Sakamoto, H.: Variable-Length Codes for Space-Efficient Grammar-Based Compression. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 398–410. Springer, Heidelberg (2012)
Chapter Google Scholar
Yehuda, R.B., Fogel, S.: Partitioning a Sequence into Few Monotone Subsequences. Acta. Inf. 35, 421–440 (1998)
Article MATH Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD, pp. 71–80 (2002)
Google Scholar
Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Trans. Inform. Theory 23, 337–343 (1977)
Article MathSciNet MATH Google Scholar
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 24, 530–536 (1978)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

JST, ERATO Minato Project, Japan
Yasuo Tabei
Kyushu Institute of Technology, Japan
Yoshimasa Takabatake & Hiroshi Sakamoto
JST, PRESTO, Japan
Hiroshi Sakamoto

Authors

Yasuo Tabei
View author publications
You can also search for this author in PubMed Google Scholar
Yoshimasa Takabatake
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Sakamoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fakultät für Informatik, Institut für Theoretische Informatik, Karlsruhe Institut für Technology, Am Fasanengarten 5, 76131, Karlsruhe, Germany
Johannes Fischer
Karlsruhe Institute of Technology, Germany
Peter Sanders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tabei, Y., Takabatake, Y., Sakamoto, H. (2013). A Succinct Grammar Compression. In: Fischer, J., Sanders, P. (eds) Combinatorial Pattern Matching. CPM 2013. Lecture Notes in Computer Science, vol 7922. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38905-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-38905-4_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38904-7
Online ISBN: 978-3-642-38905-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics