Skip to main content

A Succinct Grammar Compression

  • Conference paper
Book cover Combinatorial Pattern Matching (CPM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7922))

Included in the following conference series:

Abstract

We solve an open problem related to an optimal encoding of a straight line program (SLP), a canonical form of grammar compression deriving a single string deterministically. We show that an information-theoretic lower bound for representing an SLP with n symbols requires at least 2n + logn! + o(n) bits. We then present a succinct representation of an SLP; this representation is asymptotically equivalent to the lower bound. The space is at most 2nlogρ(1 + o(1)) bits for \(\rho \leq 2\sqrt{n}\), while supporting random access to any production rule of an SLP in O(loglogn) time. In addition, we present a novel dynamic data structure associating a digram with a unique symbol. Such a data structure is called a naming function and has been implemented using a hash table that has a space-time tradeoff. Thus, the memory space is mainly occupied by the hash table during the development of production rules. Alternatively, we build a dynamic data structure for the naming function by leveraging the idea behind the wavelet tree. The space is strictly bounded by 2nlogn(1 + o(1)) bits, while supporting O(logn) query and update time.

This study was supported by KAKENHI(23680016,20589824) and JST PRESTO program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apostolico, A., Lonardi, S.: Off-line Compression by Greedy Textual Substitution. Proceedings of the IEEE 88, 1733–1744 (2000)

    Google Scholar 

  2. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: SDM, pp. 158–174 (2002)

    Google Scholar 

  3. Barbay, J., Gagie, T., Navarro, G., Nekrich, Y.: Alphabet Partitioning for Compressed Rank/Select and Applications. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part II. LNCS, vol. 6507, pp. 315–326. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Barbay, J., Navarro, G.: Compressed Representations of Permutations, and Applications. In: STACS, pp. 111–122 (2009)

    Google Scholar 

  5. Botelho, F.C., Pagh, R., Ziviani, N.: Simple and Space-Efficient Minimal Perfect Hash Functions. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 139–150. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inform. Theory 51, 2554–2576 (2005)

    Article  MathSciNet  Google Scholar 

  7. Claude, F., Navarro, G.: Self-Indexed Grammar-Based Compression. Fundam. Inform. 111, 313–337 (2011)

    MathSciNet  MATH  Google Scholar 

  8. Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theor. Comput. Sci. 372, 115–121 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fomin, F.V., Kratsch, D., Novelli, J.-C.: Approximating minimum cocolorings. Inf. Process. Lett. 84, 285–290 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  10. Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: SODA, pp. 368–373 (2006)

    Google Scholar 

  11. González, R., Navarro, G.: Statistical Encoding of Succinct Data Structures. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 294–305. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 841–850 (2003)

    Google Scholar 

  13. Jacobson, G.: Space-efficient Static Trees and Graphs. In: FOCS, pp. 549–554 (1989)

    Google Scholar 

  14. Jansson, J., Sadakane, K., Sung, W.-K.: Ultra-succinct representation of ordered trees with applications. J. Comput. Syst. Sci. 78, 619–631 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  15. Jansson, J., Sadakane, K., Sung, W.-K.: CRAM: Compressed Random Access Memory. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part I. LNCS, vol. 7391, pp. 510–521. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  16. Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid Identification of Repeated Patterns in Strings, Trees and Arrays. In: STOC, pp. 125–136 (1972)

    Google Scholar 

  17. Karp, R.M., Rabin, M.O.: Efficient Randomized Pattern-Matching Algorithms. IBM Journal of Research and Development 31, 249–260 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  18. Karpinski, M., Rytter, W., Shinohara, A.: An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions. Nordic J. Comp. 4, 172–186 (1997)

    MathSciNet  MATH  Google Scholar 

  19. Larsson, N.J., Moffat, A.: Offline Dictionary-Based Compression. In: DCC, pp. 296–305 (1999)

    Google Scholar 

  20. Lehman, E.: Approximation Algorithms for Grammar-Based Compression. PhD thesis, MIT (2002)

    Google Scholar 

  21. Lehman, E., Shelat, A.: Approximation algorithms for grammar-based compression. In: SODA, pp. 205–212 (2002)

    Google Scholar 

  22. Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 398–409. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  23. Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. J. Discrete Algorithms 18, 100–112 (2013)

    Article  MathSciNet  Google Scholar 

  24. Maruyama, S., Sakamoto, H., Takeda, M.: An Online Algorithm for Lightweight Grammar-Based Compression. Algorithms 5, 213–235 (2012)

    Article  MathSciNet  Google Scholar 

  25. Navarro, G.: Wavelet Trees for All. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 2–26. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  26. Navarro, G., Providel, E.: Fast, small, simple rank/Select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  27. Okanohara, D., Sadakane, K.: Practical Entropy-Compressed Rank/Select Dictionary. In: ALENEX (2007)

    Google Scholar 

  28. Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 233–242 (2002)

    Google Scholar 

  29. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302, 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  30. Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: SODA, pp. 1230–1239 (2006)

    Google Scholar 

  31. Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algorithms 3, 416–430 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  32. Takabatake, Y., Tabei, Y., Sakamoto, H.: Variable-Length Codes for Space-Efficient Grammar-Based Compression. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 398–410. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  33. Yehuda, R.B., Fogel, S.: Partitioning a Sequence into Few Monotone Subsequences. Acta. Inf. 35, 421–440 (1998)

    Article  MATH  Google Scholar 

  34. Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD, pp. 71–80 (2002)

    Google Scholar 

  35. Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Trans. Inform. Theory 23, 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  36. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 24, 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tabei, Y., Takabatake, Y., Sakamoto, H. (2013). A Succinct Grammar Compression. In: Fischer, J., Sanders, P. (eds) Combinatorial Pattern Matching. CPM 2013. Lecture Notes in Computer Science, vol 7922. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38905-4_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38905-4_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38904-7

  • Online ISBN: 978-3-642-38905-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics