skip to main content
article
Free Access

Automatic inference of models for statistical code compression

Published:01 May 1999Publication History
Skip Abstract Section

Abstract

This paper describes experiments that apply machine learning to compress computer programs, formalizing and automating decisions about instruction encoding that have traditionally been made by humans in a more ad hoc manner. A program accepts a large training set of program material in a conventional compiler intermediate representation (IR) and automatically infers a decision tree that separates IR code into streams that compress much better than the undifferentiated whole. Driving a conventional arithmetic compressor with this model yields code 30% smaller than the previous record for IR code compression, and 24% smaller than an ambitious optimizing compiler feeding an ambitious general-purpose data compressor.

References

  1. 1 Timothy C. Bell, John G. Cleary, and Ian H. Witten. Text Compression. Prentice Hall, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Digital SRC research report 124, 5/10/94.Google ScholarGoogle Scholar
  3. 3 D. Chickering, D. Heckerman, and C. Meek. A Bayesian approach to learning Bayesian networks with local structure. Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufman, 8/97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 Jens Ernst, William Evans, Christopher W. Fraser, Steven Lucco, and Todd A. Proebsting. Code compression. PLDI'97:358-365, 6/97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 M. Franz and T. Kistler. Slim binaries. TR 96-24, Dept of Information and Computer Science, University of California, irvine, 6/96.Google ScholarGoogle Scholar
  6. 6 M. Franz. Adaptive compression of syntax trees and iterative dynamic code optimization: Two basic technologies for mobile-object systems. TR 97-04, Dept of Information and Computer Science, University of California, Irvine, 2/97.Google ScholarGoogle Scholar
  7. 7 Christopher W. Fraser and David R. Hanson. A Retargetable C Compiler: Design and Implementation. Addison Wesley Longman, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 Christopher W. Fraser and Todd A. Proebsting. Custom Instruction Sets For Code Compression. Unpublished manuscript, http://research'micr~s~ft'c~rn/~t~ddpr~/papers/ pldi2.ps, 10/95.Google ScholarGoogle Scholar
  9. 9 Free Software Foundation. GCC - The GNU C Compiler. http://www.gnu.org/software/gcc, 8/13/98.Google ScholarGoogle Scholar
  10. 10 Jean-Loup Gailly and Mark Adler. The gzip home page. http://w3.gzip.org. 1/21/99.Google ScholarGoogle Scholar
  11. 11 Pat Langley. Elements of Machine Learning. Morgan Kaufmann, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 R. Nigel Horspool and Jason Corless. Tailored compression of Java class files. Software ~ Practice and Experience 28(12):1253-1268, 10198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 A. Lempel and J. Ziv. On the complexity of finite sequences. IEEE Transactions on Information Theory 22(1):75-81, 1/76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 Todd A. Proebsting. Optimizing an ANSI C interpreter with superoperators. POPL'95: 322-332, 1/95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 Julian Seward. The bzip2 and libbzip2 home page. http://www.muraroa.demon.co.uk, 2/11/99.Google ScholarGoogle Scholar
  16. 16 C. E. Shannon. Prediction and entropy of printed English. Bell System Technical Journal 30:50-64, 1/51.Google ScholarGoogle ScholarCross RefCross Ref
  17. 17 Richard E. Sweet. Empirical analysis of the Mesa instruction set. ASPLOS'82:158-166. 3/82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18 Rik van de Wiel. Code compaction bibliography. http://www, win.tue.nl/csdpa/rikvdw/bibl.html, 2/3/99.Google ScholarGoogle Scholar
  19. 19 Tong Lai Yu. Data compression for PC software distribution. Software-Practice & Experience 26(11 ): 1181 - 1195, 11/96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5):530-536, 9/78. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic inference of models for statistical code compression

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader