Abstract
This paper describes experiments that apply machine learning to compress computer programs, formalizing and automating decisions about instruction encoding that have traditionally been made by humans in a more ad hoc manner. A program accepts a large training set of program material in a conventional compiler intermediate representation (IR) and automatically infers a decision tree that separates IR code into streams that compress much better than the undifferentiated whole. Driving a conventional arithmetic compressor with this model yields code 30% smaller than the previous record for IR code compression, and 24% smaller than an ambitious optimizing compiler feeding an ambitious general-purpose data compressor.
- 1 Timothy C. Bell, John G. Cleary, and Ian H. Witten. Text Compression. Prentice Hall, 1990. Google ScholarDigital Library
- 2 M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Digital SRC research report 124, 5/10/94.Google Scholar
- 3 D. Chickering, D. Heckerman, and C. Meek. A Bayesian approach to learning Bayesian networks with local structure. Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufman, 8/97. Google ScholarDigital Library
- 4 Jens Ernst, William Evans, Christopher W. Fraser, Steven Lucco, and Todd A. Proebsting. Code compression. PLDI'97:358-365, 6/97. Google ScholarDigital Library
- 5 M. Franz and T. Kistler. Slim binaries. TR 96-24, Dept of Information and Computer Science, University of California, irvine, 6/96.Google Scholar
- 6 M. Franz. Adaptive compression of syntax trees and iterative dynamic code optimization: Two basic technologies for mobile-object systems. TR 97-04, Dept of Information and Computer Science, University of California, Irvine, 2/97.Google Scholar
- 7 Christopher W. Fraser and David R. Hanson. A Retargetable C Compiler: Design and Implementation. Addison Wesley Longman, 1995. Google ScholarDigital Library
- 8 Christopher W. Fraser and Todd A. Proebsting. Custom Instruction Sets For Code Compression. Unpublished manuscript, http://research'micr~s~ft'c~rn/~t~ddpr~/papers/ pldi2.ps, 10/95.Google Scholar
- 9 Free Software Foundation. GCC - The GNU C Compiler. http://www.gnu.org/software/gcc, 8/13/98.Google Scholar
- 10 Jean-Loup Gailly and Mark Adler. The gzip home page. http://w3.gzip.org. 1/21/99.Google Scholar
- 11 Pat Langley. Elements of Machine Learning. Morgan Kaufmann, 1996. Google ScholarDigital Library
- 12 R. Nigel Horspool and Jason Corless. Tailored compression of Java class files. Software ~ Practice and Experience 28(12):1253-1268, 10198. Google ScholarDigital Library
- 13 A. Lempel and J. Ziv. On the complexity of finite sequences. IEEE Transactions on Information Theory 22(1):75-81, 1/76. Google ScholarDigital Library
- 14 Todd A. Proebsting. Optimizing an ANSI C interpreter with superoperators. POPL'95: 322-332, 1/95. Google ScholarDigital Library
- 15 Julian Seward. The bzip2 and libbzip2 home page. http://www.muraroa.demon.co.uk, 2/11/99.Google Scholar
- 16 C. E. Shannon. Prediction and entropy of printed English. Bell System Technical Journal 30:50-64, 1/51.Google ScholarCross Ref
- 17 Richard E. Sweet. Empirical analysis of the Mesa instruction set. ASPLOS'82:158-166. 3/82. Google ScholarDigital Library
- 18 Rik van de Wiel. Code compaction bibliography. http://www, win.tue.nl/csdpa/rikvdw/bibl.html, 2/3/99.Google Scholar
- 19 Tong Lai Yu. Data compression for PC software distribution. Software-Practice & Experience 26(11 ): 1181 - 1195, 11/96. Google ScholarDigital Library
- 20 J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5):530-536, 9/78. Google ScholarDigital Library
Index Terms
- Automatic inference of models for statistical code compression
Recommendations
Automatic inference of models for statistical code compression
PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementationThis paper describes experiments that apply machine learning to compress computer programs, formalizing and automating decisions about instruction encoding that have traditionally been made by humans in a more ad hoc manner. A program accepts a large ...
Code compression for VLIW embedded systems using a self-generating table
We propose a new class of methods for VLIW code compression using variable-sized branch blocks with self-generating tables. Code compression traditionally works on fixed-sized blocks with its efficiency limited by their small size. A branch block, a ...
Profile-guided code compression
PLDI '02: Proceedings of the ACM SIGPLAN 2002 conference on Programming language design and implementationAs computers are increasingly used in contexts where the amount of available memory is limited, it becomes important to devise techniques that reduce the memory footprint of application programs while leaving them in an executable form. This paper ...
Comments