Abstract
Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler's ability to improve memory behavior is its imperfect knowledge about the run-time behavior of the program. The compiler cannot completely predict runtime access patterns.There is an exception to this rule. During the register allocation phase, the compiler often must insert substantial amounts of spill code; that is, instructions that move values from registers to memory and back again. Because the compiler itself inserts these memory instructions, it has more knowledge about them than other memory operations in the program.Spill-code operations are disjoint from the memory manipulations required by the semantics of the program being compiled, and, indeed, the two can interfere in the cache. This paper proposes a hardware solution to the problem of increased spill costs---a small compiler-controlled memory (CCM) to hold spilled values. This small random-access memory can (and should) be placed in a distinct address space from the main memory hierarchy. The compiler can target spill instructions to use the CCM, moving most compiler-inserted memory traffic out of the pathway to main memory and eliminating any impact that those spill instructions would have on the state of the main memory hierarchy. Such memories already exist on some DSP microprocessors. Our techniques can be applied directly on those chips.This paper presents two compiler-based methods to exploit such a memory, along with experimental results showing that speedups from using CCM may be sizable. It shows that using the register allocation's coloring paradigm to assign spilled values to memory can greatly reduce the amount of memory required by a program.
- 1 Anonymous. Performance of pentium pro and pentium ii processor/cache combinations. Technical report, ECG Technology Communications Group, Compaq Computer Corporation, May 1997.Google Scholar
- 2 Bary R. Beck, David W.L. Yen, and Thomas L. Anderson. The cydra 5 minisupercomputer: Architecture and implementation. The Journal of Supercomputing, 7, 1993. Google ScholarDigital Library
- 3 Peter Bergner, Peter DaM, David Engebretsen, and Matthew O'Keefe. Spill code minimization via interference region spilling. SiGPLAN Notices, 32(6):287-295, June 1997. Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation. Google ScholarDigital Library
- 4 Preston Briggs. Register Allocation via Graph Coloring. PhD thesis, Rice University, April 1992. Google ScholarDigital Library
- 5 Preston Briggs. The massively scalar compiler project. Technical report, Rice University, July 1994. Preliminary version available via anonymous ftp.Google Scholar
- 6 David Callahan, Alan Carle, Mary W. Hall, and Ken Kennedy. Constructing the procedure call multigraph. IEEE Transactions on Software Engineering, 16(4), April 1990. Google ScholarDigital Library
- 7 David Callahan, Ken Kennedy, and Allan Porterfield. Software prcfetching. In Proceedings of tile Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, 1991. Google ScholarDigital Library
- 8 Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, 1994. Google ScholarDigital Library
- 9 Fred Chow, Sun Chan, Robert Kennedy, Shin-Ming Liu, Raymond Lo, and Peng Tu. A new algorithm for partial redundancy elimination based on ssa form. SIGPLAN Notices, 32(6):273-286, June 1997. Proceedings of the A CM SIGPLAN '97 Conference on Programming Language Design and Implementation. Google ScholarDigital Library
- 10 Keith Cooper, Ken Kennedy, and Nathaniel Mclntosh. Cross-loop reuse analysis and its application to cache optimization. In Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing, San Jose, California, 1996. Google ScholarDigital Library
- 11 George E. Forsythe, Michael A. Malcolm, and Cleve B. Moler. Computer Methods for Mathematical Computations. Prentice-Hall, Englewood Cliffs, New Jersey, 1977. Google ScholarDigital Library
- 12 Lal George and Andrew W. Appel. Iterated register coalescing. A CM Transactions on Programming Languages and Systems, 18(3):300-324, May 1996. Google ScholarDigital Library
- 13 John Hennessy and David Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, Inc., second edition, 1990. Google ScholarDigital Library
- 14 Cristina Hristea, Daniel Lenoski, and John Keen. Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks. In ACM, editor, SC'97: High Performance Networking and Computing: Proceedings of the 1997 A CM/IEEE S C97 Conference: November 15- ~I, 1997, San Jose, California, USA., pages ??-??, New York, NY 10036, USA and 1109 Spring Street, Suite 300, Silver Spring, MD 20910, USA, 1997. ACM Press and IEEE Computer Society Press. Google ScholarDigital Library
- 15 Intel Corporation. PentiumTM {I Processor Developer's Manual, 1997.Google Scholar
- 16 John Lu and Keith Cooper. Register promotion in c programs. SIGPLAN Notices, 32(6):308-319, June 1997. Proceedirzgs of the A CM SIGPLAN '97 Conference on Programming Language Design and Implementation. Google ScholarDigital Library
- 17 Sally A. McKee. Compiling for efficient memory utilization. In Workshop on Interaction Between Compilers and Computer Architectures, Second IEEE Symposium on High Performance Computer Architecture (HPCA-~), San Jose, California, January 1996.Google Scholar
- 18 Kathryn S. McKinley. Personal communication. Email message, July 1998.Google Scholar
- 19 Kathryn S. McKinley and Olivier Temam. A quantitative analysis of loop nest locality, in Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, 1996. Google ScholarDigital Library
- 20 Larry Meadows, Steven Nakamoto, and Vincent Schuster. A vectorizing, software pipelining compiler for LIW and superscalar architecture. In Proceedings of RISC '9~, San Jose, CA, February 1992.Google Scholar
- 21 Todd C. Mowry, Monica S. Lain, and Anoop Gut)ta. Design and evaluation of a compiler algorithln for prefetching. In Proceedings of the Fifth InternationM Conference on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, 1992. Google ScholarDigital Library
- 22 Vijay S. Pal, Parthasarathy Ranganathan, Sarita V. Adve, and Tracy Harton. An evaluation of memory consistency models for shared-memory systems with ilp processors. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, 1996. Google ScholarDigital Library
- 23 Barbara G. Ryder. Constructing the call graph of a program. IEEE Transactions on Software Engineering, 5(3):217-226, May 1979.Google Scholar
- 24 SPEC release 1.2, September 1989. Standards Performance Evaluation Corporation.Google Scholar
- 25 SPEC release 1.10, September 1995. Standards Performance Evaluation Corporation.Google Scholar
- 26 Michael Upton, Thomas Huff, Trevor Mudge, and Richard Brown. Resource allocation in a high clock rate microprocessor. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, 1994. Google ScholarDigital Library
- 27 Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. SIGPLAN Notices, 26(6):30-44, June 1991. Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation. Google ScholarDigital Library
- 28 Michael Wolfe. More iteration space tiling. In Proceedings of Supercomputing '89, pages 655-664, Rcno, Nevada, November 1989. Google ScholarDigital Library
- 29 Win. A. Wulf and Sally A. McKee. Hitting the memory wall: implications of the obvious. Computer Architecture News, 23(1), March 1995. Google ScholarDigital Library
Index Terms
- Compiler-controlled memory
Recommendations
Compiler-controlled memory
ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systemsOptimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler's ability to improve memory ...
Compiler-controlled memory
Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler's ability to improve memory ...
Comments