skip to main content
article
Free Access

Compiler-controlled memory

Authors Info & Claims
Published:01 October 1998Publication History
Skip Abstract Section

Abstract

Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler's ability to improve memory behavior is its imperfect knowledge about the run-time behavior of the program. The compiler cannot completely predict runtime access patterns.There is an exception to this rule. During the register allocation phase, the compiler often must insert substantial amounts of spill code; that is, instructions that move values from registers to memory and back again. Because the compiler itself inserts these memory instructions, it has more knowledge about them than other memory operations in the program.Spill-code operations are disjoint from the memory manipulations required by the semantics of the program being compiled, and, indeed, the two can interfere in the cache. This paper proposes a hardware solution to the problem of increased spill costs---a small compiler-controlled memory (CCM) to hold spilled values. This small random-access memory can (and should) be placed in a distinct address space from the main memory hierarchy. The compiler can target spill instructions to use the CCM, moving most compiler-inserted memory traffic out of the pathway to main memory and eliminating any impact that those spill instructions would have on the state of the main memory hierarchy. Such memories already exist on some DSP microprocessors. Our techniques can be applied directly on those chips.This paper presents two compiler-based methods to exploit such a memory, along with experimental results showing that speedups from using CCM may be sizable. It shows that using the register allocation's coloring paradigm to assign spilled values to memory can greatly reduce the amount of memory required by a program.

References

  1. 1 Anonymous. Performance of pentium pro and pentium ii processor/cache combinations. Technical report, ECG Technology Communications Group, Compaq Computer Corporation, May 1997.Google ScholarGoogle Scholar
  2. 2 Bary R. Beck, David W.L. Yen, and Thomas L. Anderson. The cydra 5 minisupercomputer: Architecture and implementation. The Journal of Supercomputing, 7, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 Peter Bergner, Peter DaM, David Engebretsen, and Matthew O'Keefe. Spill code minimization via interference region spilling. SiGPLAN Notices, 32(6):287-295, June 1997. Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 Preston Briggs. Register Allocation via Graph Coloring. PhD thesis, Rice University, April 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 Preston Briggs. The massively scalar compiler project. Technical report, Rice University, July 1994. Preliminary version available via anonymous ftp.Google ScholarGoogle Scholar
  6. 6 David Callahan, Alan Carle, Mary W. Hall, and Ken Kennedy. Constructing the procedure call multigraph. IEEE Transactions on Software Engineering, 16(4), April 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 David Callahan, Ken Kennedy, and Allan Porterfield. Software prcfetching. In Proceedings of tile Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 Fred Chow, Sun Chan, Robert Kennedy, Shin-Ming Liu, Raymond Lo, and Peng Tu. A new algorithm for partial redundancy elimination based on ssa form. SIGPLAN Notices, 32(6):273-286, June 1997. Proceedings of the A CM SIGPLAN '97 Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 Keith Cooper, Ken Kennedy, and Nathaniel Mclntosh. Cross-loop reuse analysis and its application to cache optimization. In Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing, San Jose, California, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11 George E. Forsythe, Michael A. Malcolm, and Cleve B. Moler. Computer Methods for Mathematical Computations. Prentice-Hall, Englewood Cliffs, New Jersey, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 Lal George and Andrew W. Appel. Iterated register coalescing. A CM Transactions on Programming Languages and Systems, 18(3):300-324, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 John Hennessy and David Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, Inc., second edition, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 Cristina Hristea, Daniel Lenoski, and John Keen. Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks. In ACM, editor, SC'97: High Performance Networking and Computing: Proceedings of the 1997 A CM/IEEE S C97 Conference: November 15- ~I, 1997, San Jose, California, USA., pages ??-??, New York, NY 10036, USA and 1109 Spring Street, Suite 300, Silver Spring, MD 20910, USA, 1997. ACM Press and IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 Intel Corporation. PentiumTM {I Processor Developer's Manual, 1997.Google ScholarGoogle Scholar
  16. 16 John Lu and Keith Cooper. Register promotion in c programs. SIGPLAN Notices, 32(6):308-319, June 1997. Proceedirzgs of the A CM SIGPLAN '97 Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 Sally A. McKee. Compiling for efficient memory utilization. In Workshop on Interaction Between Compilers and Computer Architectures, Second IEEE Symposium on High Performance Computer Architecture (HPCA-~), San Jose, California, January 1996.Google ScholarGoogle Scholar
  18. 18 Kathryn S. McKinley. Personal communication. Email message, July 1998.Google ScholarGoogle Scholar
  19. 19 Kathryn S. McKinley and Olivier Temam. A quantitative analysis of loop nest locality, in Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 Larry Meadows, Steven Nakamoto, and Vincent Schuster. A vectorizing, software pipelining compiler for LIW and superscalar architecture. In Proceedings of RISC '9~, San Jose, CA, February 1992.Google ScholarGoogle Scholar
  21. 21 Todd C. Mowry, Monica S. Lain, and Anoop Gut)ta. Design and evaluation of a compiler algorithln for prefetching. In Proceedings of the Fifth InternationM Conference on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 Vijay S. Pal, Parthasarathy Ranganathan, Sarita V. Adve, and Tracy Harton. An evaluation of memory consistency models for shared-memory systems with ilp processors. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23 Barbara G. Ryder. Constructing the call graph of a program. IEEE Transactions on Software Engineering, 5(3):217-226, May 1979.Google ScholarGoogle Scholar
  24. 24 SPEC release 1.2, September 1989. Standards Performance Evaluation Corporation.Google ScholarGoogle Scholar
  25. 25 SPEC release 1.10, September 1995. Standards Performance Evaluation Corporation.Google ScholarGoogle Scholar
  26. 26 Michael Upton, Thomas Huff, Trevor Mudge, and Richard Brown. Resource allocation in a high clock rate microprocessor. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27 Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. SIGPLAN Notices, 26(6):30-44, June 1991. Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28 Michael Wolfe. More iteration space tiling. In Proceedings of Supercomputing '89, pages 655-664, Rcno, Nevada, November 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29 Win. A. Wulf and Sally A. McKee. Hitting the memory wall: implications of the obvious. Computer Architecture News, 23(1), March 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Compiler-controlled memory

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 33, Issue 11
            Nov. 1998
            309 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/291006
            Issue’s Table of Contents
            • cover image ACM Conferences
              ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
              October 1998
              326 pages
              ISBN:1581131070
              DOI:10.1145/291069

            Copyright © 1998 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 October 1998

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader