Abstract
The cost of accessing main memory is increasing. Machine designers have tried to mitigate the consequences of the processor and memory technology trends underlying this increasing gap with a variety of techniques to reduce or tolerate memory latency. These techniques, unfortunately, are only occasionally successful for pointer-manipulating programs. Recent research has demonstrated the value of a complementary approach, in which pointer-based data structures are reorganized to improve cache locality.This paper studies a technique for using a generational garbage collector to reorganize data structures to produce a cache-conscious data layout, in which objects with high temporal affinity are placed next to each other, so that they are likely to reside in the same cache block. The paper explains how to collect, with low overhead, real-time profiling information about data access patterns in object-oriented languages, and describes a new copying algorithm that utilizes this information to produce a cache-conscious object layout.Preliminary results show that this technique reduces cache miss rates by 21--42%, and improves program performance by 14--37% over Cheney's algorithm. We also compare our layouts against those produced by the Wilson-Lam-Moher algorithm, which attempts to improve program locality at the page level. Our cache-conscious object layouts reduces cache miss rates by 20--41% and improves program performance by 18--31% over their algorithm, indicating that improving locality at the page level is not necessarily beneficial at the cache level.
- 1 Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. "Cache-conscious data placement." To appear in Proceedings of the Eight International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII), Oct. 1998. Google ScholarDigital Library
- 2 David Callahan, Ken Kennedy, and Allan Porterfield. "Software prefetching." In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASP- LOS IV), pages 40-52, April 1991. Google ScholarDigital Library
- 3 Steve Carr, Kathryn S. McKinley, and Chau-Wen Tseng. "Compiler optimizations for improving data locality." In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 252-262, Oct. 1994. Google ScholarDigital Library
- 4 Craig Chambers. "Object-oriented multi-methods in Cecil." In Proceedings ECOOP'92, LNCS 615, Springer- Verlag, pages 33-56, June 1992. Google ScholarDigital Library
- 5 Craig Chambers. "The Cecil language: Specification and rationale." University of Washington Seattle, Technical Report TR-93-03-05, Mar. 1993.Google Scholar
- 6 Craig Chambers. "Personal communication." March 1998.Google Scholar
- 7 Craig Chambers, Jeffrey Dean, and David Grove. "Whole-program optimization of object-oriented languages." University of Washington Seattle, Technical Report 96-06-02, June 1996.Google Scholar
- 8 C.J. Cheney. "A nonrecursive list compacting algorithm." Communications of the A CM, 13(11):677-678, 1970. Google ScholarDigital Library
- 9 Trishul M. Chilimbi, James R. Larus, and Mark D. Hill. "Improving pointer-based codes through cache-conscious data placement." University of Wisconsin-Madison, Technical Report CS-TR-98-1365, Mar. 1998.Google Scholar
- 10 R. Courts. "Improving locality of reference in a garbage-collecting memory management system." Communications of the ACM, 31(9):1128-1138, 1988. Google ScholarDigital Library
- 11 Robert Fenichel and Jerome Yochelson. "A LISP garbage-collector for virtual-memory computer systems." Communications of the A CM, 12(11):611-612, 1969. Google ScholarDigital Library
- 12 Dennis Gannon, William Jalby, and K. Gallivan. "Strategies for cache and local memory management by global program transformation." Journal of Parallel and Distributed Computing, 5:587-616, 1988. Google ScholarDigital Library
- 13 Richard Hudson, Eliot Moss, Amer Diwan, and Christopher Weight. "A language-independent garbage collector toolkit." University of Massachusetts at Amherst technical report TR 91-47, Sept. 199 I. Google ScholarDigital Library
- 14 David Kroft. "Lockup-free instruction fetch/prefetch cache organization." In The 8th Annual International Symposium on Computer Architecture, pages 81-87, May 1981. Google ScholarDigital Library
- 15 M. S. Lam, P. R. Wilson, and T. G. Moher. "Object type directed garbage collection to improve locality." In Proceedings of the International Workshop on Memory Management, pages 16-18, Sept. 1992. Google ScholarDigital Library
- 16 James Laudon, Anoop Gupta, and Mark Horowitz. "Interleaving: A multithreading technique targeting multiprocessors and workstations." In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308-318, San Jose, California, 1994. Google ScholarDigital Library
- 17 Henry Lieberman and Carl Hewitt. "A real-time garbage collector based on lifetimes of objects." Communications of the ACM, 26(6):419-429, 1983. Google ScholarDigital Library
- 18 D. A. Moon. "Garbage collection in a large LISP system." In Conference Record of the 1984 Symposium on LISP and Functional Programming, pages 235-246, Aug. 1984. Google ScholarDigital Library
- 19 Todd C. Mowry, Monica S. Lam, and Anoop Gupta. "Design and evaluation of a compiler algorithm for prefetching." In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 62-73, October 1992. Google ScholarDigital Library
- 20 David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keaton, Christoforos Kazyrakis, Randi Thomas, and Katherine Yellick. "A case for intelligent RAM." In IEEE Micro, pages 34- 44, Apr 1997. Google ScholarDigital Library
- 21 Sharon E. Perl and Richard L. Sites. "Studies of Windows NT performance using dynamic execution traces." In Second Symposium on Operating Systems Design and Implementation, Oct. 1996. Google ScholarDigital Library
- 22 Shai Rubin, David Bernstein, and Michael Rodeh. "Virtual cache line: A new technique to improve cache exploitation for recursive data structures." Submitted for publication, Apr. 1998. Google ScholarDigital Library
- 23 Alan J. Smith. "Cache memories." ACM Computing Surveys, 14(3):473-530, 1982. Google ScholarDigital Library
- 24 Burton J. Smith. "Architecture and applications of the HEP multiprocessor computer system." In Real-Time Signal Processing IV, pages 241-248, 1981.Google Scholar
- 25 Sun Microelectronics. UItraSPARC User's Manual, 1996.Google Scholar
- 26 David Ungar. "Generation scavenging: A non-disruptive high performance storage reclamation algorithm." In Proceedings of the A CM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, pages 157-167, Apr. 1984. Google ScholarDigital Library
- 27 David Ungar and Frank Jackson. "An adaptive tenuring policy for generation scavengers." ACM Transactions on Programming Languages and Systems, 14(1 ): 1-27, January 1992. Google ScholarDigital Library
- 28 J. L. White. "Address/memory management for a gigantic LISP environment, or, GC considered harmful." In Conference Record of the 1980 LISP Conference, pages I 19-127, 1980. Google ScholarDigital Library
- 29 M. V. Wilkes. "Slave memories and dynamic storage allocation." In IEEE Trans. on Electronic Computers, pages 270-271, April 1965.Google Scholar
- 30 Paul R. Wilson, Michael S. Lam, and Thomas G. Moher. "Effective "static-graph" reorganization to improve locality in garbage-collected systems." SIG- PLAN Notices, 26(6): 177-191, June 1991. Proceedings of the A CM SIGPLAN'91 Conference on Programming Language Design and Implementation. Google ScholarDigital Library
- 31 Michael E. Wolf and Monica S. Lam. "A data locality optimizing algorithm." SIGPLAN Notices, 26(6):30-44, June 1991. Proceedings of the ACM SIGPLAN'91 Conference on Programming Language Design and Implementation. Google ScholarDigital Library
Index Terms
- Using generational garbage collection to implement cache-conscious data placement
Recommendations
Cache-conscious structure layout
PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementationHardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating ...
Using generational garbage collection to implement cache-conscious data placement
ISMM '98: Proceedings of the 1st international symposium on Memory managementThe cost of accessing main memory is increasing. Machine designers have tried to mitigate the consequences of the processor and memory technology trends underlying this increasing gap with a variety of techniques to reduce or tolerate memory latency. ...
Cache-conscious structure layout
Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating ...
Comments