Abstract
Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and misses. Applications that exhibit a distant re-reference interval perform badly under LRU. Such applications usually have a working-set larger than the cache or have frequent bursts of references to non-temporal data (called scans). To improve the performance of such workloads, this paper proposes cache replacement using Re-reference Interval Prediction (RRIP). We propose Static RRIP (SRRIP) that is scan-resistant and Dynamic RRIP (DRRIP) that is both scan-resistant and thrash-resistant. Both RRIP policies require only 2-bits per cache block and easily integrate into existing LRU approximations found in modern processors. Our evaluations using PC games, multimedia, server and SPEC CPU2006 workloads on a single-core processor with a 2MB last-level cache (LLC) show that both SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 4% and 10% respectively. Our evaluations with over 1000 multi-programmed workloads on a 4-core CMP with an 8MB shared LLC show that SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 7% and 9% respectively. We also show that RRIP outperforms LFU, the state-of the art scan-resistant replacement algorithm to-date. For the cache configurations under study, RRIP requires 2X less hardware than LRU and 2.5X less hardware than LFU.
- "Inside the Intel Itanium 2 Processor", HP Technical White Paper, July 2002.Google Scholar
- "UltraSPARC T2 supplement to the UltraSPARC architecture 2007", Draft D1.4.3. 2007.Google Scholar
- Intel. Intel Core i7 Processor. http://www.intel.com/products/processor/corei7/specifications.htmGoogle Scholar
- H. Al-Zoubi, A. Milenkovic, M. Milenkovic. "Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite." In ACMSE, 2004. Google ScholarDigital Library
- S. Bansal and D. S. Modha. "CAR: Clock with Adaptive Replacement", In FAST, 2004. Google ScholarDigital Library
- A. Basu, N. Kirman, M. Kirman, M. Chaudhuri, J. Martinez. "Scavenger: A New Last Level Cache Architecture with Global Block Priority". In Micro-40, 2007. Google ScholarDigital Library
- L. A. Belady. A study of replacement algorithms for a virtual-storage computer. In IBM Systems journal, pages 78--101, 1966. Google ScholarDigital Library
- M. Chaudhuri. "Pseudo-LIFO: The Foundation of a New Family of Replacement Policies for Last-level Caches". In Micro, 2009. Google ScholarDigital Library
- F. J. Corbató, "A paging experiment with the multics system," In Honor of P. M. Morse, pp. 217--228, MIT Press, 1969.Google Scholar
- A. Jaleel, R. Cohn, C. K. Luk, B. Jacob. CMP$im: A Pin-Based On-The-Fly MultiCore Cache Simulator. In MoBS, 2008.Google Scholar
- A. Jaleel, W. Hasenplaugh, M. K. Qureshi, S. C. Steely Jr., J. Emer. "Adaptive Insertion Policies for Managing Shared Caches". In PACT, 2008. Google ScholarDigital Library
- S. Jiang and X. Zhang, "LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance," In Proc. ACM SIGMETRICS Conf., 2002. Google ScholarDigital Library
- T. Johnson and D. Shasha, "2Q: A low overhead high performance buffer management replacement algorithm," In VLDB Conf., 1994. Google ScholarDigital Library
- S. Kaxiras, Z. Hu, M. Martonosi. "Cache decay: exploiting generational behavior to reduce cache leakage power." In ISCA--28. Google ScholarDigital Library
- G. Keramidas, P. Petoumenos, S. Kaxiras. "Cache replacement based on reuse-distance prediction'. In ICCD, 2007Google ScholarCross Ref
- A. Lai, C. Fide, B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In ISCA-28, 2001 Google ScholarDigital Library
- D. Lee, J. Choi, J. Kim, S. H. Noh, S. Lyul Min, Y. Cho, C. Sang Kim. "LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies," IEEE Trans. Computers, vol. 50, no. 12, pp. 1352--1360, 2001. Google ScholarDigital Library
- W. Lin and S. K. Reinhardt. "Predicting last-touch references under optimal replacement." Technical Report CSE-TR-447-02, U. of Michigan, 2002.Google Scholar
- H. Liu, M. Ferdman, J. Huh, D. Burger. "Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency." In Micro-41, 2008. Google ScholarDigital Library
- G. Loh. "Extending the Effectiveness of 3D-Stacked DRAM Caches with an Adaptive Multi-Queue Policy". In Micro, 2009. Google ScholarDigital Library
- C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, K. Hazelwood. "Pin: building customized program analysis tools with dynamic instrumentation". In PLDI, pages 190--200, 2005. Google ScholarDigital Library
- N. Megiddo and D. S. Modha, "ARC: A self-tuning, low overhead replacement cache,' in FAST, 2003. Google ScholarDigital Library
- E. J. O'Neil, P. E. O'Neil, G. Weikum. "The LRU-K page replacement algorithm for database disk buffering," in Proc. ACM SIGMOD Conf., pp. 297--306, 1993. Google ScholarDigital Library
- H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, A. Karunanidhi. "Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation". In MICRO--37, 2004. Google ScholarDigital Library
- M. Qureshi, A. Jaleel, Y. Patt, S. Steely, J. Emer. "Adaptive Insertion Policies for High Performance Caching". In ISCA--34, 2007. Google ScholarDigital Library
- K. Rajan and G. Ramaswamy. "Emulating Optimal Replacement with a Shepherd Cache". In Micro--40, 2007. Google ScholarDigital Library
- J. T. Robinson and M. V. Devarakonda, "Data cache management using frequency-based replacement," in SIGMETRICS Conf, 1990. Google ScholarDigital Library
- S. Srinath, O. Mutlu, H. Kim, Y. Patt. "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetcher". In HPCA-13, 2007. Google ScholarDigital Library
- R. Subramanian, Y. Smaragdakis, G. Loh. "Adaptive caches: Effective shaping of cache behavior to workloads." In MICRO-39, 2006. Google ScholarDigital Library
- Y. Xie and G. Loh. "PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches." In ISCA-36, 2009 Google ScholarDigital Library
- Y. Zhou and J. F. Philbin, "The multi-queue replacement algorithm for second level buffer caches," in USENIX Annual Tech. Conf, 2001. Google ScholarDigital Library
Index Terms
- High performance cache replacement using re-reference interval prediction (RRIP)
Recommendations
SHiP: signature-based hit predictor for high performance caching
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on MicroarchitectureThe shared last-level caches in CMPs play an important role in improving application performance and reducing off-chip memory bandwidth requirements. In order to use LLCs more efficiently, recent research has shown that changing the re-reference ...
Adaptive insertion policies for high performance caching
ISCA '07: Proceedings of the 34th annual international symposium on Computer architectureThe commonly used LRU replacement policy is susceptible to thrashing for memory-intensive workloads that have a working set greater than the available cache size. For such applications, the majority of lines traverse from the MRU position to the LRU ...
Adaptive insertion policies for high performance caching
The commonly used LRU replacement policy is susceptible to thrashing for memory-intensive workloads that have a working set greater than the available cache size. For such applications, the majority of lines traverse from the MRU position to the LRU ...
Comments