skip to main content
research-article

High performance cache replacement using re-reference interval prediction (RRIP)

Published:19 June 2010Publication History
Skip Abstract Section

Abstract

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and misses. Applications that exhibit a distant re-reference interval perform badly under LRU. Such applications usually have a working-set larger than the cache or have frequent bursts of references to non-temporal data (called scans). To improve the performance of such workloads, this paper proposes cache replacement using Re-reference Interval Prediction (RRIP). We propose Static RRIP (SRRIP) that is scan-resistant and Dynamic RRIP (DRRIP) that is both scan-resistant and thrash-resistant. Both RRIP policies require only 2-bits per cache block and easily integrate into existing LRU approximations found in modern processors. Our evaluations using PC games, multimedia, server and SPEC CPU2006 workloads on a single-core processor with a 2MB last-level cache (LLC) show that both SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 4% and 10% respectively. Our evaluations with over 1000 multi-programmed workloads on a 4-core CMP with an 8MB shared LLC show that SRRIP and DRRIP outperform LRU replacement on the throughput metric by an average of 7% and 9% respectively. We also show that RRIP outperforms LFU, the state-of the art scan-resistant replacement algorithm to-date. For the cache configurations under study, RRIP requires 2X less hardware than LRU and 2.5X less hardware than LFU.

References

  1. "Inside the Intel Itanium 2 Processor", HP Technical White Paper, July 2002.Google ScholarGoogle Scholar
  2. "UltraSPARC T2 supplement to the UltraSPARC architecture 2007", Draft D1.4.3. 2007.Google ScholarGoogle Scholar
  3. Intel. Intel Core i7 Processor. http://www.intel.com/products/processor/corei7/specifications.htmGoogle ScholarGoogle Scholar
  4. H. Al-Zoubi, A. Milenkovic, M. Milenkovic. "Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite." In ACMSE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Bansal and D. S. Modha. "CAR: Clock with Adaptive Replacement", In FAST, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Basu, N. Kirman, M. Kirman, M. Chaudhuri, J. Martinez. "Scavenger: A New Last Level Cache Architecture with Global Block Priority". In Micro-40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. A. Belady. A study of replacement algorithms for a virtual-storage computer. In IBM Systems journal, pages 78--101, 1966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Chaudhuri. "Pseudo-LIFO: The Foundation of a New Family of Replacement Policies for Last-level Caches". In Micro, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. J. Corbató, "A paging experiment with the multics system," In Honor of P. M. Morse, pp. 217--228, MIT Press, 1969.Google ScholarGoogle Scholar
  10. A. Jaleel, R. Cohn, C. K. Luk, B. Jacob. CMP$im: A Pin-Based On-The-Fly MultiCore Cache Simulator. In MoBS, 2008.Google ScholarGoogle Scholar
  11. A. Jaleel, W. Hasenplaugh, M. K. Qureshi, S. C. Steely Jr., J. Emer. "Adaptive Insertion Policies for Managing Shared Caches". In PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Jiang and X. Zhang, "LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance," In Proc. ACM SIGMETRICS Conf., 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Johnson and D. Shasha, "2Q: A low overhead high performance buffer management replacement algorithm," In VLDB Conf., 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Kaxiras, Z. Hu, M. Martonosi. "Cache decay: exploiting generational behavior to reduce cache leakage power." In ISCA--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Keramidas, P. Petoumenos, S. Kaxiras. "Cache replacement based on reuse-distance prediction'. In ICCD, 2007Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Lai, C. Fide, B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In ISCA-28, 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Lee, J. Choi, J. Kim, S. H. Noh, S. Lyul Min, Y. Cho, C. Sang Kim. "LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies," IEEE Trans. Computers, vol. 50, no. 12, pp. 1352--1360, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Lin and S. K. Reinhardt. "Predicting last-touch references under optimal replacement." Technical Report CSE-TR-447-02, U. of Michigan, 2002.Google ScholarGoogle Scholar
  19. H. Liu, M. Ferdman, J. Huh, D. Burger. "Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency." In Micro-41, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Loh. "Extending the Effectiveness of 3D-Stacked DRAM Caches with an Adaptive Multi-Queue Policy". In Micro, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, K. Hazelwood. "Pin: building customized program analysis tools with dynamic instrumentation". In PLDI, pages 190--200, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Megiddo and D. S. Modha, "ARC: A self-tuning, low overhead replacement cache,' in FAST, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. J. O'Neil, P. E. O'Neil, G. Weikum. "The LRU-K page replacement algorithm for database disk buffering," in Proc. ACM SIGMOD Conf., pp. 297--306, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, A. Karunanidhi. "Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation". In MICRO--37, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Qureshi, A. Jaleel, Y. Patt, S. Steely, J. Emer. "Adaptive Insertion Policies for High Performance Caching". In ISCA--34, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Rajan and G. Ramaswamy. "Emulating Optimal Replacement with a Shepherd Cache". In Micro--40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. T. Robinson and M. V. Devarakonda, "Data cache management using frequency-based replacement," in SIGMETRICS Conf, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Srinath, O. Mutlu, H. Kim, Y. Patt. "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetcher". In HPCA-13, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Subramanian, Y. Smaragdakis, G. Loh. "Adaptive caches: Effective shaping of cache behavior to workloads." In MICRO-39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Xie and G. Loh. "PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches." In ISCA-36, 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Zhou and J. F. Philbin, "The multi-queue replacement algorithm for second level buffer caches," in USENIX Annual Tech. Conf, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. High performance cache replacement using re-reference interval prediction (RRIP)

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
        ISCA '10
        June 2010
        508 pages
        ISSN:0163-5964
        DOI:10.1145/1816038
        Issue’s Table of Contents
        • cover image ACM Conferences
          ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
          June 2010
          520 pages
          ISBN:9781450300537
          DOI:10.1145/1815961

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 June 2010

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader