research-article

Free Access

Exploiting reuse locality on inclusive shared last-level caches

Authors:
Jorge Albericio

University of Zaragoza

University of Zaragoza
View Profile

,
Pablo Ibáñez

University of Zaragoza

University of Zaragoza
View Profile

,
Víctor Viñals

University of Zaragoza

University of Zaragoza
View Profile

,
Jose María Llabería

UPC Barcelona Tech

UPC Barcelona Tech
View Profile

ACM Transactions on Architecture and Code Optimization Volume 9 Issue 4Article No.: 38pp 1–19https://doi.org/10.1145/2400682.2400697

Published:20 January 2013Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Optimization of the replacement policy used for Shared Last-Level Cache (SLLC) management in a Chip-MultiProcessor (CMP) is critical for avoiding off-chip accesses. Temporal locality, while being exploited by first levels of private cache memories, is only slightly exhibited by the stream of references arriving at the SLLC. Thus, traditional replacement algorithms based on recency are bad choices for governing SLLC replacement. Recent proposals involve SLLC replacement policies that attempt to exploit reuse either by segmenting the replacement list or improving the rereference interval prediction.

On the other hand, inclusive SLLCs are commonplace in the CMP market, but the interaction between replacement policy and the enforcement of inclusion has barely been discussed. After analyzing that interaction, this article introduces two simple replacement policies exploiting reuse locality and targeting inclusive SLLCs: Least Recently Reused (LRR) and Not Recently Reused (NRR). NRR has the same implementation cost as NRU, and LRR only adds one bit per line to the LRU cost.

After considering reuse locality and its interaction with the invalidations induced by inclusion, the proposals are evaluated by simulating multiprogrammed workloads in an 8-core system with two private cache levels and an SLLC. LRR outperforms LRU by 4.5% (performing better in 97 out of 100 mixes) and NRR outperforms NRU by 4.2% (performing better in 99 out of 100 mixes). We also show that our mechanisms outperform rereference interval prediction, a recently proposed SLLC replacement policy and that similar conclusions can be drawn by varying the associativity or the SLLC size.

References

Baer, J. and Wang, W.-H. (1988). On the inclusion properties for multi-level cache hierarchies. In Proceedings of the 15^th Annual International Computer Architecture Symposium. 73--80. Google ScholarDigital Library
Chen, X., Yanh, Y., Gopalarkishnan, G., and Chou, C. T. 2006. Reducing verification complexity of a multicore coherence protocol using assume/guarantee. In Proceedings of the International Conference on Formal Methods in Computer Aided Design (FMCAD'06). 81--88. Google ScholarDigital Library
Gao, H. and Wilkerson, C. 2010. A dueling segmented lru replacement algorithm with adaptive bypassing. In Proceedings of the 1^st JILP Workshop on Computer Architecture Competitions.Google Scholar
Intel. 2011. Intel core i7 processor. http://www.intel.com/products/processor/corei7/specifications.htmGoogle Scholar
Jaleel, A., Borch, E., Bhandaru, M., Steely Jr., S., and Emer, J. 2010a. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (tla) cache management policies. In Proceedings of the 43^rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'10). IEEE Computer Society, 151--162. Google ScholarDigital Library
Jaleel, A., Hasenplaugh, W., Qureshi, M., Sebot, J., Steely, S., and Emer, J. 2008. Adaptive insertion policies for managing shared caches. In Proceedings of the 17^th Conference on Parallel Architectures and Compilation Techniques (PACT'08). ACM Press, New York, 208--219. Google ScholarDigital Library
Jaleel, A., Theobald, K., Steely, S., and Emer, J. 2010b. High performance cache replacement using re-reference interval prediction (rrip). In Proceedings of the 37^th Annual International Symposium on Computer Architecture (ISCA'10). ACM Press, New York, 60--71. Google ScholarDigital Library
Karedla, R., Love, J., and Wherry, B. 1994. Caching strategies to improve disk system performance. Comput. 27, 3, 38--46. Google ScholarDigital Library
Kaxiras, S., Hu, Z., and Martonosi, M. 2001. Cache decay: Exploiting generational behavior to reduce cache leakage power. In Proceedings of the 28^th Annual International Computer Architecture Symposium. 240--251. Google ScholarDigital Library
Kahn, S., Wang, Z., and Jimenez, D. A. 2012. Decoupled dynamic cache segmentation. In Proceedings of the IEEE 18^th International Symposium on High Performance Computer Architecture (HPCA'12). Google ScholarDigital Library
Kahn, S. M., Tian, Y., and Jimenez, D. A. 2010. Sampling dead block prediction for last-level caches. In Proceedings of the 43^rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'10). 175--186. Google ScholarDigital Library
Lai, A.-C., Fide, C., and Falsafi, B. 2001. Dead-Block prediction and dead-block correlating prefetchers. In Proceedings of the 28^th Annual International Computer Architecture Symposium. 144--154. Google ScholarDigital Library
Lee, D., Choi, J., Kim, J.-H., Noh, S., Min, S. L., Cho, Y., and Kim, C. S. 2001. LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Trans. Comput. 50, 12, 1352--1361. Google ScholarDigital Library
Lin, W. and Reinhardt, S. K. 2002. Predicting last-touch references under optimal replacement. Tech. rep. CSE-TR-447-02, University of Michigan.Google Scholar
Liu, H., Ferdman, M., Huh, J., and Burger, D. 2008. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proceedings of the 41^st IEEE/ACM International Symposium on Microarchitecture (MICRO'08). 222--233. Google ScholarDigital Library
Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Comput. 35, 2, 50--58. Google ScholarDigital Library
Martin, M., Sorin, D., Beckmann, B., Marty, M., Xu, M., Alameldeen, A., Moore, K., Hill, M., and Wood, D. 2005. Multifacets general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News 33. Google ScholarDigital Library
Qureshi, M., Jaleel, A., Patt, Y., Steely, S., and Emer, J. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34^th Annual International Symposium on Computer Architecture (ISCA'07). ACM Press, New York, 381--391. Google ScholarDigital Library
Subramanian, R., Smaragdakis, Y., and Loh, G. H. 2006. Adaptive caches: Effective shaping of cache behavior to workloads. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06). 385--396. Google ScholarDigital Library
Sun Microsystems. 2007. UltraSPARC T2 supplement to the UltraSPARC architecture 2007. Draft D1.4.3, 19 Sep 2007.Google Scholar
Valero, A., Sahuquillo, J., Petit, S., Lopez, P., and Duato, J. 2012. Combining recency of information with selective random and a victim cache in last-level caches. ACM Trans. Archit. Code Optim. 9, 3, 16:1--16:20. Google ScholarDigital Library
Wu, C.-J., Jaleel, A., Hasenplaugh, W., Martonosi, M., Steely, S. C., and Emer, J. 2011. Ship: Signature-Based hit predictor for high performance caching. In Proceedings of the 44^th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'11). ACM Press, New York, 430--441. Google ScholarDigital Library
Xie, Y. and Loh, G. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the 36^th Annual International Symposium on Computer Architecture (ISCA'09). ACM Press, New York, 174--183. Google ScholarDigital Library

Index Terms

Exploiting reuse locality on inclusive shared last-level caches
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

The replacement policies for the last-level caches (LLCs) are usually designed based on the access information available locally at the LLC. These policies are inherently sub-optimal due to lack of information about the activities in the inner-levels of ...
Read More
A new cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines

Cache memory plays a crucial role in determining the performance of processors, especially for embedded processors where area and power are tightly constrained. It is necessary to have effective management mechanisms, such as cache replacement policies, ...
Read More
Adaptive Cache Bypassing for Inclusive Last Level Caches
IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing

Cache hierarchy designs, including bypassing, replacement, and the inclusion property, have significant performance impact. Recent works on high performance caches have shown that cache bypassing is an effective technique to enhance the last level cache ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Architecture and Code Optimization Volume 9, Issue 4
Special Issue on High-Performance Embedded Architectures and Compilers
January 2013
876 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2400682
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 January 2013
- Accepted: 1 November 2012
- Revised: 1 September 2012
- Received: 1 June 2012
Published in taco Volume 9, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Replacement policy
shared resources management
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 790
  Total Downloads
- Downloads (Last 12 months)57
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting reuse locality on inclusive shared last-level caches

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches

A new cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines

Adaptive Cache Bypassing for Inclusive Last Level Caches

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Exploiting reuse locality on inclusive shared last-level caches

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches

A new cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines

Adaptive Cache Bypassing for Inclusive Last Level Caches

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media