Article

Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Authors:
Lisa R. Hsu

University of Michigan, Ann Arbor, Michigan

University of Michigan, Ann Arbor, Michigan
View Profile

,
Steven K. Reinhardt

University of Michigan, Ann Arbor, Michigan

University of Michigan, Ann Arbor, Michigan
View Profile

,
Ravishankar Iyer

Intel Corp, Hillsboro, Oregon

Intel Corp, Hillsboro, Oregon
View Profile

,
Srihari Makineni

Intel Corp, Hillsboro, Oregon

Intel Corp, Hillsboro, Oregon
View Profile

PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniquesSeptember 2006Pages 13–22https://doi.org/10.1145/1152154.1152161

Published:16 September 2006Publication History

PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques

Pages 13–22

ABSTRACT

As chip multiprocessors (CMPs) become increasingly mainstream, architects have likewise become more interested in how best to share a cache hierarchy among multiple simultaneous threads of execution. The complexity of this problem is exacerbated as the number of simultaneous threads grows from two or four to the tens or hundreds. However, there is no consensus in the architectural community on what "best" means in this context. Some papers in the literature seek to equalize each thread's performance loss due to sharing, while others emphasize maximizing overall system performance. Furthermore, the specific effect of these goals varies depending on the metric used to define "performance".In this paper we label equal performance targets as Communist cache policies and overall performance targets as Utilitarian cache policies. We compare both of these models to the most common current model of a free-for-all cache (a Capitalist policy). We consider various performance metrics, including miss rates, bandwidth usage, and IPC, including both absolute and relative values of each metric. Using analytical models and behavioral cache simulation, we find that the optimal partitioning of a shared cache can vary greatly as different but reasonable definitions of optimality are applied. We also find that, although Communist and Utilitarian targets are generally compatible, each policy has workloads for which it provides poor overall performance or poor fairness, respectively. Finally, we find that simple policies like LRU replacement and static uniform partitioning are not sufficient to provide near-optimal performance under any reasonable definition, indicating that some thread-aware cache resource allocation mechanism is required.

References

D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proc. 11th Int'l Symp. on High-Performance Computer Architecture (HPCA), pages 340--351, Feb. 2005. Google ScholarDigital Library
D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. In Proceedings of Design Automation Conference, Los Angeles, June 2000.Google Scholar
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In Proc. 2005 USENIX Technical Conference, pages 395--398, 2005. Google ScholarDigital Library
R. Goodwins. Does hyperthreading hurt server performance? http://news.com.com/Does+hyperthreading+hurt+server+performance/2100-1006_3-5965435.html?tag=nefd.top, Nov. 2005.Google Scholar
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A nuca substrate for flexible cmp cache sharing. In Proc. 2005 Int'l Conf. on Supercomputing, pages 31--40, 2005. Google ScholarDigital Library
Intel Corp. Next leap in microprocessor architecture: Intel core duo. White paper. http://ces2006.akamai.com.edgesuite.net/yonahassets/CoreDuo_WhitePaper.pdf.Google Scholar
R. R. Iyer. On modeling and analyzing cache hierarchies using CASPER. In Proc. 11th Int'l Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, pages 182--187, Oct. 2003.Google ScholarCross Ref
R. R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In Proc. 2004 Int'l Conf. on Supercomputing, pages 257--266, 2004. Google ScholarDigital Library
R. Kalla, B. Sinharoy, and J. M. Tendler. Ibm power5 chip: A dual-core multithreaded processor. IEEE Micro, 24(2):40--47, Mar. 2004. Google ScholarDigital Library
S. Kim, D. Chandra, and Y. Solihin. Fair caching in a chip multiprocessor architecture. In Proc. 13th Ann. Int'l Conf. on Parallel Architectures and Compilation Techniques, pages 111--122, Sept. 2004. Google ScholarDigital Library
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, 25(2):21--29, March/April 2005. Google ScholarDigital Library
S. R. Kunkel, R. J. Eickemeyer, M. H. Lipasti, T. J. Mullins, B. O'Krafka, H. Rosenberg, S. P. VanderWiel, P. L. Vitale, and L. D. Whitley. A performance methodology for commercial servers. IBM Journal of Research and Development, 44(6):851--871, November 2000. Google ScholarDigital Library
M5 Development Team. The M5 Simulator. http://m5.eecs.umich.edu.Google Scholar
A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proc. Ninth Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX), pages 234--244, Nov. 2000. Google ScholarDigital Library
H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE Trans. Computers, 41(9):1054--1068, Sept. 1992. Google ScholarDigital Library
G. E. Suh, S. Devadas, and L. Rudolph. Analytical cache models with applications to cache partitioning. In Proc. 2001 Int'l Conf. on Supercomputing, pages 1--12, 2001. Google ScholarDigital Library
G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proc. 8th Int'l Symp. on High-Performance Computer Architecture (HPCA), Feb. 2002. Google ScholarDigital Library
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic cache partitioning for simultaneous multithreading systems. In Proc. 13th IASTED Int'l Conference on Parallel and Distributed Computing Systems, 2001.Google Scholar
D. Thiebaut, H. S. Stone, and J. L. Wolf. Improving disk cache hit-ratios through cache partitioning. 41(6):665--676, 1992. Google ScholarDigital Library
C. A. Waldspurger. Memory resource management in vmware esx server. In Proc. 2002 USENIX Technical Conference, pages 181--194, Dec. 2002. Google ScholarDigital Library
D. A. Wood, M. D. Hill, and R. E. Kessler. A model for estimating trace-sample miss ratios. In Proc. 1991 ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pages 79--89, May 1991. Google ScholarDigital Library

Index Terms

Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource
1. Hardware
  1. Hardware validation

Recommendations

A new cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines

Cache memory plays a crucial role in determining the performance of processors, especially for embedded processors where area and power are tightly constrained. It is necessary to have effective management mechanisms, such as cache replacement policies, ...
Read More
A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and design

Chip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
Read More
Early miss prediction based periodic cache bypassing for high performance GPUs

The aim of the hierarchical cache memories that are equipped for GPUs is the management of irregular memory access patterns for general purpose workloads. The level-1 data cache (L1D) of the GPU plays an important role for its ability in the provision ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques
September 2006
308 pages
ISBN:159593264X
DOI:10.1145/1152154
General Chair:
Erik Altman
IBM Research, USA
,
Program Chairs:
Kevin Skadron
University of Virginia, USA
,
Ben Zorn
Microsoft Research, USA
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 September 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache
multiprocessor
partitioning
performance
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 167
  Total Citations
  View Citations
- 1,034
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques

ABSTRACT

References

Cited By

Index Terms

Recommendations

A new cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines

A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches

Early miss prediction based periodic cache bypassing for high performance GPUs