research-article

A light-weight fairness mechanism for chip multiprocessor memory systems

Authors:
Magnus Jahre

Norwegian University of Science and Technology, Trondheim, Norway

Norwegian University of Science and Technology, Trondheim, Norway
View Profile

,
Lasse Natvig

Norwegian University of Science and Technology, Trondheim, Norway

Norwegian University of Science and Technology, Trondheim, Norway
View Profile

CF '09: Proceedings of the 6th ACM conference on Computing frontiersMay 2009Pages 1–10https://doi.org/10.1145/1531743.1531747

Published:18 May 2009Publication History

CF '09: Proceedings of the 6th ACM conference on Computing frontiers

Pages 1–10

ABSTRACT

Chip Multiprocessor (CMP) memory systems suffer from the effects of destructive thread interference. This interference reduces performance predictability because it depends heavily on the memory access pattern and intensity of the co-scheduled threads. In this work, we confirm that all shared units must be thread-aware in order to provide memory system fairness. However, the current proposals for fair memory systems are complex as they require an interference measurement mechanism and a fairness enforcement policy for all hardware-controlled shared units. Furthermore, they often sacrifice system throughput to reach their fairness goals which is not desirable in all systems.

In this work, we show that our novel fairness mechanism, called the Dynamic Miss Handling Architecture (DMHA), is able to reduce implementation complexity by using a single fairness enforcement policy for the complete hardware-managed shared memory system. Specifically, it controls the total miss bandwidth available to each thread by dynamically manipulating the number of Miss Status Holding Registers (MSHRs) available in each private data cache. When fairness is chosen as the metric of interest and we compare to a state-of-the-art fairness-aware memory system, DMHA improves fairness by 26% on average with the single program baseline. With a different configuration, DMHA improves throughput by 13% on average compared to a conventional memory system.

References

N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K.Reinhardt. The M5 Simulator: Modeling Networked Systems. IEEE Micro, 26(4):52--60, 2006. Google ScholarDigital Library
R. Bitirgen, E. Ipek, and J. F. Martinez. Coordinated Management of Multiple Resources in Chip Multiprocessors: A Machine Learning Approach. MICRO 41: Proc. of the 41th IEEE/ACM Int. Symp. on Microarchitecture, 2008. Google ScholarDigital Library
J. Chang and G. S. Sohi. Cooperative Cache Partitioning for Chip Multiprocessors. ICS '07: Proc. of the 21st Annual Int. Conf. on Supercomputing, pages 242--252, 2007. Google ScholarDigital Library
V. Cuppu, B. Jacob, B. Davis, and T. Mudge. A Performance Comparison of Contemporary DRAM Architectures. Proc. of the 26th Inter. Symp. on Comp. Arch., pages 222--233, 1999. Google ScholarDigital Library
S. Eyerman and L. Eeckhout. System-Level Performance Metrics for Multiprogram Workloads. IEEE Micro, 28(3):42--53, 2008. Google ScholarDigital Library
K. I. Farkas and N. P. Jouppi. Complexity/Performance Tradeoffs with Non-Blocking Loads. ISCA '94: Proc. of the 21st An. Int. Symp. on Comp. Arch.,pages 211--222, 1994. Google ScholarDigital Library
R. Gabor, S. Weiss, and A. Mendelson. Fairness and throughput in switch on event multithreading. MICRO 39: Proc. of the 39th Int. Symp. on Microarchitecture,pages 149--160, 2006. Google ScholarDigital Library
P. Goyal, H. M. Vin, and H. Chen. Start-time Fair Queueing: A Scheduling Algorithm for Integrated Services Packet Switching Networks. SIGCOMM '96: Conf. Proc. on App., Tech., Arch., and Protocols for Comp. Com., pages 157--168, 1996. Google ScholarDigital Library
F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A Framework for Providing Quality of Service in Chip Multi-Processors. MICRO 40: Proc. of the 40th An. IEEE/ACM Int. Symp. on Microarchitecture, 2007. Google ScholarDigital Library
L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource. PACT '06: Proc. of the 15th Int. Conf. on Parallel Arch. and Comp. Tech., pages 13--22, 2006. Google ScholarDigital Library
R. Iyer. CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms. ICS '04: Proceedings of the 18th An. Int. Conf. on Supercomputing, pages 257--266, 2004. Google ScholarDigital Library
R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS Policies and Architecture for Cache/Memory in CMP Platforms. SIGMETRICS '07: Proc. of the 2007 ACM SIGMETRICS Int. Conf. on Measurement and Modeling of Comp. Sys., pages 25--36, 2007. Google ScholarDigital Library
JEDEC Solid State Technology Association. DDR2 SDRAM Specification, May 2006.Google Scholar
S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. PACT '04: Proc. of the 13th Int. Conf. on Parallel Architectures and Compilation Techniques, pages 111--122, 2004. Google ScholarDigital Library
D. Kroft. Lockup-free Instruction Fetch/Prefetch Cache Organization. ISCA '81: Proc. of the 8th An. Symp. on Comp. Arch., pages 81--87, 1981. Google ScholarDigital Library
K. Luo, J. Gummaraju, and M. Franklin. Balancing Throughput and Fairness in SMT Processors. ISPASS, 2001.Google Scholar
O. Mutlu and T. Moscibroda. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. MICRO 40: Proc. of the 40th Annual IEEE/ACM Int. Symp. on Microarchitecture, 2007. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems. ISCA '08: Proc. of the 35th An. Int. Symp. on Comp. Arch., pages 63--74, 2008. Google ScholarDigital Library
K. Nesbit, M. Moreto, F. Cazorla, A. Ramirez, M. Valero, and J. Smith. Multicore Resource Management. IEEE Micro, 28(3):6--16, 2008. Google ScholarDigital Library
K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair Queuing Memory Systems. MICRO 39: Proc. of the 39th An. IEEE/ACM Int. Symp. on Microarch., pages 208--222, 2006. Google ScholarDigital Library
K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. ISCA '07: Proc. of the 34th An. Int. Symp. on Comp. Arch., pages 57--68, 2007. Google ScholarDigital Library
N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural Support for Operating System-driven CMP Cache Management. PACT '06: Proc. of the 15th Int. Conf. on Parallel Architectures and Compilation Techniques, pages 2---12, 2006. Google ScholarDigital Library
N. Rafique, W.-T. Lim, and M. Thottethodi. Effective Management of DRAM Bandwidth in Multicore Processors. PACT '07: Proc. of the 16th Int. Conf. on Parallel Architecture and Compilation Techniques (PACT 2007), pages 245--258, 2007. Google ScholarDigital Library
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. ISCA '00: Proc. of the 27th An. Int. Symp. on Comp. Arch., pages 128--138, 2000. Google ScholarDigital Library
A. Snavely and D. M. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreading Processor. Arch. Support for Programming Languages and Operating Systems, pages 234--244, 2000. Google ScholarDigital Library
SPEC CPU 2000 Web Page. http://www.spec.org/cpu2000/.Google Scholar
D. Tarjan, S. Thoziyoor, and N. P. Jouppi. CACTI 4.0 Technical Report. 2006.Google Scholar
L. Zhao, R. Iyer, R. Illikkal, J. Moses, S. Makineni, and D. Newell. CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms. PACT '07: Proc. of the 16th Int. Conf. on Parallel Arch. and Comp. Tech., pages 339--352, 2007. Google ScholarDigital Library

Index Terms

A light-weight fairness mechanism for chip multiprocessor memory systems
1. Computer systems organization

Recommendations

Dynamic Fair Cache Partitioning for Chip Multiprocessor
CSO '10: Proceedings of the 2010 Third International Joint Conference on Computational Science and Optimization - Volume 02

Fairness is a critical issue because of some serious problems, such as thread starvation and priority inversion, it can arise and render the Operating System (OS) scheduler ineffective if no fair cache sharing which provided by the hardware. In order to ...
Read More
Managing wire delay in chip multiprocessor caches
Read More
An adaptive chip multiprocessor cache hierarchy
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CF '09: Proceedings of the 6th ACM conference on Computing frontiers
May 2009
238 pages
ISBN:9781605584133
DOI:10.1145/1531743
General Chairs:
Gearold Johnson
Colorado State University, USA
,
Cartsen Trinitis
TU München, Germany
,
Program Chairs:
Georgi N. Gaydadjiev
TU Delft, The Nederland
,
Alex Veidenbaum
University of California, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 May 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chip multiprocessor
dynamic miss handling architecture
fairness
interference
mechanism
miss status holding register
Qualifiers
- research-article
Conference

Acceptance Rates
CF '09 Paper Acceptance Rate26of113submissions,23%Overall Acceptance Rate240of680submissions,35%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 306
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A light-weight fairness mechanism for chip multiprocessor memory systems

CF '09: Proceedings of the 6th ACM conference on Computing frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Dynamic Fair Cache Partitioning for Chip Multiprocessor

Managing wire delay in chip multiprocessor caches

An adaptive chip multiprocessor cache hierarchy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A light-weight fairness mechanism for chip multiprocessor memory systems

CF '09: Proceedings of the 6th ACM conference on Computing frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Dynamic Fair Cache Partitioning for Chip Multiprocessor

Managing wire delay in chip multiprocessor caches

An adaptive chip multiprocessor cache hierarchy

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media