ABSTRACT
Chip Multiprocessor (CMP) memory systems suffer from the effects of destructive thread interference. This interference reduces performance predictability because it depends heavily on the memory access pattern and intensity of the co-scheduled threads. In this work, we confirm that all shared units must be thread-aware in order to provide memory system fairness. However, the current proposals for fair memory systems are complex as they require an interference measurement mechanism and a fairness enforcement policy for all hardware-controlled shared units. Furthermore, they often sacrifice system throughput to reach their fairness goals which is not desirable in all systems.
In this work, we show that our novel fairness mechanism, called the Dynamic Miss Handling Architecture (DMHA), is able to reduce implementation complexity by using a single fairness enforcement policy for the complete hardware-managed shared memory system. Specifically, it controls the total miss bandwidth available to each thread by dynamically manipulating the number of Miss Status Holding Registers (MSHRs) available in each private data cache. When fairness is chosen as the metric of interest and we compare to a state-of-the-art fairness-aware memory system, DMHA improves fairness by 26% on average with the single program baseline. With a different configuration, DMHA improves throughput by 13% on average compared to a conventional memory system.
- N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K.Reinhardt. The M5 Simulator: Modeling Networked Systems. IEEE Micro, 26(4):52--60, 2006. Google ScholarDigital Library
- R. Bitirgen, E. Ipek, and J. F. Martinez. Coordinated Management of Multiple Resources in Chip Multiprocessors: A Machine Learning Approach. MICRO 41: Proc. of the 41th IEEE/ACM Int. Symp. on Microarchitecture, 2008. Google ScholarDigital Library
- J. Chang and G. S. Sohi. Cooperative Cache Partitioning for Chip Multiprocessors. ICS '07: Proc. of the 21st Annual Int. Conf. on Supercomputing, pages 242--252, 2007. Google ScholarDigital Library
- V. Cuppu, B. Jacob, B. Davis, and T. Mudge. A Performance Comparison of Contemporary DRAM Architectures. Proc. of the 26th Inter. Symp. on Comp. Arch., pages 222--233, 1999. Google ScholarDigital Library
- S. Eyerman and L. Eeckhout. System-Level Performance Metrics for Multiprogram Workloads. IEEE Micro, 28(3):42--53, 2008. Google ScholarDigital Library
- K. I. Farkas and N. P. Jouppi. Complexity/Performance Tradeoffs with Non-Blocking Loads. ISCA '94: Proc. of the 21st An. Int. Symp. on Comp. Arch.,pages 211--222, 1994. Google ScholarDigital Library
- R. Gabor, S. Weiss, and A. Mendelson. Fairness and throughput in switch on event multithreading. MICRO 39: Proc. of the 39th Int. Symp. on Microarchitecture,pages 149--160, 2006. Google ScholarDigital Library
- P. Goyal, H. M. Vin, and H. Chen. Start-time Fair Queueing: A Scheduling Algorithm for Integrated Services Packet Switching Networks. SIGCOMM '96: Conf. Proc. on App., Tech., Arch., and Protocols for Comp. Com., pages 157--168, 1996. Google ScholarDigital Library
- F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A Framework for Providing Quality of Service in Chip Multi-Processors. MICRO 40: Proc. of the 40th An. IEEE/ACM Int. Symp. on Microarchitecture, 2007. Google ScholarDigital Library
- L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource. PACT '06: Proc. of the 15th Int. Conf. on Parallel Arch. and Comp. Tech., pages 13--22, 2006. Google ScholarDigital Library
- R. Iyer. CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms. ICS '04: Proceedings of the 18th An. Int. Conf. on Supercomputing, pages 257--266, 2004. Google ScholarDigital Library
- R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS Policies and Architecture for Cache/Memory in CMP Platforms. SIGMETRICS '07: Proc. of the 2007 ACM SIGMETRICS Int. Conf. on Measurement and Modeling of Comp. Sys., pages 25--36, 2007. Google ScholarDigital Library
- JEDEC Solid State Technology Association. DDR2 SDRAM Specification, May 2006.Google Scholar
- S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. PACT '04: Proc. of the 13th Int. Conf. on Parallel Architectures and Compilation Techniques, pages 111--122, 2004. Google ScholarDigital Library
- D. Kroft. Lockup-free Instruction Fetch/Prefetch Cache Organization. ISCA '81: Proc. of the 8th An. Symp. on Comp. Arch., pages 81--87, 1981. Google ScholarDigital Library
- K. Luo, J. Gummaraju, and M. Franklin. Balancing Throughput and Fairness in SMT Processors. ISPASS, 2001.Google Scholar
- O. Mutlu and T. Moscibroda. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. MICRO 40: Proc. of the 40th Annual IEEE/ACM Int. Symp. on Microarchitecture, 2007. Google ScholarDigital Library
- O. Mutlu and T. Moscibroda. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems. ISCA '08: Proc. of the 35th An. Int. Symp. on Comp. Arch., pages 63--74, 2008. Google ScholarDigital Library
- K. Nesbit, M. Moreto, F. Cazorla, A. Ramirez, M. Valero, and J. Smith. Multicore Resource Management. IEEE Micro, 28(3):6--16, 2008. Google ScholarDigital Library
- K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair Queuing Memory Systems. MICRO 39: Proc. of the 39th An. IEEE/ACM Int. Symp. on Microarch., pages 208--222, 2006. Google ScholarDigital Library
- K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. ISCA '07: Proc. of the 34th An. Int. Symp. on Comp. Arch., pages 57--68, 2007. Google ScholarDigital Library
- N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural Support for Operating System-driven CMP Cache Management. PACT '06: Proc. of the 15th Int. Conf. on Parallel Architectures and Compilation Techniques, pages 2---12, 2006. Google ScholarDigital Library
- N. Rafique, W.-T. Lim, and M. Thottethodi. Effective Management of DRAM Bandwidth in Multicore Processors. PACT '07: Proc. of the 16th Int. Conf. on Parallel Architecture and Compilation Techniques (PACT 2007), pages 245--258, 2007. Google ScholarDigital Library
- S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. ISCA '00: Proc. of the 27th An. Int. Symp. on Comp. Arch., pages 128--138, 2000. Google ScholarDigital Library
- A. Snavely and D. M. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreading Processor. Arch. Support for Programming Languages and Operating Systems, pages 234--244, 2000. Google ScholarDigital Library
- SPEC CPU 2000 Web Page. http://www.spec.org/cpu2000/.Google Scholar
- D. Tarjan, S. Thoziyoor, and N. P. Jouppi. CACTI 4.0 Technical Report. 2006.Google Scholar
- L. Zhao, R. Iyer, R. Illikkal, J. Moses, S. Makineni, and D. Newell. CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms. PACT '07: Proc. of the 16th Int. Conf. on Parallel Arch. and Comp. Tech., pages 339--352, 2007. Google ScholarDigital Library
Index Terms
- A light-weight fairness mechanism for chip multiprocessor memory systems
Recommendations
Dynamic Fair Cache Partitioning for Chip Multiprocessor
CSO '10: Proceedings of the 2010 Third International Joint Conference on Computational Science and Optimization - Volume 02Fairness is a critical issue because of some serious problems, such as thread starvation and priority inversion, it can arise and render the Operating System (OS) scheduler ineffective if no fair cache sharing which provided by the hardware. In order to ...
Comments