skip to main content
10.1145/1531743.1531747acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

A light-weight fairness mechanism for chip multiprocessor memory systems

Authors Info & Claims
Published:18 May 2009Publication History

ABSTRACT

Chip Multiprocessor (CMP) memory systems suffer from the effects of destructive thread interference. This interference reduces performance predictability because it depends heavily on the memory access pattern and intensity of the co-scheduled threads. In this work, we confirm that all shared units must be thread-aware in order to provide memory system fairness. However, the current proposals for fair memory systems are complex as they require an interference measurement mechanism and a fairness enforcement policy for all hardware-controlled shared units. Furthermore, they often sacrifice system throughput to reach their fairness goals which is not desirable in all systems.

In this work, we show that our novel fairness mechanism, called the Dynamic Miss Handling Architecture (DMHA), is able to reduce implementation complexity by using a single fairness enforcement policy for the complete hardware-managed shared memory system. Specifically, it controls the total miss bandwidth available to each thread by dynamically manipulating the number of Miss Status Holding Registers (MSHRs) available in each private data cache. When fairness is chosen as the metric of interest and we compare to a state-of-the-art fairness-aware memory system, DMHA improves fairness by 26% on average with the single program baseline. With a different configuration, DMHA improves throughput by 13% on average compared to a conventional memory system.

References

  1. N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K.Reinhardt. The M5 Simulator: Modeling Networked Systems. IEEE Micro, 26(4):52--60, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Bitirgen, E. Ipek, and J. F. Martinez. Coordinated Management of Multiple Resources in Chip Multiprocessors: A Machine Learning Approach. MICRO 41: Proc. of the 41th IEEE/ACM Int. Symp. on Microarchitecture, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Chang and G. S. Sohi. Cooperative Cache Partitioning for Chip Multiprocessors. ICS '07: Proc. of the 21st Annual Int. Conf. on Supercomputing, pages 242--252, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Cuppu, B. Jacob, B. Davis, and T. Mudge. A Performance Comparison of Contemporary DRAM Architectures. Proc. of the 26th Inter. Symp. on Comp. Arch., pages 222--233, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Eyerman and L. Eeckhout. System-Level Performance Metrics for Multiprogram Workloads. IEEE Micro, 28(3):42--53, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. I. Farkas and N. P. Jouppi. Complexity/Performance Tradeoffs with Non-Blocking Loads. ISCA '94: Proc. of the 21st An. Int. Symp. on Comp. Arch.,pages 211--222, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Gabor, S. Weiss, and A. Mendelson. Fairness and throughput in switch on event multithreading. MICRO 39: Proc. of the 39th Int. Symp. on Microarchitecture,pages 149--160, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Goyal, H. M. Vin, and H. Chen. Start-time Fair Queueing: A Scheduling Algorithm for Integrated Services Packet Switching Networks. SIGCOMM '96: Conf. Proc. on App., Tech., Arch., and Protocols for Comp. Com., pages 157--168, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A Framework for Providing Quality of Service in Chip Multi-Processors. MICRO 40: Proc. of the 40th An. IEEE/ACM Int. Symp. on Microarchitecture, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource. PACT '06: Proc. of the 15th Int. Conf. on Parallel Arch. and Comp. Tech., pages 13--22, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Iyer. CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms. ICS '04: Proceedings of the 18th An. Int. Conf. on Supercomputing, pages 257--266, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS Policies and Architecture for Cache/Memory in CMP Platforms. SIGMETRICS '07: Proc. of the 2007 ACM SIGMETRICS Int. Conf. on Measurement and Modeling of Comp. Sys., pages 25--36, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. JEDEC Solid State Technology Association. DDR2 SDRAM Specification, May 2006.Google ScholarGoogle Scholar
  14. S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. PACT '04: Proc. of the 13th Int. Conf. on Parallel Architectures and Compilation Techniques, pages 111--122, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Kroft. Lockup-free Instruction Fetch/Prefetch Cache Organization. ISCA '81: Proc. of the 8th An. Symp. on Comp. Arch., pages 81--87, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Luo, J. Gummaraju, and M. Franklin. Balancing Throughput and Fairness in SMT Processors. ISPASS, 2001.Google ScholarGoogle Scholar
  17. O. Mutlu and T. Moscibroda. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. MICRO 40: Proc. of the 40th Annual IEEE/ACM Int. Symp. on Microarchitecture, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Mutlu and T. Moscibroda. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems. ISCA '08: Proc. of the 35th An. Int. Symp. on Comp. Arch., pages 63--74, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Nesbit, M. Moreto, F. Cazorla, A. Ramirez, M. Valero, and J. Smith. Multicore Resource Management. IEEE Micro, 28(3):6--16, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair Queuing Memory Systems. MICRO 39: Proc. of the 39th An. IEEE/ACM Int. Symp. on Microarch., pages 208--222, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. ISCA '07: Proc. of the 34th An. Int. Symp. on Comp. Arch., pages 57--68, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural Support for Operating System-driven CMP Cache Management. PACT '06: Proc. of the 15th Int. Conf. on Parallel Architectures and Compilation Techniques, pages 2---12, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Rafique, W.-T. Lim, and M. Thottethodi. Effective Management of DRAM Bandwidth in Multicore Processors. PACT '07: Proc. of the 16th Int. Conf. on Parallel Architecture and Compilation Techniques (PACT 2007), pages 245--258, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. ISCA '00: Proc. of the 27th An. Int. Symp. on Comp. Arch., pages 128--138, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Snavely and D. M. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreading Processor. Arch. Support for Programming Languages and Operating Systems, pages 234--244, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. SPEC CPU 2000 Web Page. http://www.spec.org/cpu2000/.Google ScholarGoogle Scholar
  27. D. Tarjan, S. Thoziyoor, and N. P. Jouppi. CACTI 4.0 Technical Report. 2006.Google ScholarGoogle Scholar
  28. L. Zhao, R. Iyer, R. Illikkal, J. Moses, S. Makineni, and D. Newell. CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms. PACT '07: Proc. of the 16th Int. Conf. on Parallel Arch. and Comp. Tech., pages 339--352, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A light-weight fairness mechanism for chip multiprocessor memory systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CF '09: Proceedings of the 6th ACM conference on Computing frontiers
      May 2009
      238 pages
      ISBN:9781605584133
      DOI:10.1145/1531743

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 May 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CF '09 Paper Acceptance Rate26of113submissions,23%Overall Acceptance Rate240of680submissions,35%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader