skip to main content
article

Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor

Published:01 May 2003Publication History
Skip Abstract Section

Abstract

A Multiple Clock Domain (MCD) processor addresses the challenges of clock distribution and power dissipation by dividing a chip into several (coarse-grained) clock domains, allowing frequency and voltage to be reduced in domains that are not currently on the application's critical path. Given a reconfiguration mechanism capable of choosing appropriate times and values for voltage/frequency scaling, an MCD processor has the potential to achieve significant energy savings with low performance degradation.Early work on MCD processors evaluated the potential for energy savings by manually inserting reconfiguration instructions into applications, or by employing an oracle driven by off-line analysis of (identical) prior program runs. Subsequent work developed a hardware-based on-line mechanism that averages 75--85% of the energy-delay improvement achieved via off-line analysis.In this paper we consider the automatic insertion of reconfiguration instructions into applications, using profile-driven binary rewriting. Profile-based reconfiguration introduces the need for "training runs" prior to production use of a given application, but avoids the hardware complexity of on-line reconfiguration. It also has the potential to yield significantly greater energy savings. Experimental results (training on small data sets and then running on larger, alternative data sets) indicate that the profile-driven approach is more stable than hardware-based reconfiguration, and yields virtually all of the energy-delay improvement achieved via off-line analysis.

References

  1. G. Ammons, T. Ball, and J. R. Larus. Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling. In Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 85--96, June 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Balasubramonian, D. H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, pages 245--257, Dec. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Ball and J. R. Larus. Efficient Path Profiling. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, pages 46--57, Dec. 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In Proceedings of the 27th International Symposium on Computer Architecture, June 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report CS-TR-97-1342, Computer Science Department, University of Wisconsin, June 1997.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Buyuktosunoglu, S. Schuster, D. Brooks, P. Bose, P. Cook, and D. H. Albonesi. An Adaptive Issue Queue for Reduced Power at High Performance. In Proceedings of the Workshop on Power-Aware Computer Systems, in conjunction with ASPLOS-IX, Nov. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Casmira and D. Grunwald. Dynamic Instruction Scheduling Slack. In Proceedings of the Kool Chips Workshop, in conjunction with MICRO-33, Dec. 2000.]]Google ScholarGoogle Scholar
  8. P. P. Chang. Trace Selection for Compiling Large C Application Programs to Microcode. In Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture (MICRO 21), pages 21--29, Nov. 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. M. Chapiro. Globally Asynchronous Locally Synchronous Systems. PhD thesis, Stanford University, 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. R. Childers, H. Tang, and R. Melhem. Adapting Processor Supply Voltage to Instruction-Level Parallelism. In Proceedings of the Kool Chips Workshop, in conjunction with MICRO-34, Dec. 2001.]]Google ScholarGoogle Scholar
  11. L. T. Clark. Circuit Design of XScale™ Microprocessors. In 2001 Symposium on VLSI Circuits, Short Course on Physical Design for Low-Power and High-Performance Microprocessor Circuits, June 2001.]]Google ScholarGoogle Scholar
  12. J. R. Ellis. A Compiler for VLIW Architectures. Technical Report YALEU/DCS/RR-364, Yale University, Department of Computer Science, Feb. 1985.]]Google ScholarGoogle Scholar
  13. A. Eustace and A. Srivastava. ATOM: A Flexible Interface for Building High Performance Program Analysis Tools. In Proceedings of the USENIX 1995 Technical Conference, Jan. 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Fields, R. Bodík, and M. D. Hill. Slack: Maximizing Performance Under Technological Constraints. In Proceedings of the 29th International Symposium on Computer Architecture, pages 47--58, May 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Fields, S. Rubin, and R. Bodík. Focusing Processor Policies via Critical-Path Prediction. In Proceedings of the 28th International Symposium on Computer Architecture, July 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers, 30(7):478--490, July 1981.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Fleischmann. Crusoe Power Management -- Reducing the Operating Power with LongRun. In Proceedings of the HOT CHIPS Symposium XII, Aug. 2000.]]Google ScholarGoogle Scholar
  18. T. R. Halfhill. Transmeta breaks x86 low power barrier. Microprocessor Report, 14(2), Feb. 2000.]]Google ScholarGoogle Scholar
  19. J. L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium. Computer, pages 28--35, July 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C.-H. Hsu, U. Kremer, and M. Hsiao. Compiler-Directed Dynamic Frequency and Voltage Scaling. In Proceedings of the Workshop on Power-Aware Computer Systems, in conjunction with ASPLOS-IX, Nov. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Huang, J. Renau, and J. Torrellas. Profile-Based Energy Reduction in High-Performance Processors. In Proceedings of the 4th Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), Dec. 2001.]]Google ScholarGoogle Scholar
  22. G. C. Hunt and M. L. Scott. The Coign Automatic Distributed Partitioning System. In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation, Feb. 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Iyer and D. Marculescu. Power and Performance Evaluation of Globally Asynchronous Locally Synchronous Processors. In Proceedings of the 29th International Symposium on Computer Architecture, May 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Lee, M. Potkonjak, and W. H. Mangione-Smith. Mediabench: a Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture, pages 330--335, Dec. 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Leibson. XScale (StrongArm-2) Muscles In. Microprocessor Report, 14(9):7--12, Sept. 2000.]]Google ScholarGoogle Scholar
  26. D. Marculescu. On the Use of Microarchitecture-Driven Dynamic Voltage Scaling. In Proceedings of the Workshop on Complexity-Effective Design, in conjunction with ISCA-27, June 2000.]]Google ScholarGoogle Scholar
  27. Intel Corp. Datasheet: Intel® Pentium®4 Processor with 512-KB L2 cache on 0.13 Micron Process at 2 GHz--3.06 GHz. Available at http://www.intel.com/design/pentium4/-datashts/298643.htm, Nov. 2002.]]Google ScholarGoogle Scholar
  28. R. Pyreddy and G. Tyson. Evaluating Design Tradeoffs in Dual Speed Pipelines. In Proceedings of the Workshop on Complexity-Effective Design, in conjunction with ISCA-28, June 2001.]]Google ScholarGoogle Scholar
  29. G. Semeraro, D. H. Albonesi, S. G. Dropsho, G. Magklis, S. Dwarkadas, and M. L. Scott. Dynamic Frequency and Voltage Control for a Multiple Clock Domain Microarchitecture. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture, Nov. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Semeraro, G. Magklis, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, and M. L. Scott. Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture, Feb. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. E. Sjogren and C. J. Myers. Interfacing Synchronous and Asynchronous Modules Within A High-Speed Pipeline. In Proceedings of the 17th Conference on Advanced Research in VLSI, pages 47--61, Sept. 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Weiser, A. Demers, B. Welch, and S. Shenker. Scheduling for Reduced CPU Energy. In Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation, Nov. 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Young and M. D. Smith. Improving the Accuracy of Static Branch Prediction Using Branch Correlation. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 232--241, Oct. 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 31, Issue 2
    ISCA 2003
    May 2003
    422 pages
    ISSN:0163-5964
    DOI:10.1145/871656
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture
      June 2003
      432 pages
      ISBN:0769519458
      DOI:10.1145/859618
      • Conference Chair:
      • Allan Gottlieb,
      • Program Chair:
      • Kai Li

    Copyright © 2003 Authors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 May 2003

    Check for updates

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader