Abstract
A Multiple Clock Domain (MCD) processor addresses the challenges of clock distribution and power dissipation by dividing a chip into several (coarse-grained) clock domains, allowing frequency and voltage to be reduced in domains that are not currently on the application's critical path. Given a reconfiguration mechanism capable of choosing appropriate times and values for voltage/frequency scaling, an MCD processor has the potential to achieve significant energy savings with low performance degradation.Early work on MCD processors evaluated the potential for energy savings by manually inserting reconfiguration instructions into applications, or by employing an oracle driven by off-line analysis of (identical) prior program runs. Subsequent work developed a hardware-based on-line mechanism that averages 75--85% of the energy-delay improvement achieved via off-line analysis.In this paper we consider the automatic insertion of reconfiguration instructions into applications, using profile-driven binary rewriting. Profile-based reconfiguration introduces the need for "training runs" prior to production use of a given application, but avoids the hardware complexity of on-line reconfiguration. It also has the potential to yield significantly greater energy savings. Experimental results (training on small data sets and then running on larger, alternative data sets) indicate that the profile-driven approach is more stable than hardware-based reconfiguration, and yields virtually all of the energy-delay improvement achieved via off-line analysis.
- G. Ammons, T. Ball, and J. R. Larus. Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling. In Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 85--96, June 1997.]] Google ScholarDigital Library
- R. Balasubramonian, D. H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, pages 245--257, Dec. 2000.]] Google ScholarDigital Library
- T. Ball and J. R. Larus. Efficient Path Profiling. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, pages 46--57, Dec. 1996.]] Google ScholarDigital Library
- D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In Proceedings of the 27th International Symposium on Computer Architecture, June 2000.]] Google ScholarDigital Library
- D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report CS-TR-97-1342, Computer Science Department, University of Wisconsin, June 1997.]]Google ScholarDigital Library
- A. Buyuktosunoglu, S. Schuster, D. Brooks, P. Bose, P. Cook, and D. H. Albonesi. An Adaptive Issue Queue for Reduced Power at High Performance. In Proceedings of the Workshop on Power-Aware Computer Systems, in conjunction with ASPLOS-IX, Nov. 2000.]] Google ScholarDigital Library
- J. Casmira and D. Grunwald. Dynamic Instruction Scheduling Slack. In Proceedings of the Kool Chips Workshop, in conjunction with MICRO-33, Dec. 2000.]]Google Scholar
- P. P. Chang. Trace Selection for Compiling Large C Application Programs to Microcode. In Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture (MICRO 21), pages 21--29, Nov. 1988.]] Google ScholarDigital Library
- D. M. Chapiro. Globally Asynchronous Locally Synchronous Systems. PhD thesis, Stanford University, 1984.]] Google ScholarDigital Library
- B. R. Childers, H. Tang, and R. Melhem. Adapting Processor Supply Voltage to Instruction-Level Parallelism. In Proceedings of the Kool Chips Workshop, in conjunction with MICRO-34, Dec. 2001.]]Google Scholar
- L. T. Clark. Circuit Design of XScale™ Microprocessors. In 2001 Symposium on VLSI Circuits, Short Course on Physical Design for Low-Power and High-Performance Microprocessor Circuits, June 2001.]]Google Scholar
- J. R. Ellis. A Compiler for VLIW Architectures. Technical Report YALEU/DCS/RR-364, Yale University, Department of Computer Science, Feb. 1985.]]Google Scholar
- A. Eustace and A. Srivastava. ATOM: A Flexible Interface for Building High Performance Program Analysis Tools. In Proceedings of the USENIX 1995 Technical Conference, Jan. 1995.]] Google ScholarDigital Library
- B. Fields, R. Bodík, and M. D. Hill. Slack: Maximizing Performance Under Technological Constraints. In Proceedings of the 29th International Symposium on Computer Architecture, pages 47--58, May 2002.]] Google ScholarDigital Library
- B. Fields, S. Rubin, and R. Bodík. Focusing Processor Policies via Critical-Path Prediction. In Proceedings of the 28th International Symposium on Computer Architecture, July 2001.]] Google ScholarDigital Library
- J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers, 30(7):478--490, July 1981.]]Google ScholarDigital Library
- M. Fleischmann. Crusoe Power Management -- Reducing the Operating Power with LongRun. In Proceedings of the HOT CHIPS Symposium XII, Aug. 2000.]]Google Scholar
- T. R. Halfhill. Transmeta breaks x86 low power barrier. Microprocessor Report, 14(2), Feb. 2000.]]Google Scholar
- J. L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium. Computer, pages 28--35, July 2000.]] Google ScholarDigital Library
- C.-H. Hsu, U. Kremer, and M. Hsiao. Compiler-Directed Dynamic Frequency and Voltage Scaling. In Proceedings of the Workshop on Power-Aware Computer Systems, in conjunction with ASPLOS-IX, Nov. 2000.]] Google ScholarDigital Library
- M. Huang, J. Renau, and J. Torrellas. Profile-Based Energy Reduction in High-Performance Processors. In Proceedings of the 4th Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), Dec. 2001.]]Google Scholar
- G. C. Hunt and M. L. Scott. The Coign Automatic Distributed Partitioning System. In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation, Feb. 1999.]] Google ScholarDigital Library
- A. Iyer and D. Marculescu. Power and Performance Evaluation of Globally Asynchronous Locally Synchronous Processors. In Proceedings of the 29th International Symposium on Computer Architecture, May 2002.]] Google ScholarDigital Library
- C. Lee, M. Potkonjak, and W. H. Mangione-Smith. Mediabench: a Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture, pages 330--335, Dec. 1997.]] Google ScholarDigital Library
- S. Leibson. XScale (StrongArm-2) Muscles In. Microprocessor Report, 14(9):7--12, Sept. 2000.]]Google Scholar
- D. Marculescu. On the Use of Microarchitecture-Driven Dynamic Voltage Scaling. In Proceedings of the Workshop on Complexity-Effective Design, in conjunction with ISCA-27, June 2000.]]Google Scholar
- Intel Corp. Datasheet: Intel® Pentium®4 Processor with 512-KB L2 cache on 0.13 Micron Process at 2 GHz--3.06 GHz. Available at http://www.intel.com/design/pentium4/-datashts/298643.htm, Nov. 2002.]]Google Scholar
- R. Pyreddy and G. Tyson. Evaluating Design Tradeoffs in Dual Speed Pipelines. In Proceedings of the Workshop on Complexity-Effective Design, in conjunction with ISCA-28, June 2001.]]Google Scholar
- G. Semeraro, D. H. Albonesi, S. G. Dropsho, G. Magklis, S. Dwarkadas, and M. L. Scott. Dynamic Frequency and Voltage Control for a Multiple Clock Domain Microarchitecture. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture, Nov. 2002.]] Google ScholarDigital Library
- G. Semeraro, G. Magklis, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, and M. L. Scott. Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture, Feb. 2002.]] Google ScholarDigital Library
- A. E. Sjogren and C. J. Myers. Interfacing Synchronous and Asynchronous Modules Within A High-Speed Pipeline. In Proceedings of the 17th Conference on Advanced Research in VLSI, pages 47--61, Sept. 1997.]] Google ScholarDigital Library
- M. Weiser, A. Demers, B. Welch, and S. Shenker. Scheduling for Reduced CPU Energy. In Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation, Nov. 1994.]] Google ScholarDigital Library
- C. Young and M. D. Smith. Improving the Accuracy of Static Branch Prediction Using Branch Correlation. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 232--241, Oct. 1994.]] Google ScholarDigital Library
Recommendations
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor
ISCA '03: Proceedings of the 30th annual international symposium on Computer architectureA Multiple Clock Domain (MCD) processor addresses the challenges of clock distribution and power dissipation by dividing a chip into several (coarse-grained) clock domains, allowing frequency and voltage to be reduced in domains that are not currently ...
Dynamic frequency and voltage control for a multiple clock domain microarchitecture
MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on MicroarchitectureWe describe the design, analysis, and performance of an on--line algorithm to dynamically control the frequency/voltage of a Multiple Clock Domain (MCD) microarchitecture. The MCD microarchitecture allows the frequency/voltage of microprocessor regions ...
Compiler-directed frequency and voltage scaling for a multiple clock domain microarchitecture
CF '08: Proceedings of the 5th conference on Computing frontiersMultiple Clock Domain processors provide an attractive solution to the increasingly challenging problems of clock distribution and power dissipation. They allow their chips to be partitioned into different clock domains, and each domain's frequency (...
Comments