research-article

Heterogeneous microarchitectures trump voltage scaling for low-power cores

Authors:
Andrew Lukefahr

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Shruti Padmanabha

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Reetuparna Das

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Ronald Dreslinski

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Thomas F. Wenisch

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Scott Mahlke

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilationAugust 2014Pages 237–250https://doi.org/10.1145/2628071.2628078

Published:24 August 2014Publication History

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

Pages 237–250

ABSTRACT

Heterogeneous architectures offer many potential avenues for improving energy efficiency in today's low-power cores. Two common approaches are dynamic voltage/frequency scaling (DVFS) and heterogeneous microarchitectures (HMs). Traditionally both approaches have incurred large switching overheads, which limit their applicability to coarse-grain program phases. However, recent research has demonstrated low-overhead mechanisms that enable switching at granularities as low as 1K instructions. The question remains, in this fine-grained switching regime, which form of heterogeneity offers better energy efficiency for a given level of performance?

The effectiveness of these techniques depend critically on both efficient architectural implementation and accurate scheduling to maximize energy efficiency for a given level of performance. Therefore, we develop PaTH, an offline analysis tool, to compute (near-)optimal schedules, allowing us to determine Pareto-optimal energy savings for a given architecture. We leverage PaTH to study the potential energy efficiency of fine-grained DVFS and HMs, as well as a hybrid approach. We show that HMs achieve higher energy savings than DVFS for a given level of performance. While at a coarse granularity the combination of DVFS and HMs still proves beneficial, for fine-grained scheduling their combination makes little sense as HMs alone provide the bulk of the energy efficiency.

References

D. Albonesi, R. Balasubramonian, S. Dropsbo, S. Dwarkadas, E. Friedman, M. Huang, V. Kursun, G. Magklis, M. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. Cook, and S. Schuster, "Dynamically tuning processor resources with adaptive processing," IEEE Computer, vol. 36, no. 12, pp. 49 --58, Dec. 2003. Google ScholarDigital Library
O. Azizi, A. Mahesri, B. C. Lee, S. J. Patel, and M. Horowitz, "Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis," in Proceedings of the 37th annual international symposium on Computer architecture, 2010, pp. 26--36. Google ScholarDigital Library
R. Bahar and S. Manne, "Power and energy reduction via pipeline balancing," Proc. of the 28th Annual International Symposium on Computer Architecture, vol. 29, no. 2, pp. 218--229, 2001. Google ScholarDigital Library
R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, "Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures," in Proc. of the 27th Annual International Symposium on Computer Architecture, 2000, pp. 245--257. Google ScholarDigital Library
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1--7, Aug. 2011. Google ScholarDigital Library
T. Carlson, W. Heirman, and L. Eeckhout, "Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation," in High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, 2011, pp. 1--12. Google ScholarDigital Library
R. Dennard, F. Gaensslen, H.-N. Yu, V. LEO RIDEOVT, E. Bassous, and A. R. Leblanc, "Design of ion-implanted mosfet's with very small physical dimensions," Solid-State Circuits Society Newsletter, IEEE, vol. 12, no. 1, pp. 38--50, 2007.Google ScholarCross Ref
R. Dreslinski, "Near threshold computing: From single core to many-core energy efficient architectures," Ph.D. dissertation, University of Michigan, 2011. Google ScholarDigital Library
C. Dubach, T. M. Jones, E. V. Bonilla, and M. F. P. O'Boyle, "A predictive model for dynamic microarchitectural adaptivity control," in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO '43, 2010, pp. 485--496. Google ScholarDigital Library
S. Eyerman and L. Eeckhout, "Fine-grained dvfs using on-chip regulators," ACM Trans. Archit. Code Optim., vol. 8, no. 1, pp. 1:1--1:24, Feb. 2011. Google ScholarDigital Library
P. Flatresse, G. Cesana, and X. Cauchy, "Planar fully depleted silicon technology to design competitive soc at 28nm and beyond," Feb. 2012, http://www.soiconsortium.org/link-812.php.Google Scholar
P. Greenhalgh, "Big.little processing with arm cortex-a15 & cortex-a7," Sep. 2011.Google Scholar
E. Grochowski, R. Ronen, J. Shen, and P. Wang, "Best of both latency and throughput," in Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings. IEEE International Conference on, 2004, pp. 236--243. Google ScholarDigital Library
M. Horowitz, E. Alon, D. Patil, S. Naffziger, R. Kumar, and K. Bernstein, "Scaling, power, and the future of cmos," in Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, 2005, pp. 7 pp.--15.Google Scholar
C. Isci, A. Buyuktosunoglu, C. Cher, P. Bose, and M. Martonosi, "An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget," in Proc. of the 39th Annual International Symposium on Microarchitecture, Dec. 2006, pp. 347--358. Google ScholarDigital Library
T. Ishihara and H. Yasuura, "Voltage scheduling problem for dynamically variable voltage processors," in Proceedings of the 1998 international symposium on Low power electronics and design, ser. ISLPED '98, 1998, pp. 197--202. Google ScholarDigital Library
ITRS, "International technology roadmap for semiconductors 2012," 2012, http://www.itrs.net/.Google Scholar
W. Kim, D. Brooks, and G.-Y. Wei, "A fully-integrated 3-level dc-dc converter for nanosecond-scale dvfs," IEEE Journal of Solid-State Circuits, vol. 47, no. 1, pp. 206 --219, Jan. 2012.Google ScholarCross Ref
W. Kim, M. S. Gupta, G.-Y. Wei, and D. Brooks, "System level analysis of fast, per-core dvfs using on-chip switching regulators," in Proc. of the 14th International Symposium on High-Performance Computer Architecture, 2008, pp. 123--134.Google Scholar
D. Koufaty, D. Reddy, and S. Hahn, "Bias scheduling in heterogeneous multi-core architectures," in Proc. of the 5th European Conference on Computer Systems, 2010, pp. 125--138. Google ScholarDigital Library
R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen, "Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction," in Proc. of the 36th Annual International Symposium on Microarchitecture, Dec. 2003, pp. 81--92. Google ScholarDigital Library
R. Kumar, D. M. Tullsen, and N. P. Jouppi, "Core architecture optimization for heterogeneous chip multiprocessors," in Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques, 2006, pp. 23--32. Google ScholarDigital Library
H.-P. Le, J. Crossley, S. R. Sanders, and E. Alon, "A sub-ns response fully integrated battery-connected switched-capacitor voltage regulator delivering 0.19 w/mm 2 at 73% efficiency," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. IEEE, 2013, pp. 372--373.Google Scholar
J. Lee, V. Sathisha, M. Schulte, K. Compton, and N. S. Kim, "Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling," in Proc. of the 20th International Conference on Parallel Architectures and Compilation Techniques, 2011, pp. 111--120. Google ScholarDigital Library
H. Li, C.-Y. Cher, T. N. Vijaykumar, and K. Roy, "Vsv: L2-miss-driven variable supply-voltage scaling for low power," in Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 36, 2003, pp. 19--28. Google ScholarDigital Library
A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F. Wenisch, and S. Mahlke, "Composite cores: Pushing heterogeneity into a core," in Proc. of the 45th Annual International Symposium on Microarchitecture, 2012, pp. 317--328. Google ScholarDigital Library
T. N. Miller, X. Pan, R. Thomas, N. Sedaghati, and R. Teodorescu, "Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips," in Proc. of the 18th International Symposium on High-Performance Computer Architecture, vol. 0, 2012, pp. 1--12. Google ScholarDigital Library
J. Mogul, J. Mudigonda, N. Binkert, P. Ranganathan, and V. Talwar, "Using asymmetric single-isa cmps to save energy on operating systems," IEEE Micro, vol. 28, no. 3, pp. 26--41, May 2008. Google ScholarDigital Library
H. Najaf-abadi, N. Choudhary, and E. Rotenberg, "Core-selectability in chip multiprocessors," in Parallel Architectures and Compilation Techniques, 2009. PACT '09. 18th International Conference on, 2009, pp. 113--122. Google ScholarDigital Library
H. Najaf-abadi and E. Rotenberg, "Architectural contesting," in High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, 2009, pp. 189--200.Google Scholar
G. Patsilaras, N. K. Choudhary, and J. Tuck, "Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era," ACM Trans. Archit. Code Optim., vol. 8, no. 4, pp. 28:1--28:21, Jan. 2012. Google ScholarDigital Library
K. K. Rangan, G.-Y. Wei, and D. Brooks, "Thread motion: fine-grained power management for multi-core systems," in Proc. of the 36th Annual International Symposium on Computer Architecture, 2009, pp. 302--313. Google ScholarDigital Library
D. Shelepov, J. C. Saez Alcaide, S. Jeffery, A. Fedorova, N. Perez, Z. F. Huang, S. Blagodurov, and V. Kumar, "Hass: A scheduler for heterogeneous multicore systems," SIGOPS Oper. Syst. Rev., vol. 43, no. 2, pp. 66--75, Apr. 2009. Google ScholarDigital Library
L. Sheng, H. A. Jung, R. Strong, J. B. Brockman, D. Tullsen, and N. Jouppi, "Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proc. of the 42nd Annual International Symposium on Microarchitecture, 2009, pp. 469--480. Google ScholarDigital Library
J. Suh and M. Dubois, "Dynamic mips rate stabilization in out-of-order processors," in Proc. of the 36th Annual International Symposium on Computer Architecture, 2009, pp. 46--56. Google ScholarDigital Library
K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer, "Scheduling heterogeneous multi-cores through performance impact estimation (pie)," in Proceedings of the 39th International Symposium on Computer Architecture, ser. ISCA '12, 2012, pp. 213--224. Google ScholarDigital Library
Q. Wu, M. Martonosi, D. W. Clark, V. J. Reddi, D. Connors, Y. Wu, J. Lee, and D. Brooks, "A dynamic compilation framework for controlling microprocessor energy and performance," in Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 38, 2005, pp. 271--282. Google ScholarDigital Library
F. Xie, M. Martonosi, and S. Malik, "Bounds on power savings using runtime dynamic voltage scaling: an exact algorithm and a linear-time heuristic approximation," in Low Power Electronics and Design, 2005. ISLPED '05. Proceedings of the 2005 International Symposium on, 2005, pp. 287--292. Google ScholarDigital Library
______|, "Compile-time dynamic voltage scaling settings: opportunities and limits," in Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, ser. PLDI '03, 2003, pp. 49--62. Google ScholarDigital Library
_____|, "Efficient behavior-driven runtime dynamic voltage scaling policies," in Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, ser. CODES+ISSS '05, 2005, pp. 105--110. Google ScholarDigital Library

Index Terms

Heterogeneous microarchitectures trump voltage scaling for low-power cores
1. Computer systems organization
  1. Architectures
    1. Other architectures

Recommendations

Managing power constraints in a single-core scenario through power tokens

Current microprocessors face constant thermal and power-related problems during their everyday use, usually solved by applying a power budget to the processor/core. Dynamic voltage and frequency scaling (DVFS) has been an effective technique that ...
Read More
Memory power management via dynamic voltage/frequency scaling
ICAC '11: Proceedings of the 8th ACM international conference on Autonomic computing

Energy efficiency and energy-proportional computing have become a central focus in enterprise server architecture. As thermal and electrical constraints limit system power, and datacenter operators become more conscious of energy costs, energy ...
Read More
The limit of dynamic voltage scaling and insomniac dynamic voltage scaling

Dynamic voltage scaling (DVS) is a popular approach for energy reduction of integrated circuits. Current processors that use DVS typically have an operating voltage range from full to half of the maximum V_dd. However, there is no fundamental reason why ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation
August 2014
514 pages
ISBN:9781450328098
DOI:10.1145/2628071
General Chair:
J. Nelson Amaral
University of Alberta, Canada
,
Program Chair:
Josep Torrellas
University of Illinois, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dvfs
energy efficiency
fine-grained architectures
heterogeneous multicores
Qualifiers
- research-article
Conference

Acceptance Rates
PACT '14 Paper Acceptance Rate54of144submissions,38%Overall Acceptance Rate121of471submissions,26%
More
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 287
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Heterogeneous microarchitectures trump voltage scaling for low-power cores

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Managing power constraints in a single-core scenario through power tokens

Memory power management via dynamic voltage/frequency scaling

The limit of dynamic voltage scaling and insomniac dynamic voltage scaling