ABSTRACT
Heterogeneous architectures offer many potential avenues for improving energy efficiency in today's low-power cores. Two common approaches are dynamic voltage/frequency scaling (DVFS) and heterogeneous microarchitectures (HMs). Traditionally both approaches have incurred large switching overheads, which limit their applicability to coarse-grain program phases. However, recent research has demonstrated low-overhead mechanisms that enable switching at granularities as low as 1K instructions. The question remains, in this fine-grained switching regime, which form of heterogeneity offers better energy efficiency for a given level of performance?
The effectiveness of these techniques depend critically on both efficient architectural implementation and accurate scheduling to maximize energy efficiency for a given level of performance. Therefore, we develop PaTH, an offline analysis tool, to compute (near-)optimal schedules, allowing us to determine Pareto-optimal energy savings for a given architecture. We leverage PaTH to study the potential energy efficiency of fine-grained DVFS and HMs, as well as a hybrid approach. We show that HMs achieve higher energy savings than DVFS for a given level of performance. While at a coarse granularity the combination of DVFS and HMs still proves beneficial, for fine-grained scheduling their combination makes little sense as HMs alone provide the bulk of the energy efficiency.
- D. Albonesi, R. Balasubramonian, S. Dropsbo, S. Dwarkadas, E. Friedman, M. Huang, V. Kursun, G. Magklis, M. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. Cook, and S. Schuster, "Dynamically tuning processor resources with adaptive processing," IEEE Computer, vol. 36, no. 12, pp. 49 --58, Dec. 2003. Google ScholarDigital Library
- O. Azizi, A. Mahesri, B. C. Lee, S. J. Patel, and M. Horowitz, "Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis," in Proceedings of the 37th annual international symposium on Computer architecture, 2010, pp. 26--36. Google ScholarDigital Library
- R. Bahar and S. Manne, "Power and energy reduction via pipeline balancing," Proc. of the 28th Annual International Symposium on Computer Architecture, vol. 29, no. 2, pp. 218--229, 2001. Google ScholarDigital Library
- R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, "Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures," in Proc. of the 27th Annual International Symposium on Computer Architecture, 2000, pp. 245--257. Google ScholarDigital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1--7, Aug. 2011. Google ScholarDigital Library
- T. Carlson, W. Heirman, and L. Eeckhout, "Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation," in High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, 2011, pp. 1--12. Google ScholarDigital Library
- R. Dennard, F. Gaensslen, H.-N. Yu, V. LEO RIDEOVT, E. Bassous, and A. R. Leblanc, "Design of ion-implanted mosfet's with very small physical dimensions," Solid-State Circuits Society Newsletter, IEEE, vol. 12, no. 1, pp. 38--50, 2007.Google ScholarCross Ref
- R. Dreslinski, "Near threshold computing: From single core to many-core energy efficient architectures," Ph.D. dissertation, University of Michigan, 2011. Google ScholarDigital Library
- C. Dubach, T. M. Jones, E. V. Bonilla, and M. F. P. O'Boyle, "A predictive model for dynamic microarchitectural adaptivity control," in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO '43, 2010, pp. 485--496. Google ScholarDigital Library
- S. Eyerman and L. Eeckhout, "Fine-grained dvfs using on-chip regulators," ACM Trans. Archit. Code Optim., vol. 8, no. 1, pp. 1:1--1:24, Feb. 2011. Google ScholarDigital Library
- P. Flatresse, G. Cesana, and X. Cauchy, "Planar fully depleted silicon technology to design competitive soc at 28nm and beyond," Feb. 2012, http://www.soiconsortium.org/link-812.php.Google Scholar
- P. Greenhalgh, "Big.little processing with arm cortex-a15 & cortex-a7," Sep. 2011.Google Scholar
- E. Grochowski, R. Ronen, J. Shen, and P. Wang, "Best of both latency and throughput," in Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings. IEEE International Conference on, 2004, pp. 236--243. Google ScholarDigital Library
- M. Horowitz, E. Alon, D. Patil, S. Naffziger, R. Kumar, and K. Bernstein, "Scaling, power, and the future of cmos," in Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, 2005, pp. 7 pp.--15.Google Scholar
- C. Isci, A. Buyuktosunoglu, C. Cher, P. Bose, and M. Martonosi, "An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget," in Proc. of the 39th Annual International Symposium on Microarchitecture, Dec. 2006, pp. 347--358. Google ScholarDigital Library
- T. Ishihara and H. Yasuura, "Voltage scheduling problem for dynamically variable voltage processors," in Proceedings of the 1998 international symposium on Low power electronics and design, ser. ISLPED '98, 1998, pp. 197--202. Google ScholarDigital Library
- ITRS, "International technology roadmap for semiconductors 2012," 2012, http://www.itrs.net/.Google Scholar
- W. Kim, D. Brooks, and G.-Y. Wei, "A fully-integrated 3-level dc-dc converter for nanosecond-scale dvfs," IEEE Journal of Solid-State Circuits, vol. 47, no. 1, pp. 206 --219, Jan. 2012.Google ScholarCross Ref
- W. Kim, M. S. Gupta, G.-Y. Wei, and D. Brooks, "System level analysis of fast, per-core dvfs using on-chip switching regulators," in Proc. of the 14th International Symposium on High-Performance Computer Architecture, 2008, pp. 123--134.Google Scholar
- D. Koufaty, D. Reddy, and S. Hahn, "Bias scheduling in heterogeneous multi-core architectures," in Proc. of the 5th European Conference on Computer Systems, 2010, pp. 125--138. Google ScholarDigital Library
- R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen, "Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction," in Proc. of the 36th Annual International Symposium on Microarchitecture, Dec. 2003, pp. 81--92. Google ScholarDigital Library
- R. Kumar, D. M. Tullsen, and N. P. Jouppi, "Core architecture optimization for heterogeneous chip multiprocessors," in Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques, 2006, pp. 23--32. Google ScholarDigital Library
- H.-P. Le, J. Crossley, S. R. Sanders, and E. Alon, "A sub-ns response fully integrated battery-connected switched-capacitor voltage regulator delivering 0.19 w/mm 2 at 73% efficiency," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. IEEE, 2013, pp. 372--373.Google Scholar
- J. Lee, V. Sathisha, M. Schulte, K. Compton, and N. S. Kim, "Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling," in Proc. of the 20th International Conference on Parallel Architectures and Compilation Techniques, 2011, pp. 111--120. Google ScholarDigital Library
- H. Li, C.-Y. Cher, T. N. Vijaykumar, and K. Roy, "Vsv: L2-miss-driven variable supply-voltage scaling for low power," in Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 36, 2003, pp. 19--28. Google ScholarDigital Library
- A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F. Wenisch, and S. Mahlke, "Composite cores: Pushing heterogeneity into a core," in Proc. of the 45th Annual International Symposium on Microarchitecture, 2012, pp. 317--328. Google ScholarDigital Library
- T. N. Miller, X. Pan, R. Thomas, N. Sedaghati, and R. Teodorescu, "Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips," in Proc. of the 18th International Symposium on High-Performance Computer Architecture, vol. 0, 2012, pp. 1--12. Google ScholarDigital Library
- J. Mogul, J. Mudigonda, N. Binkert, P. Ranganathan, and V. Talwar, "Using asymmetric single-isa cmps to save energy on operating systems," IEEE Micro, vol. 28, no. 3, pp. 26--41, May 2008. Google ScholarDigital Library
- H. Najaf-abadi, N. Choudhary, and E. Rotenberg, "Core-selectability in chip multiprocessors," in Parallel Architectures and Compilation Techniques, 2009. PACT '09. 18th International Conference on, 2009, pp. 113--122. Google ScholarDigital Library
- H. Najaf-abadi and E. Rotenberg, "Architectural contesting," in High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, 2009, pp. 189--200.Google Scholar
- G. Patsilaras, N. K. Choudhary, and J. Tuck, "Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era," ACM Trans. Archit. Code Optim., vol. 8, no. 4, pp. 28:1--28:21, Jan. 2012. Google ScholarDigital Library
- K. K. Rangan, G.-Y. Wei, and D. Brooks, "Thread motion: fine-grained power management for multi-core systems," in Proc. of the 36th Annual International Symposium on Computer Architecture, 2009, pp. 302--313. Google ScholarDigital Library
- D. Shelepov, J. C. Saez Alcaide, S. Jeffery, A. Fedorova, N. Perez, Z. F. Huang, S. Blagodurov, and V. Kumar, "Hass: A scheduler for heterogeneous multicore systems," SIGOPS Oper. Syst. Rev., vol. 43, no. 2, pp. 66--75, Apr. 2009. Google ScholarDigital Library
- L. Sheng, H. A. Jung, R. Strong, J. B. Brockman, D. Tullsen, and N. Jouppi, "Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proc. of the 42nd Annual International Symposium on Microarchitecture, 2009, pp. 469--480. Google ScholarDigital Library
- J. Suh and M. Dubois, "Dynamic mips rate stabilization in out-of-order processors," in Proc. of the 36th Annual International Symposium on Computer Architecture, 2009, pp. 46--56. Google ScholarDigital Library
- K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer, "Scheduling heterogeneous multi-cores through performance impact estimation (pie)," in Proceedings of the 39th International Symposium on Computer Architecture, ser. ISCA '12, 2012, pp. 213--224. Google ScholarDigital Library
- Q. Wu, M. Martonosi, D. W. Clark, V. J. Reddi, D. Connors, Y. Wu, J. Lee, and D. Brooks, "A dynamic compilation framework for controlling microprocessor energy and performance," in Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 38, 2005, pp. 271--282. Google ScholarDigital Library
- F. Xie, M. Martonosi, and S. Malik, "Bounds on power savings using runtime dynamic voltage scaling: an exact algorithm and a linear-time heuristic approximation," in Low Power Electronics and Design, 2005. ISLPED '05. Proceedings of the 2005 International Symposium on, 2005, pp. 287--292. Google ScholarDigital Library
- ______|, "Compile-time dynamic voltage scaling settings: opportunities and limits," in Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, ser. PLDI '03, 2003, pp. 49--62. Google ScholarDigital Library
- _____|, "Efficient behavior-driven runtime dynamic voltage scaling policies," in Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, ser. CODES+ISSS '05, 2005, pp. 105--110. Google ScholarDigital Library
Index Terms
- Heterogeneous microarchitectures trump voltage scaling for low-power cores
Recommendations
Managing power constraints in a single-core scenario through power tokens
Current microprocessors face constant thermal and power-related problems during their everyday use, usually solved by applying a power budget to the processor/core. Dynamic voltage and frequency scaling (DVFS) has been an effective technique that ...
Memory power management via dynamic voltage/frequency scaling
ICAC '11: Proceedings of the 8th ACM international conference on Autonomic computingEnergy efficiency and energy-proportional computing have become a central focus in enterprise server architecture. As thermal and electrical constraints limit system power, and datacenter operators become more conscious of energy costs, energy ...
The limit of dynamic voltage scaling and insomniac dynamic voltage scaling
Dynamic voltage scaling (DVS) is a popular approach for energy reduction of integrated circuits. Current processors that use DVS typically have an operating voltage range from full to half of the maximum Vdd. However, there is no fundamental reason why ...
Comments