skip to main content
10.1145/2628071.2628078acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Heterogeneous microarchitectures trump voltage scaling for low-power cores

Published:24 August 2014Publication History

ABSTRACT

Heterogeneous architectures offer many potential avenues for improving energy efficiency in today's low-power cores. Two common approaches are dynamic voltage/frequency scaling (DVFS) and heterogeneous microarchitectures (HMs). Traditionally both approaches have incurred large switching overheads, which limit their applicability to coarse-grain program phases. However, recent research has demonstrated low-overhead mechanisms that enable switching at granularities as low as 1K instructions. The question remains, in this fine-grained switching regime, which form of heterogeneity offers better energy efficiency for a given level of performance?

The effectiveness of these techniques depend critically on both efficient architectural implementation and accurate scheduling to maximize energy efficiency for a given level of performance. Therefore, we develop PaTH, an offline analysis tool, to compute (near-)optimal schedules, allowing us to determine Pareto-optimal energy savings for a given architecture. We leverage PaTH to study the potential energy efficiency of fine-grained DVFS and HMs, as well as a hybrid approach. We show that HMs achieve higher energy savings than DVFS for a given level of performance. While at a coarse granularity the combination of DVFS and HMs still proves beneficial, for fine-grained scheduling their combination makes little sense as HMs alone provide the bulk of the energy efficiency.

References

  1. D. Albonesi, R. Balasubramonian, S. Dropsbo, S. Dwarkadas, E. Friedman, M. Huang, V. Kursun, G. Magklis, M. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. Cook, and S. Schuster, "Dynamically tuning processor resources with adaptive processing," IEEE Computer, vol. 36, no. 12, pp. 49 --58, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Azizi, A. Mahesri, B. C. Lee, S. J. Patel, and M. Horowitz, "Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis," in Proceedings of the 37th annual international symposium on Computer architecture, 2010, pp. 26--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Bahar and S. Manne, "Power and energy reduction via pipeline balancing," Proc. of the 28th Annual International Symposium on Computer Architecture, vol. 29, no. 2, pp. 218--229, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, "Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures," in Proc. of the 27th Annual International Symposium on Computer Architecture, 2000, pp. 245--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1--7, Aug. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Carlson, W. Heirman, and L. Eeckhout, "Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation," in High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, 2011, pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Dennard, F. Gaensslen, H.-N. Yu, V. LEO RIDEOVT, E. Bassous, and A. R. Leblanc, "Design of ion-implanted mosfet's with very small physical dimensions," Solid-State Circuits Society Newsletter, IEEE, vol. 12, no. 1, pp. 38--50, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. R. Dreslinski, "Near threshold computing: From single core to many-core energy efficient architectures," Ph.D. dissertation, University of Michigan, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Dubach, T. M. Jones, E. V. Bonilla, and M. F. P. O'Boyle, "A predictive model for dynamic microarchitectural adaptivity control," in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO '43, 2010, pp. 485--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Eyerman and L. Eeckhout, "Fine-grained dvfs using on-chip regulators," ACM Trans. Archit. Code Optim., vol. 8, no. 1, pp. 1:1--1:24, Feb. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Flatresse, G. Cesana, and X. Cauchy, "Planar fully depleted silicon technology to design competitive soc at 28nm and beyond," Feb. 2012, http://www.soiconsortium.org/link-812.php.Google ScholarGoogle Scholar
  12. P. Greenhalgh, "Big.little processing with arm cortex-a15 & cortex-a7," Sep. 2011.Google ScholarGoogle Scholar
  13. E. Grochowski, R. Ronen, J. Shen, and P. Wang, "Best of both latency and throughput," in Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings. IEEE International Conference on, 2004, pp. 236--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Horowitz, E. Alon, D. Patil, S. Naffziger, R. Kumar, and K. Bernstein, "Scaling, power, and the future of cmos," in Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, 2005, pp. 7 pp.--15.Google ScholarGoogle Scholar
  15. C. Isci, A. Buyuktosunoglu, C. Cher, P. Bose, and M. Martonosi, "An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget," in Proc. of the 39th Annual International Symposium on Microarchitecture, Dec. 2006, pp. 347--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Ishihara and H. Yasuura, "Voltage scheduling problem for dynamically variable voltage processors," in Proceedings of the 1998 international symposium on Low power electronics and design, ser. ISLPED '98, 1998, pp. 197--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. ITRS, "International technology roadmap for semiconductors 2012," 2012, http://www.itrs.net/.Google ScholarGoogle Scholar
  18. W. Kim, D. Brooks, and G.-Y. Wei, "A fully-integrated 3-level dc-dc converter for nanosecond-scale dvfs," IEEE Journal of Solid-State Circuits, vol. 47, no. 1, pp. 206 --219, Jan. 2012.Google ScholarGoogle ScholarCross RefCross Ref
  19. W. Kim, M. S. Gupta, G.-Y. Wei, and D. Brooks, "System level analysis of fast, per-core dvfs using on-chip switching regulators," in Proc. of the 14th International Symposium on High-Performance Computer Architecture, 2008, pp. 123--134.Google ScholarGoogle Scholar
  20. D. Koufaty, D. Reddy, and S. Hahn, "Bias scheduling in heterogeneous multi-core architectures," in Proc. of the 5th European Conference on Computer Systems, 2010, pp. 125--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen, "Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction," in Proc. of the 36th Annual International Symposium on Microarchitecture, Dec. 2003, pp. 81--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Kumar, D. M. Tullsen, and N. P. Jouppi, "Core architecture optimization for heterogeneous chip multiprocessors," in Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques, 2006, pp. 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H.-P. Le, J. Crossley, S. R. Sanders, and E. Alon, "A sub-ns response fully integrated battery-connected switched-capacitor voltage regulator delivering 0.19 w/mm 2 at 73% efficiency," in Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. IEEE, 2013, pp. 372--373.Google ScholarGoogle Scholar
  24. J. Lee, V. Sathisha, M. Schulte, K. Compton, and N. S. Kim, "Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling," in Proc. of the 20th International Conference on Parallel Architectures and Compilation Techniques, 2011, pp. 111--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Li, C.-Y. Cher, T. N. Vijaykumar, and K. Roy, "Vsv: L2-miss-driven variable supply-voltage scaling for low power," in Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 36, 2003, pp. 19--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F. Wenisch, and S. Mahlke, "Composite cores: Pushing heterogeneity into a core," in Proc. of the 45th Annual International Symposium on Microarchitecture, 2012, pp. 317--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. N. Miller, X. Pan, R. Thomas, N. Sedaghati, and R. Teodorescu, "Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips," in Proc. of the 18th International Symposium on High-Performance Computer Architecture, vol. 0, 2012, pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Mogul, J. Mudigonda, N. Binkert, P. Ranganathan, and V. Talwar, "Using asymmetric single-isa cmps to save energy on operating systems," IEEE Micro, vol. 28, no. 3, pp. 26--41, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Najaf-abadi, N. Choudhary, and E. Rotenberg, "Core-selectability in chip multiprocessors," in Parallel Architectures and Compilation Techniques, 2009. PACT '09. 18th International Conference on, 2009, pp. 113--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Najaf-abadi and E. Rotenberg, "Architectural contesting," in High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, 2009, pp. 189--200.Google ScholarGoogle Scholar
  31. G. Patsilaras, N. K. Choudhary, and J. Tuck, "Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era," ACM Trans. Archit. Code Optim., vol. 8, no. 4, pp. 28:1--28:21, Jan. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. K. Rangan, G.-Y. Wei, and D. Brooks, "Thread motion: fine-grained power management for multi-core systems," in Proc. of the 36th Annual International Symposium on Computer Architecture, 2009, pp. 302--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Shelepov, J. C. Saez Alcaide, S. Jeffery, A. Fedorova, N. Perez, Z. F. Huang, S. Blagodurov, and V. Kumar, "Hass: A scheduler for heterogeneous multicore systems," SIGOPS Oper. Syst. Rev., vol. 43, no. 2, pp. 66--75, Apr. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Sheng, H. A. Jung, R. Strong, J. B. Brockman, D. Tullsen, and N. Jouppi, "Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in Proc. of the 42nd Annual International Symposium on Microarchitecture, 2009, pp. 469--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Suh and M. Dubois, "Dynamic mips rate stabilization in out-of-order processors," in Proc. of the 36th Annual International Symposium on Computer Architecture, 2009, pp. 46--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer, "Scheduling heterogeneous multi-cores through performance impact estimation (pie)," in Proceedings of the 39th International Symposium on Computer Architecture, ser. ISCA '12, 2012, pp. 213--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Q. Wu, M. Martonosi, D. W. Clark, V. J. Reddi, D. Connors, Y. Wu, J. Lee, and D. Brooks, "A dynamic compilation framework for controlling microprocessor energy and performance," in Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 38, 2005, pp. 271--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. F. Xie, M. Martonosi, and S. Malik, "Bounds on power savings using runtime dynamic voltage scaling: an exact algorithm and a linear-time heuristic approximation," in Low Power Electronics and Design, 2005. ISLPED '05. Proceedings of the 2005 International Symposium on, 2005, pp. 287--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. ______|, "Compile-time dynamic voltage scaling settings: opportunities and limits," in Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, ser. PLDI '03, 2003, pp. 49--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. _____|, "Efficient behavior-driven runtime dynamic voltage scaling policies," in Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, ser. CODES+ISSS '05, 2005, pp. 105--110. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Heterogeneous microarchitectures trump voltage scaling for low-power cores

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation
      August 2014
      514 pages
      ISBN:9781450328098
      DOI:10.1145/2628071

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 August 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      PACT '14 Paper Acceptance Rate54of144submissions,38%Overall Acceptance Rate121of471submissions,26%

      Upcoming Conference

      PACT '24
      International Conference on Parallel Architectures and Compilation Techniques
      October 14 - 16, 2024
      Southern California , CA , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader