ABSTRACT
The rise of utilization wall limits the number of transistors that can be powered on in a single chip and results in a large region of dark silicon. While such phenomenon has led to disruptive innovation in computation, little work has been done for the Network-on-Chip (NoC) design. NoC not only directly influences the overall multi-core performance, but also consumes a significant portion of the total chip power. In this paper, we first reveal challenges and opportunities of designing power-efficient NoC in the dark silicon era. Then we propose NoC-Sprinting: based on the workload characteristics, it explores fine-grained sprinting that allows a chip to flexibly activate dark cores for instantaneous throughput improvement. In addition, it investigates topological/routing support and thermal-aware floorplanning for the sprinting process. Moreover, it builds an efficient network power-management scheme that can mitigate the dark silicon problems. Experiments on performance, power, and thermal analysis show that NoC-sprinting can provide tremendous speedup, increase sprinting duration, and meanwhile reduce the chip power significantly.
- N. Agarwal, T. Krishna, L.-S. Peh, and N. K. Jha. Garnet: A detailed on-chip network model inside a full-system simulator. In ISPASS, pages 33--42, 2009.Google ScholarCross Ref
- C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, 2011. Google ScholarDigital Library
- N. Binkert et al. The gem5 simulator. ACM SIGARCH Computer Architecture News, 39(2):1--7, 2011. Google ScholarDigital Library
- L. Chen and T. M. Pinkston. Nord: Node-router decoupling for effective power-gating of on-chip routers. In MICRO-45, pages 270--281, 2012. Google ScholarDigital Library
- R. Das, S. Narayanasamy, S. Satpathy, and R. Dreslinski. Catnap: Energy proportional multiple network-on-chip. In ISCA, pages 320--331, 2013. Google ScholarDigital Library
- Y. Ding, M. Kandemir, P. Raghavan, and M. J. Irwin. A helper thread based EDP reduction scheme for adapting application execution in CMPs. In IPDPS, pages 1--14, 2008.Google ScholarCross Ref
- J. Flich, S. Rodrigo, and J. Duato. An efficient implementation of distributed routing algorithms for nocs. In NoCs, pages 87--96, 2008. Google ScholarDigital Library
- Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. A 5-ghz mesh interconnect for a teraflops processor. Micro, IEEE, 27(5):51--61, 2007. Google ScholarDigital Library
- W. Huang et al. HotSpot: A compact thermal modeling methodology for early-stage VLSI design. IEEE Trans. on VLSI, 14(5):501--513, 2006. Google ScholarDigital Library
- N. Jiang et al. A detailed and flexible cycle-accurate network-on-chip simulator. In ISPASS, pages 86--96, 2013.Google ScholarCross Ref
- T. Krishna et al. Single-Cycle Multihop Asynchronous Repeated Traversal: A SMART Future for Reconfigurable On-Chip Networks. Computer, 46(40):48--55, 2013. Google ScholarDigital Library
- J. Li and J. F. Martinez. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In HPCA, pages 77--87, 2006.Google Scholar
- S. Li et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO-42, pages 469--480, 2009. Google ScholarDigital Library
- H. Matsutani, M. Koibuchi, D. Wang, and H. Amano. Run-time power gating of on-chip routers using look-ahead routing. In ASP-DAC, pages 55--60, 2008. Google ScholarDigital Library
- T. G. Mattson et al. The 48-core scc processor: the programmer's view. In SC, pages 1--11, 2010. Google ScholarDigital Library
- U. G. Nawathe et al. Implementation of an 8-core, 64-thread, power-efficient sparc server on a chip. IEEE JSSC, 43(1):6--20, 2008.Google ScholarCross Ref
- A. Raghavan et al. Computational sprinting. In HPCA, pages 1--12, 2012. Google ScholarDigital Library
- A. Samih et al. Energy-efficient interconnect via router parking. In HPCA, pages 508--519, 2013. Google ScholarDigital Library
- C. Sun et al. DSENT-a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In NoCS, pages 201--210, 2012. Google ScholarDigital Library
- M. B. Taylor et al. The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. Micro, IEEE, 22(2):25--35, 2002. Google ScholarDigital Library
- G. Venkatesh et al. Conservation cores: reducing the energy of mature computations. ACM SIGARCH Computer Architecture News, 38(1):205--218, 2010. Google ScholarDigital Library
Index Terms
- NoC-Sprinting: Interconnect for Fine-Grained Sprinting in the Dark Silicon Era
Recommendations
Communication latency aware low power NoC synthesis
DAC '06: Proceedings of the 43rd annual Design Automation ConferenceCommunication latency and power consumption are two competing objectives in Network-on-Chip (NoC) design. This paper proposes a novel method that unifies these two objectives in a multi-commodity flow (MCF) formulation. With an improved fully polynomial ...
Efficient Mapping of Applications for Future Chip-Multiprocessors in Dark Silicon Era
The failure of Dennard scaling has led to the utilization wall that is the source of dark silicon and limits the percentage of a chip that can actively switch within a given power budget. To address this issue, a structure is needed to guarantee the ...
P-NoC: Performance Evaluation and Design Space Exploration of NoCs for Chip Multiprocessor Architecture Using FPGA
AbstractThe network-on-chip (NoC) has emerged as an efficient and scalable communication fabric for chip multiprocessors (CMPs) and multiprocessor system on chips (MPSoCs). The NoC architecture, the routers micro-architecture and links influence the ...
Comments