ABSTRACT
Thermal management is a crucial aspect of the design and operation of safety-critical multi-core architectures, as their high power density can cause significant heat generation and risk of thermal overload. If not properly managed, thermal overload can lead to system failures and performance degradation, which is a major challenge for system designers. To address this challenge, advanced core mapping solutions have become increasingly popular in both industry and academia. In this paper, we present key insights, techniques and results on thermal management in multi-core architectures. We propose a new per-core power budget strategy called that is scalable and enables system performance optimization while abstracting from mapping concerns. In addition, we present a new strategy called that allows us to derive worst-case mappings as a function of the number of active cores from a power consumption perspective in a thermal-aware design. We demonstrate the effectiveness of our solution through intensive simulations with the homogeneous AMD EPYC 7351 16-cores platform.
- AMD. 2017. AMD EPYC™ 7351. https://www.amd.com/en/product/1986.Google Scholar
- AMD. 2019. AMD EPYC™ 7742. https://www.amd.com/en/product/8761.Google Scholar
- Mohsen Ansari, Sepideh Safari, Amir Yeganeh-Khaksar, Mohammad Salehi, and Alireza Ejlali. 2019. Peak Power Management to Meet Thermal Design Power in Fault-Tolerant Embedded Systems. IEEE Transactions on Parallel and Distributed Systems 30, 1 (2019), 161–173. https://doi.org/10.1109/TPDS.2018.2858816Google ScholarDigital Library
- Ondřej Benedikt, Michal Sojka, Pavel Zaykov, David Hornof, Matěj Kafka, Přemysl Šůcha, and Zdeněk Hanzálek. 2021. Thermal-Aware Scheduling for MPSoC in the Avionics Domain: Tooling and Initial Results. In 27th Int. Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). IEEE, Houston, TX, USA, 159–168. https://doi.org/10.1109/RTCSA52859.2021.00026Google Scholar
- T. Chantem, R. P. Dick, and X. S. Hu. 2008. Temperature-Aware Scheduling and Assignment for Hard Real-Time Applications on MPSoCs. In Design, Automation and Test in Europe. IEEE, Munich, Germany, 288–293. https://doi.org/10.1109/DATE.2008.4484694Google ScholarCross Ref
- Ting-Hsuan Chien and Rong-Guey Chang. 2016. A thermal-aware scheduling for multicore architectures. Journal of Systems Architecture 62 (2016), 54–62. https://doi.org/10.1016/j.sysarc.2015.12.003Google ScholarDigital Library
- Intel. 2019. Intel® Xeon® Platinum 9282 Processor. https://www.intel.com/content/www/us/en/products/sku/194146/intel-xeon-platinum-9282-processor-77m-cache-2-60-ghz/specifications.html.Google Scholar
- Intel. 2019. Intel® Xeon® W-3275M Processor. https://www.intel.com/content/www/us/en/products/sku/193754/intel-xeon-w3275m-processor-38-5m-cache-2-50-ghz/specifications.html.Google Scholar
- Deguang Li, Ruiling Zhang, Shijie Jia, Yanling Jin, Youzhong Ma, and Junke Li. 2019. An Improved Dynamic Power Management Approach by Process Migration for Multi-Core Systems. In International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). IEEE, Atlanta, GA, USA, 368–372. https://doi.org/10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00081Google ScholarCross Ref
- Shiting (Justin) Lu, Russell Tessier, and Wayne Burleson. 2015. Reinforcement Learning for Thermal-Aware Many-Core Task Allocation. In Proceedings of the 25th Edition on Great Lakes Symposium on VLSI (Pittsburgh, Pennsylvania, USA) (GLSVLSI ’15). ACM, NY, USA, 379–384. https://doi.org/10.1145/2742060.2742078Google ScholarDigital Library
- Enric Musoll. 2008. A Thermal-Friendly Load-Balancing Technique for Multi-Core Processors. In International Symposium on Quality Electronic Design (ISQED). IEEE, San Jose, CA, USA, 549–552. https://doi.org/10.1109/ISQED.2008.4479794Google ScholarCross Ref
- S. Pagani. 2016. Power, Energy, and Thermal Management for Clustered Manycores. Ph. D. Dissertation. Karlsruher Institut für Technologie.Google Scholar
- Santiago Pagani, Heba Khdr, Jian-Jia Chen, Muhammad Shafique, Minming Li, and Jörg Henkel. 2017. Thermal Safe Power (TSP): Efficient Power Budgeting for Heterogeneous Manycore Systems in Dark Silicon. IEEE Trans. Comput. 66, 1 (2017), 147–162. https://doi.org/10.1109/TC.2016.2564969Google ScholarDigital Library
- Javier Pérez Rodríguez and Patrick Meumeu Yomsi. 2021. An Efficient Proactive Thermal-Aware Scheduler for DVFS-Enabled Single-Core Processors. In 29th Int. Conference on Real-Time Networks and Systems (NANTES, France) (RTNS’2021). ACM, NY, USA, 144–154. https://doi.org/10.1145/3453417.3453430Google ScholarDigital Library
- J. P. Rodriguez and P. M. Yomsi. 2020. WiP: Towards a fine-grain thermal model for uniform multi-core processors. In RTSS. IEEE, Houston, TX, USA, 403–406.Google Scholar
- Andrea Rudi, Andrea Bartolini, Andrea Lodi, and Luca Benini. 2014. Optimum: Thermal-aware task allocation for heterogeneous many-core devices. In International Conference on High Performance Computing and Simulation (HPCS). IEEE, Bologna, Italy, 82–87. https://doi.org/10.1109/HPCSim.2014.6903672Google ScholarCross Ref
- Muhammad Naeem Shehzad, Qaisar Bashir, Umer Farooq, Ghufran Ahmed, Mohsin Raza, Priyan Malarvizhi Kumar, and Muhammad Khalid. 2020. Threshold temperature scaling: Heuristic to address temperature and power issues in MPSoCs. Microprocessors and Microsystems 77 (2020), 103124. https://doi.org/10.1016/j.micpro.2020.103124Google ScholarDigital Library
- Hafiz Fahad Sheikh and Ishfaq Ahmad. 2014. Efficient heuristics for joint optimization of performance, energy, and temperature in allocating tasks to multi-core processors. In International Green Computing Conference. IEEE, Dallas, TX, USA, 1–8. https://doi.org/10.1109/IGCC.2014.7039178Google ScholarCross Ref
- K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan. 2003. Temperature-Aware Microarchitecture. In Int. Symp. on Comp. Arch.ACM, New York, NY, USA, 2–13.Google Scholar
- Ting-Hao Tsai and Ya-Shu Chen. 2016. Thermal-throttling server: A thermal-aware real-time task scheduling framework for three-dimensional multicore chips. Journal of Systems and Software 112 (2016), 11–25. https://doi.org/10.1016/j.jss.2015.10.038Google ScholarDigital Library
- Hai Wang, Ming Zhang, Sheldon X.-D. Tan, Chi Zhang, Yuan Yuan, Keheng Huang, and Zhenghong Zhang. 2016. New power budgeting and thermal management scheme for multi-core systems in dark silicon. In 29th International System-on-Chip Conference (SOCC). IEEE, Seattle, WA, USA, 344–349. https://doi.org/10.1109/SOCC.2016.7905507Google ScholarCross Ref
- Guowei Wu, Zichuan Xu, Qiufen Xia, Jiankang Ren, and Feng Xia. 2010. Task Allocation and Migration Algorithm for Temperature-Constrained Real-Time Multi-Core Systems. In Int. Conference on Green Computing and Communications and Int. Conference on Cyber, Physical and Social Computing. IEEE/ACM, Hangzhou, China, 189–196. https://doi.org/10.1109/GreenCom-CPSCom.2010.27Google ScholarDigital Library
- Buyoung Yun, Kang G. Shin, and Shige Wang. 2011. Thermal-Aware Scheduling of Critical Applications Using Job Migration and Power-Gating on Multi-core Chips. In International Conference on Trust, Security and Privacy in Computing and Communications. IEEE, Changsha, China, 1083–1090. https://doi.org/10.1109/TrustCom.2011.148Google ScholarDigital Library
- Jinwei Zhang, Sheriff Sadiqbatcha, Yuanqi Gao, Michael O’Dea, Nanpeng Yu, and Sheldon X.-D. Tan. 2020. HAT-DRL: Hotspot-Aware Task Mapping for Lifetime Improvement of Multicore System using Deep Reinforcement Learning. In 2nd Workshop on Machine Learning for CAD (MLCAD). ACM/IEEE, New York, NY, USA, 77–82. https://doi.org/10.1145/3380446.3430623Google ScholarDigital Library
- Dennis G Zill. 2012. A first course in differential equations with modeling applications. Cengage Learning, USA.Google Scholar
Index Terms
- B-TSP: An Advanced Power Safe Management Strategy for modern Multi-core Platforms under Thermal-Aware Design
Recommendations
TSP: thermal safe power: efficient power budgeting for many-core systems in dark silicon
CODES '14: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System SynthesisChip manufacturers provide the Thermal Design Power (TDP) for a specific chip. The cooling solution is designed to dissipate this power level. But because TDP is not necessarily the maximum power that can be applied, chips are operated with Dynamic ...
Workload-adaptive process tuning strategy for power-efficient multi-core processors
ISLPED '10: Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and designAs more devices are integrated with technology scaling, reducing the power consumption of both high-performance and low-power processors has become the first-class design constraint. Reducing power consumption while satisfying required performance is ...
Application and Thermal-reliability-aware Reinforcement Learning Based Multi-core Power Management
Special Issue on HALO for Energy-Constrained On-Chip Machine Learning, Part 2 and Regular PapersPower management through dynamic voltage and frequency scaling (DVFS) is one of the most widely adopted techniques. However, it impacts application reliability (due to soft errors, circuit aging, and deadline misses). However, increased power density ...
Comments