ABSTRACT
Current processor trends show an increasing number of cores and a diversity of characteristics among them. Such processors offer a large potential for achieving high performance for different applications. Nevertheless, exploiting the characteristics of such processors is a challenge. In particular, considering all cores to be the same for scheduling tasks is not valid any longer. In this work we address three important characteristics for future many-core processors: (1) a many-core processor will include groups of different cores, (2) the latency to access off-chip memory will be larger for cores further from the on-chip memory controller and (3) as the number of cores per memory controller increases so does the pressure regarding the off-chip access bandwidth. To address these issues we propose a task assignment policy that monitors the demands of the application task and accordingly assigns the task to a better matching core if available. The assignment policy triggers, if needed, task migration in order to optimize both the execution time and the power consumption. In this paper we describe the assignment algorithm and how we will implement it on a many-core system.
- W. Bolosky, R. Fitzgerald, and M. Scott. Simple but effective techniques for numa memory management. SIGOPS Oper. Syst. Rev., 23:19--31, November 1989. Google ScholarDigital Library
- J. Howard et al. A 48-core ia-32 message-passing processor with dvfs in 45nm cmos. In Proceedings of the International Solid-State Circuits Conference, Feb, 2010.Google Scholar
- J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A nuca substrate for flexible cmp cache sharing. IEEE Transactions on Parallel and Distributed Systems, 18:1028--1040, 2007. Google ScholarDigital Library
- P. Petrides, F. Pratas, L. Sousa, and P. Trancoso. Exploiting location-aware task execution on future large-scale many-core architectures, Technical Report TR-12-4, University of Cyprus, Department of Computer Science,2012.Google Scholar
- B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin. Scaling the bandwidth wall: challenges in and avenues for cmp scaling. In ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture, pages 371--382, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- M. Timothy G. et al. The 48-core scc processor: the programmer's view. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking and Storage Analysis, April, 2007. Google ScholarDigital Library
Index Terms
- Addressing the challenges of future large-scale many-core architectures
Recommendations
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and ManycoresAchieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and ManycoresAchieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Vectorizing unstructured mesh computations for many-core architectures
Achieving optimal performance on the latest multi-core and many-core architectures increasingly depends on making efficient use of the hardware's vector units. This paper presents results on achieving high performance through vectorization on CPUs and ...
Comments