Islands-of-Cores Approach for Harnessing SMP/NUMA Architectures in Heterogeneous Stencil Computations

Szustak, Lukasz; Wyrzykowski, Roman; Jakl, Ondřej

doi:10.1007/978-3-319-62932-2_34

Lukasz Szustak¹⁴,
Roman Wyrzykowski¹⁴ &
Ondřej Jakl¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10421))

Included in the following conference series:

International Conference on Parallel Computing Technologies

1132 Accesses
2 Citations

Abstract

SMP/NUMA systems are powerful HPC platforms which could be applied for a wide range of real-life applications. These systems provide large capacity of shared memory, and allow using the shared-variable programming model to take advantages of shared memory for inter-process communications and synchronizations. However, as data can be physically dispersed over many nodes, the access to various data items may require significantly different times. In this paper, we face the challenge of harnessing the heterogeneous nature of SMP/NUMA communications for a complex scientific application which implements the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA), consisting of a set of heterogeneous stencil computations.

When using our method of MPDATA workload distribution, which was successfully applied for small-scale shared memory systems with several CPUs and/or accelerators, significant performance losses are noticeable for larger SMP/NUMA systems, such as SGI UV 2000 server used in this work. To overcome this shortcoming, we propose a new islands-of-cores approach. It exposes a correlation between computation and communication for heterogeneous stencils, and enables an efficient management of trade-off between computation and communication costs in accordance with the features of SMP/NUMA systems. In consequence, when using the maximum configuration with 112 cores of 14 Intel Xeon E5-4627v2 3.3 GHz processors, the proposed approach accelerates the previous method more then 10 times, achieving about 390 Gflop/s, or approximately 30% of the theoretical peak performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cao, X., et al.: Accelerating data shuffling in MapReduce framework with a scale-up NUMA computing architecture. In: Proceedings of the 24th High Performance Computing Symposium, HPC 2016. International Society for Computer Simulation (2016)
Google Scholar
Castro, M., Francesquini, E., Nguélé, T.M., Méhaut, J.F.: Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application. In: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms. ACM (2013)
Google Scholar
Ciznicki, M., Kulczewski, M., Kopta, P., Kurowski, K.: Methods to load balance a GCR pressure solver using a stencil framework on multi-and many-core architectures. Sci. Program. (2015)
Google Scholar
Culler, D., Pal Singh, J., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Czarnul, P.: Benchmarking performance of a hybrid Xeon/Xeon Phi system for parallel computation of similarity measures between large vectors. Int. J. Parallel Program. 1–17 (2017)
Google Scholar
Guo, J., Bikshandi, G., Fraguela, B.B., Padua, D.: Writing productive stencil codes with overlapped tiling. Concurr. Comput. Pract. Exp. 21(1), 25–39 (2009)
Article Google Scholar
Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multi-core chips via simple machine models. Concurr. Comput. Pract. Exp. 28(22), 189–210 (2016)
Article Google Scholar
National Supercomputing Center IT4Innovations (2017). http://www.it4i.cz
Kumar, S., Bhattacharyya, R., Joshi, B., Smolarkiewicz, P.: On the role of repetitive magnetic reconnections in evolution of magnetic flux ropes in solar corona. Astrophys. J. 830(2), 80 (2016)
Article Google Scholar
Lastovetsky, A., Szustak, L., Wyrzykowski, R.: Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing. IEEE Trans. Parallel Distrib. Syst. 28(3), 787–797 (2017)
Article Google Scholar
SGI Products: Servers SGI UV (2015). https://www.sgi.com/products/servers/uv/
SGI UV 2000 System User Guide. Document Number 007–5832-002 (2013)
Google Scholar
Smolarkiewicz, P.: Multidimensional positive definite advection transport algorithm: an overview. Int. J. Numer. Methods Fluids 50(10), 1123–1144 (2006)
Article MathSciNet MATH Google Scholar
Smolarkiewicz, P., Margolin, L.: MPDATA: a finite-difference solver for geophysical flows. J. Comput. Phys. 140(2), 459–480 (1998)
Article MathSciNet MATH Google Scholar
Smolarkiewicz, P.K., Charbonneau, P.: EULAG, a computational model for multiscale flows: an MHD extension. J. Comput. Phys. 236, 608–623 (2013)
Article MathSciNet Google Scholar
Smolarkiewicz, P.K., Szmelter, J., Xiao, F.: Simulation of all-scale atmospheric dynamics on unstructured meshes. J. Comput. Phys. 322(C), 267–287 (2016)
Article MathSciNet MATH Google Scholar
Strugarek, A., Beaudoin, P., Brun, A., Charbonneau, P., Mathis, S., Smolarkiewicz, P.: Modeling turbulent stellar convection zones: sub-grid scales effects. Adv. Space Res. 58(8), 1538–1553 (2016)
Article Google Scholar
Szustak, L., Rojek, K., Gepner, P.: Using Intel Xeon Phi coprocessor to accelerate computations in MPDATA algorithm. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013. LNCS, vol. 8384, pp. 582–592. Springer, Heidelberg (2014). doi:10.1007/978-3-642-55224-3_54
Chapter Google Scholar
Szustak, L., Rojek, K., Olas, T., Kuczynski, L., Halbiniak, K., Gepner, P.: Adaptation of MPDATA heterogeneous stencil computation to Intel Xeon Phi coprocessor. Sci. Program. (2015). doi:10.1155/2015/642705
Szustak, L., Rojek, K., Wyrzykowski, R., Gepner, P.: Toward efficient distribution of MPDATA stencil computation on Intel MIC architecture. In: Proceedings of the 1st International Workshop on High-Performance Stencil Computations, HiStencils 2014, pp. 51–56 (2014)
Google Scholar
Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of the First International Workshop on Parallel Software Tools and Tool Infrastructures, PSTI 2010, San Diego, CA (2010)
Google Scholar
Unat, D., et al.: Programming abstractions for data locality. (2014). http://web.eecs.umich.edu/akamil/papers/padal14report.pdf
Utrera, G., Gil, M., Martorell, X.: In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance. In: 2015 International Conference on High Performance Computing and Simulation (HPCS), pp. 429–435. IEEE (2015)
Google Scholar
Xue, W., et al.: Ultra-scalable CPU-MIC acceleration of mesoscale atmospheric modeling on Tianhe-2. IEEE Trans. Comput. 64(8), 2382–2393 (2015)
Article MathSciNet MATH Google Scholar
Yasui, Y., Fujisawa, K., Goh, E.L., Baron, J., Sugiura, A., Uchiyama, T.: NUMA-aware scalable graph traversal on SGI UV systems. In: Proceedings of the ACM Workshop on High Performance Graph Processing, pp. 19–26. ACM (2016)
Google Scholar
Zhou, X., Giacalone, J.P., Garzarán, M.J., Kuhn, R.H., Ni, Y., Padua, D.: Hierarchical overlapped tiling. In: Proceedings of the Tenth International Symposium on Code Generation and Optimization, pp. 207–218. ACM (2012)
Google Scholar

Download references

Acknowledgments

This work was supported by the National Science Centre (Poland) under grant UMO-2015/17/D/ST6/04059, as well as partially supported by the Ministry of Education, Youth and Sports of Czech Republic from the project “IT4Innovations National Supercomputing Center LM2015070”, and by EU under the COST Program Action IC1305 “Network for Sustainable Ultrascale Computing (NESUS)” and its Czech supporting project LD15105 “Ultrascale Computing in Geosciences”.

Author information

Authors and Affiliations

Czestochowa University of Technology, Dabrowskiego 69, 42-201, Czestochowa, Poland
Lukasz Szustak & Roman Wyrzykowski
Institute of Geonics of the Czech Academy of Sciences, Studentská 1768, 708 00, Ostrava-Poruba, Czech Republic
Ondřej Jakl

Authors

Lukasz Szustak
View author publications
You can also search for this author in PubMed Google Scholar
Roman Wyrzykowski
View author publications
You can also search for this author in PubMed Google Scholar
Ondřej Jakl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lukasz Szustak .

Editor information

Editors and Affiliations

Russian Academy of Sciences, Novosibirsk, Russia
Victor Malyshkin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szustak, L., Wyrzykowski, R., Jakl, O. (2017). Islands-of-Cores Approach for Harnessing SMP/NUMA Architectures in Heterogeneous Stencil Computations. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2017. Lecture Notes in Computer Science(), vol 10421. Springer, Cham. https://doi.org/10.1007/978-3-319-62932-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-62932-2_34
Published: 29 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62931-5
Online ISBN: 978-3-319-62932-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics