Abstract
Ensuring applications to achieve an efficient usage of resources and fast execution time in the complex current heterogeneous high-performance computing platforms is a paramount problem. Essential efforts to reach the goal are the optimal partitioning of the data space between the processes composing a typical task/data-parallel application, and their right mapping and deployment on the platform. The computational and communication performance modeling describing the platform and the application behaviors is an increasingly recognized approach. This paper discusses the utility of the \(\uptau\)–Lop analytic communication performance model in facing these issues and contributes with a practical symbolic computation tool that represents, manipulates and accurately evaluates the formal communication cost expression derived from a hybrid kernel. We identify a set of scenarios where the tool could be applied, provide with both basic and advanced use examples and evaluate the tool on real-life kernels.
Similar content being viewed by others
Notes
Ring algorithm executes in \(P-1\) steps. In each step process with rank p sends a message of size m to process with rank \(p+1\) and receives a message of the same size from rank \(p-1\). The Recursive Doubling algorithm executes in \(\log _2 P\) steps by doubling the message size interchanged in each step. Process p communicates with process \(p \oplus 2^s\) in the step s.
References
Beaumont O, Boudet V, Rastello F, Robert Y (2001) Matrix multiplication on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 12(10):1033–1051. https://doi.org/10.1109/71.963416
Beaumont O, Becker BA, DeFlumere AM, Eyraud-Dubois L, Lambert T, Lastovetsky AL (2019) Recent advances in matrix partitioning for parallel computing on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 30(1):218–229. https://doi.org/10.1109/TPDS.2018.2853151
Bosque J, Pastor L (2006) A parallel computational model for heterogeneous clusters. IEEE Trans Parallel Distrib Syst 17:1390–1400. https://doi.org/10.1109/TPDS.2006.165
Casanova H, Giersch A, Legrand A, Quinson M, Suter F (2014) Versatile, scalable, and accurate simulation of distributed applications and platforms. J Parallel Distrib Comput 74(10):2899–2917
Jeannot E, Mercier G, Tessier F (2014) Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans Parallel Distrib Syst 25(4):993–1002. https://doi.org/10.1109/TPDS.2013.104
Kalinov A, Lastovetsky A (2001) Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers. J Parallel Distrib Comput 61(4):520–535. https://doi.org/10.1006/jpdc.2000.1686
Lastovetsky A (2002) Adaptive parallel computing on heterogeneous networks with mpC. Parallel Comput 28(10):1369–1407. https://doi.org/10.1016/S0167-8191(02)00159-X
Lastovetsky A, Reddy R (2007) Data partitioning with a functional performance model of heterogeneous processors. Int J High Perform Comput Appl 21(1):76–90
Lastovetsky A, Mkwawa IH, O’Flynn M (2006) An accurate communication model of a heterogeneous cluster based on a switch-enabled Ethernet network. In: 12th International Conference on Parallel and Distributed Systems, 2006. ICPADS 2006, vol 2, p 6
Malik T, Rychkov V, Lastovetsky A (2016) Network-aware optimization of communications for parallel matrix multiplication on hierarchical hpc platforms. Concurr Comput Pract Exp 28:802–821. https://doi.org/10.1002/cpe.3609
Rico-Gallego JA, Díaz-Martín JC, Lastovetsky AL (2016) Extending \(\uptau\)-lop to model concurrent MPI communications in multicore clusters. Future Gener Comput Syst 61:66–82. https://doi.org/10.1016/j.future.2016.02.021
Rico-Gallego JA, Lastovetsky AL, Díaz-Martín JC (2017) Model-based estimation of the communication cost of hybrid data-parallel applications on heterogeneous clusters. IEEE Trans Parallel Distrib Syst 28(11):3215–3228. https://doi.org/10.1109/TPDS.2017.2715809
Van De Geijn RA, Watts J (1997) Summa: scalable universal matrix multiplication algorithm. Concurr Pract Exp 9(4):255–274
Acknowledgements
This work was supported by the European Regional Development Fund ’A way to achieve Europe’ (ERDF) and the Extremadura Local Government (Ref. IB16118). This work also was partially supported by Advanced Computing and Technologies Foundation of Extremadura (CenitS/COMPUTAEX).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rico-Gallego, J.A., Moreno-Álvarez, S., Díaz-Martín, J.C. et al. A tool to assess the communication cost of parallel kernels on heterogeneous platforms. J Supercomput 76, 4629–4644 (2020). https://doi.org/10.1007/s11227-019-02919-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-02919-1