A tool to assess the communication cost of parallel kernels on heterogeneous platforms

Rico-Gallego, Juan A.; Moreno-Álvarez, Sergio; Díaz-Martín, Juan C.; Lastovetsky, Alexey L.

doi:10.1007/s11227-019-02919-1

A tool to assess the communication cost of parallel kernels on heterogeneous platforms

Published: 05 June 2019

Volume 76, pages 4629–4644, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Juan A. Rico-Gallego ORCID: orcid.org/0000-0002-4264-7473¹,
Sergio Moreno-Álvarez¹,
Juan C. Díaz-Martín² &
…
Alexey L. Lastovetsky³

286 Accesses
3 Citations
Explore all metrics

Abstract

Ensuring applications to achieve an efficient usage of resources and fast execution time in the complex current heterogeneous high-performance computing platforms is a paramount problem. Essential efforts to reach the goal are the optimal partitioning of the data space between the processes composing a typical task/data-parallel application, and their right mapping and deployment on the platform. The computational and communication performance modeling describing the platform and the application behaviors is an increasingly recognized approach. This paper discusses the utility of the \(\uptau\)–Lop analytic communication performance model in facing these issues and contributes with a practical symbolic computation tool that represents, manipulates and accurately evaluates the formal communication cost expression derived from a hybrid kernel. We identify a set of scenarios where the tool could be applied, provide with both basic and advanced use examples and evaluate the tool on real-life kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

NISQ computing: where are we and where do we go?

Article Open access 27 September 2022

Notes

Ring algorithm executes in \(P-1\) steps. In each step process with rank p sends a message of size m to process with rank \(p+1\) and receives a message of the same size from rank \(p-1\). The Recursive Doubling algorithm executes in \(\log _2 P\) steps by doubling the message size interchanged in each step. Process p communicates with process \(p \oplus 2^s\) in the step s.

References

Beaumont O, Boudet V, Rastello F, Robert Y (2001) Matrix multiplication on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 12(10):1033–1051. https://doi.org/10.1109/71.963416
Article Google Scholar
Beaumont O, Becker BA, DeFlumere AM, Eyraud-Dubois L, Lambert T, Lastovetsky AL (2019) Recent advances in matrix partitioning for parallel computing on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 30(1):218–229. https://doi.org/10.1109/TPDS.2018.2853151
Article Google Scholar
Bosque J, Pastor L (2006) A parallel computational model for heterogeneous clusters. IEEE Trans Parallel Distrib Syst 17:1390–1400. https://doi.org/10.1109/TPDS.2006.165
Article Google Scholar
Casanova H, Giersch A, Legrand A, Quinson M, Suter F (2014) Versatile, scalable, and accurate simulation of distributed applications and platforms. J Parallel Distrib Comput 74(10):2899–2917
Article Google Scholar
Jeannot E, Mercier G, Tessier F (2014) Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans Parallel Distrib Syst 25(4):993–1002. https://doi.org/10.1109/TPDS.2013.104
Article Google Scholar
Kalinov A, Lastovetsky A (2001) Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers. J Parallel Distrib Comput 61(4):520–535. https://doi.org/10.1006/jpdc.2000.1686
Article MATH Google Scholar
Lastovetsky A (2002) Adaptive parallel computing on heterogeneous networks with mpC. Parallel Comput 28(10):1369–1407. https://doi.org/10.1016/S0167-8191(02)00159-X
Article MATH Google Scholar
Lastovetsky A, Reddy R (2007) Data partitioning with a functional performance model of heterogeneous processors. Int J High Perform Comput Appl 21(1):76–90
Article Google Scholar
Lastovetsky A, Mkwawa IH, O’Flynn M (2006) An accurate communication model of a heterogeneous cluster based on a switch-enabled Ethernet network. In: 12th International Conference on Parallel and Distributed Systems, 2006. ICPADS 2006, vol 2, p 6
Malik T, Rychkov V, Lastovetsky A (2016) Network-aware optimization of communications for parallel matrix multiplication on hierarchical hpc platforms. Concurr Comput Pract Exp 28:802–821. https://doi.org/10.1002/cpe.3609
Article Google Scholar
Rico-Gallego JA, Díaz-Martín JC, Lastovetsky AL (2016) Extending \(\uptau\)-lop to model concurrent MPI communications in multicore clusters. Future Gener Comput Syst 61:66–82. https://doi.org/10.1016/j.future.2016.02.021
Article Google Scholar
Rico-Gallego JA, Lastovetsky AL, Díaz-Martín JC (2017) Model-based estimation of the communication cost of hybrid data-parallel applications on heterogeneous clusters. IEEE Trans Parallel Distrib Syst 28(11):3215–3228. https://doi.org/10.1109/TPDS.2017.2715809
Article Google Scholar
Van De Geijn RA, Watts J (1997) Summa: scalable universal matrix multiplication algorithm. Concurr Pract Exp 9(4):255–274
Article Google Scholar

Download references

Acknowledgements

This work was supported by the European Regional Development Fund ’A way to achieve Europe’ (ERDF) and the Extremadura Local Government (Ref. IB16118). This work also was partially supported by Advanced Computing and Technologies Foundation of Extremadura (CenitS/COMPUTAEX).

Author information

Authors and Affiliations

Department of Computer Systems Engineering and Telematics, School of Technology, University of Extremadura, Avd. Universidad s/n, 10003, Cáceres, Spain
Juan A. Rico-Gallego & Sergio Moreno-Álvarez
Department of Computer and Communication Technology, University of Extremadura, Cáceres, Spain
Juan C. Díaz-Martín
School of Computer Science, University College Dublin, Dublin, Ireland
Alexey L. Lastovetsky

Authors

Juan A. Rico-Gallego
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Moreno-Álvarez
View author publications
You can also search for this author in PubMed Google Scholar
Juan C. Díaz-Martín
View author publications
You can also search for this author in PubMed Google Scholar
Alexey L. Lastovetsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan A. Rico-Gallego.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rico-Gallego, J.A., Moreno-Álvarez, S., Díaz-Martín, J.C. et al. A tool to assess the communication cost of parallel kernels on heterogeneous platforms. J Supercomput 76, 4629–4644 (2020). https://doi.org/10.1007/s11227-019-02919-1

Download citation

Published: 05 June 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11227-019-02919-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A tool to assess the communication cost of parallel kernels on heterogeneous platforms

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Performance improvement of the triangular matrix product in commodity clusters

NISQ computing: where are we and where do we go?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A tool to assess the communication cost of parallel kernels on heterogeneous platforms

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Performance improvement of the triangular matrix product in commodity clusters

NISQ computing: where are we and where do we go?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation