Skip to main content
Log in

A tool to assess the communication cost of parallel kernels on heterogeneous platforms

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Ensuring applications to achieve an efficient usage of resources and fast execution time in the complex current heterogeneous high-performance computing platforms is a paramount problem. Essential efforts to reach the goal are the optimal partitioning of the data space between the processes composing a typical task/data-parallel application, and their right mapping and deployment on the platform. The computational and communication performance modeling describing the platform and the application behaviors is an increasingly recognized approach. This paper discusses the utility of the \(\uptau\)–Lop analytic communication performance model in facing these issues and contributes with a practical symbolic computation tool that represents, manipulates and accurately evaluates the formal communication cost expression derived from a hybrid kernel. We identify a set of scenarios where the tool could be applied, provide with both basic and advanced use examples and evaluate the tool on real-life kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Ring algorithm executes in \(P-1\) steps. In each step process with rank p sends a message of size m to process with rank \(p+1\) and receives a message of the same size from rank \(p-1\). The Recursive Doubling algorithm executes in \(\log _2 P\) steps by doubling the message size interchanged in each step. Process p communicates with process \(p \oplus 2^s\) in the step s.

References

  1. Beaumont O, Boudet V, Rastello F, Robert Y (2001) Matrix multiplication on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 12(10):1033–1051. https://doi.org/10.1109/71.963416

    Article  Google Scholar 

  2. Beaumont O, Becker BA, DeFlumere AM, Eyraud-Dubois L, Lambert T, Lastovetsky AL (2019) Recent advances in matrix partitioning for parallel computing on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 30(1):218–229. https://doi.org/10.1109/TPDS.2018.2853151

    Article  Google Scholar 

  3. Bosque J, Pastor L (2006) A parallel computational model for heterogeneous clusters. IEEE Trans Parallel Distrib Syst 17:1390–1400. https://doi.org/10.1109/TPDS.2006.165

    Article  Google Scholar 

  4. Casanova H, Giersch A, Legrand A, Quinson M, Suter F (2014) Versatile, scalable, and accurate simulation of distributed applications and platforms. J Parallel Distrib Comput 74(10):2899–2917

    Article  Google Scholar 

  5. Jeannot E, Mercier G, Tessier F (2014) Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans Parallel Distrib Syst 25(4):993–1002. https://doi.org/10.1109/TPDS.2013.104

    Article  Google Scholar 

  6. Kalinov A, Lastovetsky A (2001) Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers. J Parallel Distrib Comput 61(4):520–535. https://doi.org/10.1006/jpdc.2000.1686

    Article  MATH  Google Scholar 

  7. Lastovetsky A (2002) Adaptive parallel computing on heterogeneous networks with mpC. Parallel Comput 28(10):1369–1407. https://doi.org/10.1016/S0167-8191(02)00159-X

    Article  MATH  Google Scholar 

  8. Lastovetsky A, Reddy R (2007) Data partitioning with a functional performance model of heterogeneous processors. Int J High Perform Comput Appl 21(1):76–90

    Article  Google Scholar 

  9. Lastovetsky A, Mkwawa IH, O’Flynn M (2006) An accurate communication model of a heterogeneous cluster based on a switch-enabled Ethernet network. In: 12th International Conference on Parallel and Distributed Systems, 2006. ICPADS 2006, vol 2, p 6

  10. Malik T, Rychkov V, Lastovetsky A (2016) Network-aware optimization of communications for parallel matrix multiplication on hierarchical hpc platforms. Concurr Comput Pract Exp 28:802–821. https://doi.org/10.1002/cpe.3609

    Article  Google Scholar 

  11. Rico-Gallego JA, Díaz-Martín JC, Lastovetsky AL (2016) Extending \(\uptau\)-lop to model concurrent MPI communications in multicore clusters. Future Gener Comput Syst 61:66–82. https://doi.org/10.1016/j.future.2016.02.021

    Article  Google Scholar 

  12. Rico-Gallego JA, Lastovetsky AL, Díaz-Martín JC (2017) Model-based estimation of the communication cost of hybrid data-parallel applications on heterogeneous clusters. IEEE Trans Parallel Distrib Syst 28(11):3215–3228. https://doi.org/10.1109/TPDS.2017.2715809

    Article  Google Scholar 

  13. Van De Geijn RA, Watts J (1997) Summa: scalable universal matrix multiplication algorithm. Concurr Pract Exp 9(4):255–274

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the European Regional Development Fund ’A way to achieve Europe’ (ERDF) and the Extremadura Local Government (Ref. IB16118). This work also was partially supported by Advanced Computing and Technologies Foundation of Extremadura (CenitS/COMPUTAEX).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan A. Rico-Gallego.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rico-Gallego, J.A., Moreno-Álvarez, S., Díaz-Martín, J.C. et al. A tool to assess the communication cost of parallel kernels on heterogeneous platforms. J Supercomput 76, 4629–4644 (2020). https://doi.org/10.1007/s11227-019-02919-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02919-1

Keywords

Navigation