Skip to main content

Using Dynamic Broadcasts to Improve Task-Based Runtime Performances

  • Conference paper
  • First Online:
Book cover Euro-Par 2020: Parallel Processing (Euro-Par 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12247))

Included in the following conference series:

Abstract

Task-based runtimes have emerged in the HPC world to take benefit from the computation power of heterogeneous supercomputers and to achieve scalability. One of the main bottlenecks for scalability is the communication layer. Some task-based algorithms need to send the same data to multiple nodes. To optimize this communication pattern, libraries propose dedicated routines, such as MPI_Bcast. However, MPI_Bcast requirements do not fit well with the constraints of task-based runtime systems: it must be performed simultaneously by all involved nodes, and these must know each other, which is not possible when each node runs a task scheduler not synchronized with others. In this paper, we propose a new approach, called dynamic broadcasts to overcome these constraints. The broadcast communication pattern required by the task-based algorithm is detected automatically, then the broadcasting algorithm relies on active messages and source routing, so that participating nodes do not need to know each other and do not need to synchronize. Receiver receives data the same way as it receives point-to-point communication, without having to know it arrives through a broadcast. We have implemented the algorithm in the StarPU runtime system using the NewMadeleine communication library. We performed benchmarks using the Cholesky factorization that is known to use broadcasts and observed up to 30% improvement of its total execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is important to note that the improvement is measured on the total performance and not on the communication part only.

References

  1. Acun, B., et al.: Parallel programming with migratable objects: Charm++ in practice. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’14, pp. 647–658. IEEE (2014)

    Google Scholar 

  2. Agullo, E., et al.: Faster, cheaper, better - a hybridization methodology to develop linear algebra software for GPUs. In: Hwu, W.W. (ed.) GPU Computing Gems, vol. 2. Morgan Kaufmann (September 2010). https://hal.inria.fr/inria-00547847

  3. Agullo, E., et al.: Achieving high performance on supercomputers with a sequential task-based programming model. IEEE Trans. Parallel Distrib. Syst. (2017). https://hal.inria.fr/hal-01618526

  4. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03869-3_80

    Chapter  Google Scholar 

  5. Aumage, O., Brunet, E., Furmento, N., Namyst, R.: NewMadeleine: a fast communication scheduling engine for high performance networks. In: Workshop on Communication Architecture for Clusters, CAC 2007, Long Beach, California, United States (March 2007). https://hal.inria.fr/inria-00127356

  6. Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp. 1–11 (November 2012). https://doi.org/10.1109/SC.2012.71

  7. Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Luszczek, P., Dongarra, J.: Dense linear algebra on distributed heterogeneous hardware with a symbolic dag approach. Scalable Comput. Commun. Theor. Pract., 699–735 (2013)

    Google Scholar 

  8. Denis, A.: Scalability of the NewMadeleine communication library for large numbers of MPI point-to-point requests. In: 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing, CCGrid 2019, Larnaca, Cyprus (May 2019) https://hal.inria.fr/hal-02103700

  9. Dongarra, J.: Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures (2013). https://doi.org/10.2172/1096392

  10. Jeannot, E.: Automatic multithreaded parallel program generation for message passing multiprocessors using parameterized task graphs. In: International Conference on Parallel Computing 2001, ParCo2001, Naples, Italy (September 2001)

    Google Scholar 

  11. Kaiser, H., Brodowicz, M., Sterling, T.: Parallex an advanced parallel execution model for scaling-impaired applications. In: 2009 International Conference on Parallel Processing Workshops, pp. 394–401 (September 2009). https://doi.org/10.1109/ICPPW.2009.14

  12. Pješivac-Grbović, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.J.: Performance analysis of MPI collective operations. Clust. Comput. 10(2), 127–143 (2007)

    Article  Google Scholar 

  13. Sanders, P., Speck, J., Träff, J.L.: Two-tree algorithms for full bandwidth broadcast, reduction and scan. Parallel Comput. 35(12), 581–594 (2009). Selected papers from the 14th European PVM/MPI Users Group Meeting. https://doi.org/10.1016/j.parco.2009.09.001

  14. Tejedor, E., Farreras, M., Grove, D., Badia, R.M., Almasi, G., Labarta, J.: A high-productivity task-based programming model for clusters. Concurrency Comput. Pract. Exp. 24(18), 2421–2448 (2012). https://doi.org/10.1002/cpe.2831

    Article  Google Scholar 

  15. Träff, J.L., Ripke, A.: Optimal broadcast for fully connected processor-node networks. J. Parallel Distrib. Comput. 68(7), 887–901 (2008). https://doi.org/10.1016/j.jpdc.2007.12.001

    Article  MATH  Google Scholar 

  16. Wickramasinghe, U., Lumsdaine, A.: A survey of methods for collective communication optimization and tuning. CoRR abs/1611.06334 (2016). http://arxiv.org/abs/1611.06334

Download references

Acknowledgements

This work is supported by the Agence Nationale de la Recherche, under grant ANR-19-CE46-0009.

This work is supported by the Région Nouvelle-Aquitaine, under grant 2018-1R50119 HPC scalable ecosystem.

Experiments presented in this paper were carried out using the PlaFRIM experimental testbed, supported by Inria, CNRS (LABRI and IMB), Université de Bordeaux, Bordeaux INP and Conseil Régional d’Aquitaine (see https://www.plafrim.fr/).

This work was granted access to the HPC resources of CINES under the allocation 2019- A0060601567 attributed by GENCI (Grand Equipement National de Calcul Intensif).

The authors furthermore thank Olivier Aumage and Nathalie Furmento for their help and advice regarding to this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Swartvagher .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Denis, A., Jeannot, E., Swartvagher, P., Thibault, S. (2020). Using Dynamic Broadcasts to Improve Task-Based Runtime Performances. In: Malawski, M., Rzadca, K. (eds) Euro-Par 2020: Parallel Processing. Euro-Par 2020. Lecture Notes in Computer Science(), vol 12247. Springer, Cham. https://doi.org/10.1007/978-3-030-57675-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-57675-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57674-5

  • Online ISBN: 978-3-030-57675-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics