Improving Simulations of Task-Based Applications on Complex NUMA Architectures

Daoudi, Idriss; Gautier, Thierry; Thibault, Samuel; Perarnau, Swann

doi:10.1007/978-3-031-40744-4_13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14114))

Included in the following conference series:

International Workshop on OpenMP

251 Accesses

Abstract

Modeling and simulation are crucial in high-performance computing (HPC), with numerous frameworks developed for distributed computing infrastructures and their applications. Despite node-level simulation of shared-memory systems and task-based parallel applications, existing works overlook non-uniform memory access (NUMA) effects, a critical characteristic of current HPC platforms. In this work, we introduce a modeling for complex NUMA architectures and enhance a simulator for dependency-based task-parallel applications. This facilitates experiments with varied data locality models: we refine a communication-oriented model leveraging topology information for data transfers, and devise a more intricate model incorporating a cache mechanism for last-level cache data storage. Dense linear algebra test cases are used to validate both models, demonstrating that our simulator reliably predicts execution time with minimal relative error.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agullo, E., Beaumont, O., Eyraud-Dubois, L., Kumar, S.: Are static schedules so bad? a case study on Cholesky factorization. In: IPDPS (2016)
Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 2009(23), 187–198 (2011). Special Issue: Euro-Par
Article Google Scholar
Ayguadé, E., Badia, R.M., Igual, F.D., Labarta, J., Mayo, R., Quintana-Ortí, E.S.: An extension of the StarSs programming model for platforms with multiple GPUs. In: Proceedings of the 15th Euro-Par Conference. Delft, The Netherlands (2009)
Google Scholar
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Article Google Scholar
Broquedis, F., et al.: hwloc: a generic framework for managing hardware affinities in HPC applications. In: International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), pp. 180–186. Pisa, Italia (2010)
Google Scholar
Bueno, J., Martinell, L., Duran, A., Farreras, M., Martorell, X., Badia, R.M., Ayguadé, E., Labarta, J.: Productive cluster programming with OmpSs. In: Proceedings of the 17th international conference on Parallel processing - Volume Part I. Euro-Par 2011 (2011)
Google Scholar
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: Lapack working note 191: a class of parallel tiled linear algebra algorithms for multicore architectures (2007)
Google Scholar
Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A.F., Buyya, R.: CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw. Pract. Exp. 41(1), 23–50 (2011)
Article Google Scholar
Casanova, H.: Simgrid: a toolkit for the simulation of application scheduling. In: CC Grid, pp. 430–437 (2001)
Google Scholar
Casanova, H.: Modeling large-scale platforms for the analysis and the simulation of scheduling strategies. In: 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. p. 170 (2004)
Google Scholar
Charles, P., et al.: X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Notices 40(10), 519–538 (2005)
Article Google Scholar
Czarnul, P., et al.: MERPSYS: an environment for simulation of parallel application execution on large scale HPC systems. Simul. Model. Pract. Theory 77, 124–140 (2017)
Article Google Scholar
Daoudi, I., Virouleau, P., Gautier, T., Thibault, S., Aumage, O.: sOMP: simulating OpenMP task-based applications with NUMA effects. In: IWOMP 2020, pp. 197–211 (2020)
Google Scholar
Denoyelle, N., Goglin, B., Ilic, A., Jeannot, E., Sousa, L.: Modeling non-uniform memory access on large compute nodes with the cache-aware roofline model. IEEE Trans. Parallel Distrib. Syst. 30(6), 1374–1389 (2019)
Article Google Scholar
Engelmann, C.: Scaling to a million cores and beyond: using light-weight simulation to understand the challenges ahead on the road to exascale. Futur. Gener. Comput. Syst. 30, 59–65 (2014)
Article Google Scholar
Galilee, F., Cavalheiro, G., Roch, J.L., Doreille, M.: Athapascan-1: on-line building data flow graph in a parallel language. In: PACT (1998)
Google Scholar
Gautier, T., Besseron, X., Pigeon, L.: KAAPI: a thread scheduling runtime system for data flow computations on cluster of multi-processors. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation. PASCO 2007 (2007)
Google Scholar
Gautier, T., Lima, J.V., Maillard, N., Raffin, B.: Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures. In: IPDPS. IEEE (2013)
Google Scholar
Girona, S., Labarta, J.: Sensitivity of performance prediction of message passing programs. J. Supercomputing 17, 291–298 (2000)
Article MATH Google Scholar
Haugen, B.: Performance analysis and modeling of task-based runtimes, Ph.D. thesis (2016)
Google Scholar
Haugen, B., Kurzak, J., YarKhan, A., Luszczek, P., Dongarra, J.: Parallel simulation of superscalar scheduling. In: ICPP, pp. 121–130 (2014)
Google Scholar
Heinrich, F.: Modeling, prediction and optimization of energy consumption of MPI applications using SimGrid, Theses, Université Grenoble Alpes (2019)
Google Scholar
Kliazovich, D., Bouvry, P., Khan, S.U.: Greencloud: a packet-level simulator of energy-aware cloud computing data centers. J. Supercomput. 62, 1263–1283 (2012)
Article Google Scholar
Liu, Y., et al.: SimNUMA: simulating NUMA-architecture multiprocessor systems efficiently. In: ICPDS (2013)
Google Scholar
Mohammed, A., Eleliemy, A., Ciorba, F.M., Kasielke, F., Banicescu, I.: Experimental verification and analysis of dynamic loop scheduling in scientific applications. In: ISPDC. IEEE (2018)
Google Scholar
Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012)
Article Google Scholar
Rico, A., Duran, A., Cabarcas, F., Etsion, Y., Ramirez, A., Valero, M.: Trace-driven simulation of multithreaded applications. In: International Symposium on Performance Analysis of Systems and Software (2011)
Google Scholar
Shudler, S., Calotoiu, A., Hoefler, T., Wolf, F.: Isoefficiency in practice: configuring and understanding the performance of task-based applications. SIGPLAN Notices 52(8), 131–143 (2017)
Article Google Scholar
Stanisic, L., et al.: Fast and accurate simulation of multithreaded sparse linear algebra solvers. In: ICPDS. Melbourne, Australia (2015)
Google Scholar
Stanisic, L., Thibault, S., Legrand, A., Videau, B., Méhaut, J.F.: Faithful performance prediction of a dynamic task-based runtime system for heterogeneous multi-core architectures. Concurr. Comput. Pract. Exp. 27(16), 4075–4090 (2015)
Article Google Scholar
Tao, J., Schulz, M., Karl, W.: Simulation as a tool for optimizing memory accesses on NUMA machines. Perform. Eval. 60(1), 31–50 (2005)
Article Google Scholar
Virouleau, P., Broquedis, F., Gautier, T., Rastello, F.: Using data dependencies to improve task-based scheduling strategies on NUMA architectures. In: ECPP (2016)
Google Scholar
Virouleau, P., et al.: Evaluation of OpenMP dependent tasks with the KASTORS benchmark suite. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 16–29. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11454-5_2
Chapter Google Scholar
Zheng, G., Kakulapati, G., Kalé, L.V.: Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In: IPDPS, p. 78. IEEE (2004)
Google Scholar

Download references

Acknowledgements

This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative, and the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computer Research, under Contract DE-AC02-06CH11357.

Author information

Authors and Affiliations

Argonne National Laboratory, Lemont, USA
Idriss Daoudi & Swann Perarnau
INRIA Grenoble - LIP - ENS Lyon, Lyon, France
Thierry Gautier
INRIA Bordeaux - Université de Bordeaux, Bordeaux, France
Samuel Thibault

Authors

Idriss Daoudi
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Gautier
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Thibault
View author publications
You can also search for this author in PubMed Google Scholar
Swann Perarnau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Idriss Daoudi .

Editor information

Editors and Affiliations

University of Bristol, Bristol, UK
Simon McIntosh-Smith
OpenMP ARB, Beaverton, OR, USA
Michael Klemm
Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
University of Bristol, Bristol, UK
Tom Deakin
RWTH Aachen University, Aachen, Germany
Jannis Klinkenberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Daoudi, I., Gautier, T., Thibault, S., Perarnau, S. (2023). Improving Simulations of Task-Based Applications on Complex NUMA Architectures. In: McIntosh-Smith, S., Klemm, M., de Supinski, B.R., Deakin, T., Klinkenberg, J. (eds) OpenMP: Advanced Task-Based, Device and Compiler Programming. IWOMP 2023. Lecture Notes in Computer Science, vol 14114. Springer, Cham. https://doi.org/10.1007/978-3-031-40744-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-40744-4_13
Published: 01 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40743-7
Online ISBN: 978-3-031-40744-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Simulations of Task-Based Applications on Complex NUMA Architectures