research-article

Performance Analysis Tools for MPI Applications and their Use in Programming Education

Authors:
Anna-Lena Roth

University of Applied Sciences, Fulda, Germany

University of Applied Sciences, Fulda, Germany

0000-0002-6463-3486
View Profile

,
Tim Süß

University of Applied Sciences, Fulda, Germany

University of Applied Sciences, Fulda, Germany

0000-0001-9935-798X
View Profile

ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance EngineeringApril 2023Pages 361–368https://doi.org/10.1145/3578245.3584358

Published:15 April 2023Publication History

ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering

Pages 361–368

ABSTRACT

Performance analysis tools are frequently used to support the development of parallel MPI applications. They facilitate the detection of errors, bottlenecks, or inefficiencies but differ substantially in their instrumentation, measurement, and type of feedback. Especially, tools that provide visual feedback are helpful for educational purposes. They provide a visual abstraction of program behavior, supporting learners to identify and understand performance issues and write more efficient code. However, existing professional tools for performance analysis are very complex, and their use in beginner courses can be very demanding. Foremost, their instrumentation and measurement require deep knowledge and take a long time. Immediate, as well as straightforward feedback, is essential to motivate learners. This paper provides an extensive overview of performance analysis tools for parallel MPI applications, which experienced developers broadly use today. It also gives an overview of existing educational tools for parallel programming with MPI and shows their shortcomings compared to professional tools. Using tools for performance analysis of MPI programs in educational scenarios can promote the understanding of program behavior in large HPC systems and support learning parallel programming. At the same time, the complexity of the programs and the lack of infrastructure in educational institutions are barriers. These aspects will be considered and discussed in detail.

References

1998. GNU gprof - The GNU profiler. https://ftp.gnu.org/old-gnu/Manuals/gprof-2.9.1/html_mono/gprof.html. Accessed: 2022--11--16.Google Scholar
2000--2022. Valgrind. https://valgrind.org. Accessed: 2022--11--16.Google Scholar
2020. mpiP 3.5. https://github.com/LLNL/mpiP. Accessed: 2022--11-04.Google Scholar
2022. Open|Speedshop. https://openspeedshop.org. Accessed: 2022--10--26.Google Scholar
2022. SAUCE - System for AUtomated Code Evaluation. https://github.com/moschlar/SAUCE. Accessed: 2022--11-02.Google Scholar
Dorian C. Arnold, Dong H. Ahn, Bronis R. de Supinski, Gregory L. Lee, Barton P. Miller, and Martin Schulz. 2007. Stack Trace Analysis for Large Scale Debugging. In 21th International Parallel and Distributed Processing Symposium (IPDPS 2007). IEEE, 1--10. https://doi.org/10.1109/IPDPS.2007.370254Google ScholarCross Ref
Juelich Supercomputing Centre at Forschungszentrum Juelich and Innovative Computing Laboratory at the University of Tennessee. 2022. KOJAK. https://icl.utk.edu/kojak/index.html. Accessed: 2022--10--24.Google Scholar
Jean-Baptiste Besnard, Marc Pérache, and William Jalby. 2013. Event Streaming for Online Performance Measurements Reduction. In 42nd International Conference on Parallel Processing (ICPP 2013). IEEE Computer Society, 985--994. https://doi.org/10.1109/ICPP.2013.117Google ScholarDigital Library
David Boehme. 2015--2021. Caliper: A Performance Analysis Toolbox in a Library. http://software.llnl.gov/Caliper/. Accessed: 2022--10--20.Google Scholar
David Boehme. 2020. Tool Time: Caliper - A Performance Analysis Toolbox in a Library. https://pop-coe.eu/blog/tool-time-caliper-a-performance-analysis-toolbox-in-a-library.Google Scholar
David Böhme, Pascal Aschwanden, Olga Pearce, Kenneth Weiss, and Matthew P. LeGendre. 2021. Ubiquitous Performance Analysis. In High Performance Computing - 36th International Conference (ISC High Performance 2021) (Lecture Notes in Computer Science, Vol. 12728). Springer, 431--449. https://doi.org/10.1007/978--3-030--78713--4_23Google ScholarCross Ref
David Böhme, Todd Gamblin, David Beckingsale, Peer-Timo Bremer, Alfredo Giménez, Matthew P. LeGendre, Olga Pearce, and Martin Schulz. 2016. Caliper: performance introspection for HPC software stacks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2016). IEEE Computer Society, 550--560. https://doi.org/10.1109/SC.2016.46Google ScholarCross Ref
BSC. 2022. Paraver. https://tools.bsc.es/paraver. Accessed: 2022--10--24.Google Scholar
Henri Casanova, Arnaud Legrand, Martin Quinson, and Frédéric Suter. 2018. SMPI Courseware: Teaching Distributed-Memory Computing with MPI in Simulation. In 2018 IEEE/ACM Workshop on Education for High- Performance Computing (EduHPC@SC 2018). IEEE, 21--30. https://doi.org/10.1109/EduHPC.2018.00006Google ScholarCross Ref
Intel Corporation. [n.d.]. Intel Trace Analyzer and Collector (ITAC). https://www.intel.com/content/www/us/en/developer/tools/oneapi/trace-analyzer.html#gs.ijzdvr. Accessed: 2022--11--18.Google Scholar
Association Curricula. 2013. Computer Science Curricula 2013: Curriculum Guidelines for Undergraduate Degree Programs in Computer Science. (2013). https://doi.org/10.1145/2534860Google ScholarDigital Library
Technische Universtiaet Darmstadt and ETH Zurich. 2020. Extra-P. https://github.com/extra-p/extrap. Accessed: 2022--10--24.Google Scholar
Constantinos T. Delistavrou and Konstantinos G. Margaritis. 2010. Survey of Software Environments for Parallel Distributed Processing: Parallel Programming Education on Real Life Target Systems Using Production Oriented Software Tools. In 14th Panhellenic Conference on Informatics (PCI 2010). IEEE Computer Society, 231--236. https://doi.org/10.1109/PCI.2010.26Google ScholarDigital Library
Constantinos T. Delistavrou and Konstantinos G. Margaritis. 2011. Towards an Integrated Teaching Environment for Parallel Programming. In 15th Panhellenic Conference on Informatics (PCI 2011). IEEE Computer Society, 3--7. https://doi.org/10.1109/PCI.2011.16Google ScholarDigital Library
Eclipse Foundation. 2022. Eclipse Parallel Tools Platform (PTP). https://www.eclipse.org/ptp/. Accessed: 2022--11--16.Google Scholar
Markus Geimer, Felix Wolf, Brian J. N. Wylie, Erika Ábrahám, Daniel Becker, and Bernd Mohr. 2010. The Scalasca performance toolset architecture. Concurr. Comput. Pract. Exp. 22, 6 (2010), 702--719. https://doi.org/10.1002/cpe.1556Google ScholarCross Ref
Victor Gergel, Evgeny Kozinov, Alexey Linev, and Anton Shtanyuk. 2016. Educational and Research Systems for Evaluating the Efficiency of Parallel Computations. In Algorithms and Architectures for Parallel Processing (ICA3PP 2016) (Lecture Notes in Computer Science, Vol. 10049). Springer, 278--290. https://doi.org/10.1007/978--3--319--49956--7_22Google ScholarCross Ref
Michael Gerndt, Ventsislav Petkov, and Yuri Oleynik. 2010. Performance analysis with Periscope. https://www.vi-hps.org/cms/upload/material/tw10/vi-hps-tw10-Periscope_Overview.pdf. Accessed: 2022--10--24.Google Scholar
GWT-TUD GmbH. 2022. Vampir. https://vampir.eu. Accessed: 2022--10--24.Google Scholar
Marjan Gusev, Sasko Ristov, Goran Velkoski, and Bisera Ivanovska. 2014. E-learning and Benchmarking Platform for Parallel and Distributed Computing. Int. J. Emerg. Technol. Learn. 9, 2 (2014), 17--21. https://doi.org/10.3991/ijet.v9i2.3215Google ScholarCross Ref
Tobias Hilbrich. 2014. Runtime MPI Correctness Checking with a Scalable Tools Infrastructure. Ph. D. Dissertation. Dresden University of Technology. https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa-175472Google Scholar
Tobias Hilbrich, Joachim Protze, Martin Schulz, Bronis R. de Supinski, and Matthias S. Müller. 2012. MPI runtime error detection with MUST: advances in deadlock detection. In SC Conference on High Performance Computing Networking, Storage and Analysis (SC 2012). IEEE/ACM, 30. https://doi.org/10.1109/SC.2012.79Google ScholarDigital Library
Rice University Houston. 2000--2022. HPCToolkit. http://hpctoolkit.org/index.html. Accessed: 2022--10--24.Google Scholar
Alan Humphrey, Christopher Derrick, Ganesh Gopalakrishnan, and Beth Tibbitts. 2010. GEM: Graphical Explorer of MPI Programs. In 39th International Conference on Parallel Processing (ICPP Workshops 2010). IEEE Computer Society, 161--168. https://doi.org/10.1109/ICPPW.2010.33Google ScholarDigital Library
David A. Joiner, Paul Gray, Thomas Murphy, and Charles Peck. 2006. Teaching parallel computing to science faculty: best practices and common pitfalls. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP 2006). ACM, 239--246. https://doi.org/10.1145/1122971.1123007Google ScholarDigital Library
Forschungszentrum Juelich. [n.d.]. Score-P, scalable performance measurement infrastructure for parallel codes. https://scorepci.pages.jsc.fz-juelich.de/scorep-pipelines/docs/scorep-6.0/html/index.html. Accessed: 2022--10--16.Google Scholar
Forschungszentrum Juelich. [n.d.]. Score-P, Scalable performance measurement infrastructure for parallel codes. https://scorepci.pages.jsc.fz-juelich.de/scorep-pipelines/docs/scorep-6.0/html/index.html. Accessed: 2022--10--20.Google Scholar
Forschungszentrum Juelich and Technische Universitaet Darmstadt. 2022. Scalasca. https://www.scalasca.orgl. Accessed: 2022--10--24.Google Scholar
Torsten Kempf, Kingshuk Karuri, and Lei Gao. 2008. Software Instrumentation. In Wiley Encyclopedia of Computer Science and Engineering. John Wiley & Sons, Inc. https://doi.org/10.1002/9780470050118.ecse386Google ScholarCross Ref
Michael Knobloch and Bernd Mohr. 2020. Tools for GPU Computing - Debugging and Performance Analysis of Heterogenous HPC Applications. Supercomput. Front. Innov. 7, 1 (2020), 91--111. https://doi.org/10.14529/jsfi200105Google ScholarCross Ref
Andreas Knüpfer, Christian Rössel, Dieter an Mey, Scott Biersdorff, Kai Diethelm, Dominic Eschweiler, Markus Geimer, Michael Gerndt, Daniel Lorenz, Allen D. Malony, Wolfgang E. Nagel, Yury Oleynik, Peter Philippen, Pavel Saviankou, Dirk Schmidl, Sameer Shende, Ronny Tschüter, Michael Wagner, Bert Wesarg, and Felix Wolf. 2011. Score-P: A Joint Performance Measurement Run- Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In Tools for High Performance Computing 2011 - Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing. Springer, 79--91. https://doi.org/10.1007/978--3--642--31476--6_7Google ScholarCross Ref
Eileen T. Kraemer and John T. Stasko. 1993. The Visualization of Parallel Systems: An Overview. J. Parallel Distributed Comput. 18, 2 (1993), 105--117. https://doi.org/10.1006/jpdc.1993.1050Google ScholarDigital Library
B. Krammer, K. Bidmon, M.S. Müller, and M.M. Resch. 2004. MARMOT: An MPI analysis and checking tool. In Parallel Computing. Advances in Parallel Computing, Vol. 13. North-Holland, 493--500. https://doi.org/10.1016/S0927--5452(04)80063--7Google ScholarCross Ref
Lawrence Livermore National Laboratory. [n.d.]. STAT: Stack Trace Analysis Tool. https://hpc.llnl.gov/software/development-environment-software/stat-stack-trace-analysis-tool. Accessed: 2022--10--20.Google Scholar
Chee Wai Lee, Allen D. Malony, and Alan Morris. 2010. TAUmon: Scalable Online Performance Data Analysis in TAU. In Euro-Par 2010 Parallel Processing Workshops - HeteroPar, HPCC, HiBB, CoreGrid, UCHPC, HPCF, PROPER, CCPI, VHPC (Lecture Notes in Computer Science, Vol. 6586). Springer, 493--499. https://doi.org/10.1007/978--3--642--21878--1_61Google ScholarCross Ref
Arm Limited. 2022. ARM DDT, The Number One Debugger for C, C and Fortran, Threaded and Parallel Code. https://www.arm.com/products/development-tools/server-and-hpc/forge/ddt. Accessed: 2022--10--20.Google Scholar
Arm Limited. 2022. ARM Performance Reports. https://developer.arm.com/tools-and-software/server-and-hpc/debug-and-profile/arm-forge/arm-performance-reports. Accessed: 2022--10--20.Google Scholar
Preeti Malakar. 2019. Experiences of Teaching Parallel Computing to Undergraduates and Post-Graduates. In 26th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW 2019). IEEE, 40--47. https://doi.org/10.1109/HiPCW.2019.00020Google ScholarCross Ref
John Mellor-Crummey, Nathan R. Tallent, Mike Fagan, and Jan Odegard. 2007. Application performance profiling on the Cray XD1 using HPCToolkit. In Proc. of the Cray User's Group.Google Scholar
Robert Mijakovic, Michael Firbach, and Michael Gerndt. 2016. An architecture for flexible auto-tuning: The Periscope Tuning Framework 2.0. In 2nd International Conference on Green High Performance Computing (ICGHPC 2016). IEEE, 1--9. https://doi.org/10.1109/ICGHPC.2016.7508066Google ScholarCross Ref
Barton P. Miller, Mark D. Callaghan, Jonathan M. Cargille, Jeffrey K. Hollingsworth, R. Bruce Irvin, Karen L. Karavanic, Krishna Kunchithapadam, and Tia Newhall. 1995. The Paradyn Parallel Performance Measurement Tool. Computer 28, 11 (1995), 37--46. https://doi.org/10.1109/2.471178Google ScholarDigital Library
Bernd Mohr. 2014. Scalable parallel performance measurement and analysis tools - state-of-the-art and future challenges. Supercomput. Front. Innov. 1, 2 (2014), 108--123. https://doi.org/10.14529/jsfi140207Google ScholarDigital Library
Shirley Moore, David Cronk, Kevin S. London, and Jack J. Dongarra. 2001. Review of Performance Analysis Tools for MPI Parallel Programs. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 8th European PVM/MPI Users' Group Meeting (Lecture Notes in Computer Science, Vol. 2131). Springer, 241--248. https://doi.org/10.1007/3--540--45417--9_34Google ScholarCross Ref
Aroon Nataraj, Matthew J. Sottile, Alan Morris, Allen D. Malony, and Sameer Shende. 2007. TAUoverSupermon : Low-Overhead Online Parallel Performance Monitoring. In Euro-Par 2007, Parallel Processing, 13th International Euro-Par Conference (Lecture Notes in Computer Science, Vol. 4641). Springer, 85--96. https://doi.org/10.1007/978--3--540--74466--5_11Google ScholarCross Ref
Department of Computer and Information Science University of Oregon. 1997- 2020. TAU, Tuning and Analysis Utilities. http://www.tau.uoregon.edu. Accessed: 2022--10--24.Google Scholar
University of Versailles St Quentin. 2004--2021. Maqao (Modular Assembly Quality Analyzer and Optimizer). http://http://www.maqao.org. Accessed: 2022--10--26.Google Scholar
Inc. Perforce Software. 2022. TotalView HPC Debugging Software. https://totalview.io/products/totalview. Accessed: 2022--10--20.Google Scholar
Sushil K. Prasad, Almadena Yu. Chtchelkanova, Sajal K. Das, Frank Dehne, Mohamed G. Gouda, Anshul Gupta, Joseph F. JáJá, Krishna Kant, Anita La Salle, Richard LeBlanc, Manish Lumsdaine, David A. Padua, Manish Parashar, Viktor K. Prasanna, Yves Robert, Arnold L. Rosenberg, Sartaj Sahni, Behrooz A. Shirazi, Alan Sussman, Charles C. Weems, and Jie Wu. 2011. NSF/IEEE-TCPP curriculum initiative on parallel and distributed computing: core topics for undergraduates. In Proceedings of the 42nd ACM technical symposium on Computer science education (SIGCSE 2011). ACM, 617--618. https://doi.org/10.1145/1953163.1953336Google ScholarDigital Library
Joachim Protze, Tobias Hilbrich, Martin Schulz, Bronis R. de Supinski, Wolfgang E. Nagel, and Matthias S. Müller. 2014. MPI Runtime Error Detection with MUST: A Scalable and Crash-Safe Approach. In 43rd International Conference on Parallel Processing Workshops, (ICPPW 2014). IEEE Computer Society, 206--215. https://doi.org/10.1109/ICPPW.2014.37Google ScholarDigital Library
Readex. 2020. Periscope Tuning Framework. https://www.readex.eu/index.php/periscope-tuning-framework/p. Accessed: 2022--10--24.Google Scholar
Sasko Ristov, Marjan Gusev, Blagoj Atanasovski, and Nenad Anchev. 2013. Using EDUCache Simulator for the Computer Architecture and Organization Course. Int. J. Eng. Pedagog. 3, 3 (2013), 47--56. https://doi.org/10.3991/ijep.v3i3.2784Google ScholarCross Ref
Sasko Ristov, Marjan Gusev, and Goran Velkoski. 2014. Cloud E-learning and Benchmarking Platform for the Parallel and Distributed Computing Course. In 2014 IEEE Global Engineering Education Conference (EDUCON 2014). IEEE, 645--651. https://doi.org/10.1109/EDUCON.2014.6826161Google ScholarCross Ref
Utah School of Computing. [n.d.]. GEM - Graphical Explorer of MPI Programs. http://formalverification.cs.utah.edu/GEM/. Accessed: 2022--11-04.Google Scholar
Utah School of Computing. [n.d.]. ISP (In-situ Partial Order): a dynamic verifier for MPI Programs. http://formalverification.cs.utah.edu/ISP-release/. Accessed: 2022--11-04.Google Scholar
Martin Schulz, Jim Galarowicz, Don Maghrak, William Hachfeld, David Montoya, and Scott Cranford. 2008. Open | SpeedShop: An open source infrastructure for parallel performance analysis. Sci. Program. 16, 2--3 (2008), 105--121. https://doi.org/10.3233/SPR-2008-0256Google ScholarCross Ref
Martin Schulz, Jim Galarowicz, Don Maghrak, William Hachfeld, David Montoya, and Scott Cranford. 2009. Analyzing the performance of Scientific Applications with Open|SpeedShop. In Parallel Computational Fluid Dynamics. 151--159.Google Scholar
Sameer Shende. 1999. Profiling and tracing in linux. In In Proceedings of Extreme Linux Workshop.Google Scholar
Sameer Shende and Allen D. Malony. 2006. The Tau Parallel Performance System. Int. J. High Perform. Comput. Appl. 20, 2 (2006), 287--311. https://doi.org/10.1177/1094342006064482Google ScholarDigital Library
Elizabeth Shoop, Richard A. Brown, Eric Biggers, Malcolm Kane, Devry Lin, and Maura Warner. 2012. Virtual clusters for parallel and distributed education. In Proceedings of the 43rd ACM technical symposium on Computer science education (SIGCSE 2012). ACM, 517--522. https://doi.org/10.1145/2157136.2157287Google ScholarDigital Library
BSC Tools. 2022. Extrae. https://tools.bsc.es/extrae. Accessed: 2022--10--20.Google Scholar
Lobachevsky University. 2022. ParaLab. https://hpc-education.unn.ru/en/trainings/teachware/paralab. Accessed: 2022--11-02.Google Scholar
Lobachevsky University. 2022. ParaLib -- Parallel Computational Methods Library. https://hpc-education.unn.ru/en/trainings/teachware/paralib. Accessed: 2022--11-02.Google Scholar
RTWH Aachen University. 2022. MUST - MPI Runtime Correctness Analysis. https://itc.rwth-aachen.de/must/. Accessed: 2022--10--20.Google Scholar
University of Wisconsin University of Maryland. 2019. Dyninst. https://www.dyninst.org. Accessed: 2022--10--20.Google Scholar
Computer Sciences Department University of Wisconsin. 2020. Paradyn. http://www.paradyn.org/overview/screen-shots.html. Accessed: 2022--10--20.Google Scholar
Sarvani S. Vakkalanka, Subodh Sharma, Ganesh Gopalakrishnan, and Robert M. Kirby. 2008. ISP: a tool for model checking MPI programs. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP 2008). ACM, 285--286. https://doi.org/10.1145/1345206.1345258Google ScholarDigital Library
Cédric Valensi, William Jalby, Mathieu Tribalat, Emmanuel Oseret, Salah Ibnamar, and Kevin Camus. 2019. Using MAQAO to Analyse and Optimise an Application. In 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2019). 423--424. https://doi.org/10.1109/MASCOTS.2019.00052Google ScholarCross Ref
Jeffrey Vetter and Chris Chambreau. 2014. mpiP: Lightweight, Scalable MPI Profiling. http://gec.di.uminho.pt/Discip/MInf/cpd1415/PCP/MPI/mpiP_%20Lightweight,%20Scalable%20MPI%20Profiling.pdf. Accessed: 2022--11-04.Google Scholar
Jeffrey S. Vetter and Bronis R. de Supinski. 2000. Dynamic Software Testing of MPI Applications with Umpire. In Proceedings Supercomputing 2000. IEEE Computer Society, 51. https://doi.org/10.1109/SC.2000.10055Google ScholarCross Ref
Jack Whitham. 2016. Profiling versus Tracing. https://www.jwhitham.org/2016/02/profiling-versus-tracing.html. Accessed: 2022--10--17.Google Scholar
Ali Yazici, Alok Mishra, and Ziya Karakaya. 2016. Teaching Parallel Computing Concepts Using Real-Life Applications*. International Journal of Engineering Education 32 (03 2016), 772--781.Google Scholar
Gonzalo Zarza, Diego Lugones, Daniel Franco, and Emilio Luque. 2012. An Innovative Teaching Strategy to Understand High-Performance Systems through Performance Evaluation. In Proceedings of the International Conference on Computational Science (ICCS 2012) (Procedia Computer Science, Vol. 9). Elsevier, 1733--1742. https://doi.org/10.1016/j.procs.2012.04.191Google ScholarCross Ref
Yuxiao Zhang, Jiang Li, Di Wu, and Yunfei Du. 2018. Improving Student Skills on Parallel Programming via Code Evaluation and Feedback Debugging. In IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE 2018). IEEE, 1069--1073. https://doi.org/10.1109/TALE.2018.8615351Google ScholarCross Ref
Ilya Zhukov, Christian Feld, Markus Geimer, Bernd Mohr, Michael Knobloch, and Pavel Saviankou. 2015. Scalasca v2: Back to the Future. In Tools for High Performance Computing 2014. Springer International Publishing, 1--24. https://doi.org/10.1007/978--3--319--16012--2_1Google ScholarCross Ref

Index Terms

Performance Analysis Tools for MPI Applications and their Use in Programming Education

Recommendations

An Overhead Analysis of MPI Profiling and Tracing Tools
PERMAVOST '22: Proceedings of the 2nd Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn Strategy

MPI performance analysis tools are important instruments for finding performance bottlenecks in large-scale MPI applications. These tools commonly support either the profiling or the tracing of parallel applications. Depending on the type of analysis, ...
Read More
Tools-supported HPF and MPI parallelization of the NAS parallel benchmarks
FRONTIERS '96: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation

High Performance Fortran (HPF) compilers and communication libraries with the standardized Message Passing Interface (MPI) are becoming widely available, easing the development of portable parallel applications. The Annai tool environment supports ...
Read More
Benefits of Cross Memory Attach for MPI libraries on HPC Clusters
XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment

With the number of cores per node increasing in modern clusters, an efficient implementation of intra-node communications is critical for application performance. MPI libraries generally use shared memory mechanisms for communication inside the node, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering
April 2023
421 pages
ISBN:9798400700729
DOI:10.1145/3578245
General Chairs:
Marco Vieira
University of Coimbra, Portugal
,
Valeria Cardellini
University of Rome Tor Vergata, Italy
,
Program Chairs:
Antinisca Di Marco
University of L'Aquila, Italy
,
Petr Tuma
Charles University, Czechia
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 April 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HPC
MPI
education
parallel programming
performance analysis
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate252of851submissions,30%
Upcoming Conference
ICPE '24

Sponsor:

sigsoft online

sigsoft online

15th ACM/SPEC International Conference on Performance Engineering

May 7 - 11, 2024

London , United Kingdom
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 107
  Total Downloads
- Downloads (Last 12 months)107
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Performance Analysis Tools for MPI Applications and their Use in Programming Education

ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Overhead Analysis of MPI Profiling and Tracing Tools

Tools-supported HPF and MPI parallelization of the NAS parallel benchmarks

Benefits of Cross Memory Attach for MPI libraries on HPC Clusters