ABSTRACT
This paper assesses and reports the experience of ten teams working to port, validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems, each one equipped with a server-class Arm CPU from Ampere Computing and two data center GPUs from NVIDIA Corp. The systems are connected together using InfiniBand interconnect. The selected applications and mini-apps are written using several programming languages and use multiple accelerator-based programming models for GPUs such as CUDA, OpenACC, and OpenMP offloading. Working on application porting requires a robust and easy-to-access programming environment, including a variety of compilers and optimized scientific libraries. The goal of this work is to evaluate platform readiness and assess the effort required from developers to deploy well-established scientific workloads on current and future generation Arm-based GPU-accelerated HPC systems. The reported case studies demonstrate that the current level of maturity and diversity of software and tools is already adequate for large-scale production deployments.
- Bilge Acun, David J. Hardy, Laxmikant Kale, Ke Li, James C. Phillips, and John E. Stone. 2018. Scalable Molecular Dynamics with NAMD on the Summit System. IBM Journal of Research and Development 62, 6 (2018), 4:1–4:9. https://doi.org/10.1147/JRD.2018.2888986Google ScholarDigital Library
- Holger Brunst, Sunita Chandrasekaran, Florina Ciorba, Nick Hagerty, Robert Henschel, Guido Juckeland, Junjie Li, Veronica G. Melesse Vergara, Sandra Wienke, and Miguel Zavala. 2022. First Experiences in Performance Benchmarking with the New SPEChpc 2021 Suites. https://doi.org/10.48550/ARXIV.2203.06751Google ScholarCross Ref
- S. H. Bryngelson, K. Schmidmayer, and T. Colonius. 2019. A quantitative comparison of phase-averaged models for bubbly, cavitating flows. International Journal of Multiphase Flow 115 (2019), 137–143. https://doi.org/10.1016/j.ijmultiphaseflow.2019.03.028Google ScholarCross Ref
- Spencer H Bryngelson, Kevin Schmidmayer, Vedran Coralic, Jomela C Meng, Kazuki Maeda, and Tim Colonius. 2021. MFC: An open-source high-order multi-component, multi-phase, and multi-scale compressible flow solver. Computer Physics Communications 266 (2021), 107396.Google ScholarCross Ref
- M. Bussmann, H. Burau, T. E. Cowan, A. Debus, A. Huebl, G. Juckeland, T. Kluge, W. E. Nagel, R. Pausch, F. Schmitt, U. Schramm, J. Schuchart, and R. Widera. 2013. Radiative Signatures of the Relativistic Kelvin–Helmholtz Instability. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC ’13). ACM, New York, NY, USA, Article 5, 12 pages. https://doi.org/10.1145/2503210.2504564Google ScholarDigital Library
- Aurélien Cavelan, Rubén M. Cabezón, Michal Grabarczyk, and Florina M. Ciorba. 2020. A Smoothed Particle Hydrodynamics Mini-App for Exascale. In Proceedings of the Platform for Advanced Scientific Computing Conference (Geneva, Switzerland) (PASC ’20). Association for Computing Machinery, New York, NY, USA, Article 11, 11 pages. https://doi.org/10.1145/3394277.3401855Google ScholarDigital Library
- A. Charalampopoulos, S. H. Bryngelson, T. Colonius, and T. P. Sapsis. 2022. Hybrid quadrature moment method for accurate and stable representation of non-Gaussian processes applied to bubble dynamics. Philosophical Transactions of the Royal Society A (2022).Google Scholar
- M. A. Clark and A. D. Kennedy. 2007. Accelerating staggered-Fermion dynamics with the rational hybrid Monte Carlo algorithm. Physical Review D 75, 1 (2007). https://doi.org/10.1103/physrevd.75.011502Google ScholarCross Ref
- Tom Deakin, Simon McIntosh-Smith, James Price, Andrei Poenaru, Patrick Atkinson, Codrin Popa, and Justin Salmon. 2019. Performance Portability across Diverse Computer Architectures. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 1–13. https://doi.org/10.1109/P3HPC49587.2019.00006Google ScholarCross Ref
- Tom Deakin, Andrei Poenaru, Tom Lin, and Simon McIntosh-Smith. 2020. Tracking Performance Portability on the Yellow Brick Road to Exascale. In 2020 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 1–13. https://doi.org/10.1109/P3HPC51967.2020.00006Google ScholarCross Ref
- Wael Elwasif, William Godoy, Nick Hagerty, J. Austin Harris, Oscar Hernandez, Balint Joo, Paul Kent, Damien Lebrun-Grandie, Elijah Maccarthy, Veronica G. Melesse Vergara, Bronson Messer, Ross Miller, Sarp Opal, Sergei Bastrakov, Michael Bussmann, Alexander Debus, Klaus Steinger, Jan Stephan, Rene Widera, Spencer H. Bryngelson, Henry Le Berre, Anand Radhakrishnan, Jefferey Young, Sunita Chandrasekaran, Florina Ciorba, Osman Simsek, Kate Clark Filippo Spiga, Jeff Hammond, John E. Stone. David Hardy, Sebastian Keller, and Jean-Guillaume Piccinali. Christian Trott. 2022. Application Experiences on a GPU-Accelerated Arm-based HPC Testbed. https://doi.org/10.48550/ARXIV.2209.09731Google ScholarCross Ref
- Catherine Feldman, Benjamin Michalowicz, Eva Siegmann, Tony Curtis, Alan Calder, and Robert Harrison. 2022. Experiences with Porting the FLASH Code to Ookami, an HPE Apollo 80 A64FX Platform. HPCAsia 2022 (to appear)(2022).Google Scholar
- E. Follana, Q. Mason, C. Davies, K. Hornbostel, G. P. Lepage, J. Shigemitsu, H. Trottier, and K. Wong. 2007. Highly improved staggered quarks on the lattice with applications to charm physics. Physical Review D 75, 5 (mar 2007). https://doi.org/10.1103/physrevd.75.054502Google ScholarCross Ref
- Nicholas Frontiere, J. D. Emberson, Michael Buehlmann, Joseph Adamo, Salman Habib, Katrin Heitmann, and Claude-André Faucher-Giguère. 2022. Simulating Hydrodynamics in Cosmology with CRK-HACC. https://doi.org/10.48550/ARXIV.2202.02840Google ScholarCross Ref
- Todd Gamblin, Matthew P. LeGendre, Michael R. Collette, Gregory L. Lee, Adam Moody, Bronis R. de Supinski, and W. Scott Futral. 2015. The Spack Package Manager: Bringing order to HPC software chaos. In Supercomputing 2015 (SC’15). Austin, Texas.Google Scholar
- J. Austin Harris, Ran Chu, Sean M Couch, Anshu Dubey, Eirik Endeve, Antigoni Georgiadou, Rajeev Jain, Daniel Kasen, M P Laiu, OE B Messer, Jared O’Neal, Michael A Sandoval, and Klaus Weide. 2022. Exascale models of stellar explosions: Quintessential multi-physics simulation. The International Journal of High Performance Computing Applications 36, 1(2022), 59–77. https://doi.org/10.1177/10943420211027937 arXiv:https://doi.org/10.1177/10943420211027937Google ScholarDigital Library
- William Humphrey, Andrew Dalke, and Klaus Schulten. 1996. VMD – Visual Molecular Dynamics. Journal of Molecular Graphics 14, 1 (1996), 33–38. https://doi.org/10.1016/0263-7855(96)00018-5Google ScholarCross Ref
- Laxmikant V. Kalé and Gengbin Zheng. 2013. Chapter 1: The Charm++ Programming Model. In Parallel Science and Engineering Applications: The Charm++ Approach (1st ed.), Laxmikant V. Kale and Abhinav Bhatele (Eds.). CRC Press, Inc., Boca Raton, FL, USA, Chapter 1, 1–16. https://doi.org/10.1201/b16251Google ScholarCross Ref
- Jeffrey Kelling, Sergei Bastrakov, Alexander Debus, Thomas Kluge, Matt Leinhauser, Richard Pausch, Klaus Steiniger, Jan Stephan, René Widera, Jeff Young, 2021. Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading. arXiv preprint arXiv:2110.08650(2021).Google Scholar
- P. R. C. Kent, Abdulgani Annaberdiyev, Anouar Benali, M. Chandler Bennett, Edgar Josué Landinez Borda, Peter Doak, Hongxia Hao, Kenneth D. Jordan, Jaron T. Krogel, Ilkka Kylänpää, Joonho Lee, Ye Luo, Fionn D. Malone, Cody A. Melton, Lubos Mitas, Miguel A. Morales, Eric Neuscamman, Fernando A. Reboredo, Brenda Rubenstein, Kayahan Saritas, Shiv Upadhyay, Guangming Wang, Shuai Zhang, and Luning Zhao. 2020. QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion quantum Monte Carlo. The Journal of Chemical Physics 152 (2020), 174105. https://doi.org/10.1063/5.0004860Google ScholarCross Ref
- M. Paul Laiu, Eirik Endeve, Ran Chu, J. Austin Harris, and O. E. Bronson Messer. 2021. A DG-IMEX Method for Two-moment Neutrino Transport: Nonlinear Solvers for Neutrino-Matter Coupling. Astrophys. J., Suppl. Ser. 253, 2, Article 52 (April 2021), 52 pages. https://doi.org/10.3847/1538-4365/abe2a8 arxiv:2102.02186 [astro-ph.HE]Google ScholarCross Ref
- Elijah A MacCarthy, Chengxin Zhang, Yang Zhang, and KC Dukka. 2022. GPU-I-TASSER: a GPU accelerated I-TASSER protein structure prediction tool. Bioinformatics (2022).Google Scholar
- Alexander Matthes, René Widera, Erik Zenker, Benjamin Worpitz, Axel Huebl, and Michael Bussmann. 2017. Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library. In High Performance Computing, Julian M. Kunkel, Rio Yokota, Michela Taufer, and John Shalf (Eds.). Springer International Publishing, Cham, 496–514. https://doi.org/10.1007/978-3-319-67630-2_36Google ScholarCross Ref
- Simon McIntosh-Smith, James Price, Andrei Poenaru, and Tom Deakin. 2020. Benchmarking the first generation of production quality Arm-based supercomputers. Concurrency and Computation: Practice and Experience 32, 20(2020), e5569.Google ScholarCross Ref
- Marcelo C. R. Melo, Rafael C. Bernardi, Till Rudack, Maximilian Scheurer, Christoph Riplinger, James C. Phillips, Julio D. C. Maia, Gerd B. Rocha, João V. Ribeiro, John E. Stone, Frank Nesse, Klaus Schulten, and Zaida Luthey-Schulten. 2018. NAMD goes quantum: An integrative suite for hybrid simulations. Nature Methods 15(2018), 351–354.Google ScholarCross Ref
- James C. Phillips, David J. Hardy, Julio D. C. Maia, John E. Stone, João V. Ribeiro, Rafael C. Bernardi, Ronak Buch, Giacomo Fiorin, Jérôme Hénin, Wei Jiang, Ryan McGreevy, Marcelo C. R. Melo, Brian Radak, Robert D. Skeel, Abhishek Singharoy, Yi Wang, Benoît Roux, Aleksei Aksimentiev, Zaida Luthey-Schulten, Laxmikant V. Kalé, Klaus Schulten, Christophe Chipot, and Emad Tajkhorshid. 2020. Scalable molecular dynamics on CPU and GPU architectures with NAMD. Journal of Chemical Physics 153 (2020), 044130. https://doi.org/10.1063/5.0014475Google ScholarCross Ref
- Nikola Rajovic, Alejandro Rico, Nikola Puzovic, Chris Adeniyi-Jones, and Alex Ramirez. 2014. Tibidabo: Making the case for an ARM-based HPC system. Future Generation Computer Systems 36 (2014), 322–334.Google ScholarCross Ref
- Mitsuhisa Sato, Yutaka Ishikawa, Hirofumi Tomita, Yuetsu Kodama, Tetsuya Odajima, Miwako Tsuji, Hisashi Yashiro, Masaki Aoki, Naoyuki Shida, Ikuo Miyoshi, Kouichi Hirai, Atsushi Furuya, Akira Asato, Kuniki Morita, and Toshiyuki Shimizu. 2020. Co-Design for A64FX Manycore Processor and “Fugaku”. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–15. https://doi.org/10.1109/SC41405.2020.00051Google ScholarCross Ref
- K. Schmidmayer, S. H. Bryngelson, and T. Colonius. 2020. An assessment of multicomponent flow models and interface capturing schemes for spherical bubble dynamics. J. Comput. Phys. 402(2020), 109080. https://doi.org/10.1016/j.jcp.2019.109080Google ScholarDigital Library
- N. Stephens, S. Biles, M. Boettcher, J. Eapen, M. Eyole, G. Gabrielli, M. Horsnell, G. Magklis, A. Martinez, N. Premillieu, A. Reid, A. Rico, and P. Walker. 2017. The ARM Scalable Vector Extension. IEEE Micro 37, 02 (mar 2017), 26–39. https://doi.org/10.1109/MM.2017.35Google ScholarDigital Library
- John E. Stone, Michael J. Hallock, James C. Phillips, Joseph R. Peterson, Zaida Luthey-Schulten, and Klaus Schulten. 2016. Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads. 2016 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)(2016), 89–100. https://doi.org/10.1109/IPDPSW.2016.130Google ScholarCross Ref
- John E. Stone, David J. Hardy, Jan Saam, Kirby L. Vandivort, and Klaus Schulten. 2011. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals. In GPU Computing Gems, Wen-mei Hwu (Ed.). Morgan Kaufmann Publishers, Chapter 1, 5–18.Google Scholar
- John E. Stone, David J. Hardy, Ivan S. Ufimtsev, and Klaus Schulten. 2010. GPU-Accelerated Molecular Modeling Coming of Age. J. Molecular Graphics and Modelling 29 (2010), 116–125.Google ScholarCross Ref
- John E. Stone, Antti-Pekka Hynninen, James C. Phillips, and Klaus Schulten. 2016. Early Experiences Porting the NAMD and VMD Molecular Simulation and Analysis Software to GPU-Accelerated OpenPOWER Platforms. International Workshop on OpenPOWER for HPC (IWOPH’16) (2016), 188–206.Google ScholarCross Ref
- John E. Stone, Jan Saam, David J. Hardy, Kirby L. Vandivort, Wen-mei W. Hwu, and Klaus Schulten. 2009. High Performance Computation and Interactive Display of Molecular Orbitals on GPUs and Multi-core CPUs. In Proceedings of the 2nd Workshop on General-Purpose Processing on Graphics Processing Units, ACM International Conference Proceeding Series, Vol. 383. ACM, New York, NY, USA, 9–18.Google ScholarDigital Library
- A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in ’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton. 2022. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comp. Phys. Comm. 271(2022), 108171. https://doi.org/10.1016/j.cpc.2021.108171Google ScholarCross Ref
- Christian R. Trott, Damien Lebrun-Grandié, Daniel Arndt, Jan Ciesko, Vinh Dang, Nathan Ellingwood, Rahulkumar Gayatri, Evan Harvey, Daisy S. Hollman, Dan Ibanez, Nevin Liber, Jonathan Madsen, Jeff Miles, David Poliakoff, Amy Powell, Sivasankaran Rajamanickam, Mikael Simberg, Dan Sunderland, Bruno Turcksin, and Jeremiah Wilke. 2022. Kokkos 3: Programming Model Extensions for the Exascale Era. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 805–817. https://doi.org/10.1109/TPDS.2021.3097283Google ScholarCross Ref
- Verónica G Vergara Larrea, Wayne Joubert, Michael J Brim, Reuben D Budiardja, Don Maxwell, Matt Ezell, Christopher Zimmer, Swen Boehm, Wael Elwasif, Sarp Oral, 2019. Scaling the summit: deploying the world’s fastest supercomputer. In International Conference on High Performance Computing. Springer, 330–351.Google ScholarDigital Library
- Wei Zheng, Chengxin Zhang, Eric W Bell, and Yang Zhang. 2019. I-TASSER gateway: a protein structure and function prediction server powered by XSEDE. Future Generation Computer Systems 99 (2019), 73–85.Google ScholarDigital Library
Index Terms
- Application Experiences on a GPU-Accelerated Arm-based HPC Testbed
Recommendations
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
AbstractGPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
Optimizing linpack benchmark on GPU-accelerated petascale supercomputer
Special issue on Community Analysis and Information RecommendationIn this paper we present the programming of the Linpack benchmark on TianHe-1 system, the first petascale supercomputer system of China, and the largest GPU-accelerated heterogeneous system ever attempted before. A hybrid programming model consisting of ...
A GPU accelerated storage system
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed ComputingMassively multicore processors, like, for example, Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude ...
Comments