skip to main content
10.1145/3581576.3581621acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article
Public Access

Application Experiences on a GPU-Accelerated Arm-based HPC Testbed

Authors Info & Claims
Published:27 February 2023Publication History

ABSTRACT

This paper assesses and reports the experience of ten teams working to port, validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems, each one equipped with a server-class Arm CPU from Ampere Computing and two data center GPUs from NVIDIA Corp. The systems are connected together using InfiniBand interconnect. The selected applications and mini-apps are written using several programming languages and use multiple accelerator-based programming models for GPUs such as CUDA, OpenACC, and OpenMP offloading. Working on application porting requires a robust and easy-to-access programming environment, including a variety of compilers and optimized scientific libraries. The goal of this work is to evaluate platform readiness and assess the effort required from developers to deploy well-established scientific workloads on current and future generation Arm-based GPU-accelerated HPC systems. The reported case studies demonstrate that the current level of maturity and diversity of software and tools is already adequate for large-scale production deployments.

References

  1. Bilge Acun, David J. Hardy, Laxmikant Kale, Ke Li, James C. Phillips, and John E. Stone. 2018. Scalable Molecular Dynamics with NAMD on the Summit System. IBM Journal of Research and Development 62, 6 (2018), 4:1–4:9. https://doi.org/10.1147/JRD.2018.2888986Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Holger Brunst, Sunita Chandrasekaran, Florina Ciorba, Nick Hagerty, Robert Henschel, Guido Juckeland, Junjie Li, Veronica G. Melesse Vergara, Sandra Wienke, and Miguel Zavala. 2022. First Experiences in Performance Benchmarking with the New SPEChpc 2021 Suites. https://doi.org/10.48550/ARXIV.2203.06751Google ScholarGoogle ScholarCross RefCross Ref
  3. S. H. Bryngelson, K. Schmidmayer, and T. Colonius. 2019. A quantitative comparison of phase-averaged models for bubbly, cavitating flows. International Journal of Multiphase Flow 115 (2019), 137–143. https://doi.org/10.1016/j.ijmultiphaseflow.2019.03.028Google ScholarGoogle ScholarCross RefCross Ref
  4. Spencer H Bryngelson, Kevin Schmidmayer, Vedran Coralic, Jomela C Meng, Kazuki Maeda, and Tim Colonius. 2021. MFC: An open-source high-order multi-component, multi-phase, and multi-scale compressible flow solver. Computer Physics Communications 266 (2021), 107396.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Bussmann, H. Burau, T. E. Cowan, A. Debus, A. Huebl, G. Juckeland, T. Kluge, W. E. Nagel, R. Pausch, F. Schmitt, U. Schramm, J. Schuchart, and R. Widera. 2013. Radiative Signatures of the Relativistic Kelvin–Helmholtz Instability. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC ’13). ACM, New York, NY, USA, Article 5, 12 pages. https://doi.org/10.1145/2503210.2504564Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Aurélien Cavelan, Rubén M. Cabezón, Michal Grabarczyk, and Florina M. Ciorba. 2020. A Smoothed Particle Hydrodynamics Mini-App for Exascale. In Proceedings of the Platform for Advanced Scientific Computing Conference (Geneva, Switzerland) (PASC ’20). Association for Computing Machinery, New York, NY, USA, Article 11, 11 pages. https://doi.org/10.1145/3394277.3401855Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Charalampopoulos, S. H. Bryngelson, T. Colonius, and T. P. Sapsis. 2022. Hybrid quadrature moment method for accurate and stable representation of non-Gaussian processes applied to bubble dynamics. Philosophical Transactions of the Royal Society A (2022).Google ScholarGoogle Scholar
  8. M. A. Clark and A. D. Kennedy. 2007. Accelerating staggered-Fermion dynamics with the rational hybrid Monte Carlo algorithm. Physical Review D 75, 1 (2007). https://doi.org/10.1103/physrevd.75.011502Google ScholarGoogle ScholarCross RefCross Ref
  9. Tom Deakin, Simon McIntosh-Smith, James Price, Andrei Poenaru, Patrick Atkinson, Codrin Popa, and Justin Salmon. 2019. Performance Portability across Diverse Computer Architectures. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 1–13. https://doi.org/10.1109/P3HPC49587.2019.00006Google ScholarGoogle ScholarCross RefCross Ref
  10. Tom Deakin, Andrei Poenaru, Tom Lin, and Simon McIntosh-Smith. 2020. Tracking Performance Portability on the Yellow Brick Road to Exascale. In 2020 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 1–13. https://doi.org/10.1109/P3HPC51967.2020.00006Google ScholarGoogle ScholarCross RefCross Ref
  11. Wael Elwasif, William Godoy, Nick Hagerty, J. Austin Harris, Oscar Hernandez, Balint Joo, Paul Kent, Damien Lebrun-Grandie, Elijah Maccarthy, Veronica G. Melesse Vergara, Bronson Messer, Ross Miller, Sarp Opal, Sergei Bastrakov, Michael Bussmann, Alexander Debus, Klaus Steinger, Jan Stephan, Rene Widera, Spencer H. Bryngelson, Henry Le Berre, Anand Radhakrishnan, Jefferey Young, Sunita Chandrasekaran, Florina Ciorba, Osman Simsek, Kate Clark Filippo Spiga, Jeff Hammond, John E. Stone. David Hardy, Sebastian Keller, and Jean-Guillaume Piccinali. Christian Trott. 2022. Application Experiences on a GPU-Accelerated Arm-based HPC Testbed. https://doi.org/10.48550/ARXIV.2209.09731Google ScholarGoogle ScholarCross RefCross Ref
  12. Catherine Feldman, Benjamin Michalowicz, Eva Siegmann, Tony Curtis, Alan Calder, and Robert Harrison. 2022. Experiences with Porting the FLASH Code to Ookami, an HPE Apollo 80 A64FX Platform. HPCAsia 2022 (to appear)(2022).Google ScholarGoogle Scholar
  13. E. Follana, Q. Mason, C. Davies, K. Hornbostel, G. P. Lepage, J. Shigemitsu, H. Trottier, and K. Wong. 2007. Highly improved staggered quarks on the lattice with applications to charm physics. Physical Review D 75, 5 (mar 2007). https://doi.org/10.1103/physrevd.75.054502Google ScholarGoogle ScholarCross RefCross Ref
  14. Nicholas Frontiere, J. D. Emberson, Michael Buehlmann, Joseph Adamo, Salman Habib, Katrin Heitmann, and Claude-André Faucher-Giguère. 2022. Simulating Hydrodynamics in Cosmology with CRK-HACC. https://doi.org/10.48550/ARXIV.2202.02840Google ScholarGoogle ScholarCross RefCross Ref
  15. Todd Gamblin, Matthew P. LeGendre, Michael R. Collette, Gregory L. Lee, Adam Moody, Bronis R. de Supinski, and W. Scott Futral. 2015. The Spack Package Manager: Bringing order to HPC software chaos. In Supercomputing 2015 (SC’15). Austin, Texas.Google ScholarGoogle Scholar
  16. J. Austin Harris, Ran Chu, Sean M Couch, Anshu Dubey, Eirik Endeve, Antigoni Georgiadou, Rajeev Jain, Daniel Kasen, M P Laiu, OE B Messer, Jared O’Neal, Michael A Sandoval, and Klaus Weide. 2022. Exascale models of stellar explosions: Quintessential multi-physics simulation. The International Journal of High Performance Computing Applications 36, 1(2022), 59–77. https://doi.org/10.1177/10943420211027937 arXiv:https://doi.org/10.1177/10943420211027937Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. William Humphrey, Andrew Dalke, and Klaus Schulten. 1996. VMD – Visual Molecular Dynamics. Journal of Molecular Graphics 14, 1 (1996), 33–38. https://doi.org/10.1016/0263-7855(96)00018-5Google ScholarGoogle ScholarCross RefCross Ref
  18. Laxmikant V. Kalé and Gengbin Zheng. 2013. Chapter 1: The Charm++ Programming Model. In Parallel Science and Engineering Applications: The Charm++ Approach (1st ed.), Laxmikant V. Kale and Abhinav Bhatele (Eds.). CRC Press, Inc., Boca Raton, FL, USA, Chapter 1, 1–16. https://doi.org/10.1201/b16251Google ScholarGoogle ScholarCross RefCross Ref
  19. Jeffrey Kelling, Sergei Bastrakov, Alexander Debus, Thomas Kluge, Matt Leinhauser, Richard Pausch, Klaus Steiniger, Jan Stephan, René Widera, Jeff Young, 2021. Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading. arXiv preprint arXiv:2110.08650(2021).Google ScholarGoogle Scholar
  20. P. R. C. Kent, Abdulgani Annaberdiyev, Anouar Benali, M. Chandler Bennett, Edgar Josué Landinez Borda, Peter Doak, Hongxia Hao, Kenneth D. Jordan, Jaron T. Krogel, Ilkka Kylänpää, Joonho Lee, Ye Luo, Fionn D. Malone, Cody A. Melton, Lubos Mitas, Miguel A. Morales, Eric Neuscamman, Fernando A. Reboredo, Brenda Rubenstein, Kayahan Saritas, Shiv Upadhyay, Guangming Wang, Shuai Zhang, and Luning Zhao. 2020. QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion quantum Monte Carlo. The Journal of Chemical Physics 152 (2020), 174105. https://doi.org/10.1063/5.0004860Google ScholarGoogle ScholarCross RefCross Ref
  21. M. Paul Laiu, Eirik Endeve, Ran Chu, J. Austin Harris, and O. E. Bronson Messer. 2021. A DG-IMEX Method for Two-moment Neutrino Transport: Nonlinear Solvers for Neutrino-Matter Coupling. Astrophys. J., Suppl. Ser. 253, 2, Article 52 (April 2021), 52 pages. https://doi.org/10.3847/1538-4365/abe2a8 arxiv:2102.02186 [astro-ph.HE]Google ScholarGoogle ScholarCross RefCross Ref
  22. Elijah A MacCarthy, Chengxin Zhang, Yang Zhang, and KC Dukka. 2022. GPU-I-TASSER: a GPU accelerated I-TASSER protein structure prediction tool. Bioinformatics (2022).Google ScholarGoogle Scholar
  23. Alexander Matthes, René Widera, Erik Zenker, Benjamin Worpitz, Axel Huebl, and Michael Bussmann. 2017. Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library. In High Performance Computing, Julian M. Kunkel, Rio Yokota, Michela Taufer, and John Shalf (Eds.). Springer International Publishing, Cham, 496–514. https://doi.org/10.1007/978-3-319-67630-2_36Google ScholarGoogle ScholarCross RefCross Ref
  24. Simon McIntosh-Smith, James Price, Andrei Poenaru, and Tom Deakin. 2020. Benchmarking the first generation of production quality Arm-based supercomputers. Concurrency and Computation: Practice and Experience 32, 20(2020), e5569.Google ScholarGoogle ScholarCross RefCross Ref
  25. Marcelo C. R. Melo, Rafael C. Bernardi, Till Rudack, Maximilian Scheurer, Christoph Riplinger, James C. Phillips, Julio D. C. Maia, Gerd B. Rocha, João V. Ribeiro, John E. Stone, Frank Nesse, Klaus Schulten, and Zaida Luthey-Schulten. 2018. NAMD goes quantum: An integrative suite for hybrid simulations. Nature Methods 15(2018), 351–354.Google ScholarGoogle ScholarCross RefCross Ref
  26. James C. Phillips, David J. Hardy, Julio D. C. Maia, John E. Stone, João V. Ribeiro, Rafael C. Bernardi, Ronak Buch, Giacomo Fiorin, Jérôme Hénin, Wei Jiang, Ryan McGreevy, Marcelo C. R. Melo, Brian Radak, Robert D. Skeel, Abhishek Singharoy, Yi Wang, Benoît Roux, Aleksei Aksimentiev, Zaida Luthey-Schulten, Laxmikant V. Kalé, Klaus Schulten, Christophe Chipot, and Emad Tajkhorshid. 2020. Scalable molecular dynamics on CPU and GPU architectures with NAMD. Journal of Chemical Physics 153 (2020), 044130. https://doi.org/10.1063/5.0014475Google ScholarGoogle ScholarCross RefCross Ref
  27. Nikola Rajovic, Alejandro Rico, Nikola Puzovic, Chris Adeniyi-Jones, and Alex Ramirez. 2014. Tibidabo: Making the case for an ARM-based HPC system. Future Generation Computer Systems 36 (2014), 322–334.Google ScholarGoogle ScholarCross RefCross Ref
  28. Mitsuhisa Sato, Yutaka Ishikawa, Hirofumi Tomita, Yuetsu Kodama, Tetsuya Odajima, Miwako Tsuji, Hisashi Yashiro, Masaki Aoki, Naoyuki Shida, Ikuo Miyoshi, Kouichi Hirai, Atsushi Furuya, Akira Asato, Kuniki Morita, and Toshiyuki Shimizu. 2020. Co-Design for A64FX Manycore Processor and “Fugaku”. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–15. https://doi.org/10.1109/SC41405.2020.00051Google ScholarGoogle ScholarCross RefCross Ref
  29. K. Schmidmayer, S. H. Bryngelson, and T. Colonius. 2020. An assessment of multicomponent flow models and interface capturing schemes for spherical bubble dynamics. J. Comput. Phys. 402(2020), 109080. https://doi.org/10.1016/j.jcp.2019.109080Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Stephens, S. Biles, M. Boettcher, J. Eapen, M. Eyole, G. Gabrielli, M. Horsnell, G. Magklis, A. Martinez, N. Premillieu, A. Reid, A. Rico, and P. Walker. 2017. The ARM Scalable Vector Extension. IEEE Micro 37, 02 (mar 2017), 26–39. https://doi.org/10.1109/MM.2017.35Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. John E. Stone, Michael J. Hallock, James C. Phillips, Joseph R. Peterson, Zaida Luthey-Schulten, and Klaus Schulten. 2016. Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads. 2016 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)(2016), 89–100. https://doi.org/10.1109/IPDPSW.2016.130Google ScholarGoogle ScholarCross RefCross Ref
  32. John E. Stone, David J. Hardy, Jan Saam, Kirby L. Vandivort, and Klaus Schulten. 2011. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals. In GPU Computing Gems, Wen-mei Hwu (Ed.). Morgan Kaufmann Publishers, Chapter 1, 5–18.Google ScholarGoogle Scholar
  33. John E. Stone, David J. Hardy, Ivan S. Ufimtsev, and Klaus Schulten. 2010. GPU-Accelerated Molecular Modeling Coming of Age. J. Molecular Graphics and Modelling 29 (2010), 116–125.Google ScholarGoogle ScholarCross RefCross Ref
  34. John E. Stone, Antti-Pekka Hynninen, James C. Phillips, and Klaus Schulten. 2016. Early Experiences Porting the NAMD and VMD Molecular Simulation and Analysis Software to GPU-Accelerated OpenPOWER Platforms. International Workshop on OpenPOWER for HPC (IWOPH’16) (2016), 188–206.Google ScholarGoogle ScholarCross RefCross Ref
  35. John E. Stone, Jan Saam, David J. Hardy, Kirby L. Vandivort, Wen-mei W. Hwu, and Klaus Schulten. 2009. High Performance Computation and Interactive Display of Molecular Orbitals on GPUs and Multi-core CPUs. In Proceedings of the 2nd Workshop on General-Purpose Processing on Graphics Processing Units, ACM International Conference Proceeding Series, Vol. 383. ACM, New York, NY, USA, 9–18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in ’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton. 2022. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comp. Phys. Comm. 271(2022), 108171. https://doi.org/10.1016/j.cpc.2021.108171Google ScholarGoogle ScholarCross RefCross Ref
  37. Christian R. Trott, Damien Lebrun-Grandié, Daniel Arndt, Jan Ciesko, Vinh Dang, Nathan Ellingwood, Rahulkumar Gayatri, Evan Harvey, Daisy S. Hollman, Dan Ibanez, Nevin Liber, Jonathan Madsen, Jeff Miles, David Poliakoff, Amy Powell, Sivasankaran Rajamanickam, Mikael Simberg, Dan Sunderland, Bruno Turcksin, and Jeremiah Wilke. 2022. Kokkos 3: Programming Model Extensions for the Exascale Era. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 805–817. https://doi.org/10.1109/TPDS.2021.3097283Google ScholarGoogle ScholarCross RefCross Ref
  38. Verónica G Vergara Larrea, Wayne Joubert, Michael J Brim, Reuben D Budiardja, Don Maxwell, Matt Ezell, Christopher Zimmer, Swen Boehm, Wael Elwasif, Sarp Oral, 2019. Scaling the summit: deploying the world’s fastest supercomputer. In International Conference on High Performance Computing. Springer, 330–351.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Wei Zheng, Chengxin Zhang, Eric W Bell, and Yang Zhang. 2019. I-TASSER gateway: a protein structure and function prediction server powered by XSEDE. Future Generation Computer Systems 99 (2019), 73–85.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Application Experiences on a GPU-Accelerated Arm-based HPC Testbed
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            HPCAsia '23 Workshops: Proceedings of the HPC Asia 2023 Workshops
            February 2023
            101 pages
            ISBN:9781450399890
            DOI:10.1145/3581576

            Copyright © 2023 ACM

            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 February 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            HPCAsia '23 Workshops Paper Acceptance Rate9of10submissions,90%Overall Acceptance Rate69of143submissions,48%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format