skip to main content
10.1145/3535044.3535050acmotherconferencesArticle/Chapter ViewAbstractPublication PagesheartConference Proceedingsconference-collections
research-article

Meta-Programming Design-Flow Patterns for Automating Reusable Optimisations

Published:09 June 2022Publication History

ABSTRACT

Continuing advances in heterogeneous and parallel computing enable massive performance gains in domains such as AI and HPC. Such gains often involve using hardware accelerators, such as FPGAs and GPUs, to speed up specific workloads. However, to make effective use of emerging heterogeneous architectures, optimisation is typically done manually by highly-skilled developers with in-depth understanding of the target hardware. The process is tedious, error-prone, and must be repeated for each new application. This paper introduces Design-Flow Patterns, which capture modular, recurring application-agnostic elements involved in mapping and optimising application descriptions onto efficient CPU and GPU targets. Our approach is the first to codify and programmatically coordinate these elements into fully automated, customisable, and reusable end-to-end design-flows. We implement key design-flow patterns using the meta-programming tool Artisan, and evaluate automated design-flows applied to three sequential C++ applications. Compared to single-threaded implementations, our approach generates multi-threaded OpenMP CPU designs achieving up to 18 times speedup on a CPU platform with 32-threads, as well as HIP GPU designs achieving up to 1184 times speedup on an NVIDIA GeForce RTX 2080 Ti GPU.

References

  1. AMD. 2022. HIP Programming Guide v4.5. Webpage. Retrieved January, 2022 from https://rocmdocs.amd.com/en/ latest/Programming_Guides/HIP-GUIDE.htmlGoogle ScholarGoogle Scholar
  2. Christopher Brown, Marco Danelutto, Peter Kilpatrick, Kevin Hammond, and Sam Elliott. 2014. Cost-Directed Refactoring for Parallel Erlang Programs. Int’l Journal of Parallel Programming 42.Google ScholarGoogle Scholar
  3. Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell Array Codes with Multicore GPUs. In Proc. of the 6th Workshop on Declarative Aspects of Multicore Programming (DAMP ’11).Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andre Dehon, Joshua Adams, Michael Delorimier, Nachiket Kapre, Yuki Matsuda, Helia Naeimi, Michael Vanier, and Michael Wrighton. 2004. Design patterns for reconfigurable computing. In 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1993. Design Patterns: Abstraction and Reuse of Object-Oriented Design. In ECOOP.Google ScholarGoogle Scholar
  6. HeCBench. 2022. Bezier Surface. Webpage. Retrieved March, 2022 from https://github.com/zjin-lcf/HeCBenchGoogle ScholarGoogle Scholar
  7. HeCBench. 2022. Rush Larsen. Webpage. Retrieved March, 2022 from https://github.com/zjin-lcf/HeCBenchGoogle ScholarGoogle Scholar
  8. Eric Holk, William Byrd, Nilesh Mahajan, Jeremiah Willcock, Arun Chauhan, and Andrew Lumsdaine. 2012. Declarative Parallel Programming for GPUs. In Advances in Parallel Computing, Vol. 22.Google ScholarGoogle Scholar
  9. Intel. 2022. Intel oneAPI DPC++ FPGA Optimization Guide. Webpage. Retrieved March, 2022 from https://www.intel.com/content/www/us/en/develop/documentation/oneapi-fpga-optimization-guide/top.htmlGoogle ScholarGoogle Scholar
  10. ISLPY. 2021. islpy 2020.2.2 Documentation. Webpage. Retrieved February, 2021 from https://documen.tician.de/islpy/index.htmlGoogle ScholarGoogle Scholar
  11. LLVM Developer Group. 2022. Clang: a C language family frontend for LLVM. Webpage. Retrieved March, 2022 from https://clang.llvm.org/Google ScholarGoogle Scholar
  12. Deepak Majeti, Kuldeep S. Meel, Rajkishore Barik, and Vivek Sarkar. 2016. Automatic Data Layout Generation and Kernel Mapping for CPU+GPU Architectures. In Proc. of the 25th Int’l Conf. on Compiler Construction (CC 2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Matthew Marangoni and Thomas Wischgoll. 2016. Paper: Togpu: Automatic Source Transformation from C++ to CUDA using Clang/LLVM. In Visualization and Data Analysis.Google ScholarGoogle Scholar
  14. Berna Massingill, Tim Mattson, and Beverly Sanders. 2000. A Pattern Language for Parallel Application Programs. In Euro-Par 2000.Google ScholarGoogle ScholarCross RefCross Ref
  15. Timothy Mattson, Beverly Sanders, and Berna Massingill. 2004. Patterns for Parallel Programming(1st ed.). Addison-Wesley Professional.Google ScholarGoogle Scholar
  16. Maxeler App Gallery. 2015. N-Body Particle Simulation. Webpage. Retrieved January, 2020 from https://github.com/maxeler/NBodyGoogle ScholarGoogle Scholar
  17. Michael McCool, James Reinders, and Arch Robison. 2012. Structured Parallel Programming: Patterns for Efficient Computation (1st ed.). Morgan Kaufmann Publishers Inc.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. NVIDIA Developer Zone. 2022. CUDA C++ Best Practices Guide. Webpage. Retrieved January, 2022 from https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.htmlGoogle ScholarGoogle Scholar
  19. Michel Steuwer, Philipp Kegel, and Sergei Gorlatch. 2011. SkelCL - A Portable Skeleton Library for High-Level GPU Programming. In 2011 IEEE Int’l Symp. on Parallel and Distributed Processing Workshops and Phd Forum.Google ScholarGoogle Scholar
  20. Joel Svensson, Mary Sheeran, and Koen Claessen. 2008. Obsidian: A Domain Specific Embedded Language for Parallel Programming of Graphics Processors. In Proc. of the 20th Int’l Conf. on Implementation and Application of Functional Languages (IFL’08).Google ScholarGoogle Scholar
  21. Jessica Vandebon, Jose G. F. Coutinho, Wayne Luk, and Eriko Nurvitadhi. 2021. Enhancing High-Level Synthesis Using a Meta-Programming Approach. IEEE Trans. Comput. 70, 12 (2021).Google ScholarGoogle Scholar
  22. Michael Vollmer, Bo Joel Svensson, Eric Holk, and Ryan Newton. 2015. Meta-Programming and Auto-Tuning in the Search for High Performance GPU Code. In Proc. of the 4th ACM SIGPLAN Workshop on Functional High-Performance Computing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mohamed Wahib and Naoya Maruyama. 2015. Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (Portland, Oregon, USA) (HPDC ’15). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zheng Wang, Dominik Grewe, and Michael F. P. O’boyle. 2015. Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems. In ACM TACO ’15, Vol. 11.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    HEART '22: Proceedings of the 12th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies
    June 2022
    114 pages
    ISBN:9781450396608
    DOI:10.1145/3535044

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 9 June 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    HEART '22 Paper Acceptance Rate10of21submissions,48%Overall Acceptance Rate22of50submissions,44%
  • Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format