Abstract
Application development for modern high-performance systems with graphics processing units (GPUs) currently relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. We present SkelCL—a high-level programming approach for systems with multiple GPUs and its implementation as a library on top of OpenCL. SkelCL makes three main enhancements to the OpenCL standard: (1) memory management is simplified using parallel container data types (vectors and matrices); (2) an automatic data (re)distribution mechanism allows for implicit data movements between GPUs and ensures scalability when using multiple GPUs; (3) computations are conveniently expressed using parallel algorithmic patterns (skeletons). We demonstrate how SkelCL is used to implement parallel applications, and we report experimental evaluation of our approach in terms of programming effort and performance.
Similar content being viewed by others
References
(2011) OpenACC Application Program Interface. Version 1.0
AMD (2013) Bolt—A C++ template library optimized for GPUs
Elangovan VK, Badia RM, Parra EA (2013) OmpSs-OpenCL programming model for heterogeneous systems. In: Kasahara H, Kimura K (eds) Languages and compilers for parallel computing, volume 7760 of LNCS. Springer, Berlin, Heidelberg, pp 96–111
Enmyren J, Kessler C (2010) SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings 4th international workshop on high-level parallel programming and applications (HLPP-2010)
Ernsting S, Kuchen H (2012) Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int J High Perform Comput Netw 7(2):129–138
Gorlatch S, Cole M (2011) Parallel skeletons. In: Encyclopedia of parallel computing, pp 1417–1422
Hoberock J, Bell N (2013) NVIDIA. A parallel template, library, thrust
Khronos OpenCL Working Group (2013) The OpenCL specification. Version 2.0
Kittler J (1983) On the accuracy of the sobel edge detector. Image Vis Comput 1(1):37–42
Mandelbrot B (1980) Fractal aspects of the iteration of \( z \mapsto \lambda z(1 - z)\) for complex \(\lambda \) and \(z\). Ann N Y Acad Sci 357(1):249–259
NVIDIA (2013) NVIDIA CUDA SDK code samples. Version 5.0
OpenMP Architecture Board (2013) OpenMP API. Version 4.0
Steuwer M, Gorlatch S (2013) Enhancing OpenCL for high-level programming of multi-GPU systems. In: Malyshkin V (ed) Parallel computing technologies (PaCT 2013), volume 7979 of LNCS. Springer, Berlin, Heidelberg, pp 258–272
Steuwer M, Kegel P, Gorlatch S (2011) SkelCL—a portable skeleton library for high-level GPU programming. In: Parallel and distributed processing workshops and Ph.D. forum (IPDPSW), 2011 IEEE international symposium, pp 1176–1182
Acknowledgments
This work is partially supported by the OFERTIE (FP7) and MONICA projects. We would like to thank NVIDIA for their generous hardware donation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Steuwer, M., Gorlatch, S. SkelCL: a high-level extension of OpenCL for multi-GPU systems. J Supercomput 69, 25–33 (2014). https://doi.org/10.1007/s11227-014-1213-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1213-y