An Empirical Evaluation of GPGPU Performance Models

Madougou, Souley; Varbanescu, Ana Lucia; de Laat, Cees; van Nieuwpoort, Rob

doi:10.1007/978-3-319-14325-5_15

Souley Madougou³⁴,
Ana Lucia Varbanescu³⁴,
Cees de Laat³⁴ &
…
Rob van Nieuwpoort³⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8805))

Included in the following conference series:

European Conference on Parallel Processing

1859 Accesses
5 Citations

Abstract

Computing systems today rely on massively parallel and heterogeneous architectures to promise very high peak performance. Yet most applications only achieve small fractions of this performance. While both programmers and architects have clear opinions about the causes of this performance gap, finding and quantifying the real problems remains a topic for performance modeling tools. In this paper, we sketch the landscape of modern GPUs’ performance limiters and optimization opportunities, and dive into details on modeling attempts for GPU-based systems. We highlight the specific features of the relevant contributions in this field, along with the optimization and design spaces they explore. We further use a typical kernel example (tiled dense matrix multiplication) to assess the efficacy and usability of a set of promising approaches. We conclude that the available GPU performance modeling solutions are very sensitive to applications and platform changes, and require significant efforts for tuning and calibration when new analyses are required.

Download to read the full chapter text

Chapter PDF

A Throughput-Aware Analytical Performance Model for GPU Applications

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models

A Brief History and Introduction to GPGPU

Keywords

References

Saule, E., Kaya, K., Çatalyürek, Ü.V.: Performance evaluation of sparse matrix multiplication kernels on intel xeon phi. CoRR abs/1302.1078 (2013)
Google Scholar
NVIDIA Corporation: Press release: Nvidia tesla gpu computing processor ushers in the era of personal supercomputing (June 2007)
Google Scholar
Advanced Micro Devices (AMD) Inc. Press release: Amd delivers enthusiast performance leadership with the introduction of the ati radeon 3870 x2 (January 2008)
Google Scholar
Asanovic, K., et al.: A view of the parallel computing landscape. Commun. ACM 52(10), 56–67 (2009)
Article Google Scholar
Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing cuda workloads using a detailed gpu simulator. In: ISPASS, pp. 163–174. IEEE (2009)
Google Scholar
Mudalige, G.R., Vernon, M.K., Jarvis, S.A.: A plug-and-play model for evaluating wavefront computations on parallel architectures. In: IPDPS, pp. 1–14. IEEE (2008)
Google Scholar
Diamos, G.F., Yalamanchili, S.: Harmony: An execution model and runtime for heterogeneous many core systems. In: Proceedings of HPDC 2008, pp. 197–200. ACM, New York (2008)
Google Scholar
Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.: Merge: A programming model for heterogeneous multi-core systems. SIGPLAN Not. 43(3) (March 2008)
Google Scholar
Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., Purkayastha, A.: A framework for performance modeling and prediction. In: Proceedings of SC 2002, pp. 1–17. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Tikir, M.M., Laurenzano, M.A., Carrington, L., Snavely, A.: PSINS: An open source event tracer and execution simulator for MPI applications. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 135–148. Springer, Heidelberg (2009)
Chapter Google Scholar
Laurenzano, M., Tikir, M., Carrington, L., Snavely, A.: Pebil: Efficient static binary instrumentation for linux. In: ISPASS 2010, pp. 175–183 (March 2010)
Google Scholar
Carrington, L., Tikir, M.M., Olschanowsky, C., Laurenzano, M., Peraza, J., Snavely, A., Poole, S.: An idiom-finding tool for increasing productivity of accelerators. In: Proceedings of ICS 2011, pp. 202–212. ACM, New York (2011)
Google Scholar
Kerr, A., Anger, E., Hendry, G., Yalamanchili, S.: Eiger: A framework for the automated synthesis of statistical performance models. In: Proceedings of WPEA 2012 (2012)
Google Scholar
Kerr, A., Diamos, G., Yalamanchili, S.: A characterization and analysis of ptx kernels. In: Proceedings of IISWC 2009, Washington, DC, USA, pp. 3–12 (2009)
Google Scholar
Jia, W., Shaw, K., Martonosi, M.: Stargazer: Automated regression-based gpu design space exploration. In: ISPASS 2012, pp. 2–13 (April 2012)
Google Scholar
Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Hwu, W.M.W.: An adaptive performance modeling tool for gpu architectures. SIGPLAN Not. 45(5), 105–114 (2010)
Google Scholar
Hong, S., Kim, H.: An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness. SIGARCH Comput. Archit. News 37(3), 152–163 (2009)
Google Scholar
Kothapalli, K., Mukherjee, R., Rehman, M., Patidar, S., Narayanan, P.J., Srinathan, K.: A performance prediction model for the cuda gpgpu platform. In: HiPC 2009, pp. 463–472 (December 2009)
Google Scholar
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
Fortune, S., Wyllie, J.: Parallelism in random access machines. In: Proceedings of STOC 1978, pp. 114–118. ACM, New York (1978)
Google Scholar
Gibbons, P.B., Matias, Y., Ramachandran, V.: The queue-read queue-write asynchronous pram model. In: Euro-Par 1996. LNCS, vol. 1124, pp. 279–292. Springer, Heidelberg (1996)
Google Scholar
Zhang, Y., Owens, J.: A quantitative performance analysis model for gpu architectures. In: HPCA 2011, pp. 382–393 (February 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Souley Madougou, Ana Lucia Varbanescu & Cees de Laat
Netherlands eScience Center, Amsterdam, The Netherlands
Rob van Nieuwpoort

Authors

Souley Madougou
View author publications
You can also search for this author in PubMed Google Scholar
Ana Lucia Varbanescu
View author publications
You can also search for this author in PubMed Google Scholar
Cees de Laat
View author publications
You can also search for this author in PubMed Google Scholar
Rob van Nieuwpoort
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, University of Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Luís Lopes
Vilnius University, 08663, Vilnius, Lithuania
Julius Žilinskas
Inria Rennes - Bretagne Atlantique, 35042, Rennes, France
Alexandru Costan
Inria, Campus Universitaire de Beaulieu, 35042, Rennes, France
Roberto G. Cascella
MTA SZTAKI, Budapest, Hungary
Gabor Kecskemeti
LaBRI, Inria, France
Emmanuel Jeannot
University Magna Graecia of Catanzaro, 88100, Catanzaro, Italy
Mario Cannataro
University of Pisa, Italy
Laura Ricci
Faculty of Computer Science, University of Vienna, Wien, Austria
Siegfried Benkner
Universitat Politècnica de València, Spain
Salvador Petit
ISISLab - Dipartimento di Informatica, Università di Salerno, Italy
Vittorio Scarano
High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, 70550, Stuttgart, Germany
José Gracia
Vienna University of Technology, 1040, Vienna, Austria
Sascha Hunold
Tennessee Tech University and Oak Ridge National Laboratory, 38505, Cookeville, TN, USA
Stephen L. Scott
RWTH Aachen University, Aachen, Germany
Stefan Lankes
Department of Informatics and Mathematics, University of Passau, Germany
Christian Lengauer
Universidad Carlos III de Madrid, 28911, Leganés, Spain
Jesus Carretero
TU München, 85747, Garching bei München, Germany
Jens Breitbart
TU Vienna, 1040, Vienna, Austria
Michael Alexander

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Madougou, S., Varbanescu, A.L., de Laat, C., van Nieuwpoort, R. (2014). An Empirical Evaluation of GPGPU Performance Models. In: Lopes, L., et al. Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8805. Springer, Cham. https://doi.org/10.1007/978-3-319-14325-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-14325-5_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14324-8
Online ISBN: 978-3-319-14325-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Empirical Evaluation of GPGPU Performance Models

Abstract

Chapter PDF

Similar content being viewed by others

A Throughput-Aware Analytical Performance Model for GPU Applications

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models

A Brief History and Introduction to GPGPU

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Empirical Evaluation of GPGPU Performance Models

Abstract

Chapter PDF

Similar content being viewed by others

A Throughput-Aware Analytical Performance Model for GPU Applications

GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models

A Brief History and Introduction to GPGPU

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation