research-article

An idiom-finding tool for increasing productivity of accelerators

Authors:
Laura Carrington

UCSD/SDSC, La Jolla, CA, USA

UCSD/SDSC, La Jolla, CA, USA
View Profile

,
Mustafa M. Tikir

Google Inc., Mountain View, CA, USA

Google Inc., Mountain View, CA, USA
View Profile

,
Catherine Olschanowsky

UCSD/SDSC, La Jolla, CA, USA

UCSD/SDSC, La Jolla, CA, USA
View Profile

,
Michael Laurenzano

UCSD/SDSC, La Jolla, CA, USA

UCSD/SDSC, La Jolla, CA, USA
View Profile

,
Joshua Peraza

UCSD/SDSC, La Jolla, CA, USA

UCSD/SDSC, La Jolla, CA, USA
View Profile

,
Allan Snavely

UCSD/SDSC, La Jolla, CA, USA

UCSD/SDSC, La Jolla, CA, USA
View Profile

,
Stephen Poole

ORNL, Oak Ridge, TN, USA

ORNL, Oak Ridge, TN, USA
View Profile

ICS '11: Proceedings of the international conference on SupercomputingMay 2011Pages 202–212https://doi.org/10.1145/1995896.1995928

Published:31 May 2011Publication History

ICS '11: Proceedings of the international conference on Supercomputing

Pages 202–212

ABSTRACT

Suppose one is considering purchase of a computer equipped with accelerators. Or suppose one has access to such a computer and is considering porting code to take advantage of the accelerators. Is there a reason to suppose the purchase cost or programmer effort will be worth it? It would be nice to able to estimate the expected improvements in advance of paying money or time. We exhibit an analytical framework and tool-set for providing such estimates: the tools first look for user-defined idioms that are patterns of computation and data access identified in advance as possibly being able to benefit from accelerator hardware. A performance model is then applied to estimate how much faster these idioms would be if they were ported and run on the accelerators, and a recommendation is made as to whether or not each idiom is worth the porting effort to put them on the accelerator and an estimate is provided of what the overall application speedup would be if this were done.

As a proof-of-concept we focus our investigations on Gather/Scatter (G/S) operations and means to accelerate these available on the Convey HC-1 which has a special-purpose "personality" for accelerating G/S. We test the methodology on two large-scale HPC applications. The idiom recognizer tool saves weeks of programmer effort compared to having the programmer examine the code visually looking for idioms; performance models save yet more time by rank-ordering the best candidates for porting; and the performance models are accurate, predicting G/S runtime speedup resulting from porting to within 10% of speedup actually achieved. The G/S hardware on the Convey sped up these operations 20x, and the overall impact on total application runtime was to improve it by as much as 21%.

References

B. Miller, et al., "The Paradyn Parallel Performance Measurement Tool," Computer, vol. 28, pp. 37--46, 2002. Google ScholarDigital Library
S. Shende and A. Maloney, "The TAU Parallel Performance System," International Journal of High Performance Computing Applications, vol. 20, 2006. Google ScholarDigital Library
V. Adve, et al., "An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs," Proceedings of the IEEE/ACM SC95 Conference, 1995. Google ScholarDigital Library
V. Freeh, et al., "Analyzing the Energy-time Trade-off in High-Performance Computing Applications," IEEE Transactions on Parallel and Distributed Systems, vol. 18, pp. 835--848, 2007. Google ScholarDigital Library
J. Shin, et al., "Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology," presented at the International Conference on Supercomputing, 2010. Google ScholarDigital Library
J. Kepner, "HPC Productivity: An Overarching View," International Journal of High Performance Computing Applications, vol. 18, 2004. Google ScholarDigital Library
L. Hochstein, et al., "Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers," Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005. Google ScholarDigital Library
C. Olschanowsky, et al., "PIR: A Static Idiom Recognizer," in First International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2010), San Diego, CA, 2010.Google Scholar
J. Nieplocha, et al., "Global Arrays: A Non-uniform Memory Access Programming Model for High-Performance Computers," Journal of Supercomputing, vol. 10, pp. 169--189, 1996. Google ScholarDigital Library
J. Lewis and H. Simon, "The Impact of Hardware Gather/Scatter On Sparse Gaussian Elimination," SIAM J. Sci. Stat. Comput., vol. 9, pp. 304--311, 1988. Google ScholarDigital Library
S. Mukherjee, et al., "Efficient Support for Irregular Applications on Distributed-memory Machines," ACM SIGPLAN Notices, vol. 30, pp. 68--79, 1995. Google ScholarDigital Library
SGBench see, http://www.sdsc.edu/pmac/SGBench.Google Scholar
J. Dongarra and P. Luszczek, "Introduction to the HPC Challenge Benchmark Suite," ICL-UT-05-01, 2005.Google Scholar
G. Fox, et al., "Solving Problems on Concurrent Processors: Volume 1, Chapter 22," P. Hall, Ed., ed Englewood Cluffs, NJ, 1988. Google ScholarDigital Library
C. HC-1, "http://www.conveycomputer.com/ConveyArchitectureWhiteP.pdf," ed.Google Scholar
M. Tikir, et al., "The PMaC Binary Instrumentation Library for PowerPC," Workshop on Binary Instrumentation and Applications, San Jose, 2006.Google Scholar
C. Olschanowsky, et al., "PSnAP: Accurate Synthetic Address Streams Through Memory Profiles," The 22nd International Workshop on Languages and Compilers for Parallel Computing, Oct. 8--10 2009. Google ScholarDigital Library
A. Snavely, et al., "A Framework for Application Performance Modeling and Prediction," ACM/IEEE Conference on High Performance Networking and Computing, 2002. Google ScholarDigital Library
L. Carrington, et al., "How well can simple metrics represent the performance of HPC applications?," Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005. Google ScholarDigital Library
M. Tikir, et al., "Genetic Algorithm Approach to Modeling the Performance of Memory-bound Codes," The Proceeding of the ACM/IEEE Conference on High Performance Networking and Computing, 2007. Google ScholarDigital Library
M. Tikir, et al., "PSINS: An Open Source Event Tracer and Execution Simulator for model prediction," presented at the HPCMP User Group Conference, San Diego, CA, 2009. Google ScholarDigital Library
"ORNL Jaguar see http://www.nccs.gov/computing-resources/jaguar/."Google Scholar
B. He, et al., "Efficient Gather and Scatter Operations on Graphics Processors," SC07, 2007. Google ScholarDigital Library
J. D. Owens, et al., "A Survey of general purpose compuation on graphics hardware," Computer Graphics Forum, vol. 26, 2007.Google Scholar
M. Zagha and G. E. Blelloch, "Radix sort for vector multiprocessors.," in Supercomputing 1991, 1991. Google ScholarDigital Library
J. Bolz, et al., "Sparse matrix solvers on the GPU: conjugate gradients and multigrid," ACM Transactions on Graphics, pp. 917--924, 2003. Google ScholarDigital Library
V. Adve and R. Sakellariou, "Application representations for multiparadigm performance modeling of large-scale parallel scientific codes," The International Journal of High Performance Computing Applications, vol. 14, 2000. Google ScholarDigital Library
S. Alam and J. Vetter, "A Framework to Develop Symbolic Performance Models of Parallel Applications," presented at the 5th International Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems, 2006. Google ScholarDigital Library
G. Almasi, et al., "Demonstrating the scalability of a molecular dynamics application on a Petaflop computer," presented at the Proceedings of the 15th international conference on Supercomputing, Sorrento, Italy, 2001. Google ScholarDigital Library
B. Armstrong and R. Eigenmann, "Performance forecasting: Towards a methodology for characterizing large computationals applications," in Internationals Conference on Parallel Processing, 1998. Google ScholarDigital Library
D. Bailey and A. Snavely, "Performance Modeling: Understanding the Present and Predicting the Future," EuroPar, 2005. Google ScholarDigital Library
J. Bourgeois and F. Spies, "Performance prediction of an NAS benchmark program with chronosmix enviroment," presented at the 6th International Euro-Par Conference, 2000. Google ScholarDigital Library
M. Clement and M. Quinn, "Automated performance prediction for scalable parallel computing," Parallel Computing, vol. 23, 1997. Google ScholarDigital Library
M. J. Clement and M. J. Quinn, "Analytical performance prediction on multicomputers," Supercomputing, pp. 886--894, 1993. Google ScholarDigital Library
D. Culler, et al., "LogP: Towards a realistic modle of parallel computation," in 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 1993. Google ScholarDigital Library
M. Faerman, et al., "Adaptive performance prediction for distributed data-intensive applications," presented at the Supercomputing, 1999. Google ScholarDigital Library
T. Fahringer and M. Zima, "A static parameter based performance prediction tool for parallel programs," presented at the The International Conference on Supercomputing, 1993. Google ScholarDigital Library
D. J. Kerbyson, et al., "Predictive Performance and Scalability Modeling of Large-Scale Application," Supercomputing, 2001. Google ScholarDigital Library
C. Lim, et al., "Implementation lessons of performance prediction tool for parallel conservative simulation," presented at the 6th International Euro-Par Conference, 2000. Google ScholarDigital Library
G. Marin and J. Mellor-Crummey, "Cross Architecture Performance Predictions for Scientific Applications Using Parameterized Models," In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, June 2004. Google ScholarDigital Library
B. Mohr and F. Wolf, "KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications," presented at the European Converence on Parallel Computing (EuroPar), 2003.Google Scholar
J. Simon and J.-M. Wierum, "Accurate Performance Prediction for Massively Parallel Systems and its Applications," Euro-Par'96 Parallel Processing, vol. 1124, pp. 675--688, 1996. Google ScholarDigital Library
A. van Gemund, "Symbolic performance modeling of parallel systems," IEEE Transactions on Parallel and Distributed Systems, vol. 14, 2003. Google ScholarDigital Library
A. Wagner, et al., "Performance models for the processor farm paradigm," IEEE Transactions on Parallel and Distributed Systems, vol. 8, 1997. Google ScholarDigital Library
L. Yang, et al., "Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution," presented at the Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005. Google ScholarDigital Library
X. Zhang and Z. Xu, "Multiprocessor Scalability Predictions Through Detailed Program Execution Analysis," International Conference on Supercomputing, pp. 97--106, 1995. Google ScholarDigital Library
S. Alam, et al., "An Exploration of Performance Attributes for Symbolic Modeling of Emerging Processing Devices," presented at the HPCC, 2007. Google ScholarDigital Library
NAS Parallel Benchmarks (NPB) see, http://www.nas.nasa.gov/Resources/Software/npb.html.Google Scholar
S. Hong and H. Kim, "An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness," presented at the ISCA'09, Austin, Texas, USA, 2009. Google ScholarDigital Library
N. Govindaraju, et al., "A Memory Model for Scientific Algorithms on Graphics Processors," presented at the Supercomputing, Tampa, Florida USA, 2006. Google ScholarDigital Library

Index Terms

An idiom-finding tool for increasing productivity of accelerators
1. Hardware
  1. Hardware validation

Recommendations

Modeling and predicting performance of high performance computing applications on hardware accelerators

Hybrid-core systems speedup applications by offloading certain compute operations that can run faster on hardware accelerators. However, such systems require significant programming and porting effort to gain a performance benefit from the accelerators. ...
Read More
Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Computers with hardware accelerators, also referred to as hybrid-core systems, speedup applications by offloading certain compute operations that can run faster on accelerators. Thus, it is not surprising that many of top500 supercomputers use ...
Read More
Petascale computing with accelerators
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

A trend is developing in high performance computing in which commodity processors are coupled to various types of computational accelerators. Such systems are commonly called hybrid systems. In this paper, we describe our experience developing an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '11: Proceedings of the international conference on Supercomputing
May 2011
398 pages
ISBN:9781450301022
DOI:10.1145/1995896
General Chair:
David K. Lowenthal
University of Arizona
,
Program Chairs:
Bronis R. de Supinski
Lawrence Livermore National Laboratory
,
Sally A. McKee
Chalmers University of Technology
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accelerators
benchmarking
fpgas
hpc
performance modeling
performance prediction
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Upcoming Conference
ICS '24

Sponsor:

sigarch

2024 International Conference on Supercomputing

June 4 - 7, 2024

Kyoto , Japan
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 200
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An idiom-finding tool for increasing productivity of accelerators

ICS '11: Proceedings of the international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Modeling and predicting performance of high performance computing applications on hardware accelerators

Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators

Petascale computing with accelerators