Precise compile-time performance prediction for superscalar-based computers

Author:
Ko-Yang Wang

IBM T. J. Watson Research Center,P.O. Box 704, Yorktown Heights, NY, USA

IBM T. J. Watson Research Center,P.O. Box 704, Yorktown Heights, NY, USA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 29 Issue 6June 1994pp 73–84https://doi.org/10.1145/773473.178250

Published:01 June 1994Publication History

ACM SIGPLAN Notices

Abstract

Optimizing compilers (particularly parallel compilers) are constrained by their ability to predict performance consequences of the transformations they apply. Many factors, such as unknowns in control structures, dynamic behavior of programs, and complexity of the underlying hardware, make it very difficult for compilers to estimate the performance of the transformations accurately and efficiently. In this paper, we present a performance prediction framework that combines several innovative approaches to solve this problem. First, the framework employs a detailed, architecture-specific, but portable, cost model that can be used to estimate the cost of straight line code efficiently. Second, aggregated costs of loops and conditional statements are computed and represented symbolically. This avoids unnecessary, premature guesses and preserves the precision of the prediction. Third, symbolic comparison allows compilers to choose the best transformation dynamically and systematically. Some methodologies for applying the framework to optimizing parallel compilers to support automatic, performance-guided program restructuring are discussed.

References

1 J. Andrews and C. D. Polychronopoulos. An Analytical Approach to Performance/Cost Modeling of Parallel Computers. PhD thesis, University of Illinois at Urbana-Champaign, Ctr. Supercomputing Res. & Dev., April 1991. CSRD Report No. 1110.Google Scholar
2 V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. A static performance estimator to guide data partitioning decisions. In Proceeding of ~he Third A CM $igplan Symposium on Principles and practice of parallel programming (PPOPP), April 1991. Google ScholarDigital Library
3 D. Bernstein, D. Cohen, Y. Lavon, and V. Rainish. Performance evaluation of instruction scheduling on the ibm risc system/6000, in Proceedings of MICRO-25, pages 226-235, 1992. Google ScholarDigital Library
4 D. Bernstein and M. Rodeh. Global instruction scheduling for superscalar machines. In Proceedings of the A CM SIGPLAN'91 Conference on Programming Language Design and Implementation, pages 241-255, Toronto, Ontario, Canada, June 1991. Google ScholarDigital Library
5 F. Bodin, D. Windheiser, W. Jalby, D. Atapattu, M. Lee, and D Cannon. Performance evaluation and prediction for parallel algorithms on the bbn gpl000. In Proceedings of the 1990 International Conference on $upercompuiing, pages 401- 413, August 1990. Google ScholarDigital Library
6 T. Fahringer, R. Blasko, and H. P. Zima. Automatic performance prediction to support parallelization of fortran programs for massively parallel systems. In Proc. 6th A CM International Conference on $upercomputing, pages 347-356, Washington D.C., July 1992. Google ScholarDigital Library
7 T. Fahringer and H. Zima. A static parameter based performance prediction tool for parallel programs. In Proceedings of the 7th International Conference on Supercomputing, pages 207- 219, Tokyo, Japan, July 1993. Google ScholarDigital Library
8 J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness, in Proceedings of the ~th International Workshop on Languages and Compilers for Parallel Computing, pages 328-343, Santa Clara, California, USA, August 1991. Google ScholarDigital Library
9 K. Gallivan, W. Jalby, A. Malony, and H. Wijshoff. Performance prediction of loop constructs on multiprocessor hierarchical-memory systems. In Proceedings of the A CM International Conference on $upercomputing, 1989. Google ScholarDigital Library
10 D. Cannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. In Proceedings of the 1987 International Conference on Supercomputing, pages 229-254, 1987. Google ScholarDigital Library
11 M. Gupta and P. Banerjee. Compile-time estimation of communication costs on multicomputers. in Proc. 6th International Parallel Processing Symposium, Beverly Hills, California, March 1992. Google ScholarDigital Library
12 M. Gupta, S. Midkiff, E. Schonberg, P. Sweeney, K.Y. Wang, and M. Burke. Ptran ii- a compiler for high performance fortran. In Proceedings of 4th Workshop on Compilers for Parallel Computers, Dec 1993.Google Scholar
13 S. Hiranandani, K. Kennedy, and C. Tseng. Evaluation of compiler optimizations for Fortran D on MIMD distributed-memory machines. In Proceedings of the 6th A GM International Conference on Supercomputing, pages 1-14, July 1992. Google ScholarDigital Library
14 M. Lain, E. Rothberg, and M. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the ~th International Conference on Architectural Support for Programming Languages and Operation Systems, Santa Clara, CA, April 1991. Google ScholarDigital Library
15 V. Sarkar. Partitioning and Scheduling Parallel Programs for Muttiprocessors. Pitman, London, 1989. Google ScholarDigital Library
16 B. Stratum and F. Berman. Predicting the performance of large programs on scalable multicomputers. In Proceedings of the Scalable High Performance Computing Conference, Williamsburg, VA, April 1992.Google Scholar
17 A. J. C. van Gemund. Performance prediction of parallel processing systems: the pamela methodology. In Proceedings of the 7th International Conference on Supevcomputing, pages 318-327, Tokyo, Japan, July 1993. Google ScholarDigital Library
18 K. Wang and D. Gannon. Applying ai techniques to program optimization for parallel computers. In Parallel Processing for $upercomputers and Artificial Intelligence, pages 441-486. McGraw-Hill, New York, New York, 1989.Google Scholar
19 K. Wang and E. Houstis. A performance prediction model for parallel compilers. Technical Report CSD-TR-1041, Department of Computer Sciences, Purdue University, November 1990.Google Scholar

Index Terms

Precise compile-time performance prediction for superscalar-based computers

Recommendations

Performance evaluation of branch prediction strategies in superscalar microprocessors
Read More
Run-time versus compile-time instruction scheduling in superscalar (RISC) processors: performance and tradeoffs
HIPC '96: Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)

The RISC revolution has spurred the development of processors with increasing degrees of instruction level parallelism (ILP). In order to realize the full potential of these processors, multiple instructions must continuously be issued and executed in a ...
Read More
Precise compile-time performance prediction for superscalar-based computers
PLDI '94: Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation

Optimizing compilers (particularly parallel compilers) are constrained by their ability to predict performance consequences of the transformations they apply. Many factors, such as unknowns in control structures, dynamic behavior of programs, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 29, Issue 6
June 1994
360 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/773473
Guest Editor:
Brent T. Hailpern
Issue’s Table of Contents
PLDI '94: Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
August 1994
360 pages
ISBN:089791662X
DOI:10.1145/178243
Chairmen:
Vivek Sarkar
IBM Santa Teresa Lab.
,
Barbara Ryder
Rutgers Univ., New Brunswick, NJ
,
Mary Lou Soffa
Univ. of Pittsburgh, Pittsburgh, PA
Copyright © 1994 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1994
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 827
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Precise compile-time performance prediction for superscalar-based computers

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Performance evaluation of branch prediction strategies in superscalar microprocessors

Run-time versus compile-time instruction scheduling in superscalar (RISC) processors: performance and tradeoffs

Precise compile-time performance prediction for superscalar-based computers