Abstract
Optimizing compilers (particularly parallel compilers) are constrained by their ability to predict performance consequences of the transformations they apply. Many factors, such as unknowns in control structures, dynamic behavior of programs, and complexity of the underlying hardware, make it very difficult for compilers to estimate the performance of the transformations accurately and efficiently. In this paper, we present a performance prediction framework that combines several innovative approaches to solve this problem. First, the framework employs a detailed, architecture-specific, but portable, cost model that can be used to estimate the cost of straight line code efficiently. Second, aggregated costs of loops and conditional statements are computed and represented symbolically. This avoids unnecessary, premature guesses and preserves the precision of the prediction. Third, symbolic comparison allows compilers to choose the best transformation dynamically and systematically. Some methodologies for applying the framework to optimizing parallel compilers to support automatic, performance-guided program restructuring are discussed.
- 1 J. Andrews and C. D. Polychronopoulos. An Analytical Approach to Performance/Cost Modeling of Parallel Computers. PhD thesis, University of Illinois at Urbana-Champaign, Ctr. Supercomputing Res. & Dev., April 1991. CSRD Report No. 1110.Google Scholar
- 2 V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. A static performance estimator to guide data partitioning decisions. In Proceeding of ~he Third A CM $igplan Symposium on Principles and practice of parallel programming (PPOPP), April 1991. Google ScholarDigital Library
- 3 D. Bernstein, D. Cohen, Y. Lavon, and V. Rainish. Performance evaluation of instruction scheduling on the ibm risc system/6000, in Proceedings of MICRO-25, pages 226-235, 1992. Google ScholarDigital Library
- 4 D. Bernstein and M. Rodeh. Global instruction scheduling for superscalar machines. In Proceedings of the A CM SIGPLAN'91 Conference on Programming Language Design and Implementation, pages 241-255, Toronto, Ontario, Canada, June 1991. Google ScholarDigital Library
- 5 F. Bodin, D. Windheiser, W. Jalby, D. Atapattu, M. Lee, and D Cannon. Performance evaluation and prediction for parallel algorithms on the bbn gpl000. In Proceedings of the 1990 International Conference on $upercompuiing, pages 401- 413, August 1990. Google ScholarDigital Library
- 6 T. Fahringer, R. Blasko, and H. P. Zima. Automatic performance prediction to support parallelization of fortran programs for massively parallel systems. In Proc. 6th A CM International Conference on $upercomputing, pages 347-356, Washington D.C., July 1992. Google ScholarDigital Library
- 7 T. Fahringer and H. Zima. A static parameter based performance prediction tool for parallel programs. In Proceedings of the 7th International Conference on Supercomputing, pages 207- 219, Tokyo, Japan, July 1993. Google ScholarDigital Library
- 8 J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness, in Proceedings of the ~th International Workshop on Languages and Compilers for Parallel Computing, pages 328-343, Santa Clara, California, USA, August 1991. Google ScholarDigital Library
- 9 K. Gallivan, W. Jalby, A. Malony, and H. Wijshoff. Performance prediction of loop constructs on multiprocessor hierarchical-memory systems. In Proceedings of the A CM International Conference on $upercomputing, 1989. Google ScholarDigital Library
- 10 D. Cannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. In Proceedings of the 1987 International Conference on Supercomputing, pages 229-254, 1987. Google ScholarDigital Library
- 11 M. Gupta and P. Banerjee. Compile-time estimation of communication costs on multicomputers. in Proc. 6th International Parallel Processing Symposium, Beverly Hills, California, March 1992. Google ScholarDigital Library
- 12 M. Gupta, S. Midkiff, E. Schonberg, P. Sweeney, K.Y. Wang, and M. Burke. Ptran ii- a compiler for high performance fortran. In Proceedings of 4th Workshop on Compilers for Parallel Computers, Dec 1993.Google Scholar
- 13 S. Hiranandani, K. Kennedy, and C. Tseng. Evaluation of compiler optimizations for Fortran D on MIMD distributed-memory machines. In Proceedings of the 6th A GM International Conference on Supercomputing, pages 1-14, July 1992. Google ScholarDigital Library
- 14 M. Lain, E. Rothberg, and M. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the ~th International Conference on Architectural Support for Programming Languages and Operation Systems, Santa Clara, CA, April 1991. Google ScholarDigital Library
- 15 V. Sarkar. Partitioning and Scheduling Parallel Programs for Muttiprocessors. Pitman, London, 1989. Google ScholarDigital Library
- 16 B. Stratum and F. Berman. Predicting the performance of large programs on scalable multicomputers. In Proceedings of the Scalable High Performance Computing Conference, Williamsburg, VA, April 1992.Google Scholar
- 17 A. J. C. van Gemund. Performance prediction of parallel processing systems: the pamela methodology. In Proceedings of the 7th International Conference on Supevcomputing, pages 318-327, Tokyo, Japan, July 1993. Google ScholarDigital Library
- 18 K. Wang and D. Gannon. Applying ai techniques to program optimization for parallel computers. In Parallel Processing for $upercomputers and Artificial Intelligence, pages 441-486. McGraw-Hill, New York, New York, 1989.Google Scholar
- 19 K. Wang and E. Houstis. A performance prediction model for parallel compilers. Technical Report CSD-TR-1041, Department of Computer Sciences, Purdue University, November 1990.Google Scholar
Index Terms
- Precise compile-time performance prediction for superscalar-based computers
Recommendations
Run-time versus compile-time instruction scheduling in superscalar (RISC) processors: performance and tradeoffs
HIPC '96: Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)The RISC revolution has spurred the development of processors with increasing degrees of instruction level parallelism (ILP). In order to realize the full potential of these processors, multiple instructions must continuously be issued and executed in a ...
Precise compile-time performance prediction for superscalar-based computers
PLDI '94: Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementationOptimizing compilers (particularly parallel compilers) are constrained by their ability to predict performance consequences of the transformations they apply. Many factors, such as unknowns in control structures, dynamic behavior of programs, and ...
Comments