Abstract
Software pipelining is a technique to improve the performance of a loop by overlapping the execution of several iterations. The execution of a software-pipelined loop goes through three phases: prolog, kernel, and epilog. Software pipelining works best if most of the time is spent in the kernel phase rather than in the prolog or epilog phases. This can happen only if the trip count of a pipelined loop is large enough to amortize the overhead of prolog and epilog phases. When a software-pipelined loop is part of a loop nest, the overhead of filling and draining the pipeline is incurred for every iteration of the outer loop. This paper introduces two novel methods to minimize the overhead of software-pipeline fill/drain in nested loops. In effect, these methods overlap the draining of the software pipeline corresponding to one outer loop iteration with the filling of the software pipeline corresponding to one or more subsequent outer loop iterations. This results in better instruction-level parallelism (ILP) for the loop nest, particularly for loop nests in which the trip counts of inner loops are small. These methods exploit Itanium™ architecture software pipelining features such as predication, register rotation, and explicit epilog stage control, to minimize the code size overhead associated with such a transformation. However, the key idea behind these methods is applicable to other architectures as well. These methods have been prototyped in the Intel optimizing compiler for the Itanium™ processor. Experimental results on SPEC2000 benchmark programs are presented.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aiken, A., Nicolau, A.: Optimal Loop Parallelization. Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, June, (1988), 308–317
Allan, Vicki H., Jones, Reese B., Lee, Randall M., Allan, Stephen J.: Software Pipelining. ACM Computing Surveys, 27, No. 3, September (1995) 367–432
Banerjee, U.: Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston, MA, (1993)
Charlesworth, A.: An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family. IEEE Computer, Sept. (1981).
Dehnert, J. C., Hsu, P. Y., Bratt, J. P.: Overlapped Loop Support in the Cydra 5. Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, April, (1989), 26–38
Ebcioglu, K.: A Compilation Technique for Software Pipelining of Loops with Conditional Jumps. Proceedings of the 20th Annual Workshop on Microprogramming and Microarchitecture", Dec. (1987), 69–79
Eisenbeis, C., et. al: A New Fast Algorithm for Optimal Register Allocation in Modulo Scheduled Loops. INRIA TR-RR3337, January (1998)
Huck, J., et al: Introducing the IA-64 Architecture. IEEE Micro, 20, Number 5, Sep/Oct (2000)
Intel Corporation: IA-64 Architecture Software Developer’s Manual. Santa Clara, CA, April 2000
Lam, M. S.: Software Pipelining: An Effective Scheduling Technique for VLIW Machines. Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, June, 1988, 318–328
Mahlke, S. A., Chen, W. Y., Hwu, W. W., Rau, B. R., Schlansker, M. S.: Sentinel Scheduling for Superscalar and VLIW Processors. Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, Oct, (1992), 238–247
Mahlke, S. A., Hank, R. E., McCormick, J.E., August, D. I., Hwu, W. W.: A Comparison of Full and Partial Predicated Execution Support for ILP Processors. Proceedings of the 22nd International Symposium on Computer Architecture, June, (1995), 138–150
Rau, B. R., Glaeser, C. D.: Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. Proceedings of the 20th Annual Workshop on Microprogramming and Microarchitecture, Oct, (1981), 183–198
Rau, B. R.: Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops. MICRO-27, (1994), 63–74
Rau, B. R, Schlansker, M. S., Tirumalai, P. P.: Code Generation Schema for Modulo Scheduled Loops. MICRO-25, (1992), 158–169
Ruttenberg, J., Gao, G. R., Stoutchinin, A., Lichtenstein, W.: Software Pipelining Showdown: Optimal vs. Heuristic Methods in a Production Compiler. Proceedings of the ACM SIGPLAN 96 Conference on Programming Language Design and Implementation, May, (1996), 1–11
Wolfe, M.: High-Performance Compilers for Parallel Computing. Addison-Wesley, Redwood City, CA, (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Muthukumar, K., Doshi, G. (2001). Software Pipelining of Nested Loops. In: Wilhelm, R. (eds) Compiler Construction. CC 2001. Lecture Notes in Computer Science, vol 2027. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45306-7_12
Download citation
DOI: https://doi.org/10.1007/3-540-45306-7_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41861-0
Online ISBN: 978-3-540-45306-2
eBook Packages: Springer Book Archive