Software Pipelining of Nested Loops

Muthukumar, Kalyan; Doshi, Gautam

doi:10.1007/3-540-45306-7_12

Kalyan Muthukumar⁵ &
Gautam Doshi⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2027))

Included in the following conference series:

International Conference on Compiler Construction

1141 Accesses
13 Citations

Abstract

Software pipelining is a technique to improve the performance of a loop by overlapping the execution of several iterations. The execution of a software-pipelined loop goes through three phases: prolog, kernel, and epilog. Software pipelining works best if most of the time is spent in the kernel phase rather than in the prolog or epilog phases. This can happen only if the trip count of a pipelined loop is large enough to amortize the overhead of prolog and epilog phases. When a software-pipelined loop is part of a loop nest, the overhead of filling and draining the pipeline is incurred for every iteration of the outer loop. This paper introduces two novel methods to minimize the overhead of software-pipeline fill/drain in nested loops. In effect, these methods overlap the draining of the software pipeline corresponding to one outer loop iteration with the filling of the software pipeline corresponding to one or more subsequent outer loop iterations. This results in better instruction-level parallelism (ILP) for the loop nest, particularly for loop nests in which the trip counts of inner loops are small. These methods exploit Itanium™ architecture software pipelining features such as predication, register rotation, and explicit epilog stage control, to minimize the code size overhead associated with such a transformation. However, the key idea behind these methods is applicable to other architectures as well. These methods have been prototyped in the Intel optimizing compiler for the Itanium™ processor. Experimental results on SPEC2000 benchmark programs are presented.

Download to read the full chapter text

Chapter PDF

Enhancing the Effectiveness of Inlining in Automatic Parallelization

Article 06 August 2021

Jichi Guo, Qing Yi & Kleanthis Psarris

Instruction Level Loop De-optimization

A Static Greedy and Dynamic Adaptive Thread Spawning Approach for Loop-Level Parallelism

Article 17 November 2014

Mei-Rong Li, Yin-Liang Zhao, … Qi-Ming Wang

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Aiken, A., Nicolau, A.: Optimal Loop Parallelization. Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, June, (1988), 308–317
Google Scholar
Allan, Vicki H., Jones, Reese B., Lee, Randall M., Allan, Stephen J.: Software Pipelining. ACM Computing Surveys, 27, No. 3, September (1995) 367–432
Article Google Scholar
Banerjee, U.: Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston, MA, (1993)
Google Scholar
Charlesworth, A.: An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family. IEEE Computer, Sept. (1981).
Google Scholar
Dehnert, J. C., Hsu, P. Y., Bratt, J. P.: Overlapped Loop Support in the Cydra 5. Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, April, (1989), 26–38
Google Scholar
Ebcioglu, K.: A Compilation Technique for Software Pipelining of Loops with Conditional Jumps. Proceedings of the 20th Annual Workshop on Microprogramming and Microarchitecture", Dec. (1987), 69–79
Google Scholar
Eisenbeis, C., et. al: A New Fast Algorithm for Optimal Register Allocation in Modulo Scheduled Loops. INRIA TR-RR3337, January (1998)
Google Scholar
Huck, J., et al: Introducing the IA-64 Architecture. IEEE Micro, 20, Number 5, Sep/Oct (2000)
Google Scholar
Intel Corporation: IA-64 Architecture Software Developer’s Manual. Santa Clara, CA, April 2000
Google Scholar
Lam, M. S.: Software Pipelining: An Effective Scheduling Technique for VLIW Machines. Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation, June, 1988, 318–328
Google Scholar
Mahlke, S. A., Chen, W. Y., Hwu, W. W., Rau, B. R., Schlansker, M. S.: Sentinel Scheduling for Superscalar and VLIW Processors. Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, Oct, (1992), 238–247
Google Scholar
Mahlke, S. A., Hank, R. E., McCormick, J.E., August, D. I., Hwu, W. W.: A Comparison of Full and Partial Predicated Execution Support for ILP Processors. Proceedings of the 22nd International Symposium on Computer Architecture, June, (1995), 138–150
Google Scholar
Rau, B. R., Glaeser, C. D.: Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. Proceedings of the 20th Annual Workshop on Microprogramming and Microarchitecture, Oct, (1981), 183–198
Google Scholar
Rau, B. R.: Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops. MICRO-27, (1994), 63–74
Google Scholar
Rau, B. R, Schlansker, M. S., Tirumalai, P. P.: Code Generation Schema for Modulo Scheduled Loops. MICRO-25, (1992), 158–169
Google Scholar
Ruttenberg, J., Gao, G. R., Stoutchinin, A., Lichtenstein, W.: Software Pipelining Showdown: Optimal vs. Heuristic Methods in a Production Compiler. Proceedings of the ACM SIGPLAN 96 Conference on Programming Language Design and Implementation, May, (1996), 1–11
Google Scholar
Wolfe, M.: High-Performance Compilers for Parallel Computing. Addison-Wesley, Redwood City, CA, (1996)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Intel Corporation, 2200 Mission College Blvd., Santa Clara, CA, 95052, USA
Kalyan Muthukumar & Gautam Doshi

Authors

Kalyan Muthukumar
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Doshi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachrichtung Informatik, Universität des Saarlandes, Postfach 15 11 50, 66041, Saarbrücken, Germany
Reinhard Wilhelm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muthukumar, K., Doshi, G. (2001). Software Pipelining of Nested Loops. In: Wilhelm, R. (eds) Compiler Construction. CC 2001. Lecture Notes in Computer Science, vol 2027. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45306-7_12

Download citation

DOI: https://doi.org/10.1007/3-540-45306-7_12
Published: 23 March 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41861-0
Online ISBN: 978-3-540-45306-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Software Pipelining of Nested Loops

Abstract

Chapter PDF

Similar content being viewed by others

Enhancing the Effectiveness of Inlining in Automatic Parallelization

Instruction Level Loop De-optimization

A Static Greedy and Dynamic Adaptive Thread Spawning Approach for Loop-Level Parallelism

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Software Pipelining of Nested Loops

Abstract

Chapter PDF

Similar content being viewed by others

Enhancing the Effectiveness of Inlining in Automatic Parallelization

Instruction Level Loop De-optimization

A Static Greedy and Dynamic Adaptive Thread Spawning Approach for Loop-Level Parallelism

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation