Article

Free Access

Iterative modulo scheduling: an algorithm for software pipelining loops

Author:
B. Ramakrishna Rau

Hewlett-Packard Laboratories, 1501 Page Mill Road, Bldg. 3L, Palo Alto, CA

Hewlett-Packard Laboratories, 1501 Page Mill Road, Bldg. 3L, Palo Alto, CA
View Profile

MICRO 27: Proceedings of the 27th annual international symposium on MicroarchitectureNovember 1994Pages 63–74https://doi.org/10.1145/192724.192731

Published:30 November 1994Publication History

MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture

Pages 63–74

ABSTRACT

Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.

References

1.Adam, T.L., Chandy, K.M., and Dickson, J.R. A comparison of list schedules for parallel processing systems. Communications of the ACM 17, 12 (December 1974), 685-690. Google ScholarDigital Library
2.Aho, A.V., Hopcroft, j.E., and Ullman, j.D. The Design and Analysts of Computer Algorithms. Addison-Wesley, Reading, Massachusetts, 1974. Google ScholarDigital Library
3.Aiken, A., and Nicolau, A. A realistic resource-constrained software pipelining algorithm. In Advances in Languages and Compilers for Parallel Processing, Nicolau, A., Gelernter, D., Gross, T., and Padua, D., (Editor). Pitman/The MIT Press, London, 1991, 274-290.Google Scholar
4.Allen, J.R., Kennedy, K., Porterfield, C., and Warren, J. Conversion of control dependence to data dependence, in Proc. Tenth Annual A CM Symposium on Principles of Programming Languages, (January 1983), 177-189. Google ScholarDigital Library
5.Beck, G.R., Yen, D.W.L., and Anderson, T.L. The Cydra 5 mini-supercomputer: architecture and implementation. The Journal of Supercomputing 7, 1/2 (May 1993), 143- 180. Google ScholarDigital Library
6.Berry, M., Chen, D., Kuck, D., Lo, S., Pang, Y., Pointer, L., Roloff, R., Samah, A., Clementi, E., Chin, S., Schneider, D., Fox, G., Messina, P., Walker, D., Hsiung, C., Schwarzmeier, J., Lue, L., Orszag, S., Seidl, F., Johnson, O., Goodrum, R., and Martin, J. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. The international Journal of Supercomputer Applications 3, 3 (Fall 1989), 5-40.Google Scholar
7.Bodin, F., and Charot, F. Loop optimization for horizontal microcoded machines. In Proc. 1990 International Conference on Supercomputing, (Amsterdam, 1990), 164- 176. Google ScholarDigital Library
8.Charlesworth, A.E. An approach to scientific array processing: the architectural design of the AP-120B/FPS- 164 Family. Computer 14, 9 (1981), 18-27.Google ScholarDigital Library
9.Davidson, E.S., Shar, L.E., Thomas, A.T., and PateI, J.H. Effective control for pipelined computers. In Proc. COMPCON '90, (San Francisco, February 1975), 181-184.Google Scholar
10.Dehnert, J.C., and Towle, R.A. Compiling for the Cydra 5. The Journal of Supercomputing 7, 1/2 (May 1993), 181- 228. Google ScholarDigital Library
11.Ebcioglu, K. A compilation technique for software pipelining of loops with conditional jumps. In Froc. 20th Annual Workshop on Microprogramming, (Colorado Springs, Colorado, December 1987), 69-79. Google ScholarDigital Library
12.Ebcioglu, K., and Nakatani, T. A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture. In Languages and Compilers for Parallel Computing, Gelernter, D., Nicolau, A., and Padua, D., (Editor). Pitman/The MIT Press, London, 1989, 213-229. Google ScholarDigital Library
13.Fisher, J.A. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers C~30, 7 (July 1981), 478-490.Google Scholar
14.Fisher, J.A., Landskov, D., and Shriver, B.D. Microcode compaction: looking backward and looking forward. In Proc. 1981 National Computer Conference, (1981), 95- 102.Google ScholarDigital Library
15.Gasperoni, F., and Schwiegelshohn, U. Scheduling loops on parallel processors: a simple algorithm with close to optimum performance. In Proc. International Conference CONPAR '92, (1992), 625-636. Google ScholarDigital Library
16.Hsu, P.Y.T. Highly Concurrent Scalar Processing. Ph.D thesis, University of Illinois, Urbana-Champaign, 1986. Google ScholarDigital Library
17.Hu, T.C. Parallel sequencing and assembly line problems Operations Research 9, 6 (1961), 841-848.Google Scholar
18.Huff, R.A. Lifetime-sensitive modulo scheduling. In Proc. SiGPLAN ~93 Conference on Programming Language Design and Implementation, (Albuquerque, New Mexico, June 1993), 258-267. Google ScholarDigital Library
19.Hwu, W.W., Mahlke, S.A., Chen, W.Y., Chang, P.P., Warter, N.J., Bringmann, R.A., Ouellette, R.G., Hank, R.E., Kiyohara, T., Haab, G.E., Holm, J.G., and Lavery, D.M. The superblock: an effective technique for VLIW and superscalar compilation. The Journal of Supercomputing 7~ 1/2 (May 1993), 229-248. Google ScholarDigital Library
20.Jain, S. Circular scheduling: a new technique to perform software pipelining, in Proc. A CM SiGPLAN ~91 Conference on Programming Language Design and Implementation, (June 1991), 219-228. Google ScholarDigital Library
21.Lam, M. Software pipelining: an effective scheduling technique for VLIW machines. In Proc. A CM SIGPLAN '88 Conference on Programming Language Design and Implementation, (June 1988), 318-327. Google ScholarDigital Library
22.Lawler, E.L. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart and Winston, 1976.Google Scholar
23.Lowney, P G., Freudenberger, S.M., Karzes, T.J., Lichtenstein, W.D., Nix, R.P., O'Donnell, J.S., and Ruttenberg, J.C. The Multiflow trace scheduling compiler. The Journal of Supercomputing 7, 1/2 (May 1993), 51- 142. Google ScholarDigital Library
24.Mahlke, S.A., Chen, W.Y., Bringmann, R.A., Hank, R.E., Hwu, W.W., Rau, B.R., and Schlansker, M.S. Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Transactions on Computer Systems 11, 4 (November 1993), 376-408. Google ScholarDigital Library
25.Mahlke. S.A., Lin, D.C., Chen, W.Y., Hank, R.E., and Bringmann, R.A. Effective compiler support for predicated execution using the hyperblock. In Proc. 25th Annual international Symposium on Microarchitecture, (1992), 45-54. Google ScholarDigital Library
26.Mateti, P., and Deo, N. On algorithms for enumerating all circuits of a graph. SIAM Journal of Computing 5, 1 (1976), 90-99.Google ScholarCross Ref
27.McMahon, F.H. The Livermore Fortran kernels: a computer test of the numerical performance range. Technical Report UCRL-53745. Lawrence Livermore National Laboratory. Livermore, California, 1986.Google Scholar
28.Moon, S.-M., and Ebcioglu, K. An efficient resourceconstrained global scheduling technique for superscalar and VLIW processors. In Proc. 25th Annual International Symposium on Microarchitecture, (Portland, Oregon, December 1992). Google ScholarDigital Library
29.Park, J.C.H., and Schlansker, M.S. On predicated execution. Technical Report HPL-91-58. Hewlett Packard Laboratories, 1991.Google Scholar
30.Ramakrishnan, S. Software pipelining in PA-RiSC compilers. Hewlett-Packard Journal, (July 1992), 39-45.Google Scholar
31.Ramamoorthy, C.V., Chandy, K.M., and Gonzalez, M.J. Optimal scheduling strategies in a multiprocessor system. IEEE Transactions on Computers C-21, 2 (February 1972), 137-146.Google ScholarDigital Library
32.Rau, B.R. Data flow and dependence analysis for instruction level parallelism. In Fourth Internattonal Workshop on Languages and Compilers for Parallel Computing, Banerjee, U., Gelernter. D., Nicolau, A., and Padua, D., (Editor). Springer-Verlag, . 1992, 236-250. Google ScholarDigital Library
33.Rau, B.R. Iterative Modulo Scheduling. HPL Technical Report. Hewlett-Packard Laboratories, 1994.Google Scholar
34.Rau, B.R., and Glaeser, C.D. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, in Proc. Fourteenth Annual Workshop on Microprogramming, (October 1981), 183-198. Google ScholarDigital Library
35.Rau, B.R., Lee, M., Tirumalai, P., and Schlansker, M.S. Register allocation for software pipelined loops. In Proc. SiGPLAN'92 Conference on Programming Language Design and Implementation, (San Francisco, June 17-19 1992). Google ScholarDigital Library
36.Rau. B.R., Schlansker, M.S., and Tirumalai, P.P. Code generation schemas for modulo scheduled loops, in Proc. 25th Annual International Symposium on Microarchitecture, (Portland, Oregon, December 1992), 158-169. Google ScholarDigital Library
37.Rau, B.R., Yen, D.W.L., Yen, W., and Towle, R.A. The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs. Computer 22, 1 (January 1989). 12-35. Google ScholarDigital Library
38.Schlansker, M., and Kathail, V. Acceleration of first and higher order recurrences on processors with instruction level parallelism. In Proc. Sixth Annual Workshop on Languages and Compilers for Parallel Computing, (Portland, Oregon, August 1993). Google ScholarDigital Library
39.Su, B., and Wang, J. GURPR*: a new global software pipelining algorithm. In Proc. 24th Annual International Symposium on Microarchitecture, (Albuquerque, New Mexico, November 1991), 212-216. Google ScholarDigital Library
40.Tiernan, J.C. An efficient search algorithm to find the elementary circuits of a graph. Communications of the A CM/3, (1970), 722-726. Google ScholarDigital Library
41.Tirumalai, P., Lee, M., and Schlansker, M.S. Parallelization of loops with exits on pipelined architectures. In Proc. Supercomputing '90, (November 1990), 200-212. Google ScholarDigital Library
42.Tokoro, M., Takizuka, T., Tamura, E., and Yamaura, I. A technique of global optimization of microprograms. In Proc. l lth Annual Workshop on Microprogramming, (Asilomar, Califomia, November 1978), 41-50. Google ScholarDigital Library
43.Uniejewski, J. SPEC Benchmark Suite: Designed for Today's Advanced Systems. SPEC Newsletter 1, 1 (Fall 1989).Google Scholar
44.Van Dongen, V., Gao, G.R., and Ning, Q. A polynomial time method for optimal software pipelining. In Proc. International Conference CONPAR '92, (1992). Google ScholarDigital Library
45.Warter, N.J., Lavery, D.M., and Hwu, W.W. The benefit of predicated execution for software pipelining. In Proc. 26th Annual Hawaii international Conference on System Sciences, (Hawaii, 1993).Google ScholarCross Ref
46.Warter, N.J., Mahlke, S.A., Hwu, W.W., and Rau, B.R. Reverse if-conversion. In Proc. SIGPLAN '93 Conference on Programming Language Design and Implementation, (Albuquerque, New Mexico, June 1993), 290-299. Google ScholarDigital Library

Index Terms

Iterative modulo scheduling: an algorithm for software pipelining loops
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Communications management
2. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis
      1. Scheduling algorithms
    2. Online algorithms
      1. Online learning algorithms
        Scheduling algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Sequential decision making

Recommendations

Iterative Modulo Scheduling

Modulo scheduling is a framework within which algorithms for software pipelining innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and ...
Read More
Modulo scheduling of loops in control-intensive non-numeric programs
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

Much of the previous work on modulo scheduling has targeted numeric programs, in which, often, the majority of the loops are well-behaved loop-counter-based loops without early exits. In control-intensive non-numeric programs, the loops frequently have ...
Read More
Modulo Scheduling with Reduced Register Pressure

Software pipelining is a scheduling technique that is used by some product compilers in order to expose more instruction level parallelism out of innermost loops. Modulo scheduling refers to a class of algorithms for software pipelining. Most previous ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture
November 1994
233 pages
ISBN:0897917073
DOI:10.1145/192724
Chairmen:
Hans Mulder
Intel Corp.
,
Matthew Farrens
Univ. of California, Davis
Copyright © 1994 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 November 1994
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
instruction scheduling
loop scheduling
modulo scheduling
software pipelining
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 556
  Total Citations
  View Citations
- 2,962
  Total Downloads
- Downloads (Last 12 months)208
- Downloads (Last 6 weeks)24
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Iterative Modulo Scheduling

Modulo scheduling of loops in control-intensive non-numeric programs

Modulo Scheduling with Reduced Register Pressure

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Iterative Modulo Scheduling

Modulo scheduling of loops in control-intensive non-numeric programs

Modulo Scheduling with Reduced Register Pressure

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media