ABSTRACT
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
- 1.Adam, T.L., Chandy, K.M., and Dickson, J.R. A comparison of list schedules for parallel processing systems. Communications of the ACM 17, 12 (December 1974), 685-690. Google ScholarDigital Library
- 2.Aho, A.V., Hopcroft, j.E., and Ullman, j.D. The Design and Analysts of Computer Algorithms. Addison-Wesley, Reading, Massachusetts, 1974. Google ScholarDigital Library
- 3.Aiken, A., and Nicolau, A. A realistic resource-constrained software pipelining algorithm. In Advances in Languages and Compilers for Parallel Processing, Nicolau, A., Gelernter, D., Gross, T., and Padua, D., (Editor). Pitman/The MIT Press, London, 1991, 274-290.Google Scholar
- 4.Allen, J.R., Kennedy, K., Porterfield, C., and Warren, J. Conversion of control dependence to data dependence, in Proc. Tenth Annual A CM Symposium on Principles of Programming Languages, (January 1983), 177-189. Google ScholarDigital Library
- 5.Beck, G.R., Yen, D.W.L., and Anderson, T.L. The Cydra 5 mini-supercomputer: architecture and implementation. The Journal of Supercomputing 7, 1/2 (May 1993), 143- 180. Google ScholarDigital Library
- 6.Berry, M., Chen, D., Kuck, D., Lo, S., Pang, Y., Pointer, L., Roloff, R., Samah, A., Clementi, E., Chin, S., Schneider, D., Fox, G., Messina, P., Walker, D., Hsiung, C., Schwarzmeier, J., Lue, L., Orszag, S., Seidl, F., Johnson, O., Goodrum, R., and Martin, J. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. The international Journal of Supercomputer Applications 3, 3 (Fall 1989), 5-40.Google Scholar
- 7.Bodin, F., and Charot, F. Loop optimization for horizontal microcoded machines. In Proc. 1990 International Conference on Supercomputing, (Amsterdam, 1990), 164- 176. Google ScholarDigital Library
- 8.Charlesworth, A.E. An approach to scientific array processing: the architectural design of the AP-120B/FPS- 164 Family. Computer 14, 9 (1981), 18-27.Google ScholarDigital Library
- 9.Davidson, E.S., Shar, L.E., Thomas, A.T., and PateI, J.H. Effective control for pipelined computers. In Proc. COMPCON '90, (San Francisco, February 1975), 181-184.Google Scholar
- 10.Dehnert, J.C., and Towle, R.A. Compiling for the Cydra 5. The Journal of Supercomputing 7, 1/2 (May 1993), 181- 228. Google ScholarDigital Library
- 11.Ebcioglu, K. A compilation technique for software pipelining of loops with conditional jumps. In Froc. 20th Annual Workshop on Microprogramming, (Colorado Springs, Colorado, December 1987), 69-79. Google ScholarDigital Library
- 12.Ebcioglu, K., and Nakatani, T. A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture. In Languages and Compilers for Parallel Computing, Gelernter, D., Nicolau, A., and Padua, D., (Editor). Pitman/The MIT Press, London, 1989, 213-229. Google ScholarDigital Library
- 13.Fisher, J.A. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers C~30, 7 (July 1981), 478-490.Google Scholar
- 14.Fisher, J.A., Landskov, D., and Shriver, B.D. Microcode compaction: looking backward and looking forward. In Proc. 1981 National Computer Conference, (1981), 95- 102.Google ScholarDigital Library
- 15.Gasperoni, F., and Schwiegelshohn, U. Scheduling loops on parallel processors: a simple algorithm with close to optimum performance. In Proc. International Conference CONPAR '92, (1992), 625-636. Google ScholarDigital Library
- 16.Hsu, P.Y.T. Highly Concurrent Scalar Processing. Ph.D thesis, University of Illinois, Urbana-Champaign, 1986. Google ScholarDigital Library
- 17.Hu, T.C. Parallel sequencing and assembly line problems Operations Research 9, 6 (1961), 841-848.Google Scholar
- 18.Huff, R.A. Lifetime-sensitive modulo scheduling. In Proc. SiGPLAN ~93 Conference on Programming Language Design and Implementation, (Albuquerque, New Mexico, June 1993), 258-267. Google ScholarDigital Library
- 19.Hwu, W.W., Mahlke, S.A., Chen, W.Y., Chang, P.P., Warter, N.J., Bringmann, R.A., Ouellette, R.G., Hank, R.E., Kiyohara, T., Haab, G.E., Holm, J.G., and Lavery, D.M. The superblock: an effective technique for VLIW and superscalar compilation. The Journal of Supercomputing 7~ 1/2 (May 1993), 229-248. Google ScholarDigital Library
- 20.Jain, S. Circular scheduling: a new technique to perform software pipelining, in Proc. A CM SiGPLAN ~91 Conference on Programming Language Design and Implementation, (June 1991), 219-228. Google ScholarDigital Library
- 21.Lam, M. Software pipelining: an effective scheduling technique for VLIW machines. In Proc. A CM SIGPLAN '88 Conference on Programming Language Design and Implementation, (June 1988), 318-327. Google ScholarDigital Library
- 22.Lawler, E.L. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart and Winston, 1976.Google Scholar
- 23.Lowney, P G., Freudenberger, S.M., Karzes, T.J., Lichtenstein, W.D., Nix, R.P., O'Donnell, J.S., and Ruttenberg, J.C. The Multiflow trace scheduling compiler. The Journal of Supercomputing 7, 1/2 (May 1993), 51- 142. Google ScholarDigital Library
- 24.Mahlke, S.A., Chen, W.Y., Bringmann, R.A., Hank, R.E., Hwu, W.W., Rau, B.R., and Schlansker, M.S. Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Transactions on Computer Systems 11, 4 (November 1993), 376-408. Google ScholarDigital Library
- 25.Mahlke. S.A., Lin, D.C., Chen, W.Y., Hank, R.E., and Bringmann, R.A. Effective compiler support for predicated execution using the hyperblock. In Proc. 25th Annual international Symposium on Microarchitecture, (1992), 45-54. Google ScholarDigital Library
- 26.Mateti, P., and Deo, N. On algorithms for enumerating all circuits of a graph. SIAM Journal of Computing 5, 1 (1976), 90-99.Google ScholarCross Ref
- 27.McMahon, F.H. The Livermore Fortran kernels: a computer test of the numerical performance range. Technical Report UCRL-53745. Lawrence Livermore National Laboratory. Livermore, California, 1986.Google Scholar
- 28.Moon, S.-M., and Ebcioglu, K. An efficient resourceconstrained global scheduling technique for superscalar and VLIW processors. In Proc. 25th Annual International Symposium on Microarchitecture, (Portland, Oregon, December 1992). Google ScholarDigital Library
- 29.Park, J.C.H., and Schlansker, M.S. On predicated execution. Technical Report HPL-91-58. Hewlett Packard Laboratories, 1991.Google Scholar
- 30.Ramakrishnan, S. Software pipelining in PA-RiSC compilers. Hewlett-Packard Journal, (July 1992), 39-45.Google Scholar
- 31.Ramamoorthy, C.V., Chandy, K.M., and Gonzalez, M.J. Optimal scheduling strategies in a multiprocessor system. IEEE Transactions on Computers C-21, 2 (February 1972), 137-146.Google ScholarDigital Library
- 32.Rau, B.R. Data flow and dependence analysis for instruction level parallelism. In Fourth Internattonal Workshop on Languages and Compilers for Parallel Computing, Banerjee, U., Gelernter. D., Nicolau, A., and Padua, D., (Editor). Springer-Verlag, . 1992, 236-250. Google ScholarDigital Library
- 33.Rau, B.R. Iterative Modulo Scheduling. HPL Technical Report. Hewlett-Packard Laboratories, 1994.Google Scholar
- 34.Rau, B.R., and Glaeser, C.D. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, in Proc. Fourteenth Annual Workshop on Microprogramming, (October 1981), 183-198. Google ScholarDigital Library
- 35.Rau, B.R., Lee, M., Tirumalai, P., and Schlansker, M.S. Register allocation for software pipelined loops. In Proc. SiGPLAN'92 Conference on Programming Language Design and Implementation, (San Francisco, June 17-19 1992). Google ScholarDigital Library
- 36.Rau. B.R., Schlansker, M.S., and Tirumalai, P.P. Code generation schemas for modulo scheduled loops, in Proc. 25th Annual International Symposium on Microarchitecture, (Portland, Oregon, December 1992), 158-169. Google ScholarDigital Library
- 37.Rau, B.R., Yen, D.W.L., Yen, W., and Towle, R.A. The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs. Computer 22, 1 (January 1989). 12-35. Google ScholarDigital Library
- 38.Schlansker, M., and Kathail, V. Acceleration of first and higher order recurrences on processors with instruction level parallelism. In Proc. Sixth Annual Workshop on Languages and Compilers for Parallel Computing, (Portland, Oregon, August 1993). Google ScholarDigital Library
- 39.Su, B., and Wang, J. GURPR*: a new global software pipelining algorithm. In Proc. 24th Annual International Symposium on Microarchitecture, (Albuquerque, New Mexico, November 1991), 212-216. Google ScholarDigital Library
- 40.Tiernan, J.C. An efficient search algorithm to find the elementary circuits of a graph. Communications of the A CM/3, (1970), 722-726. Google ScholarDigital Library
- 41.Tirumalai, P., Lee, M., and Schlansker, M.S. Parallelization of loops with exits on pipelined architectures. In Proc. Supercomputing '90, (November 1990), 200-212. Google ScholarDigital Library
- 42.Tokoro, M., Takizuka, T., Tamura, E., and Yamaura, I. A technique of global optimization of microprograms. In Proc. l lth Annual Workshop on Microprogramming, (Asilomar, Califomia, November 1978), 41-50. Google ScholarDigital Library
- 43.Uniejewski, J. SPEC Benchmark Suite: Designed for Today's Advanced Systems. SPEC Newsletter 1, 1 (Fall 1989).Google Scholar
- 44.Van Dongen, V., Gao, G.R., and Ning, Q. A polynomial time method for optimal software pipelining. In Proc. International Conference CONPAR '92, (1992). Google ScholarDigital Library
- 45.Warter, N.J., Lavery, D.M., and Hwu, W.W. The benefit of predicated execution for software pipelining. In Proc. 26th Annual Hawaii international Conference on System Sciences, (Hawaii, 1993).Google ScholarCross Ref
- 46.Warter, N.J., Mahlke, S.A., Hwu, W.W., and Rau, B.R. Reverse if-conversion. In Proc. SIGPLAN '93 Conference on Programming Language Design and Implementation, (Albuquerque, New Mexico, June 1993), 290-299. Google ScholarDigital Library
Index Terms
- Iterative modulo scheduling: an algorithm for software pipelining loops
Recommendations
Iterative Modulo Scheduling
Modulo scheduling is a framework within which algorithms for software pipelining innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and ...
Modulo scheduling of loops in control-intensive non-numeric programs
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on MicroarchitectureMuch of the previous work on modulo scheduling has targeted numeric programs, in which, often, the majority of the loops are well-behaved loop-counter-based loops without early exits. In control-intensive non-numeric programs, the loops frequently have ...
Modulo Scheduling with Reduced Register Pressure
Software pipelining is a scheduling technique that is used by some product compilers in order to expose more instruction level parallelism out of innermost loops. Modulo scheduling refers to a class of algorithms for software pipelining. Most previous ...
Comments