skip to main content
10.1145/192724.192731acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article
Free Access

Iterative modulo scheduling: an algorithm for software pipelining loops

Published:30 November 1994Publication History

ABSTRACT

Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.

References

  1. 1.Adam, T.L., Chandy, K.M., and Dickson, J.R. A comparison of list schedules for parallel processing systems. Communications of the ACM 17, 12 (December 1974), 685-690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Aho, A.V., Hopcroft, j.E., and Ullman, j.D. The Design and Analysts of Computer Algorithms. Addison-Wesley, Reading, Massachusetts, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Aiken, A., and Nicolau, A. A realistic resource-constrained software pipelining algorithm. In Advances in Languages and Compilers for Parallel Processing, Nicolau, A., Gelernter, D., Gross, T., and Padua, D., (Editor). Pitman/The MIT Press, London, 1991, 274-290.Google ScholarGoogle Scholar
  4. 4.Allen, J.R., Kennedy, K., Porterfield, C., and Warren, J. Conversion of control dependence to data dependence, in Proc. Tenth Annual A CM Symposium on Principles of Programming Languages, (January 1983), 177-189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Beck, G.R., Yen, D.W.L., and Anderson, T.L. The Cydra 5 mini-supercomputer: architecture and implementation. The Journal of Supercomputing 7, 1/2 (May 1993), 143- 180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Berry, M., Chen, D., Kuck, D., Lo, S., Pang, Y., Pointer, L., Roloff, R., Samah, A., Clementi, E., Chin, S., Schneider, D., Fox, G., Messina, P., Walker, D., Hsiung, C., Schwarzmeier, J., Lue, L., Orszag, S., Seidl, F., Johnson, O., Goodrum, R., and Martin, J. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. The international Journal of Supercomputer Applications 3, 3 (Fall 1989), 5-40.Google ScholarGoogle Scholar
  7. 7.Bodin, F., and Charot, F. Loop optimization for horizontal microcoded machines. In Proc. 1990 International Conference on Supercomputing, (Amsterdam, 1990), 164- 176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Charlesworth, A.E. An approach to scientific array processing: the architectural design of the AP-120B/FPS- 164 Family. Computer 14, 9 (1981), 18-27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Davidson, E.S., Shar, L.E., Thomas, A.T., and PateI, J.H. Effective control for pipelined computers. In Proc. COMPCON '90, (San Francisco, February 1975), 181-184.Google ScholarGoogle Scholar
  10. 10.Dehnert, J.C., and Towle, R.A. Compiling for the Cydra 5. The Journal of Supercomputing 7, 1/2 (May 1993), 181- 228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.Ebcioglu, K. A compilation technique for software pipelining of loops with conditional jumps. In Froc. 20th Annual Workshop on Microprogramming, (Colorado Springs, Colorado, December 1987), 69-79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Ebcioglu, K., and Nakatani, T. A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture. In Languages and Compilers for Parallel Computing, Gelernter, D., Nicolau, A., and Padua, D., (Editor). Pitman/The MIT Press, London, 1989, 213-229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.Fisher, J.A. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers C~30, 7 (July 1981), 478-490.Google ScholarGoogle Scholar
  14. 14.Fisher, J.A., Landskov, D., and Shriver, B.D. Microcode compaction: looking backward and looking forward. In Proc. 1981 National Computer Conference, (1981), 95- 102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.Gasperoni, F., and Schwiegelshohn, U. Scheduling loops on parallel processors: a simple algorithm with close to optimum performance. In Proc. International Conference CONPAR '92, (1992), 625-636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.Hsu, P.Y.T. Highly Concurrent Scalar Processing. Ph.D thesis, University of Illinois, Urbana-Champaign, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.Hu, T.C. Parallel sequencing and assembly line problems Operations Research 9, 6 (1961), 841-848.Google ScholarGoogle Scholar
  18. 18.Huff, R.A. Lifetime-sensitive modulo scheduling. In Proc. SiGPLAN ~93 Conference on Programming Language Design and Implementation, (Albuquerque, New Mexico, June 1993), 258-267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.Hwu, W.W., Mahlke, S.A., Chen, W.Y., Chang, P.P., Warter, N.J., Bringmann, R.A., Ouellette, R.G., Hank, R.E., Kiyohara, T., Haab, G.E., Holm, J.G., and Lavery, D.M. The superblock: an effective technique for VLIW and superscalar compilation. The Journal of Supercomputing 7~ 1/2 (May 1993), 229-248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.Jain, S. Circular scheduling: a new technique to perform software pipelining, in Proc. A CM SiGPLAN ~91 Conference on Programming Language Design and Implementation, (June 1991), 219-228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.Lam, M. Software pipelining: an effective scheduling technique for VLIW machines. In Proc. A CM SIGPLAN '88 Conference on Programming Language Design and Implementation, (June 1988), 318-327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.Lawler, E.L. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart and Winston, 1976.Google ScholarGoogle Scholar
  23. 23.Lowney, P G., Freudenberger, S.M., Karzes, T.J., Lichtenstein, W.D., Nix, R.P., O'Donnell, J.S., and Ruttenberg, J.C. The Multiflow trace scheduling compiler. The Journal of Supercomputing 7, 1/2 (May 1993), 51- 142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.Mahlke, S.A., Chen, W.Y., Bringmann, R.A., Hank, R.E., Hwu, W.W., Rau, B.R., and Schlansker, M.S. Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Transactions on Computer Systems 11, 4 (November 1993), 376-408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.Mahlke. S.A., Lin, D.C., Chen, W.Y., Hank, R.E., and Bringmann, R.A. Effective compiler support for predicated execution using the hyperblock. In Proc. 25th Annual international Symposium on Microarchitecture, (1992), 45-54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.Mateti, P., and Deo, N. On algorithms for enumerating all circuits of a graph. SIAM Journal of Computing 5, 1 (1976), 90-99.Google ScholarGoogle ScholarCross RefCross Ref
  27. 27.McMahon, F.H. The Livermore Fortran kernels: a computer test of the numerical performance range. Technical Report UCRL-53745. Lawrence Livermore National Laboratory. Livermore, California, 1986.Google ScholarGoogle Scholar
  28. 28.Moon, S.-M., and Ebcioglu, K. An efficient resourceconstrained global scheduling technique for superscalar and VLIW processors. In Proc. 25th Annual International Symposium on Microarchitecture, (Portland, Oregon, December 1992). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.Park, J.C.H., and Schlansker, M.S. On predicated execution. Technical Report HPL-91-58. Hewlett Packard Laboratories, 1991.Google ScholarGoogle Scholar
  30. 30.Ramakrishnan, S. Software pipelining in PA-RiSC compilers. Hewlett-Packard Journal, (July 1992), 39-45.Google ScholarGoogle Scholar
  31. 31.Ramamoorthy, C.V., Chandy, K.M., and Gonzalez, M.J. Optimal scheduling strategies in a multiprocessor system. IEEE Transactions on Computers C-21, 2 (February 1972), 137-146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32.Rau, B.R. Data flow and dependence analysis for instruction level parallelism. In Fourth Internattonal Workshop on Languages and Compilers for Parallel Computing, Banerjee, U., Gelernter. D., Nicolau, A., and Padua, D., (Editor). Springer-Verlag, . 1992, 236-250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33.Rau, B.R. Iterative Modulo Scheduling. HPL Technical Report. Hewlett-Packard Laboratories, 1994.Google ScholarGoogle Scholar
  34. 34.Rau, B.R., and Glaeser, C.D. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, in Proc. Fourteenth Annual Workshop on Microprogramming, (October 1981), 183-198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35.Rau, B.R., Lee, M., Tirumalai, P., and Schlansker, M.S. Register allocation for software pipelined loops. In Proc. SiGPLAN'92 Conference on Programming Language Design and Implementation, (San Francisco, June 17-19 1992). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. 36.Rau. B.R., Schlansker, M.S., and Tirumalai, P.P. Code generation schemas for modulo scheduled loops, in Proc. 25th Annual International Symposium on Microarchitecture, (Portland, Oregon, December 1992), 158-169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. 37.Rau, B.R., Yen, D.W.L., Yen, W., and Towle, R.A. The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs. Computer 22, 1 (January 1989). 12-35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. 38.Schlansker, M., and Kathail, V. Acceleration of first and higher order recurrences on processors with instruction level parallelism. In Proc. Sixth Annual Workshop on Languages and Compilers for Parallel Computing, (Portland, Oregon, August 1993). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. 39.Su, B., and Wang, J. GURPR*: a new global software pipelining algorithm. In Proc. 24th Annual International Symposium on Microarchitecture, (Albuquerque, New Mexico, November 1991), 212-216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. 40.Tiernan, J.C. An efficient search algorithm to find the elementary circuits of a graph. Communications of the A CM/3, (1970), 722-726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. 41.Tirumalai, P., Lee, M., and Schlansker, M.S. Parallelization of loops with exits on pipelined architectures. In Proc. Supercomputing '90, (November 1990), 200-212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. 42.Tokoro, M., Takizuka, T., Tamura, E., and Yamaura, I. A technique of global optimization of microprograms. In Proc. l lth Annual Workshop on Microprogramming, (Asilomar, Califomia, November 1978), 41-50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. 43.Uniejewski, J. SPEC Benchmark Suite: Designed for Today's Advanced Systems. SPEC Newsletter 1, 1 (Fall 1989).Google ScholarGoogle Scholar
  44. 44.Van Dongen, V., Gao, G.R., and Ning, Q. A polynomial time method for optimal software pipelining. In Proc. International Conference CONPAR '92, (1992). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. 45.Warter, N.J., Lavery, D.M., and Hwu, W.W. The benefit of predicated execution for software pipelining. In Proc. 26th Annual Hawaii international Conference on System Sciences, (Hawaii, 1993).Google ScholarGoogle ScholarCross RefCross Ref
  46. 46.Warter, N.J., Mahlke, S.A., Hwu, W.W., and Rau, B.R. Reverse if-conversion. In Proc. SIGPLAN '93 Conference on Programming Language Design and Implementation, (Albuquerque, New Mexico, June 1993), 290-299. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Iterative modulo scheduling: an algorithm for software pipelining loops

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture
            November 1994
            233 pages
            ISBN:0897917073
            DOI:10.1145/192724

            Copyright © 1994 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 November 1994

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate484of2,242submissions,22%

            Upcoming Conference

            MICRO '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader