skip to main content
research-article

When polyhedral transformations meet SIMD code generation

Published:16 June 2013Publication History
Skip Abstract Section

Abstract

Data locality and parallelism are critical optimization objectives for performance on modern multi-core machines. Both coarse-grain parallelism (e.g., multi-core) and fine-grain parallelism (e.g., vector SIMD) must be effectively exploited, but despite decades of progress at both ends, current compiler optimization schemes that attempt to address data locality and both kinds of parallelism often fail at one of the three objectives.

We address this problem by proposing a 3-step framework, which aims for integrated data locality, multi-core parallelism and SIMD execution of programs. We define the concept of vectorizable codelets, with properties tailored to achieve effective SIMD code generation for the codelets. We leverage the power of a modern high-level transformation framework to restructure a program to expose good ISA-independent vectorizable codelets, exploiting multi-dimensional data reuse. Then, we generate ISA-specific customized code for the codelets, using a collection of lower-level SIMD-focused optimizations.

We demonstrate our approach on a collection of numerical kernels that we automatically tile, parallelize and vectorize, exhibiting significant performance improvements over existing compilers.

References

  1. PoCC, the polyhedral compiler collection. http://pocc.sourceforge.net.Google ScholarGoogle Scholar
  2. PolyOpt/C. http://hpcrl.cse.ohio-state.edu/wiki/index.php/polyopt/c.Google ScholarGoogle Scholar
  3. www.spiral.net/software/stencilgen.html.Google ScholarGoogle Scholar
  4. V. Bandishti, I. Pananilath, , and U. Bondhugula. Tiling stencil computations to maximize parallelism. In ACM/IEEE Conf. on Supercomputing (SC'12), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Baskaran, A. Hartono, S. Tavarageri, T. Henretty, J. Ramanujam, and P. Sadayappan. Parameterized tiling revisited. In CGO, April 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Juan-les-Pins, France, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Bastoul and P. Feautrier. More legal transformations for locality. In Euro-Par'10 Intl. Euro-Par conference, LNCS 3149, pages 272--283, Pisa, august 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. D. Batory, C. Johnson, B. MacDonald, and D. von Heeder. Achieving extensibility through product-lines and domain-specific languages: A case study. ACM Transactions on Software Engineering and Methodology (TOSEM), 11(2):191--214, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Batory, R. Lopez-Herrejon, and J.-P. Martin. Generating productlines of product-families. In Proc. Automated Software Engineering Conference (ASE), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Bentley. Programming pearls: little languages. Communications of the ACM, 29(8):711--721, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral program optimization system. In PLDI, June 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Chen, J. Chame, and M. Hall. Chill: A framework for composing high-level loop transformations. Technical Report 08-897, USC Computer Science Technical Report, 2008.Google ScholarGoogle Scholar
  13. K. Czarnecki and U. Eisenecker. Generative Programming: Methods, Tools, and Applications. Addison-Wesley, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, C. Whaley, and K. Yelick. Self adapting linear algebra algorithms and software. Proc. of the IEEE, 93(2):293--312, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Eichenberger, P. Wu, and K. O'Brien. Vectorization for simd architectures with alignment constraints. In PLDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Feautrier. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Intl. J. of Parallel Programming, 21(6):389--420, Dec. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Frigo. A fast Fourier transform compiler. In PLDI, pages 169--180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proc. of the IEEE, 93(2):216--231, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  19. S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations. International Journal of Parallel Programming, 34(3):261--317, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. J. Gough. Little language processing, an alternative to courses on compiler construction. SIGCSE Bulletin, 13(3):31--34, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. GPCE. ACM conference on generative programming and component engineering.Google ScholarGoogle Scholar
  22. A. Hartono, M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy, B. Norris, J. Ramanujam, and P. Sadayappan. Parametric multi-level tiling of imperfectly nested loops. In ICS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Henretty, K. Stock, L.-N. Pouchet, F. Franchetti, J. Ramanujam, and P. Sadayappan. Data layout transformation for stencil computations on short simd architectures. In ETAPS International Conference on Compiler Construction (CC'11), pages 225--245, Saarbrcken, Germany, Mar. 2011. Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Hudak. Domain specific languages. Available from author on request, 1997.Google ScholarGoogle Scholar
  25. E.-J. Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. Int'l J. High Performance Computing Applications, 18(1), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Kennedy and J. Allen. Optimizing compilers for modern architectures: A dependence-based approach. Morgan Kaufmann, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Kong, L.-N. Pouchet, and P. Sadayappan. Abstract vector SIMD code generation using the polyhedral model. Technical Report Technical Report 4/13-TR08, Ohio State University, Apr. 2013.Google ScholarGoogle Scholar
  28. S. Larsen and S. P. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In POPL, pages 201--214, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for simd. In PLDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Nuzman and A. Zaks. Outer-loop vectorization: revisited for short simd architectures. In PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos. Iterative optimization in the polyhedral model: Part II, multidimensional time. In PLDI, pages 90--100. ACM Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, and P. Sadayappan. Combined iterative and model-driven optimization in an automatic parallelization framework. In ACM Supercomputing Conf. (SC'10), New Orleans, Lousiana, Nov. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. Loop transformations: Convexity, pruning and optimization. In POPL, pages 549--562, Austin, TX, Jan. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proc. of the IEEE, 93(2):232--275, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  36. D. R. Smith. Mechanizing the development of software. In M. Broy, editor, Calculational System Design, Proc. of the International Summer School Marktoberdorf. NATO ASI Series, IOS Press, 1999. Kestrel Institute Technical Report KES.U.99.1.Google ScholarGoogle Scholar
  37. W. Taha. Domain-specific languages. In Proc. Intl Conf. Computer Engineering and Systems (ICCES), 2008.Google ScholarGoogle Scholar
  38. K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. Polyhedral-model guided loop-nest auto-vectorization. In PACT, Sept. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. N. Vasilache. Scalable Program Optimization Techniques in the Polyhedra Model. PhD thesis, University of Paris-Sud 11, 2007.Google ScholarGoogle Scholar
  40. N. Vasilache, B. Meister, M. Baskaran, and R. Lethin. Joint scheduling and layout optimization to enable multi-level vectorization. In Proc. of IMPACT'12, Jan. 2012.Google ScholarGoogle Scholar
  41. Y. Voronenko and M. Püschel. Algebraic signal processing theory: Cooley-tukey type algorithms for real dfts. IEEE Transactions on Signal Processing, 57(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. R. C. Whaley and J. Dongarra. Automatically Tuned Linear Algebra Software (ATLAS). In Proc. Supercomputing, 1998. math-atlas. sourceforge.net. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. J. Wolfe. High Performance Compilers For Parallel Computing. Addison-Wesley, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. When polyhedral transformations meet SIMD code generation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 6
      PLDI '13
      June 2013
      515 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2499370
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2013
        546 pages
        ISBN:9781450320146
        DOI:10.1145/2491956

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 June 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader