Abstract
Automatic data transfer generation is a critical step for guided or automatic code generation for accelerators using distributed memories. Although good results have been achieved for loop nests, more complex control flows such as switches or while loops are generally not handled. This paper shows how to leverage the convex array regions abstraction to generate data transfers. The scope of this study ranges from inter-procedural analysis in simple loop nests with function calls, to inter-iteration data reuse optimization and arbitrary control flow in loop bodies. Generated transfers are approximated when an exact solution cannot be found. Array regions are also used to extend redundant load store elimination to array variables. The approach has been successfully applied to GPUs and domain-specific hardware accelerators.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alias, C., Darte, A., Plesco, A.: Program Analysis and Source-Level Communication Optimizations for High-Level Synthesis. Rapport de recherche RR-7648, INRIA (June 2011), http://hal.inria.fr/inria-00601822
Alias, C., Darte, A., Plesco, A.: Optimizing Remote Accesses for Offloaded Kernels: Application to High-Level Synthesis for FPGA. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)
Alias, C., Darte, A., Plesco, A.: Optimizing Remote Accesses for Offloaded Kernels: Application to High-level Synthesis for FPGA. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pp. 1–10. ACM, New York (2012)
Amini, M., Coelho, F., Irigoin, F., Keryell, R.: Static compilation analysis for host-accelerator communication optimization. In: International Workshop on Languages and Compilers for Parallel Computing, LCPC (September 2011)
Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., McMahon, J.O., Pasquier, F.X., Péan, G., Villalon, P.: Par4All: From convex array regions to heterogeneous computing. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)
Baskaran, M.M., Ramanujam, J., Sadayappan, P.: Automatic C-to-CUDA Code Generation for Affine Programs. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 244–263. Springer, Heidelberg (2010)
Benabderrahmane, M.-W., Pouchet, L.-N., Cohen, A., Bastoul, C.: The Polyhedral Model Is More Widely Applicable Than You Think. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 283–303. Springer, Heidelberg (2010)
Bonnot, P., Lemonnier, F., Edelin, G., Gaillat, G., Ruch, O., Gauget, P.: Definition and SIMD implementation of a multi-processing architecture approach on FPGA. In: Design Automation and Test in Europe, DATE, pp. 610–615. IEEE Computer Society Press (2008)
Coelho, F.: Étude de la Compilation du High Performance Fortran. Ph.D. thesis, Université Paris VI (1993)
Creusillet, B.: Array Region Analyses and Applications. Ph.D. thesis, MINES ParisTech. (1996)
Creusillet, B., Irigoin, F.: Exact vs. Approximate Array Region Analyses. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1996. LNCS, vol. 1239, pp. 86–100. Springer, Heidelberg (1997)
Creusillet, B., Irigoin, F.: Interprocedural array region analyses. International Journal of Parallel Programming 24(6), 513–546 (1996)
Entreprise, C.: HMPP workbench, http://www.caps-entreprise.com/hmpp.html
Guelton, S.: Building Source-to-Source compilers for Heterogenous targets. Ph.D. thesis, Télécom Bretagne (2011)
Guelton, S.: Transformations for memory size and distribution. [14], chap. 6
Kandemir, M., Ramanujam, J., Irwin, M.J., Vijaykrishnan, N., Kadayif, I., Parikh, A.: A compiler-based approach for dynamically managing scratch-pad memories in embedded systems. In: Computer-Aided Design of Integrated Circuits and Systems, vol. 23, pp. 243–260. IEEE (February 2004)
Meister, B., Leung, A., Vasilache, N., Wohlford, D., Bastoul, C., Lethin, R.: Productivity via automatic code generation for PGAS platforms with the R-Stream compiler. In: Workshop on Asynchrony in the PGAS Programming Model, APGAS, Yorktown Heights, New York (June 2009)
Meister, B., Vasilache, N., Wohlford, D., Baskaran, M.M., Leung, A., Lethin, R.: R-Stream compiler. In: Padua, D.A. (ed.) Encyclopedia of Parallel Computing, pp. 1756–1765. Springer (2011)
NVIDIA, Cray, PGI, CAPS: The OpenACC Specification, version 1.0 (November 2011), http://www.openacc-standard.org/Downloads/OpenACC.1.0.pdf
Pugh, W.: The Omega test: a fast and practical integer programming algorithm for dependence analysis. In: Conference on Supercomputing, pp. 4–13. ACM, New York (1991)
Silkan: Par4All initiative for automatic parallelization (2010), http://www.par4all.org
Torquati, M., Vanneschi, M., Amini, M., Guelton, S., Keryell, R., Lanore, V., Pasquier, F.X., Barreteau, M., Barrère, R., Petrisor, C.T., Lenormand, É., Cantini, C., De Stefani, F.: An innovative compilation tool-chain for embedded multi-core architectures. In: Embedded World Conference (February 2012)
Triolet, R., Feautrier, P., Irigoin, F.: Direct parallelization of call statements. In: ACM SIGPLAN Symposium on Compiler Construction, pp. 176–185 (1986)
Ventroux, N., Sassolas, T., Guerre, A., Creusillet, B., Keryell, R.: SESAM/ Par4All: a tool for joint exploration of MPSoC architectures and dynamic dataflow code generation. In: Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO, pp. 9–16. ACM, New York (2012)
Verdoolaege, S., Grosser, T.: Polyhedral Extraction Tool. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)
Wolfe, M.: Implementing the PGI accelerator model. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU, pp. 43–50. ACM, New York (2010)
Wolfe, M.: Optimizing Data Movement in the PGI Accelerator Programming Model (February 2011), http://www.pgroup.com/lit/articles/insider/v3n1a1.htm
Wonnacott, D., Pugh, W.: Nonlinear array dependence analysis. In: Proceedings of the Third Workshop on Languages, Compilers and Run-Time Systems for Scalable Computers (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guelton, S., Amini, M., Creusillet, B. (2013). Beyond Do Loops: Data Transfer Generation with Convex Array Regions. In: Kasahara, H., Kimura, K. (eds) Languages and Compilers for Parallel Computing. LCPC 2012. Lecture Notes in Computer Science, vol 7760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37658-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-37658-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37657-3
Online ISBN: 978-3-642-37658-0
eBook Packages: Computer ScienceComputer Science (R0)