Abstract
As heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatiblemix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these models do little to shield developers from the difficult problems of parallelization, data decomposition and machine-specific details. Most programmersare having a difficult time using these programming models effectively. To provide a programming modelthat addresses the productivity and performance requirements for the average programmer, we explore a domainspecificapproach to heterogeneous parallel programming.
We propose language virtualization as a new principle that enables the construction of highly efficient parallel domain specific languages that are embedded in a common host language. We define criteria for language virtualization and present techniques to achieve them.We present two concrete case studies of domain-specific languages that are implemented using our virtualization approach.
- }}Scala. http://www.scala-lang.org.Google Scholar
- }}AMD. The Industry-Changing Impact of Accelerated Computing. Website. http://sites.amd.com/us/Documents/AMD_fusion_Whitepaper.pdf.Google Scholar
- }}S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith. Efficient Management of Parallelism in Object Oriented Numerical Software Libraries. In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, pages 163--202. Birkhäuser Press, 1997. Google ScholarDigital Library
- }}J. Bentley. Programming pearls: little languages. Commun. ACM, 29(8):711--721, 1986. Google ScholarDigital Library
- }}G. E. Blelloch and J. Greiner. A Provable Time and Space Efficient Implementation of NESL. In ACM SIGPLAN International Conference on Functional Programming, pages 213--225, May 1996. Google ScholarDigital Library
- }}D. L. Brown, W. D. Henshaw, and D. J. Quinlan. Overture: An object-oriented framework for solving partial differential equations on overlapping grids. In SIAM conference on Object Oriented Methods for Scientfic Computing, volume UCRL-JC-132017, 1999. Google ScholarDigital Library
- }}C. Calvert and D. Kulkarni. Essential LINQ. Addison-Wesley Professional, 2009. Google ScholarDigital Library
- }}J. Carette, O. Kiselyov, and C. chieh Shan. Finally tagless, partially evaluated. In Z. Shao, editor, APLAS, volume 4807 of Lecture Notes in Computer Science, pages 222--238. Springer, 2007. Google ScholarDigital Library
- }}S. Chakradhar, A. Raghunathan, and J. Meng. Best-effort parallel execution framework for recognition and mining applications. In Proc. of the 23rd Annual Int'l Symp. on Parallel and Distributed Processing (IPDPS'09), pages 1--12, 2009. Google ScholarDigital Library
- }}B. L. Chamberlain, D. Callahan, and H. P. Zima. Parallel programmability and the chapel language. IJHPCA, 21(3):291--312, 2007. Google ScholarDigital Library
- }}E. Chow, A. Cleary, and R. Falgout. Design of the hypre Preconditioner Library. In M. Henderson, C. Anderson, and S. Lyons, editors, SIAM Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing, pages 21--23, 1998.Google Scholar
- }}C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS '06, pages 281--288, 2006.Google Scholar
- }}J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004. Google ScholarDigital Library
- }}C. Elliott, S. Finne, and O. De Moor. Compiling embedded languages. Journal of Functional Programming, 13(03):455--481, 2003. Google ScholarDigital Library
- }}J. M. et. al. SISAL: Streams and iterators in a single assignment language, language reference manual. Technical Report M-146, Lawrence Livermore National Laboratory, March 1985.Google Scholar
- }}B. Feigin and A. Mycroft. Jones optimality and hardware virtualization: a report on work in progress. In PEPM, pages 169--175, 2008. Google ScholarDigital Library
- }}M. Frigo. A fast fourier transform compiler. In PLDI, pages 169--180, 1999. Google ScholarDigital Library
- }}S. Gorlatch. Send-receive considered harmful: myths and realities of message passing. ACM Trans. Program. Lang. Syst., 26(1):47--56, 2004. Google ScholarDigital Library
- }}H. P. Graf, E. Cosatto, L. Bottou, I. Durdanovic, and V. Vapnik. Parallel support vector machines: The cascade svm. In NIPS ’04, 2004.Google Scholar
- }}M. Guerrero, E. Pizzi, R. Rosenbaum, K. Swadi, and W. Taha. Implementing DSLs in metaOCaml. In OOPSLA '04: Companion to the 19th annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, pages 41--42, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- }}M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. An overview of the Trilinos project. ACM Trans. Math. Softw., 31(3):397--423, 2005. Google ScholarDigital Library
- }}C. Hofer, K. Ostermann, T. Rendel, and A. Moors. Polymorphic embedding of dsls. In Y. Smaragdakis and J. G. Siek, editors, GPCE, pages 137--148. ACM, 2008. Google ScholarDigital Library
- }}P. Hudak. Modular domain specific languages and tools. In Software Reuse, 1998. Proceedings. Fifth International Conference on, pages 134--142, 1998. Google ScholarDigital Library
- }}Intel. From a Few Cores to Many: A Tera-scale Computing Research Review. Website. http://download.intel.com/research/platform/terascale/terascale_overvie%w_paper.pdf.Google Scholar
- }}M. Irwin and J. Shen, editors. Revitalizing Computer Architecture Research. Computing Research Association, dec 2005.Google Scholar
- }}S. L. P. Jones, R. Leshchinskiy, G. Keller, and M. M. T. Chakravarty. Harnessing the Multicores: Nested Data Parallelism in Haskell. In R. Hariharan, M. Mukund, and V. Vinay, editors, FSTTCS, volume 2 of LIPIcs, pages 383--414. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2008.Google Scholar
- }}G. L. S. Jr. Parallel programming and parallel abstractions in fortress. In IEEE PACT, page 157. IEEE Computer Society, 2005. Google ScholarDigital Library
- }}G. Karypis and V. Kumar. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. J. Parallel Distrib. Comput., 48(1):71--95, 1998. Google ScholarDigital Library
- }}K. Kennedy, B. Broom, A. Chauhan, R. Fowler, J. Garvin, C. Koelbel, C. McCosh, and J. Mellor-Crummey. Telescoping languages: A system for automatic generation of domain languages. Proceedings of the IEEE, 93(3):387--408, 2005. This provides a current overview of the entire Telescoping Languages Project.Google ScholarCross Ref
- }}D. Leijen and E. Meijer. Domain specific embedded compilers. In DSL: Proceedings of the 2 nd conference on Domain-specific languages: Austin, Texas, United States. Association for Computing Machinery, Inc, One Astor Plaza, 1515 Broadway, New York, NY, 10036--5701, USA,, 1999. Google ScholarDigital Library
- }}M. Odersky and M. Zenger. Scalable component abstractions. In R. E. Johnson and R. P. Gabriel, editors, OOPSLA, pages 41--57. ACM, 2005. Google ScholarDigital Library
- }}K. Olukotun, B. A. Nayfeh, L. Hammond, K. G. Wilson, and K. Chang. The case for a single-chip multiprocessor. In ASPLOS '96. Google ScholarDigital Library
- }}E. Pasalic, W. Taha, and T. Sheard. Tagless staged interpreters for typed languages. SIGPLAN Not., 37(9):218--229, 2002. Google ScholarDigital Library
- }}S. Peyton Jones, D. Vytiniotis, S. Weirich, and G. Washburn. Simple unification-based type inference for GADTs. SIGPLAN Not., 41(9):50--61, 2006. Google ScholarDigital Library
- }}M. Püschel, J. M. F. Moura, B. Singer, J. Xiong, J. Johnson, D. A. Padua, M. M. Veloso, and R. W. Johnson. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. IJHPCA, 18(1):21--45, 2004. Google ScholarDigital Library
- }}D. Quinlan and R. Parsons. A P array classes for architecture independent finite differences computations. In ONNSKI, 1994.Google Scholar
- }}D. J. Quinlan, B. Miller, B. Philip, and M. Schordan. Treating a user-defined parallel library as a domain-specific language. In IPDPS. IEEE Computer Society, 2002. Google ScholarDigital Library
- }}J. V. W. Reynders, P. J. Hinker, J. C. Cummings, S. R. Atlas, S. Banerjee, W. F. Humphrey, K. Keahey, M. Srikant, and M. Tholburn. POOMA: A Framework for Scientific Simulation on Parallel Architectures, 1996.Google Scholar
- }}T. Rompf and M. Odersky. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs. In GPCE, 2010. Google ScholarDigital Library
- }}V. A. Saraswat. X10: Concurrent programming for modern architectures. In APLAS, page 1, 2007. Google ScholarDigital Library
- }}S.-B. Scholz. Single Assignment C: efficient support for high-level array operations in a functional setting. J. Funct. Program., 13(6):1005--1059, 2003. Google ScholarDigital Library
- }}T. Schrijvers, S. Peyton Jones, M. Sulzmann, and D. Vytiniotis. Complete and decidable type inference for GADTs. In ICFP '09: Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, pages 341--352, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- }}T. Sheard and S. Jones. Template meta-programming for Haskell. ACM SIGPLAN Notices, 37(12):60--75, 2002. Google ScholarDigital Library
- }}G. C. Sih and E. A. Lee. A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst., 4(2):175--187, 1993. Google ScholarDigital Library
- }}G. L. Steele. Common Lisp the Language. Digital Press, Billerica, MA, 1984. Google ScholarDigital Library
- }}J. R. Stewart and H. C. Edwards. A framework approach for developing parallel adaptive multiphysics applications. Finite Elem. Anal. Des., 40(12):1599--1617, 2004. Google ScholarDigital Library
- }}H. Sutter. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb's Journal, 30(3), 2005.Google Scholar
- }}W. M. Taha. Multistage programming: its theory and applications. PhD thesis, 1999. Supervisor-Sheard, Tim. Google ScholarDigital Library
- }}A. van Deursen, P. Klint, and J. Visser. Domain-specific languages: an annotated bibliography. SIGPLAN Not., 35(6):26--36, 2000. Google ScholarDigital Library
- }}D. Vandevoorde and N. Josuttis. C templates: the Complete Guide. Addison-Wesley Professional, 2003.Google Scholar
- }}T. Veldhuizen. Expression templates, C gems, 1996.Google Scholar
- }}T. L. Veldhuizen. Arrays in Blitz. In D. Caromel, R. R. Oldehoeft, and M. Tholburn, editors, ISCOPE, volume 1505 of Lecture Notes in Computer Science, pages 223--230. Springer, 1998. Google ScholarDigital Library
- }}T. L. Veldhuizen. Active Libraries and Universal Languages. PhD thesis, Indiana University Computer Science, May 2004. Google ScholarDigital Library
- }}R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, 2001.Google ScholarDigital Library
Index Terms
- Language virtualization for heterogeneous parallel computing
Recommendations
Language virtualization for heterogeneous parallel computing
OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsAs heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatiblemix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these models do little to shield developers from the ...
A domain-specific approach to heterogeneous parallelism
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingExploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be ...
A domain-specific approach to heterogeneous parallelism
PPoPP '11Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be ...
Comments