skip to main content
research-article

Language virtualization for heterogeneous parallel computing

Published:17 October 2010Publication History
Skip Abstract Section

Abstract

As heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatiblemix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these models do little to shield developers from the difficult problems of parallelization, data decomposition and machine-specific details. Most programmersare having a difficult time using these programming models effectively. To provide a programming modelthat addresses the productivity and performance requirements for the average programmer, we explore a domainspecificapproach to heterogeneous parallel programming.

We propose language virtualization as a new principle that enables the construction of highly efficient parallel domain specific languages that are embedded in a common host language. We define criteria for language virtualization and present techniques to achieve them.We present two concrete case studies of domain-specific languages that are implemented using our virtualization approach.

References

  1. }}Scala. http://www.scala-lang.org.Google ScholarGoogle Scholar
  2. }}AMD. The Industry-Changing Impact of Accelerated Computing. Website. http://sites.amd.com/us/Documents/AMD_fusion_Whitepaper.pdf.Google ScholarGoogle Scholar
  3. }}S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith. Efficient Management of Parallelism in Object Oriented Numerical Software Libraries. In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, pages 163--202. Birkhäuser Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}J. Bentley. Programming pearls: little languages. Commun. ACM, 29(8):711--721, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}G. E. Blelloch and J. Greiner. A Provable Time and Space Efficient Implementation of NESL. In ACM SIGPLAN International Conference on Functional Programming, pages 213--225, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}D. L. Brown, W. D. Henshaw, and D. J. Quinlan. Overture: An object-oriented framework for solving partial differential equations on overlapping grids. In SIAM conference on Object Oriented Methods for Scientfic Computing, volume UCRL-JC-132017, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}C. Calvert and D. Kulkarni. Essential LINQ. Addison-Wesley Professional, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}J. Carette, O. Kiselyov, and C. chieh Shan. Finally tagless, partially evaluated. In Z. Shao, editor, APLAS, volume 4807 of Lecture Notes in Computer Science, pages 222--238. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}S. Chakradhar, A. Raghunathan, and J. Meng. Best-effort parallel execution framework for recognition and mining applications. In Proc. of the 23rd Annual Int'l Symp. on Parallel and Distributed Processing (IPDPS'09), pages 1--12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}B. L. Chamberlain, D. Callahan, and H. P. Zima. Parallel programmability and the chapel language. IJHPCA, 21(3):291--312, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}E. Chow, A. Cleary, and R. Falgout. Design of the hypre Preconditioner Library. In M. Henderson, C. Anderson, and S. Lyons, editors, SIAM Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing, pages 21--23, 1998.Google ScholarGoogle Scholar
  12. }}C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS '06, pages 281--288, 2006.Google ScholarGoogle Scholar
  13. }}J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}C. Elliott, S. Finne, and O. De Moor. Compiling embedded languages. Journal of Functional Programming, 13(03):455--481, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}J. M. et. al. SISAL: Streams and iterators in a single assignment language, language reference manual. Technical Report M-146, Lawrence Livermore National Laboratory, March 1985.Google ScholarGoogle Scholar
  16. }}B. Feigin and A. Mycroft. Jones optimality and hardware virtualization: a report on work in progress. In PEPM, pages 169--175, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}M. Frigo. A fast fourier transform compiler. In PLDI, pages 169--180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}S. Gorlatch. Send-receive considered harmful: myths and realities of message passing. ACM Trans. Program. Lang. Syst., 26(1):47--56, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}H. P. Graf, E. Cosatto, L. Bottou, I. Durdanovic, and V. Vapnik. Parallel support vector machines: The cascade svm. In NIPS ’04, 2004.Google ScholarGoogle Scholar
  20. }}M. Guerrero, E. Pizzi, R. Rosenbaum, K. Swadi, and W. Taha. Implementing DSLs in metaOCaml. In OOPSLA '04: Companion to the 19th annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, pages 41--42, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}M. A. Heroux, R. A. Bartlett, V. E. Howle, R. J. Hoekstra, J. J. Hu, T. G. Kolda, R. B. Lehoucq, K. R. Long, R. P. Pawlowski, E. T. Phipps, A. G. Salinger, H. K. Thornquist, R. S. Tuminaro, J. M. Willenbring, A. Williams, and K. S. Stanley. An overview of the Trilinos project. ACM Trans. Math. Softw., 31(3):397--423, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}C. Hofer, K. Ostermann, T. Rendel, and A. Moors. Polymorphic embedding of dsls. In Y. Smaragdakis and J. G. Siek, editors, GPCE, pages 137--148. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}P. Hudak. Modular domain specific languages and tools. In Software Reuse, 1998. Proceedings. Fifth International Conference on, pages 134--142, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}Intel. From a Few Cores to Many: A Tera-scale Computing Research Review. Website. http://download.intel.com/research/platform/terascale/terascale_overvie%w_paper.pdf.Google ScholarGoogle Scholar
  25. }}M. Irwin and J. Shen, editors. Revitalizing Computer Architecture Research. Computing Research Association, dec 2005.Google ScholarGoogle Scholar
  26. }}S. L. P. Jones, R. Leshchinskiy, G. Keller, and M. M. T. Chakravarty. Harnessing the Multicores: Nested Data Parallelism in Haskell. In R. Hariharan, M. Mukund, and V. Vinay, editors, FSTTCS, volume 2 of LIPIcs, pages 383--414. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2008.Google ScholarGoogle Scholar
  27. }}G. L. S. Jr. Parallel programming and parallel abstractions in fortress. In IEEE PACT, page 157. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. }}G. Karypis and V. Kumar. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. J. Parallel Distrib. Comput., 48(1):71--95, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. }}K. Kennedy, B. Broom, A. Chauhan, R. Fowler, J. Garvin, C. Koelbel, C. McCosh, and J. Mellor-Crummey. Telescoping languages: A system for automatic generation of domain languages. Proceedings of the IEEE, 93(3):387--408, 2005. This provides a current overview of the entire Telescoping Languages Project.Google ScholarGoogle ScholarCross RefCross Ref
  30. }}D. Leijen and E. Meijer. Domain specific embedded compilers. In DSL: Proceedings of the 2 nd conference on Domain-specific languages: Austin, Texas, United States. Association for Computing Machinery, Inc, One Astor Plaza, 1515 Broadway, New York, NY, 10036--5701, USA,, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. }}M. Odersky and M. Zenger. Scalable component abstractions. In R. E. Johnson and R. P. Gabriel, editors, OOPSLA, pages 41--57. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. }}K. Olukotun, B. A. Nayfeh, L. Hammond, K. G. Wilson, and K. Chang. The case for a single-chip multiprocessor. In ASPLOS '96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. }}E. Pasalic, W. Taha, and T. Sheard. Tagless staged interpreters for typed languages. SIGPLAN Not., 37(9):218--229, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. }}S. Peyton Jones, D. Vytiniotis, S. Weirich, and G. Washburn. Simple unification-based type inference for GADTs. SIGPLAN Not., 41(9):50--61, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. }}M. Püschel, J. M. F. Moura, B. Singer, J. Xiong, J. Johnson, D. A. Padua, M. M. Veloso, and R. W. Johnson. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. IJHPCA, 18(1):21--45, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. }}D. Quinlan and R. Parsons. A P array classes for architecture independent finite differences computations. In ONNSKI, 1994.Google ScholarGoogle Scholar
  37. }}D. J. Quinlan, B. Miller, B. Philip, and M. Schordan. Treating a user-defined parallel library as a domain-specific language. In IPDPS. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. }}J. V. W. Reynders, P. J. Hinker, J. C. Cummings, S. R. Atlas, S. Banerjee, W. F. Humphrey, K. Keahey, M. Srikant, and M. Tholburn. POOMA: A Framework for Scientific Simulation on Parallel Architectures, 1996.Google ScholarGoogle Scholar
  39. }}T. Rompf and M. Odersky. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs. In GPCE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. }}V. A. Saraswat. X10: Concurrent programming for modern architectures. In APLAS, page 1, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. }}S.-B. Scholz. Single Assignment C: efficient support for high-level array operations in a functional setting. J. Funct. Program., 13(6):1005--1059, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. }}T. Schrijvers, S. Peyton Jones, M. Sulzmann, and D. Vytiniotis. Complete and decidable type inference for GADTs. In ICFP '09: Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, pages 341--352, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. }}T. Sheard and S. Jones. Template meta-programming for Haskell. ACM SIGPLAN Notices, 37(12):60--75, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. }}G. C. Sih and E. A. Lee. A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst., 4(2):175--187, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. }}G. L. Steele. Common Lisp the Language. Digital Press, Billerica, MA, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. }}J. R. Stewart and H. C. Edwards. A framework approach for developing parallel adaptive multiphysics applications. Finite Elem. Anal. Des., 40(12):1599--1617, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. }}H. Sutter. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb's Journal, 30(3), 2005.Google ScholarGoogle Scholar
  48. }}W. M. Taha. Multistage programming: its theory and applications. PhD thesis, 1999. Supervisor-Sheard, Tim. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. }}A. van Deursen, P. Klint, and J. Visser. Domain-specific languages: an annotated bibliography. SIGPLAN Not., 35(6):26--36, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. }}D. Vandevoorde and N. Josuttis. C templates: the Complete Guide. Addison-Wesley Professional, 2003.Google ScholarGoogle Scholar
  51. }}T. Veldhuizen. Expression templates, C gems, 1996.Google ScholarGoogle Scholar
  52. }}T. L. Veldhuizen. Arrays in Blitz. In D. Caromel, R. R. Oldehoeft, and M. Tholburn, editors, ISCOPE, volume 1505 of Lecture Notes in Computer Science, pages 223--230. Springer, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. }}T. L. Veldhuizen. Active Libraries and Universal Languages. PhD thesis, Indiana University Computer Science, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. }}R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Language virtualization for heterogeneous parallel computing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 45, Issue 10
          OOPSLA '10
          October 2010
          957 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1932682
          Issue’s Table of Contents
          • cover image ACM Conferences
            OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
            October 2010
            984 pages
            ISBN:9781450302036
            DOI:10.1145/1869459

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 October 2010

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader