Abstract
Modern parallel computing hardware demands increasingly specialized attention to the details of scheduling and load balancing across heterogeneous execution resources that may include GPU and cloud environments, in addition to traditional CPUs. Many existing solutions address the challenges of particular resources, but do so in isolation, and in general do not compose within larger systems. We propose a general, composable abstraction for execution resources, along with a continuation-based meta-scheduler that harnesses those resources in the context of a deterministic parallel programming library for Haskell. We demonstrate performance benefits of combined CPU/GPU scheduling over either alone, and of combined multithreaded/distributed scheduling over existing distributed programming approaches for Haskell.
- Code for cilk runtime system. https://github.com/mirrors/gcc/tree/cilkplus/libcilkrts.Google Scholar
- Intel Cilk Plus. http://software.intel.com/en-us/articles/intel-cilk-plus/.Google Scholar
- Openmp article. http://intel.ly/9h7c7B.Google Scholar
- Threading Building Blocks Reference Manual, 2011. http://threadingbuildingblocks.org/documentation.php.Google Scholar
- N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures, SPAA '98, pages 119--129, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
- S. Blagodurov, S. Zhuravlev, A. Fedorova, and A. Kamali. A case for numa-aware contention management on multicore systems. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pages 557--558, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- G. Blelloch, P. Gibbons, Y. Matias, and G. Narlikar. Space-efficient scheduling of parallelism with synchronization variables. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 12--23, Newport, RI, jun 1997. Google ScholarDigital Library
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30:207--216, August 1995. Google ScholarDigital Library
- M. M. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating haskell array codes with multicore gpus. In Proceedings of the sixth workshop on Declarative aspects of multicore programming, DAMP '11, pages 3--14, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- J. Chhugani, A. D. Nguyen, V. W. Lee, W. Macy, M. Hagog, Y.-K. Chen, A. Baransi, S. Kumar, and P. Dubey. Efficient implementation of sorting on multi-core simd cpu architecture. PVLDB, 1(2):1313--1324, 2008. Google ScholarDigital Library
- K. Claessen. A poor man's concurrency monad. J. Funct. Program., 9:313--323, May 1999. Google ScholarDigital Library
- D. Doel. The vector-algorithms package. http://hackage.haskell.org/package/vector-algorithms. Efficient algorithms for vector arrays.Google Scholar
- M. Dybdal. The hopencl package. http://hackage.haskell.org/package/hopencl. Haskell bindings for OpenCL.Google Scholar
- J. Epstein, A. P. Black, and S. Peyton-Jones. Towards haskell in the cloud. In Proceedings of the 4th ACM symposium on Haskell, Haskell '11, pages 118--129, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- M. Fluet, M. Rainey, J. Reppy, A. Shaw, and Y. Xiao. Manticore: a heterogeneous parallel language. In Proceedings of the 2007 workshop on Declarative aspects of multicore programming, DAMP '07, pages 37--44, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- C. T. Haynes, D. P. Friedman, and M. Wand. Obtaining coroutines with continuations. Computer Languages, 11(3.4):143--153, 1986. Google ScholarDigital Library
- C. Lauterback, Q. Mo, and D. Manocha. Work distribution methods on GPUs. University of North Carolina Technical Report TR009-16.Google Scholar
- D. Lea. A java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA '00, pages 36--43, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
- D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. SIGPLAN Not., 44:227--242, Oct. 2009. Google ScholarDigital Library
- P. Li, S. Marlow, S. Peyton Jones, and A. Tolmach. Lightweight concurrency primitives for ghc. In Proceedings of the ACM SIGPLAN workshop on Haskell workshop, Haskell '07, pages 107--118, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- P. Li and S. Zdancewic. Combining events and threads for scalable network services implementation and evaluation of monadic, application-level concurrency primitives. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, pages 189--199, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- J. P. Magalhães, A. Dijkstra, J. Jeuring, and A. Löh. A generic deriving mechanism for haskell. In Proceedings of the third ACM Haskell symposium on Haskell, Haskell '10, pages 37--48, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- P. Maier, P. Trinder, and H.-W. Loidl. Implementing a High-Level Distributed-Memory parallel Haskell in Haskell, 2011. Submitted to IFL 2011.Google Scholar
- G. Mainland and G. Morrisett. Nikola: embedding compiled gpu functions in haskell. In Proceedings of the third ACM Haskell symposium on Haskell, Haskell '10, pages 67--78, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- S. Marlow, R. Newton, and S. Peyton Jones. A monad for deterministic parallelism. In Proceedings of the 4th ACM symposium on Haskell, Haskell '11, pages 71--82, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- S. Marlow, S. Peyton Jones, and S. Singh. Runtime support for multicore haskell. In Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, ICFP '09, pages 65--78, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- T. L. McDonell. cuda. http://hackage.haskell.org/package/cuda. FFI binding to the CUDA interface for programming NVIDIA GPUs.Google Scholar
- C. Newburn, B. So, Z. Liu, M. McCool, A. Ghuloum, S. Toit, Z. G. Wang, Z. H. Du, Y. Chen, G. Wu, P. Guo, Z. Liu, and D. Zhang. Intel's array building blocks: A retargetable, dynamic compiler and embedded language. In Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on, pages 224 --235, april 2011. Google ScholarDigital Library
- R. Newton, C.-P. Chen, and S. Marlow. Intel Concurrent Collections for Haskell, March, 2011. MIT CSAIL Technical Report, MIT-CSAIL-TR-2011-015.Google Scholar
- B. O'Sullivan and J. Tibell. Scalable i/o event handling for ghc. SIGPLAN Not., 45(11):103--108, Sept. 2010. Google ScholarDigital Library
- H. Pan, B. Hindman, and K. Asanović. Composing parallel software efficiently with Lithe. SIGPLAN Not., 45:376--387, June 2010. Google ScholarDigital Library
- J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media, July 2007. Google ScholarDigital Library
- T. Rompf, I. Maier, and M. Odersky. Implementing first-class polymorphic delimited continuations by a type-directed selective cps-transform. SIGPLAN Not., 44:317--328, Aug. 2009. Google ScholarDigital Library
- D. Spoonhower, G. E. Blelloch, P. B. Gibbons, and R. Harper. Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, SPAA '09, pages 91--100, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- D. Spoonhower, G. E. Blelloch, R. Harper, and P. B. Gibbons. Space profiling for parallel functional programs. In Proceedings of the 13th ACM SIGPLAN international conference on Functional programming, ICFP '08, pages 253--264, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- J. Svensson, M. Sheeran, and K. Claessen. Obsidian: A domain specific embedded language for parallel programming of graphics processors. In S.-B. Scholz and O. Chitil, editors, Implementation and Application of Functional Languages, volume 5836 of Lecture Notes in Computer Science, pages 156--173. Springer Berlin / Heidelberg, 2011. Google ScholarDigital Library
- D. Syme, T. Petricek, and D. Lomov. The f# asynchronous programming model. In Proceedings of the 13th international conference on Practical aspects of declarative languages, PADL'11, pages 175--189, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
Index Terms
- A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud
Recommendations
A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud
ICFP '12: Proceedings of the 17th ACM SIGPLAN international conference on Functional programmingModern parallel computing hardware demands increasingly specialized attention to the details of scheduling and load balancing across heterogeneous execution resources that may include GPU and cloud environments, in addition to traditional CPUs. Many ...
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and ManycoresTraditional work-stealing schedulers perform poorly in multi-programmed multi-core architectures, because all the programs tend to use all the cores and thus incur serious core contention. To relieve this problem, this paper proposes a Demand-aware Work-...
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and ManycoresTraditional work-stealing schedulers perform poorly in multi-programmed multi-core architectures, because all the programs tend to use all the cores and thus incur serious core contention. To relieve this problem, this paper proposes a Demand-aware Work-...
Comments