research-article

A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud

Authors:
Adam Foltzer

Indiana University, Bloomington, IN, USA

Indiana University, Bloomington, IN, USA
View Profile

,
Abhishek Kulkarni

Indiana University, Bloomington, IN, USA

Indiana University, Bloomington, IN, USA
View Profile

,
Rebecca Swords

Indiana University, Bloomington, IN, USA

Indiana University, Bloomington, IN, USA
View Profile

,
Sajith Sasidharan

Indiana University, Bloomington, IN, USA

Indiana University, Bloomington, IN, USA
View Profile

,
Eric Jiang

Indiana University, Bloomington, IN, USA

Indiana University, Bloomington, IN, USA
View Profile

,
Ryan Newton

Indiana University, Bloomington, IN, USA

Indiana University, Bloomington, IN, USA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 47 Issue 9September 2012pp 235–246https://doi.org/10.1145/2398856.2364562

Published:09 September 2012Publication History

ACM SIGPLAN Notices

Abstract

Modern parallel computing hardware demands increasingly specialized attention to the details of scheduling and load balancing across heterogeneous execution resources that may include GPU and cloud environments, in addition to traditional CPUs. Many existing solutions address the challenges of particular resources, but do so in isolation, and in general do not compose within larger systems. We propose a general, composable abstraction for execution resources, along with a continuation-based meta-scheduler that harnesses those resources in the context of a deterministic parallel programming library for Haskell. We demonstrate performance benefits of combined CPU/GPU scheduling over either alone, and of combined multithreaded/distributed scheduling over existing distributed programming approaches for Haskell.

References

Code for cilk runtime system. https://github.com/mirrors/gcc/tree/cilkplus/libcilkrts.Google Scholar
Intel Cilk Plus. http://software.intel.com/en-us/articles/intel-cilk-plus/.Google Scholar
Openmp article. http://intel.ly/9h7c7B.Google Scholar
Threading Building Blocks Reference Manual, 2011. http://threadingbuildingblocks.org/documentation.php.Google Scholar
N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures, SPAA '98, pages 119--129, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
S. Blagodurov, S. Zhuravlev, A. Fedorova, and A. Kamali. A case for numa-aware contention management on multicore systems. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pages 557--558, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
G. Blelloch, P. Gibbons, Y. Matias, and G. Narlikar. Space-efficient scheduling of parallelism with synchronization variables. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 12--23, Newport, RI, jun 1997. Google ScholarDigital Library
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30:207--216, August 1995. Google ScholarDigital Library
M. M. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating haskell array codes with multicore gpus. In Proceedings of the sixth workshop on Declarative aspects of multicore programming, DAMP '11, pages 3--14, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
J. Chhugani, A. D. Nguyen, V. W. Lee, W. Macy, M. Hagog, Y.-K. Chen, A. Baransi, S. Kumar, and P. Dubey. Efficient implementation of sorting on multi-core simd cpu architecture. PVLDB, 1(2):1313--1324, 2008. Google ScholarDigital Library
K. Claessen. A poor man's concurrency monad. J. Funct. Program., 9:313--323, May 1999. Google ScholarDigital Library
D. Doel. The vector-algorithms package. http://hackage.haskell.org/package/vector-algorithms. Efficient algorithms for vector arrays.Google Scholar
M. Dybdal. The hopencl package. http://hackage.haskell.org/package/hopencl. Haskell bindings for OpenCL.Google Scholar
J. Epstein, A. P. Black, and S. Peyton-Jones. Towards haskell in the cloud. In Proceedings of the 4th ACM symposium on Haskell, Haskell '11, pages 118--129, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
M. Fluet, M. Rainey, J. Reppy, A. Shaw, and Y. Xiao. Manticore: a heterogeneous parallel language. In Proceedings of the 2007 workshop on Declarative aspects of multicore programming, DAMP '07, pages 37--44, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
C. T. Haynes, D. P. Friedman, and M. Wand. Obtaining coroutines with continuations. Computer Languages, 11(3.4):143--153, 1986. Google ScholarDigital Library
C. Lauterback, Q. Mo, and D. Manocha. Work distribution methods on GPUs. University of North Carolina Technical Report TR009-16.Google Scholar
D. Lea. A java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA '00, pages 36--43, New York, NY, USA, 2000. ACM. Google ScholarDigital Library
D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. SIGPLAN Not., 44:227--242, Oct. 2009. Google ScholarDigital Library
P. Li, S. Marlow, S. Peyton Jones, and A. Tolmach. Lightweight concurrency primitives for ghc. In Proceedings of the ACM SIGPLAN workshop on Haskell workshop, Haskell '07, pages 107--118, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
P. Li and S. Zdancewic. Combining events and threads for scalable network services implementation and evaluation of monadic, application-level concurrency primitives. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, pages 189--199, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
J. P. Magalhães, A. Dijkstra, J. Jeuring, and A. Löh. A generic deriving mechanism for haskell. In Proceedings of the third ACM Haskell symposium on Haskell, Haskell '10, pages 37--48, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
P. Maier, P. Trinder, and H.-W. Loidl. Implementing a High-Level Distributed-Memory parallel Haskell in Haskell, 2011. Submitted to IFL 2011.Google Scholar
G. Mainland and G. Morrisett. Nikola: embedding compiled gpu functions in haskell. In Proceedings of the third ACM Haskell symposium on Haskell, Haskell '10, pages 67--78, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
S. Marlow, R. Newton, and S. Peyton Jones. A monad for deterministic parallelism. In Proceedings of the 4th ACM symposium on Haskell, Haskell '11, pages 71--82, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
S. Marlow, S. Peyton Jones, and S. Singh. Runtime support for multicore haskell. In Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, ICFP '09, pages 65--78, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
T. L. McDonell. cuda. http://hackage.haskell.org/package/cuda. FFI binding to the CUDA interface for programming NVIDIA GPUs.Google Scholar
C. Newburn, B. So, Z. Liu, M. McCool, A. Ghuloum, S. Toit, Z. G. Wang, Z. H. Du, Y. Chen, G. Wu, P. Guo, Z. Liu, and D. Zhang. Intel's array building blocks: A retargetable, dynamic compiler and embedded language. In Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on, pages 224 --235, april 2011. Google ScholarDigital Library
R. Newton, C.-P. Chen, and S. Marlow. Intel Concurrent Collections for Haskell, March, 2011. MIT CSAIL Technical Report, MIT-CSAIL-TR-2011-015.Google Scholar
B. O'Sullivan and J. Tibell. Scalable i/o event handling for ghc. SIGPLAN Not., 45(11):103--108, Sept. 2010. Google ScholarDigital Library
H. Pan, B. Hindman, and K. Asanović. Composing parallel software efficiently with Lithe. SIGPLAN Not., 45:376--387, June 2010. Google ScholarDigital Library
J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media, July 2007. Google ScholarDigital Library
T. Rompf, I. Maier, and M. Odersky. Implementing first-class polymorphic delimited continuations by a type-directed selective cps-transform. SIGPLAN Not., 44:317--328, Aug. 2009. Google ScholarDigital Library
D. Spoonhower, G. E. Blelloch, P. B. Gibbons, and R. Harper. Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, SPAA '09, pages 91--100, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
D. Spoonhower, G. E. Blelloch, R. Harper, and P. B. Gibbons. Space profiling for parallel functional programs. In Proceedings of the 13th ACM SIGPLAN international conference on Functional programming, ICFP '08, pages 253--264, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
J. Svensson, M. Sheeran, and K. Claessen. Obsidian: A domain specific embedded language for parallel programming of graphics processors. In S.-B. Scholz and O. Chitil, editors, Implementation and Application of Functional Languages, volume 5836 of Lecture Notes in Computer Science, pages 156--173. Springer Berlin / Heidelberg, 2011. Google ScholarDigital Library
D. Syme, T. Petricek, and D. Lomov. The f# asynchronous programming model. In Proceedings of the 13th international conference on Practical aspects of declarative languages, PADL'11, pages 175--189, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library

Index Terms

A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages
        Distributed programming languages
        Parallel programming languages

Recommendations

A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud
ICFP '12: Proceedings of the 17th ACM SIGPLAN international conference on Functional programming

Modern parallel computing hardware demands increasingly specialized attention to the details of scheduling and load balancing across heterogeneous execution resources that may include GPU and cloud environments, in addition to traditional CPUs. Many ...
Read More
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Traditional work-stealing schedulers perform poorly in multi-programmed multi-core architectures, because all the programs tend to use all the cores and thus incur serious core contention. To relieve this problem, this paper proposes a Demand-aware Work-...
Read More
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Traditional work-stealing schedulers perform poorly in multi-programmed multi-core architectures, because all the programs tend to use all the cores and thus incur serious core contention. To relieve this problem, this paper proposes a Demand-aware Work-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 47, Issue 9
ICFP '12
September 2012
368 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2398856
Issue’s Table of Contents
ICFP '12: Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
September 2012
392 pages
ISBN:9781450310543
DOI:10.1145/2364527
General Chair:
Peter Thiemann
University of Freiburg, Germany
,
Program Chair:
Robby Findler
Northwestern University, USA
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 September 2012
Check for updates
Author Tags
composability
gpu
haskell
work-stealing
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 346
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud

DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures

DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures