skip to main content
research-article

A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud

Published:09 September 2012Publication History
Skip Abstract Section

Abstract

Modern parallel computing hardware demands increasingly specialized attention to the details of scheduling and load balancing across heterogeneous execution resources that may include GPU and cloud environments, in addition to traditional CPUs. Many existing solutions address the challenges of particular resources, but do so in isolation, and in general do not compose within larger systems. We propose a general, composable abstraction for execution resources, along with a continuation-based meta-scheduler that harnesses those resources in the context of a deterministic parallel programming library for Haskell. We demonstrate performance benefits of combined CPU/GPU scheduling over either alone, and of combined multithreaded/distributed scheduling over existing distributed programming approaches for Haskell.

References

  1. Code for cilk runtime system. https://github.com/mirrors/gcc/tree/cilkplus/libcilkrts.Google ScholarGoogle Scholar
  2. Intel Cilk Plus. http://software.intel.com/en-us/articles/intel-cilk-plus/.Google ScholarGoogle Scholar
  3. Openmp article. http://intel.ly/9h7c7B.Google ScholarGoogle Scholar
  4. Threading Building Blocks Reference Manual, 2011. http://threadingbuildingblocks.org/documentation.php.Google ScholarGoogle Scholar
  5. N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures, SPAA '98, pages 119--129, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Blagodurov, S. Zhuravlev, A. Fedorova, and A. Kamali. A case for numa-aware contention management on multicore systems. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pages 557--558, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Blelloch, P. Gibbons, Y. Matias, and G. Narlikar. Space-efficient scheduling of parallelism with synchronization variables. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 12--23, Newport, RI, jun 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30:207--216, August 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. M. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating haskell array codes with multicore gpus. In Proceedings of the sixth workshop on Declarative aspects of multicore programming, DAMP '11, pages 3--14, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Chhugani, A. D. Nguyen, V. W. Lee, W. Macy, M. Hagog, Y.-K. Chen, A. Baransi, S. Kumar, and P. Dubey. Efficient implementation of sorting on multi-core simd cpu architecture. PVLDB, 1(2):1313--1324, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Claessen. A poor man's concurrency monad. J. Funct. Program., 9:313--323, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Doel. The vector-algorithms package. http://hackage.haskell.org/package/vector-algorithms. Efficient algorithms for vector arrays.Google ScholarGoogle Scholar
  13. M. Dybdal. The hopencl package. http://hackage.haskell.org/package/hopencl. Haskell bindings for OpenCL.Google ScholarGoogle Scholar
  14. J. Epstein, A. P. Black, and S. Peyton-Jones. Towards haskell in the cloud. In Proceedings of the 4th ACM symposium on Haskell, Haskell '11, pages 118--129, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Fluet, M. Rainey, J. Reppy, A. Shaw, and Y. Xiao. Manticore: a heterogeneous parallel language. In Proceedings of the 2007 workshop on Declarative aspects of multicore programming, DAMP '07, pages 37--44, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. T. Haynes, D. P. Friedman, and M. Wand. Obtaining coroutines with continuations. Computer Languages, 11(3.4):143--153, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Lauterback, Q. Mo, and D. Manocha. Work distribution methods on GPUs. University of North Carolina Technical Report TR009-16.Google ScholarGoogle Scholar
  18. D. Lea. A java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA '00, pages 36--43, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. SIGPLAN Not., 44:227--242, Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Li, S. Marlow, S. Peyton Jones, and A. Tolmach. Lightweight concurrency primitives for ghc. In Proceedings of the ACM SIGPLAN workshop on Haskell workshop, Haskell '07, pages 107--118, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Li and S. Zdancewic. Combining events and threads for scalable network services implementation and evaluation of monadic, application-level concurrency primitives. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, pages 189--199, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. P. Magalhães, A. Dijkstra, J. Jeuring, and A. Löh. A generic deriving mechanism for haskell. In Proceedings of the third ACM Haskell symposium on Haskell, Haskell '10, pages 37--48, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Maier, P. Trinder, and H.-W. Loidl. Implementing a High-Level Distributed-Memory parallel Haskell in Haskell, 2011. Submitted to IFL 2011.Google ScholarGoogle Scholar
  24. G. Mainland and G. Morrisett. Nikola: embedding compiled gpu functions in haskell. In Proceedings of the third ACM Haskell symposium on Haskell, Haskell '10, pages 67--78, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Marlow, R. Newton, and S. Peyton Jones. A monad for deterministic parallelism. In Proceedings of the 4th ACM symposium on Haskell, Haskell '11, pages 71--82, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Marlow, S. Peyton Jones, and S. Singh. Runtime support for multicore haskell. In Proceedings of the 14th ACM SIGPLAN international conference on Functional programming, ICFP '09, pages 65--78, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. L. McDonell. cuda. http://hackage.haskell.org/package/cuda. FFI binding to the CUDA interface for programming NVIDIA GPUs.Google ScholarGoogle Scholar
  28. C. Newburn, B. So, Z. Liu, M. McCool, A. Ghuloum, S. Toit, Z. G. Wang, Z. H. Du, Y. Chen, G. Wu, P. Guo, Z. Liu, and D. Zhang. Intel's array building blocks: A retargetable, dynamic compiler and embedded language. In Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on, pages 224 --235, april 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Newton, C.-P. Chen, and S. Marlow. Intel Concurrent Collections for Haskell, March, 2011. MIT CSAIL Technical Report, MIT-CSAIL-TR-2011-015.Google ScholarGoogle Scholar
  30. B. O'Sullivan and J. Tibell. Scalable i/o event handling for ghc. SIGPLAN Not., 45(11):103--108, Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Pan, B. Hindman, and K. Asanović. Composing parallel software efficiently with Lithe. SIGPLAN Not., 45:376--387, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O'Reilly Media, July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Rompf, I. Maier, and M. Odersky. Implementing first-class polymorphic delimited continuations by a type-directed selective cps-transform. SIGPLAN Not., 44:317--328, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Spoonhower, G. E. Blelloch, P. B. Gibbons, and R. Harper. Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, SPAA '09, pages 91--100, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Spoonhower, G. E. Blelloch, R. Harper, and P. B. Gibbons. Space profiling for parallel functional programs. In Proceedings of the 13th ACM SIGPLAN international conference on Functional programming, ICFP '08, pages 253--264, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Svensson, M. Sheeran, and K. Claessen. Obsidian: A domain specific embedded language for parallel programming of graphics processors. In S.-B. Scholz and O. Chitil, editors, Implementation and Application of Functional Languages, volume 5836 of Lecture Notes in Computer Science, pages 156--173. Springer Berlin / Heidelberg, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Syme, T. Petricek, and D. Lomov. The f# asynchronous programming model. In Proceedings of the 13th international conference on Practical aspects of declarative languages, PADL'11, pages 175--189, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 47, Issue 9
          ICFP '12
          September 2012
          368 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2398856
          Issue’s Table of Contents
          • cover image ACM Conferences
            ICFP '12: Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
            September 2012
            392 pages
            ISBN:9781450310543
            DOI:10.1145/2364527

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 September 2012

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader