research-article

Public Access

PPL: an abstract runtime system for hybrid parallel programming

Authors:
Alex Brooks

University of Illinois at Urbana-Champaign, IL

University of Illinois at Urbana-Champaign, IL
View Profile

,
Hoang-Vu Dang

University of Illinois at Urbana-Champaign, IL

University of Illinois at Urbana-Champaign, IL
View Profile

,
Nikoli Dryden

University of Illinois at Urbana-Champaign, IL

University of Illinois at Urbana-Champaign, IL
View Profile

,
Marc Snir

University of Illinois at Urbana-Champaign, IL and Mathematics and Computer Science Division Argonne National Laboratory, IL

University of Illinois at Urbana-Champaign, IL and Mathematics and Computer Science Division Argonne National Laboratory, IL
View Profile

ESPM '15: Proceedings of the First International Workshop on Extreme Scale Programming Models and MiddlewareNovember 2015Pages 2–9https://doi.org/10.1145/2832241.2832246

Published:15 November 2015Publication History

ESPM '15: Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware

Pages 2–9

ABSTRACT

Hardware trends indicate that supercomputers will see fast growing intra-node parallelism. Future programming models will need to carefully manage the interaction between inter- and intra-node parallelism to cope with this evolution. There exist many programming models which expose both levels of parallelism. However, they do not scale well as per-node thread counts rise and there is limited interoperability between threading and communication, leading to unnecessary software overheads and an increased amount of unnecessary communication. To address this, it is necessary to understand the limitations of current models and develop new approaches.

We propose a new runtime system design, PPL, which abstracts important high-level concepts of a typical parallel system for distributed-memory machines. By modularizing these elements, layers can be tested to better understand the needs of future programming models. We present details of the design and development implementation of PPL in C++11 and evaluate the performance of several different module implementations through micro-benchmarks and three applications: Barnes-Hut, Monte Carlo particle tracking, and a sparse-triangular solver.

References

Mellanox technologies. http://www.mellanox.com.Google Scholar
B. Acun, A. Gupta, N. Jain, et al. Parallel programming with migratable objects: Charm++ in practice. In Supercomputing 2014, pages 647--658. IEEE, 2014. Google ScholarDigital Library
A. Amer, H. Lu, Y. Wei, et al. MPI+Threads: Runtime contention and remedies. In PPoPP 2015, pages 239--248, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Toward efficient support for multithreaded mpi communication. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 120--129. Springer, 2008. Google ScholarDigital Library
P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Fine-grained multithreading support for hybrid threaded mpi programming. International Journal of High Performance Computing Applications, 24(1):49--57, 2010. Google ScholarDigital Library
M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: expressing locality and independence with logical regions. In Supercomputing 2012, page 66. IEEE Computer Society Press, 2012. Google ScholarDigital Library
K. Bergman, S. Borkar, D. Campbell, et al. Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep, 15, 2008.Google Scholar
D. Bonachea. GASNet specification, v1.8. http://gasnet.lbl.gov/#spec, November 2008.Google Scholar
P. Charles, C. Grothoff, V. Saraswat, et al. X10: an object-oriented approach to non-uniform cluster computing. ACM SIGPlan Notices, 40(10):519--538, 2005. Google ScholarDigital Library
C. Coarfa, Y. Dotsenko, J. Mellor-Crummey, et al. An evaluation of global address space languages: Co-array Fortran and Unified Parallel C. In PPoPP 2005, pages 36--47. ACM, 2005. Google ScholarDigital Library
K. Devine, E. Boman, R. Heaphy, et al. Zoltan data management services for parallel dynamic applications. Computing in Science & Engineering, 4(2):90--96, 2002. Google ScholarDigital Library
G. Dózsa, S. Kumar, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, J. Ratterman, and R. Thakur. Enabling concurrent multithreaded MPI communication on multicore petascale systems. In Recent Advances in the Message Passing Interface, pages 11--20. Springer, 2010. Google ScholarDigital Library
K. G. Felker, A. R. Siegel, K. S. Smith, P. K. Romano, and B. Forget. The Energy Band Memory Server Algorithm for Parallel Monte Carlo Transport Calculations. In Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo, 2013.Google Scholar
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In ACM Sigplan Notices, volume 33, pages 212--223. ACM, 1998. Google ScholarDigital Library
W. Gropp and R. Thakur. Thread-safety in an MPI implementation: Requirements and analysis. Parallel Computing, 33(9):595--604, 2007. Google ScholarDigital Library
P. Husbands, C. Iancu, and K. Yelick. A performance analysis of the berkeley upc compiler. In ICS '03, pages 63--73. ACM, 2003. Google ScholarDigital Library
H. Kaiser, T. Heller, B. Adelstein-Lelbach, A. Serio, and D. Fey. HPX: A task based programming model in a global address space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, page 6. ACM, 2014. Google ScholarDigital Library
L. V. Kale and S. Krishnan. CHARM++: A portable concurrent object oriented system based on C++. In OOPSLA '93, pages 91--108, New York, NY, USA, 1993. ACM. Google ScholarDigital Library
T. Mattson, R. Cledat, Z. Budimlic, et al. OCR: The open community runtime interface version 1.1.0. http://xstack.exascale-tech.com/git/public?p=xstack.git;a=blob;f=ocr/spec/ocr-1.0.0.pdf, June 2015.Google Scholar
MPI Forum. MPI: A message-passing interface standard version 3.0. http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf, Sept. 2012.Google Scholar
R. W. Numrich and J. Reid. Co-Array Fortran for parallel programming. In ACM Sigplan Fortran Forum, volume 17, pages 1--31. ACM, 1998. Google ScholarDigital Library
OpenMP Architedcture Review Board. OpenMP application program interface version 4.0. http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf, July 213.Google Scholar
J. Reinders. Intel threading building blocks: outfitting C++ for multi-core processor parallelism. O'Reilly Media, Inc., 2007. Google ScholarDigital Library
A. Righi. umalloc.c file reference. http://minirighi.sourceforge.net/html/umalloc_8c.html. Accessed: 2014-10-17.Google Scholar
J. K. Salmon. Parallel hierarchical N-body methods. PhD thesis, California Institute of Technology, 1991. Google ScholarDigital Library
V. Sarkar, W. Harrod, and A. E. Snavely. Software challenges in extreme scale systems. In Journal of Physics: Conference Series, volume 180, page 012045. IOP Publishing, 2009.Google Scholar
S. Seo, A. Amer, P. Balaji, P. Beckman, C. Bordage, G. Bosilca, A. Brooks, A. CastellÃş, D. Genet, T. Herault, P. Jindal, L. V. Kale, S. Krishnamoorthy, J. Lifflander, H. Lu, E. Meneses, M. Snir, and Y. Sun. Argobots: A lightweight low-level threading/tasking framework. http://collab.mcs.anl.gov/display/ARGOBOTS/, 2015.Google Scholar
M. Si, A. J. Peña, P. Balaji, et al. MT-MPI: Multithreaded MPI for many-core environments. In ICS '14, pages 125--134. ACM, 2014. Google ScholarDigital Library
H. Tang and T. Yang. Optimizing threaded MPI execution on SMP clusters. In ICS '01, pages 381--392. ACM, 2001. Google ScholarDigital Library
Texas Advanced Computing Center. Stampede. portal.tacc.utexas.edu/user-guides/stampede. Accessed: 2015-01-13.Google Scholar
E. Totoni, M. T. Heath, and L. V. Kale. Structure-adaptive parallel solution of sparse triangular linear systems. Parallel Computing, 40(9):454--470, 2014.Google ScholarDigital Library
UPC Consortium. UPC language specifications v1.3. http://upc.lbl.gov/publications/upc-spec-1.3.pdf, 2013.Google Scholar
V. M. Weaver. Linux perf_event features and overhead. In The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, page 80, 2013.Google Scholar
K. B. Wheeler, R. C. Murphy, and D. Thain. Qthreads: An API for programming with millions of lightweight threads. In IPDPS 2008, pages 1--8. IEEE, 2008.Google ScholarCross Ref
J. Zhang. MPI-3 EBMS. http://github.com/ANL-CESAR/EBMS, 2015.Google Scholar
J. Zhang, B. Behzad, and M. Snir. Design of a multithreaded Barnes-Hut algorithm for multicore clusters. IEEE Transactions on Parallel and Distributed Systems, 26(7):31--36, 2015.Google ScholarCross Ref

Index Terms

PPL: an abstract runtime system for hybrid parallel programming
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language features
        Concurrent programming structures
        Frameworks

Recommendations

Productivity and performance using partitioned global address space languages
PASCO '07: Proceedings of the 2007 international workshop on Parallel symbolic computation

Partitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C (UPC) is an extension of ISO C defined by a ...
Read More
Asynchronous PGAS runtime for Myrinet networks
PGAS '10: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model

PGAS languages aim to enhance productivity for large scale systems. The IBM Asynchronous PGAS runtime (APGAS) supports various high productivity programming languages including UPC, X10 and CAF. The runtime has been designed for scalability and ...
Read More
DART-MPI: An MPI-based Implementation of a PGAS Runtime System
PGAS '14: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models

A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESPM '15: Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware
November 2015
58 pages
ISBN:9781450339964
DOI:10.1145/2832241
Program Chairs:
Dhabaleswar K. (DK) Panda
The Ohio State University
,
Karl Schulz
Intel Corporation
,
Khaled Hamidouche
The Ohio State University
,
Hari Subramoni
The Ohio State University
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
PGAS
RDMA
distributed-memory parallelism
multithreading
one-sided communication
programming models
Qualifiers
- research-article
Conference

Acceptance Rates
ESPM '15 Paper Acceptance Rate5of10submissions,50%Overall Acceptance Rate5of10submissions,50%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 173
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PPL: an abstract runtime system for hybrid parallel programming

ESPM '15: Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware

ABSTRACT

References

Cited By

Index Terms

Recommendations

Productivity and performance using partitioned global address space languages

Asynchronous PGAS runtime for Myrinet networks

DART-MPI: An MPI-based Implementation of a PGAS Runtime System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PPL: an abstract runtime system for hybrid parallel programming

ESPM '15: Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware

ABSTRACT

References

Cited By

Index Terms

Recommendations

Productivity and performance using partitioned global address space languages

Asynchronous PGAS runtime for Myrinet networks

DART-MPI: An MPI-based Implementation of a PGAS Runtime System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media