research-article

Infrastructure and API Extensions for Elastic Execution of MPI Applications

Authors:
Isaías Comprés

Technical University of Munich (TUM), Institute of Informatics, Garching, Germany

Technical University of Munich (TUM), Institute of Informatics, Garching, Germany
View Profile

,
Ao Mo-Hellenbrand

Technical University of Munich (TUM), Institute of Informatics, Garching, Germany

Technical University of Munich (TUM), Institute of Informatics, Garching, Germany
View Profile

,
Michael Gerndt

Technical University of Munich (TUM), Institute of Informatics, Garching, Germany

Technical University of Munich (TUM), Institute of Informatics, Garching, Germany
View Profile

,
Hans-Joachim Bungartz

Technical University of Munich (TUM), Institute of Informatics, Garching, Germany

Technical University of Munich (TUM), Institute of Informatics, Garching, Germany
View Profile

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group MeetingSeptember 2016Pages 82–97https://doi.org/10.1145/2966884.2966917

Published:25 September 2016Publication History

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

Pages 82–97

ABSTRACT

Dynamic Processes support was added to MPI in version 2.0 of the standard. This feature of MPI has not been widely used by application developers in part due to the performance cost and limitations of the spawn operation. In this paper, we propose an extension to MPI that consists of four new operations. These operations allow an application to be initialized in an elastic mode of execution and enter an adaptation window when necessary, where resources are incorporated into or released from the application's world communicator. A prototype solution based on the MPICH library and the SLURM resource manager is presented and evaluated alongside an elastic scientific application that makes use of the new MPI extensions. The cost of these new operations is shown to be negligible due mainly to the latency hiding design, leaving the application's time for data redistribution as the only significant performance cost.

References

AMPI: Adaptive Message Passing Interface. http://charm.cs.illinois.edu/research/ampi, 2016. {Online}.Google Scholar
Caliper: Application Introspection System. http://computation.llnl.gov/projects/caliper, 2016. {Online}.Google Scholar
Charm++: Parallel Programming with Migratable Objects. http://charm.cs.illinois.edu/research/charm, 2016. {Online}.Google Scholar
MPICH: High-Performance Portable MPI. http://www.mpich.org, 2016. {Online}.Google Scholar
OpenFabrics Alliance. http://openfabrics.org/, 2016. {Online}.Google Scholar
Parallel Virtual Machine (PVM). http://www.csm.ornl.gov/pvm/, 2016. {Online}.Google Scholar
PMI Exascale. http://www.open-mpi.org/projects/pmix/, 2016. {Online}.Google Scholar
SchedMD. http://www.schedmd.com/, 2016. {Online}.Google Scholar
Simple Linux Utility For Resource Management. http://slurm.schedmd.com/, 2016. {Online}.Google Scholar
Top 500 Supercomputers. http://www.top500.org/, 2016. {Online}.Google Scholar
Transregional Research Center InvasIC. http://www.invasic.de, 2016. {Online}.Google Scholar
X10: Performance and Productivity at Scale. http://x10-lang.org/, 2016. {Online}.Google Scholar
D. J. Acheson. Elementary fluid dynamics. Oxford applied mathematics and computing science series. Clarendon Press; Oxford University Press, Oxford: New York, 1990.Google Scholar
D. H. Ahn, J. Garlick, M. Grondona, D. Lipari, B. Springmeyer, and M. Schulz. Flux: A next-generation resource management framework for large hpc centers. In 10th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems. IEEE Computer Society, 2014. Google ScholarDigital Library
M. Besta and T. Hoefler. Fault tolerance for remote memory access programming models. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pages 37--48. ACM, 2014. Google ScholarDigital Library
L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, Y. Marzouk, L. Tenorio, B. van Bloemen Waanders, and K. Willcox. Large-Scale Inverse Problems and Quantification of Uncertainty. Wiley, Chichester, West Sussex, UK, first edition, 2011.Google Scholar
W. Bland, A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. J. Dongarra. Recent Advances in the Message Passing Interface: 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, chapter An Evaluation of User-Level Failure Mitigation Support in MPI, pages 193--203. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. Google ScholarDigital Library
A. Bouteiller, G. Bosilca, and J. J. Dongarra. Plan b: Interruption of ongoing mpi operations to support failure recovery. In Proceedings of the 22nd European MPI Users' Group Meeting, page 11. ACM, 2015. Google ScholarDigital Library
H. Bungartz, M. Griebel, and U. RÃijde. Extrapolation, combination, and sparse grid techniques for elliptic boundary value problems. Computer Methods in Applied Mechanics and Engineering, 116(1):243--252, 1994.Google ScholarCross Ref
F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, and M. Snir. Toward exascale resilience: 2014 update. Supercomputing frontiers and innovations, 1(1), 2014. Google ScholarDigital Library
J. Garcke and M. Griebel, editors. Sparse Grids and Applications. Number 88 in Lecture Notes in Computational Science and Engineering. Springer, Heidelberg, 2013. Google ScholarDigital Library
W. Gropp. MPI at exascale: Challenges for data structures and algorithms. In M. Ropo, J. Westerholm, and J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface (16th PVM/MPI'09), volume 5759 of Lecture Notes in Computer Science (LNCS), page 3. Springer-Verlag (New York), Espoo, Finland, Sept. 2009. Google ScholarDigital Library
W. Gropp and M. Snir. Programming for exascale computers. Computing in Science and Engineering, 15(6):27--35, 2013. Google ScholarDigital Library
J. Kaipo and E. Somersalo. Statistical and Computational Inverse Problems. Springer, New York, NY, USA, 2005 edition, 2005.Google ScholarCross Ref
A. Klimke and B. Wohlmuth. Algorithm 847: Spinterp: Piecewise multilinear hierarchical sparse grid interpolation in matlab. ACM Trans. Math. Softw., 31(4):561--579, Dec. 2005. Google ScholarDigital Library
I. Laguna, D. F. Richards, T. Gamblin, M. Schulz, B. R. de Supinski, K. Mohror, and H. Pritchard. Evaluating and extending user-level fault tolerance in mpi applications. International Journal of High Performance Computing Applications, page 1094342015623623, 2016.Google ScholarCross Ref
Leibniz-Rechenzentrum. Leibniz supercomputing centre of the bavarian academy of sciences and humanities. https://www.lrz.de/services/compute/supermuc, March 2016.Google Scholar
A. Marathe, P. E. Bailey, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski. A run-time system for power-constrained hpc applications. In High Performance Computing, pages 394--408. Springer International Publishing, 2015.Google ScholarCross Ref
A. Mo-Hellenbrand. Sparse grid interpolants as surrogate models in statistical inverse problems. Master's thesis, Technische Universität München, 2013.Google Scholar
P. Neumann, A. Atanasov, and C. Kowitz. Praktikum wissenschaftliches rechnen computational fluid dynamics, 2012. Lecture Notes.Google Scholar
D. Pflüger. Spatially adaptive sparse grids for high-dimensional problems. Verlag Dr. Hut, 2010.Google Scholar
S. Prabhakaran, M. Iqbal, S. Rinke, C. Windisch, and F. Wolf. A batch system with fair scheduling for evolving applications. In ICPP, pages 351--360. IEEE Computer Society, 2014. Google ScholarDigital Library
B. Rountree, D. H. Ahn, B. R. de Supinski, D. K. Lowenthal, and M. Schulz. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In IPDPS Workshops, pages 947--953. IEEE Computer Society, 2012. Google ScholarDigital Library
O. Sarood, A. Langer, L. Kalé, B. Rountree, and B. De Supinski. Optimizing power allocation to cpu and memory subsystems in overprovisioned hpc systems. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on, pages 1--8. IEEE, 2013.Google ScholarCross Ref
R. Thakur, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, T. Hoefler, S. Kumar, E. Lusk, and J. L. Traeff. MPI at Exascale. In Procceedings of SciDAC 2010, Jun. 2010.Google Scholar

Recommendations

A Large-Scale Malleable Tsunami Simulation Realized on an Elastic MPI Infrastructure
CF'17: Proceedings of the Computing Frontiers Conference

Realization of resource awareness and elasticity in hardware and software is an answer to many problems and challenges we are facing in High Performance Computing (HPC) today. Resource utilization inefficiency is a real problem in current HPC systems ...
Read More
MPI-StarT: delivering network performance to numerical applications
SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing

We describe an MPI implementation for a cluster of SMPs interconnected by a high-performance interconnect. This work is a collaboration between a numerical applications programmer and a cluster interconnect architect. The collaboration started with the ...
Read More
Performance Evaluation of MPI Implementations and MPI-Based Parallel ELLPACK Solvers
MPIDC '96: Proceedings of the Second MPI Developers Conference

Abstract: We are concerned with the parallelization of finite element mesh generation and its decomposition, and the parallel solution of sparse algebraic equations which are obtained from the parallel discretization of second order elliptic partial ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting
September 2016
225 pages
ISBN:9781450342346
DOI:10.1145/2966884
General Chair:
Jack Dongarra,
Program Chairs:
Daniel Holmes,
Antonia Collis,
Jesper Larsson Träff,
Lorna Smith
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 September 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Elastic Computing
MPI
MPICH
Malleable Applications
Message Passing
Resource Aware Computing
SLURM
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate66of139submissions,47%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 280
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Infrastructure and API Extensions for Elastic Execution of MPI Applications

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

ABSTRACT

References

Cited By

Recommendations

A Large-Scale Malleable Tsunami Simulation Realized on an Elastic MPI Infrastructure

MPI-StarT: delivering network performance to numerical applications

Performance Evaluation of MPI Implementations and MPI-Based Parallel ELLPACK Solvers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Infrastructure and API Extensions for Elastic Execution of MPI Applications

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

ABSTRACT

References

Cited By

Recommendations

A Large-Scale Malleable Tsunami Simulation Realized on an Elastic MPI Infrastructure

MPI-StarT: delivering network performance to numerical applications

Performance Evaluation of MPI Implementations and MPI-Based Parallel ELLPACK Solvers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media