ABSTRACT
Dynamic Processes support was added to MPI in version 2.0 of the standard. This feature of MPI has not been widely used by application developers in part due to the performance cost and limitations of the spawn operation. In this paper, we propose an extension to MPI that consists of four new operations. These operations allow an application to be initialized in an elastic mode of execution and enter an adaptation window when necessary, where resources are incorporated into or released from the application's world communicator. A prototype solution based on the MPICH library and the SLURM resource manager is presented and evaluated alongside an elastic scientific application that makes use of the new MPI extensions. The cost of these new operations is shown to be negligible due mainly to the latency hiding design, leaving the application's time for data redistribution as the only significant performance cost.
- AMPI: Adaptive Message Passing Interface. http://charm.cs.illinois.edu/research/ampi, 2016. {Online}.Google Scholar
- Caliper: Application Introspection System. http://computation.llnl.gov/projects/caliper, 2016. {Online}.Google Scholar
- Charm++: Parallel Programming with Migratable Objects. http://charm.cs.illinois.edu/research/charm, 2016. {Online}.Google Scholar
- MPICH: High-Performance Portable MPI. http://www.mpich.org, 2016. {Online}.Google Scholar
- OpenFabrics Alliance. http://openfabrics.org/, 2016. {Online}.Google Scholar
- Parallel Virtual Machine (PVM). http://www.csm.ornl.gov/pvm/, 2016. {Online}.Google Scholar
- PMI Exascale. http://www.open-mpi.org/projects/pmix/, 2016. {Online}.Google Scholar
- SchedMD. http://www.schedmd.com/, 2016. {Online}.Google Scholar
- Simple Linux Utility For Resource Management. http://slurm.schedmd.com/, 2016. {Online}.Google Scholar
- Top 500 Supercomputers. http://www.top500.org/, 2016. {Online}.Google Scholar
- Transregional Research Center InvasIC. http://www.invasic.de, 2016. {Online}.Google Scholar
- X10: Performance and Productivity at Scale. http://x10-lang.org/, 2016. {Online}.Google Scholar
- D. J. Acheson. Elementary fluid dynamics. Oxford applied mathematics and computing science series. Clarendon Press; Oxford University Press, Oxford: New York, 1990.Google Scholar
- D. H. Ahn, J. Garlick, M. Grondona, D. Lipari, B. Springmeyer, and M. Schulz. Flux: A next-generation resource management framework for large hpc centers. In 10th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems. IEEE Computer Society, 2014. Google ScholarDigital Library
- M. Besta and T. Hoefler. Fault tolerance for remote memory access programming models. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pages 37--48. ACM, 2014. Google ScholarDigital Library
- L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, Y. Marzouk, L. Tenorio, B. van Bloemen Waanders, and K. Willcox. Large-Scale Inverse Problems and Quantification of Uncertainty. Wiley, Chichester, West Sussex, UK, first edition, 2011.Google Scholar
- W. Bland, A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. J. Dongarra. Recent Advances in the Message Passing Interface: 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, chapter An Evaluation of User-Level Failure Mitigation Support in MPI, pages 193--203. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. Google ScholarDigital Library
- A. Bouteiller, G. Bosilca, and J. J. Dongarra. Plan b: Interruption of ongoing mpi operations to support failure recovery. In Proceedings of the 22nd European MPI Users' Group Meeting, page 11. ACM, 2015. Google ScholarDigital Library
- H. Bungartz, M. Griebel, and U. RÃijde. Extrapolation, combination, and sparse grid techniques for elliptic boundary value problems. Computer Methods in Applied Mechanics and Engineering, 116(1):243--252, 1994.Google ScholarCross Ref
- F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, and M. Snir. Toward exascale resilience: 2014 update. Supercomputing frontiers and innovations, 1(1), 2014. Google ScholarDigital Library
- J. Garcke and M. Griebel, editors. Sparse Grids and Applications. Number 88 in Lecture Notes in Computational Science and Engineering. Springer, Heidelberg, 2013. Google ScholarDigital Library
- W. Gropp. MPI at exascale: Challenges for data structures and algorithms. In M. Ropo, J. Westerholm, and J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface (16th PVM/MPI'09), volume 5759 of Lecture Notes in Computer Science (LNCS), page 3. Springer-Verlag (New York), Espoo, Finland, Sept. 2009. Google ScholarDigital Library
- W. Gropp and M. Snir. Programming for exascale computers. Computing in Science and Engineering, 15(6):27--35, 2013. Google ScholarDigital Library
- J. Kaipo and E. Somersalo. Statistical and Computational Inverse Problems. Springer, New York, NY, USA, 2005 edition, 2005.Google ScholarCross Ref
- A. Klimke and B. Wohlmuth. Algorithm 847: Spinterp: Piecewise multilinear hierarchical sparse grid interpolation in matlab. ACM Trans. Math. Softw., 31(4):561--579, Dec. 2005. Google ScholarDigital Library
- I. Laguna, D. F. Richards, T. Gamblin, M. Schulz, B. R. de Supinski, K. Mohror, and H. Pritchard. Evaluating and extending user-level fault tolerance in mpi applications. International Journal of High Performance Computing Applications, page 1094342015623623, 2016.Google ScholarCross Ref
- Leibniz-Rechenzentrum. Leibniz supercomputing centre of the bavarian academy of sciences and humanities. https://www.lrz.de/services/compute/supermuc, March 2016.Google Scholar
- A. Marathe, P. E. Bailey, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski. A run-time system for power-constrained hpc applications. In High Performance Computing, pages 394--408. Springer International Publishing, 2015.Google ScholarCross Ref
- A. Mo-Hellenbrand. Sparse grid interpolants as surrogate models in statistical inverse problems. Master's thesis, Technische Universität München, 2013.Google Scholar
- P. Neumann, A. Atanasov, and C. Kowitz. Praktikum wissenschaftliches rechnen computational fluid dynamics, 2012. Lecture Notes.Google Scholar
- D. Pflüger. Spatially adaptive sparse grids for high-dimensional problems. Verlag Dr. Hut, 2010.Google Scholar
- S. Prabhakaran, M. Iqbal, S. Rinke, C. Windisch, and F. Wolf. A batch system with fair scheduling for evolving applications. In ICPP, pages 351--360. IEEE Computer Society, 2014. Google ScholarDigital Library
- B. Rountree, D. H. Ahn, B. R. de Supinski, D. K. Lowenthal, and M. Schulz. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In IPDPS Workshops, pages 947--953. IEEE Computer Society, 2012. Google ScholarDigital Library
- O. Sarood, A. Langer, L. Kalé, B. Rountree, and B. De Supinski. Optimizing power allocation to cpu and memory subsystems in overprovisioned hpc systems. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on, pages 1--8. IEEE, 2013.Google ScholarCross Ref
- R. Thakur, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, T. Hoefler, S. Kumar, E. Lusk, and J. L. Traeff. MPI at Exascale. In Procceedings of SciDAC 2010, Jun. 2010.Google Scholar
Recommendations
A Large-Scale Malleable Tsunami Simulation Realized on an Elastic MPI Infrastructure
CF'17: Proceedings of the Computing Frontiers ConferenceRealization of resource awareness and elasticity in hardware and software is an answer to many problems and challenges we are facing in High Performance Computing (HPC) today. Resource utilization inefficiency is a real problem in current HPC systems ...
MPI-StarT: delivering network performance to numerical applications
SC '98: Proceedings of the 1998 ACM/IEEE conference on SupercomputingWe describe an MPI implementation for a cluster of SMPs interconnected by a high-performance interconnect. This work is a collaboration between a numerical applications programmer and a cluster interconnect architect. The collaboration started with the ...
Performance Evaluation of MPI Implementations and MPI-Based Parallel ELLPACK Solvers
MPIDC '96: Proceedings of the Second MPI Developers ConferenceAbstract: We are concerned with the parallelization of finite element mesh generation and its decomposition, and the parallel solution of sparse algebraic equations which are obtained from the parallel discretization of second order elliptic partial ...
Comments