skip to main content
10.1145/2966884.2966917acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Infrastructure and API Extensions for Elastic Execution of MPI Applications

Authors Info & Claims
Published:25 September 2016Publication History

ABSTRACT

Dynamic Processes support was added to MPI in version 2.0 of the standard. This feature of MPI has not been widely used by application developers in part due to the performance cost and limitations of the spawn operation. In this paper, we propose an extension to MPI that consists of four new operations. These operations allow an application to be initialized in an elastic mode of execution and enter an adaptation window when necessary, where resources are incorporated into or released from the application's world communicator. A prototype solution based on the MPICH library and the SLURM resource manager is presented and evaluated alongside an elastic scientific application that makes use of the new MPI extensions. The cost of these new operations is shown to be negligible due mainly to the latency hiding design, leaving the application's time for data redistribution as the only significant performance cost.

References

  1. AMPI: Adaptive Message Passing Interface. http://charm.cs.illinois.edu/research/ampi, 2016. {Online}.Google ScholarGoogle Scholar
  2. Caliper: Application Introspection System. http://computation.llnl.gov/projects/caliper, 2016. {Online}.Google ScholarGoogle Scholar
  3. Charm++: Parallel Programming with Migratable Objects. http://charm.cs.illinois.edu/research/charm, 2016. {Online}.Google ScholarGoogle Scholar
  4. MPICH: High-Performance Portable MPI. http://www.mpich.org, 2016. {Online}.Google ScholarGoogle Scholar
  5. OpenFabrics Alliance. http://openfabrics.org/, 2016. {Online}.Google ScholarGoogle Scholar
  6. Parallel Virtual Machine (PVM). http://www.csm.ornl.gov/pvm/, 2016. {Online}.Google ScholarGoogle Scholar
  7. PMI Exascale. http://www.open-mpi.org/projects/pmix/, 2016. {Online}.Google ScholarGoogle Scholar
  8. SchedMD. http://www.schedmd.com/, 2016. {Online}.Google ScholarGoogle Scholar
  9. Simple Linux Utility For Resource Management. http://slurm.schedmd.com/, 2016. {Online}.Google ScholarGoogle Scholar
  10. Top 500 Supercomputers. http://www.top500.org/, 2016. {Online}.Google ScholarGoogle Scholar
  11. Transregional Research Center InvasIC. http://www.invasic.de, 2016. {Online}.Google ScholarGoogle Scholar
  12. X10: Performance and Productivity at Scale. http://x10-lang.org/, 2016. {Online}.Google ScholarGoogle Scholar
  13. D. J. Acheson. Elementary fluid dynamics. Oxford applied mathematics and computing science series. Clarendon Press; Oxford University Press, Oxford: New York, 1990.Google ScholarGoogle Scholar
  14. D. H. Ahn, J. Garlick, M. Grondona, D. Lipari, B. Springmeyer, and M. Schulz. Flux: A next-generation resource management framework for large hpc centers. In 10th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems. IEEE Computer Society, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Besta and T. Hoefler. Fault tolerance for remote memory access programming models. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, pages 37--48. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, Y. Marzouk, L. Tenorio, B. van Bloemen Waanders, and K. Willcox. Large-Scale Inverse Problems and Quantification of Uncertainty. Wiley, Chichester, West Sussex, UK, first edition, 2011.Google ScholarGoogle Scholar
  17. W. Bland, A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. J. Dongarra. Recent Advances in the Message Passing Interface: 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, chapter An Evaluation of User-Level Failure Mitigation Support in MPI, pages 193--203. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Bouteiller, G. Bosilca, and J. J. Dongarra. Plan b: Interruption of ongoing mpi operations to support failure recovery. In Proceedings of the 22nd European MPI Users' Group Meeting, page 11. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Bungartz, M. Griebel, and U. RÃijde. Extrapolation, combination, and sparse grid techniques for elliptic boundary value problems. Computer Methods in Applied Mechanics and Engineering, 116(1):243--252, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  20. F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, and M. Snir. Toward exascale resilience: 2014 update. Supercomputing frontiers and innovations, 1(1), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Garcke and M. Griebel, editors. Sparse Grids and Applications. Number 88 in Lecture Notes in Computational Science and Engineering. Springer, Heidelberg, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Gropp. MPI at exascale: Challenges for data structures and algorithms. In M. Ropo, J. Westerholm, and J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface (16th PVM/MPI'09), volume 5759 of Lecture Notes in Computer Science (LNCS), page 3. Springer-Verlag (New York), Espoo, Finland, Sept. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W. Gropp and M. Snir. Programming for exascale computers. Computing in Science and Engineering, 15(6):27--35, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Kaipo and E. Somersalo. Statistical and Computational Inverse Problems. Springer, New York, NY, USA, 2005 edition, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  25. A. Klimke and B. Wohlmuth. Algorithm 847: Spinterp: Piecewise multilinear hierarchical sparse grid interpolation in matlab. ACM Trans. Math. Softw., 31(4):561--579, Dec. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. I. Laguna, D. F. Richards, T. Gamblin, M. Schulz, B. R. de Supinski, K. Mohror, and H. Pritchard. Evaluating and extending user-level fault tolerance in mpi applications. International Journal of High Performance Computing Applications, page 1094342015623623, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  27. Leibniz-Rechenzentrum. Leibniz supercomputing centre of the bavarian academy of sciences and humanities. https://www.lrz.de/services/compute/supermuc, March 2016.Google ScholarGoogle Scholar
  28. A. Marathe, P. E. Bailey, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski. A run-time system for power-constrained hpc applications. In High Performance Computing, pages 394--408. Springer International Publishing, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  29. A. Mo-Hellenbrand. Sparse grid interpolants as surrogate models in statistical inverse problems. Master's thesis, Technische Universität München, 2013.Google ScholarGoogle Scholar
  30. P. Neumann, A. Atanasov, and C. Kowitz. Praktikum wissenschaftliches rechnen computational fluid dynamics, 2012. Lecture Notes.Google ScholarGoogle Scholar
  31. D. Pflüger. Spatially adaptive sparse grids for high-dimensional problems. Verlag Dr. Hut, 2010.Google ScholarGoogle Scholar
  32. S. Prabhakaran, M. Iqbal, S. Rinke, C. Windisch, and F. Wolf. A batch system with fair scheduling for evolving applications. In ICPP, pages 351--360. IEEE Computer Society, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. Rountree, D. H. Ahn, B. R. de Supinski, D. K. Lowenthal, and M. Schulz. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In IPDPS Workshops, pages 947--953. IEEE Computer Society, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. O. Sarood, A. Langer, L. Kalé, B. Rountree, and B. De Supinski. Optimizing power allocation to cpu and memory subsystems in overprovisioned hpc systems. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on, pages 1--8. IEEE, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  35. R. Thakur, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, T. Hoefler, S. Kumar, E. Lusk, and J. L. Traeff. MPI at Exascale. In Procceedings of SciDAC 2010, Jun. 2010.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting
    September 2016
    225 pages
    ISBN:9781450342346
    DOI:10.1145/2966884

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 September 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate66of139submissions,47%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader