Role-Dependent Resource Utilization Analysis for Large HPC Centers

Nikitenko, Dmitry; Shvets, Pavel; Voevodin, Vadim; Zhumatiy, Sergey

doi:10.1007/978-3-319-99673-8_4

Role-Dependent Resource Utilization Analysis for Large HPC Centers

Conference paper
First Online: 26 August 2018

450 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 910))

Abstract

The resource utilization analysis of HPC systems can be performed in different ways. The method of analysis is selected depending primarily on the original focus of research. It can be a particular application and/or a series of application run analyses, a selected partition or a whole supercomputer system utilization study, a research on peculiarities of workgroup collaboration, and so on. The larger an HPC center is, the more diverse are the scenarios and user roles that arise. In this paper, we share the results of our research on possible roles and scenarios, as well as typical methods of resource utilization analysis for each role and scenario. The results obtained in this research have served as the basis for the development of appropriate modules in the Octoshell management system, which is used by all users of the largest HPC center in Russia, at Lomonosov Moscow State University.

The results were obtained at the Research Computing Center of Lomonosov Moscow State University. The work was partially funded by the Russian Foundation for Basic Research (grant № 17-07-00719), and with financial support from the Russian Science Foundation (grant № 17-71-20114) in the part of the program implementation described in Sect. 4. The research was carried out on equipment of the shared research facilities of HPC resources at Lomonosov Moscow State University.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
JobDigest® is a Russian registered trademark. An application for the creation of the JobDigest approach was filed and the corresponding patent was granted.

References

Voevodin, V., Voevodin, V.: Efficiency of exascale supercomputer centers and supercomputing education. In: Gitler, I., Klapp, J. (eds.) ISUM 2015. CCIS, vol. 595, pp. 14–23. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32243-8_2
Chapter Google Scholar
Voevodin, V., et al.: Practice of “Lomonosov” supercomputer. Open Syst. J. 7, 36–39 (2012)
Google Scholar
Gunter, D., Tierney, B., Jackson, K., Lee, J., Stoufer, M.: Dynamic monitoring of high-performance distributed applications. In: Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing, pp. 163–170 (2002). https://doi.org/10.1109/hpdc.2002.1029915
Mellor-Crummey, J., Fowler, R.J., Marin, G., Tallent, N.: HPCVIEW: a tool for top-down analysis of node performance. J. Supercomput. 23(1), 81–104 (2002). https://doi.org/10.1023/A:1015789220266
Article MATH Google Scholar
Jagode, H., Dongarra, J., Alam, S., Vetter, J., Spear, W., Malony, A.D.: A holistic approach for performance measurement and analysis for petascale applications. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009. LNCS, vol. 5545, pp. 686–695. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01973-9_77
Chapter Google Scholar
Adhianto, L., et al.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exper. J. 22(6), 685–701 (2009). https://doi.org/10.1002/cpe.1553
Article Google Scholar
Kluge, M., Hackenberg, D., Nagel, W.E.: Collecting distributed performance data with dataheap: generating and exploiting a holistic system view. Procedia Comput. Sci. J. 9, 1969–1978 (2012). https://doi.org/10.1016/j.procs.2012.04.215
Article Google Scholar
Nikitenko, D., et al.: JobDigest - detailed system monitoring-based supercomputer application behavior analysis. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2017. CCIS, vol. 793, pp. 516–529. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71255-0_42
Chapter Google Scholar
JobDigest components. https://github.com/srcc-msu/job_statistics
Nikitenko, D., Voevodin, V., Zhumatiy, S.: Resolving frontier problems of mastering large-scale supercomputer complexes. In: ACM International Conference on Computing Frontiers (CF 2016), pp. 349–352. ACM, New York (2016). https://doi.org/10.1145/2903150.2903481
Nikitenko, D., Voevodin, V., Zhumatiy, S.: Octoshell: large supercomputer complex administration system. In: Russian Supercomputing Days International Conference, Moscow, Russia, CEUR Workshop Proceedings, vol. 1482, pp. 69–83 (2015)
Google Scholar
Nikitenko, D., Stefanov, K., Zhumatiy, S., Voevodin, V., Teplov, A., Shvets, P.: System monitoring-based holistic resource utilization analysis for every user of a large HPC center. In: Carretero, J., et al. (eds.) ICA3PP 2016. LNCS, vol. 10049, pp. 305–318. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49956-7_24
Chapter Google Scholar
Nikitenko, D.A., et al.: Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems. In: 10th Annual International Scientific Conference on Parallel Computing Technologies, PCT 2016, Arkhangelsk, Russian Federation, CEUR Workshop Proceedings, vol. 1576, pp. 20–30 (2016)
Google Scholar
Movchan, A., Zymbler, M.: Time series subsequence similarity search under dynamic time warping distance on the Intel many-core accelerators. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) SISAP 2015. LNCS, vol. 9371, pp. 295–306. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25087-8_28
Chapter Google Scholar
Rechkalov, T., Zymbler, M.: Accelerating medoids-based clustering with the Intel many integrated core architecture. In: Proceedings of the 9th International Conference on Application of Information and Communication Technologies (AICT 2015), 14–16 October 2015, Rostov-on-Don, Russia, pp. 413–417. IEEE (2015). https://doi.org/10.1109/ICAICT.2015.7338591
Voevodin, V., Voevodin, V., Shaikhislamov, D., Nikitenko, D.: Data mining method for anomaly detection in the supercomputer task flow. In: Numerical Computations: Theory and Algorithms, The 2nd International Conference and Summer School, Pizzo calabro, Italy, 20–24 June 2016, AIP Conference Proceedings, vol. 1776, pp. 090015-1–090015-4 (2016). https://doi.org/10.1063/1.4965379
Antonov, A., et al.: An approach for ensuring reliable functioning of a supercomputer based on a formal model. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015, Part I. LNCS, vol. 9573, pp. 12–22. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_2
Chapter Google Scholar
Rudyak, V., Krakhalev, M., Sutormin, V.: Electrically induced structure transition in nematic liquid crystal droplets with conical boundary conditions. Phys. Rev. E. 96, 052701-1–052701-5 (2017). https://doi.org/10.1103/PhysRevE.96.052701
Article Google Scholar
Guseva, D., Rudyak, V., Komarov, P., et al.: Crosslinking mechanisms, structure and glass transition in phthalonitrile resins: insight from computer multiscale simulations and experiments. J. Polym. Sci. Part B: Polym. Phys. (2017). https://doi.org/10.1002/polb.24548
Nikitenko, D., Zhumatiy, S., Shvets, P.: Making large-scale systems observable – another inescapable step towards exascale. Supercomput. Front. Innov. J. 3(2), 72–79 (2016). https://doi.org/10.14529/jsfi160205
Article Google Scholar

Download references

Author information

Authors and Affiliations

Research Computing Center, Lomonosov Moscow State University, Moscow, Russia
Dmitry Nikitenko, Pavel Shvets, Vadim Voevodin & Sergey Zhumatiy

Authors

Dmitry Nikitenko
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Shvets
View author publications
You can also search for this author in PubMed Google Scholar
Vadim Voevodin
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Zhumatiy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmitry Nikitenko .

Editor information

Editors and Affiliations

South Ural State University, Chelyabinsk, Russia
Leonid Sokolinsky
South Ural State University, Chelyabinsk, Russia
Mikhail Zymbler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nikitenko, D., Shvets, P., Voevodin, V., Zhumatiy, S. (2018). Role-Dependent Resource Utilization Analysis for Large HPC Centers. In: Sokolinsky, L., Zymbler, M. (eds) Parallel Computational Technologies. PCT 2018. Communications in Computer and Information Science, vol 910. Springer, Cham. https://doi.org/10.1007/978-3-319-99673-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-99673-8_4
Published: 26 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99672-1
Online ISBN: 978-3-319-99673-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics