Abstract
The resource utilization analysis of HPC systems can be performed in different ways. The method of analysis is selected depending primarily on the original focus of research. It can be a particular application and/or a series of application run analyses, a selected partition or a whole supercomputer system utilization study, a research on peculiarities of workgroup collaboration, and so on. The larger an HPC center is, the more diverse are the scenarios and user roles that arise. In this paper, we share the results of our research on possible roles and scenarios, as well as typical methods of resource utilization analysis for each role and scenario. The results obtained in this research have served as the basis for the development of appropriate modules in the Octoshell management system, which is used by all users of the largest HPC center in Russia, at Lomonosov Moscow State University.
The results were obtained at the Research Computing Center of Lomonosov Moscow State University. The work was partially funded by the Russian Foundation for Basic Research (grant № 17-07-00719), and with financial support from the Russian Science Foundation (grant № 17-71-20114) in the part of the program implementation described in Sect. 4. The research was carried out on equipment of the shared research facilities of HPC resources at Lomonosov Moscow State University.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
JobDigest® is a Russian registered trademark. An application for the creation of the JobDigest approach was filed and the corresponding patent was granted.
References
Voevodin, V., Voevodin, V.: Efficiency of exascale supercomputer centers and supercomputing education. In: Gitler, I., Klapp, J. (eds.) ISUM 2015. CCIS, vol. 595, pp. 14–23. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32243-8_2
Voevodin, V., et al.: Practice of “Lomonosov” supercomputer. Open Syst. J. 7, 36–39 (2012)
Gunter, D., Tierney, B., Jackson, K., Lee, J., Stoufer, M.: Dynamic monitoring of high-performance distributed applications. In: Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing, pp. 163–170 (2002). https://doi.org/10.1109/hpdc.2002.1029915
Mellor-Crummey, J., Fowler, R.J., Marin, G., Tallent, N.: HPCVIEW: a tool for top-down analysis of node performance. J. Supercomput. 23(1), 81–104 (2002). https://doi.org/10.1023/A:1015789220266
Jagode, H., Dongarra, J., Alam, S., Vetter, J., Spear, W., Malony, A.D.: A holistic approach for performance measurement and analysis for petascale applications. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009. LNCS, vol. 5545, pp. 686–695. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01973-9_77
Adhianto, L., et al.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exper. J. 22(6), 685–701 (2009). https://doi.org/10.1002/cpe.1553
Kluge, M., Hackenberg, D., Nagel, W.E.: Collecting distributed performance data with dataheap: generating and exploiting a holistic system view. Procedia Comput. Sci. J. 9, 1969–1978 (2012). https://doi.org/10.1016/j.procs.2012.04.215
Nikitenko, D., et al.: JobDigest - detailed system monitoring-based supercomputer application behavior analysis. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2017. CCIS, vol. 793, pp. 516–529. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71255-0_42
JobDigest components. https://github.com/srcc-msu/job_statistics
Nikitenko, D., Voevodin, V., Zhumatiy, S.: Resolving frontier problems of mastering large-scale supercomputer complexes. In: ACM International Conference on Computing Frontiers (CF 2016), pp. 349–352. ACM, New York (2016). https://doi.org/10.1145/2903150.2903481
Nikitenko, D., Voevodin, V., Zhumatiy, S.: Octoshell: large supercomputer complex administration system. In: Russian Supercomputing Days International Conference, Moscow, Russia, CEUR Workshop Proceedings, vol. 1482, pp. 69–83 (2015)
Nikitenko, D., Stefanov, K., Zhumatiy, S., Voevodin, V., Teplov, A., Shvets, P.: System monitoring-based holistic resource utilization analysis for every user of a large HPC center. In: Carretero, J., et al. (eds.) ICA3PP 2016. LNCS, vol. 10049, pp. 305–318. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49956-7_24
Nikitenko, D.A., et al.: Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems. In: 10th Annual International Scientific Conference on Parallel Computing Technologies, PCT 2016, Arkhangelsk, Russian Federation, CEUR Workshop Proceedings, vol. 1576, pp. 20–30 (2016)
Movchan, A., Zymbler, M.: Time series subsequence similarity search under dynamic time warping distance on the Intel many-core accelerators. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) SISAP 2015. LNCS, vol. 9371, pp. 295–306. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25087-8_28
Rechkalov, T., Zymbler, M.: Accelerating medoids-based clustering with the Intel many integrated core architecture. In: Proceedings of the 9th International Conference on Application of Information and Communication Technologies (AICT 2015), 14–16 October 2015, Rostov-on-Don, Russia, pp. 413–417. IEEE (2015). https://doi.org/10.1109/ICAICT.2015.7338591
Voevodin, V., Voevodin, V., Shaikhislamov, D., Nikitenko, D.: Data mining method for anomaly detection in the supercomputer task flow. In: Numerical Computations: Theory and Algorithms, The 2nd International Conference and Summer School, Pizzo calabro, Italy, 20–24 June 2016, AIP Conference Proceedings, vol. 1776, pp. 090015-1–090015-4 (2016). https://doi.org/10.1063/1.4965379
Antonov, A., et al.: An approach for ensuring reliable functioning of a supercomputer based on a formal model. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015, Part I. LNCS, vol. 9573, pp. 12–22. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_2
Rudyak, V., Krakhalev, M., Sutormin, V.: Electrically induced structure transition in nematic liquid crystal droplets with conical boundary conditions. Phys. Rev. E. 96, 052701-1–052701-5 (2017). https://doi.org/10.1103/PhysRevE.96.052701
Guseva, D., Rudyak, V., Komarov, P., et al.: Crosslinking mechanisms, structure and glass transition in phthalonitrile resins: insight from computer multiscale simulations and experiments. J. Polym. Sci. Part B: Polym. Phys. (2017). https://doi.org/10.1002/polb.24548
Nikitenko, D., Zhumatiy, S., Shvets, P.: Making large-scale systems observable – another inescapable step towards exascale. Supercomput. Front. Innov. J. 3(2), 72–79 (2016). https://doi.org/10.14529/jsfi160205
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Nikitenko, D., Shvets, P., Voevodin, V., Zhumatiy, S. (2018). Role-Dependent Resource Utilization Analysis for Large HPC Centers. In: Sokolinsky, L., Zymbler, M. (eds) Parallel Computational Technologies. PCT 2018. Communications in Computer and Information Science, vol 910. Springer, Cham. https://doi.org/10.1007/978-3-319-99673-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-99673-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99672-1
Online ISBN: 978-3-319-99673-8
eBook Packages: Computer ScienceComputer Science (R0)