Skip to main content

Role-Dependent Resource Utilization Analysis for Large HPC Centers

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 910))

Abstract

The resource utilization analysis of HPC systems can be performed in different ways. The method of analysis is selected depending primarily on the original focus of research. It can be a particular application and/or a series of application run analyses, a selected partition or a whole supercomputer system utilization study, a research on peculiarities of workgroup collaboration, and so on. The larger an HPC center is, the more diverse are the scenarios and user roles that arise. In this paper, we share the results of our research on possible roles and scenarios, as well as typical methods of resource utilization analysis for each role and scenario. The results obtained in this research have served as the basis for the development of appropriate modules in the Octoshell management system, which is used by all users of the largest HPC center in Russia, at Lomonosov Moscow State University.

The results were obtained at the Research Computing Center of Lomonosov Moscow State University. The work was partially funded by the Russian Foundation for Basic Research (grant № 17-07-00719), and with financial support from the Russian Science Foundation (grant № 17-71-20114) in the part of the program implementation described in Sect. 4. The research was carried out on equipment of the shared research facilities of HPC resources at Lomonosov Moscow State University.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    JobDigest® is a Russian registered trademark. An application for the creation of the JobDigest approach was filed and the corresponding patent was granted.

References

  1. Voevodin, V., Voevodin, V.: Efficiency of exascale supercomputer centers and supercomputing education. In: Gitler, I., Klapp, J. (eds.) ISUM 2015. CCIS, vol. 595, pp. 14–23. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32243-8_2

    Chapter  Google Scholar 

  2. Voevodin, V., et al.: Practice of “Lomonosov” supercomputer. Open Syst. J. 7, 36–39 (2012)

    Google Scholar 

  3. Gunter, D., Tierney, B., Jackson, K., Lee, J., Stoufer, M.: Dynamic monitoring of high-performance distributed applications. In: Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing, pp. 163–170 (2002). https://doi.org/10.1109/hpdc.2002.1029915

  4. Mellor-Crummey, J., Fowler, R.J., Marin, G., Tallent, N.: HPCVIEW: a tool for top-down analysis of node performance. J. Supercomput. 23(1), 81–104 (2002). https://doi.org/10.1023/A:1015789220266

    Article  MATH  Google Scholar 

  5. Jagode, H., Dongarra, J., Alam, S., Vetter, J., Spear, W., Malony, A.D.: A holistic approach for performance measurement and analysis for petascale applications. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2009. LNCS, vol. 5545, pp. 686–695. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01973-9_77

    Chapter  Google Scholar 

  6. Adhianto, L., et al.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exper. J. 22(6), 685–701 (2009). https://doi.org/10.1002/cpe.1553

    Article  Google Scholar 

  7. Kluge, M., Hackenberg, D., Nagel, W.E.: Collecting distributed performance data with dataheap: generating and exploiting a holistic system view. Procedia Comput. Sci. J. 9, 1969–1978 (2012). https://doi.org/10.1016/j.procs.2012.04.215

    Article  Google Scholar 

  8. Nikitenko, D., et al.: JobDigest - detailed system monitoring-based supercomputer application behavior analysis. In: Voevodin, V., Sobolev, S. (eds.) RuSCDays 2017. CCIS, vol. 793, pp. 516–529. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71255-0_42

    Chapter  Google Scholar 

  9. JobDigest components. https://github.com/srcc-msu/job_statistics

  10. Nikitenko, D., Voevodin, V., Zhumatiy, S.: Resolving frontier problems of mastering large-scale supercomputer complexes. In: ACM International Conference on Computing Frontiers (CF 2016), pp. 349–352. ACM, New York (2016). https://doi.org/10.1145/2903150.2903481

  11. Nikitenko, D., Voevodin, V., Zhumatiy, S.: Octoshell: large supercomputer complex administration system. In: Russian Supercomputing Days International Conference, Moscow, Russia, CEUR Workshop Proceedings, vol. 1482, pp. 69–83 (2015)

    Google Scholar 

  12. Nikitenko, D., Stefanov, K., Zhumatiy, S., Voevodin, V., Teplov, A., Shvets, P.: System monitoring-based holistic resource utilization analysis for every user of a large HPC center. In: Carretero, J., et al. (eds.) ICA3PP 2016. LNCS, vol. 10049, pp. 305–318. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49956-7_24

    Chapter  Google Scholar 

  13. Nikitenko, D.A., et al.: Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems. In: 10th Annual International Scientific Conference on Parallel Computing Technologies, PCT 2016, Arkhangelsk, Russian Federation, CEUR Workshop Proceedings, vol. 1576, pp. 20–30 (2016)

    Google Scholar 

  14. Movchan, A., Zymbler, M.: Time series subsequence similarity search under dynamic time warping distance on the Intel many-core accelerators. In: Amato, G., Connor, R., Falchi, F., Gennaro, C. (eds.) SISAP 2015. LNCS, vol. 9371, pp. 295–306. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25087-8_28

    Chapter  Google Scholar 

  15. Rechkalov, T., Zymbler, M.: Accelerating medoids-based clustering with the Intel many integrated core architecture. In: Proceedings of the 9th International Conference on Application of Information and Communication Technologies (AICT 2015), 14–16 October 2015, Rostov-on-Don, Russia, pp. 413–417. IEEE (2015). https://doi.org/10.1109/ICAICT.2015.7338591

  16. Voevodin, V., Voevodin, V., Shaikhislamov, D., Nikitenko, D.: Data mining method for anomaly detection in the supercomputer task flow. In: Numerical Computations: Theory and Algorithms, The 2nd International Conference and Summer School, Pizzo calabro, Italy, 20–24 June 2016, AIP Conference Proceedings, vol. 1776, pp. 090015-1–090015-4 (2016). https://doi.org/10.1063/1.4965379

  17. Antonov, A., et al.: An approach for ensuring reliable functioning of a supercomputer based on a formal model. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015, Part I. LNCS, vol. 9573, pp. 12–22. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32149-3_2

    Chapter  Google Scholar 

  18. Rudyak, V., Krakhalev, M., Sutormin, V.: Electrically induced structure transition in nematic liquid crystal droplets with conical boundary conditions. Phys. Rev. E. 96, 052701-1–052701-5 (2017). https://doi.org/10.1103/PhysRevE.96.052701

    Article  Google Scholar 

  19. Guseva, D., Rudyak, V., Komarov, P., et al.: Crosslinking mechanisms, structure and glass transition in phthalonitrile resins: insight from computer multiscale simulations and experiments. J. Polym. Sci. Part B: Polym. Phys. (2017). https://doi.org/10.1002/polb.24548

  20. Nikitenko, D., Zhumatiy, S., Shvets, P.: Making large-scale systems observable – another inescapable step towards exascale. Supercomput. Front. Innov. J. 3(2), 72–79 (2016). https://doi.org/10.14529/jsfi160205

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmitry Nikitenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nikitenko, D., Shvets, P., Voevodin, V., Zhumatiy, S. (2018). Role-Dependent Resource Utilization Analysis for Large HPC Centers. In: Sokolinsky, L., Zymbler, M. (eds) Parallel Computational Technologies. PCT 2018. Communications in Computer and Information Science, vol 910. Springer, Cham. https://doi.org/10.1007/978-3-319-99673-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99673-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99672-1

  • Online ISBN: 978-3-319-99673-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics