Capacity Estimation in HPC Systems: Simulation Approach

Anghelescu, A.; Lenin, R. B.; Ramaswamy, S.; Yoshigoe, K.

doi:10.1007/978-3-642-19056-8_8

A. Anghelescu¹⁸,
R. B. Lenin¹⁹,
S. Ramaswamy²⁰ &
…
K. Yoshigoe²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6536))

Included in the following conference series:

International Conference on Distributed Computing and Internet Technology

638 Accesses

Abstract

As HPC (high performance computing) systems are extensively employed for heavy computational problems throughout heterogeneous environments, the scale and complexity of applications raises the issue of capacity planning. A cardinal aspect of efficiency is the job scheduler in any HPC systems. The job scheduling techniques can worsen or mitigate issues such as job starvation, increased queue time, and decreased system utilization. Since the impact of scheduling techniques is dependent on the workload of a supercomputer, this research proposes to analyze various scheduling disciplines on a given workload. By simulating HPC system, for any given workload, we can find the paradigm that yields the best performance, i.e. minimizing the wait time of jobs in the queue while maximizing resource utilization. Furthermore, given a fixed configuration of a HPC system, this research can be used to determine an appropriate workload that optimizes the system’s performance. The development and implementation of such complex simulation framework for HPC does not yet exist in HPC’s literature. The efficiency of the proposed simulation framework is illustrated through simulation results of performance measures such as average queuing time, average number of jobs in the queue, and system utilization. These results are verified by a developed mathematical model for job load characterization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

OMNeT++ (2010), http://www.omnetpp.org
Bansal, N., Harchol-Balter, M.: Analysis of srpt scheduling: Investigating unfairness. ACM SIGMETRICS Performance Evaluation Review 29(1), 279–290 (2001)
Article Google Scholar
Cirne, W., Berman, F.: Adaptive selection of partition size for supercomputer requests. In: Feitelson, D.G., Rudolph, L. (eds.) IPDPS-WS 2000 and JSSPP 2000. LNCS, vol. 1911, pp. 187–207. Springer, Heidelberg (2000)
Chapter Google Scholar
Hurst, W.B., Ramaswamy, S., Lenin, R.B., Hoffman, D.: Development of generalized hpc simulator. In: Proc. of Acxiom Laboratory for Applied Research 2010 (2010)
Google Scholar
Iqbal, S., Gupta, S.R., Fang, Y.-C.: Planning considerations for job scheduling in hpc clusters. Dell Power Solutions Magazine, 133–136 (February 2005)
Google Scholar
Jackson, D.B., Jackson, H.L., Snell, Q.O.: Simulation based HPC workload analysis. In: Proc. of International Parallel and Distributed Processing Symposium (2001)
Google Scholar
Jones, J.P., Nitzberg, B.: Scheduling for parallel supercomputing: A historical perspective of achievable utilization. In: Feitelson, D., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 1–16. Springer, Heidelberg (1999)
Chapter Google Scholar
Lui, H.-L., Shooman, M.L.: Simulation of computer network reliability with congestion. In: Proc. of Annual Reliability and Maintainability Symposium, pp. 208–213 (1999)
Google Scholar
Menascé, D.A., Almeida, V.A.F., Dowdy, L.W.: Capacity Planning and Performance Modeling: From Mainframes to Client-Server Systems. Prentice-Hall, Upper Saddle River (1994)
Google Scholar
Merkuryev, Y., Tolujew, J., Blumel, E., Novitsky, L., Ginters, E., Viktorova, E., Merkuryeva, G., Pronins, J.: A modelling and simulation methodology for managing the riga harbour container terminal. Simulation 71(2), 84–95 (1998)
Article Google Scholar
Riesen, R.: Simulating a supercomputer. Presentation, Sandia National Laboratories, Wildhaus, Switzerland (March 2008), http://sos12.epfl.ch/riesen.pdf
Streit, A.: The self-tuning dynp job-scheduler. In: Proc. of the 20th International Parallel and Distributed Processing Symposium, pp. 1530–2075 (2002)
Google Scholar
Thanalapati, T., Dandamudi, S.: An efficient adaptive scheduling scheme for distributed memory multicomputers. IEEE Transactions on Parallel and Distributed Systems 12(7), 758–768 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Emory University, Atlanta, GA, 30322, USA
A. Anghelescu
Department of Mathematics, University of Central Arkansas, Conway, AR, 72035, USA
R. B. Lenin
Industrial Software Systems, ABB Corporate Research, Bangalore, 560048, India
S. Ramaswamy
Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR, 72204, USA
K. Yoshigoe

Authors

A. Anghelescu
View author publications
You can also search for this author in PubMed Google Scholar
R. B. Lenin
View author publications
You can also search for this author in PubMed Google Scholar
S. Ramaswamy
View author publications
You can also search for this author in PubMed Google Scholar
K. Yoshigoe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tata Institute of Fundamental Research, School of Technology & Computer Science, Homi Bhabha Road, Colaba, 400005, Mumbai, India
Raja Natarajan
International Institute of Software Technology, Center for Electronic Governance, United Nations University, P.O. Box 3058, Macao
Adegboyega Ojo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anghelescu, A., Lenin, R.B., Ramaswamy, S., Yoshigoe, K. (2011). Capacity Estimation in HPC Systems: Simulation Approach. In: Natarajan, R., Ojo, A. (eds) Distributed Computing and Internet Technology. ICDCIT 2011. Lecture Notes in Computer Science, vol 6536. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19056-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-19056-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19055-1
Online ISBN: 978-3-642-19056-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics