Way of Measuring Data Transfer Delays among Graphics Processing Units at Different Nodes of a Computer Cluster

Begaev, A. A.; Salnikov, A. N.

doi:10.3103/S0278641920010021

Way of Measuring Data Transfer Delays among Graphics Processing Units at Different Nodes of a Computer Cluster

Published: 21 May 2020

Volume 44, pages 1–10, (2020)
Cite this article

Moscow University Computational Mathematics and Cybernetics Aims and scope Submit manuscript

A. A. Begaev¹ &
A. N. Salnikov¹

51 Accesses
Explore all metrics

Abstract

The basics of load tests for a computer cluster with a large number of GPUs (graphics processing units) distributed over the cluster’s nodes are presented and implemented as a program code. Information about the time delays in the transfer of data of different sizes among all GPUs in the system is collected as a result. Two modes of tests, ‘‘all to all’’ and ‘‘one to one,’’ are developed. In the first mode, all GPUs transfer data to all GPUs simultaneously. In the second mode, only the transfer between two GPUs proceeds at a single moment in time. Using test results obtained on the K60 computer cluster at the Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, it is shown that the interconnector medium of the supercomputer is inhomogeneous in data transfer among the GPUs not only for transfer through the network, but also for the GPUs in a common node of the computer cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Template for Scalable Continuum Dynamic Simulations in Multiple GPUs

Qualitative and Quantitative Study of Modern GPU Synchronization Approaches

On the Effectiveness of Using the GPU for Numerical Solution of Stochastic Collection Equation

REFERENCES

K. P. Belyaev, A. A. Kuleshov, and N. P. Tuchkova, ‘‘Modeling of ocean dynamics with the assimilation of observational data,’’ KIAM Preprint No. 133 (Keldysh Inst. Appl. Math., Russ. Acad. Sci., Moscow, 2018) [in Russian].
Google Scholar
S.-X. Zou, C.-Y. Chen, J.-L. Wu, C.-N. Chou, C.-C. Tsao, K.-C. Tung, T.-W. Lin, C.-L. Sung, and E. Y. Chang, ‘‘Distributed training large-scale deep architecture,’’ in Advanced Data Mining and Applications, Proc. 13th Int. Conf., ADMA 2017, Ed. by G. Cong, et al., Lecture Notes in Computer Science (Springer, Cham, 2017), Vol. 10604, pp. 18-32.
Y. Bazilevs, K. Takizawa, and T. E. Tezduyar, ‘‘New directions and challenging computations in fluid dynamics modeling with stabilized and multiscale methods,’’ Math. Models Methods Appl. Sci. 25 (12), 2217-2226 (2015).
NVIDIA Tesla P100. https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf (Accessed May 5, 2019).
Summit Supercomputer. https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit (Accessed July 2, 2019).
A. Perepelkina and V. Levchenko, ‘‘LRnLA algorithm ConeFold with non-local vectorization for LBM implementation,’’ in Supercomputing, RuSCDays 2018, Ed. by V. Voevodin and S. Sobolev, Communications in Computer and Information Science (Springer, Cham, 2019), Vol. 965, pp. 101-113.
Google Scholar
J. Colmenares, A. Galizia, J. Ortiz, A. Clematis, and W. Rocchia, ‘‘A combined MPI-CUDA parallel solution of linear and nonlinear Poisson-Boltzmann equation,’’ Biomed Res. Int. 2014, Article ID 560987, 1-12 (2014).
Google Scholar
MPI Standart. https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf (Accessed May 14, 2019).
CUDA Developer Documentation. https://docs.nvidia.com/cuda/ (Accessed May 12, 2019).
A. N. Salnikov, D. Yu. Andreev, and R. D. Lebedev, ‘‘Toolkit for analyzing the communication environment characteristics of a computational cluster based on MPI standard functions,’’ Moscow Univ. Comput. Math. Cybern. 36 (1), 41-49 (2012).
Article Google Scholar
T. Fujiwara, J. K. Li, M. Mubarak, et al., ‘‘A visual analytics system for optimizing the performance of large-scale networks in supercomputing systems,’’ Visual Inf. 2 (1), 98-110 (2018).
Article Google Scholar
A. Bhatele and V. Laxmikant, ‘‘An evaluative study on the effect of contention on message latencies in large supercomputers,’’ in Proc. 2009 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS 2009), Rome, Italy, 2009 (IEEE, 2009), pp. 1-8.
S. J. Pennycook, S. D. Hammond, S. A. Jarvis, and G. R. Mudalige, ‘‘Performance analysis of a hybrid MPI/CUDA implementation of the NAS-LU benchmark,’’ ACM SIGMETRICS Perform. Eval. Rev. 38 (4), 23-29 (2011).
Article Google Scholar
Clustbench. https://github.com/clustbench (Accessed May 3, 2019).
NetCDF4 Documentation. https://www.unidata.ucar.edu/software/netcdf/docs/index.html (Accessed May 9, 2019).
H. Li, D. Yu, A. Kumar, and Y.-C. Tu, ‘‘Performance modeling in CUDA streams - A means for high-throughput data processing,’’ in Proc. 2014 IEEE Int. Conf. on Big Data, Washington, USA, 2014 (IEEE, 2014), pp. 301-310.
K-60 Computer Complex [in Russian]. http://www.kiam.ru/MVS/resourses/k60.html (Accessed May 13, 2019).
P. S. Pacheco, Parallel Programming with MPI, 1st ed. (Morgan Kaufmann, San Francisco, 1996).
MATH Google Scholar
A. N. Salnikov, A. A. Begaev, and A. I. Maysuradze, ‘‘Analysis of delays structure of interconnections in supercomputer by means of DBScan and divisive clustering algorithms,’’ Comput. Math. Inf. Technol. 2 (1), 33-43 (2018).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational Mathematics and Cybernetics, Moscow State University, 119991, Moscow, Russia
A. A. Begaev & A. N. Salnikov

Authors

A. A. Begaev
View author publications
You can also search for this author in PubMed Google Scholar
A. N. Salnikov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to A. A. Begaev or A. N. Salnikov.

Additional information

Translated by E. Oborin

About this article

Cite this article

Begaev, A.A., Salnikov, A.N. Way of Measuring Data Transfer Delays among Graphics Processing Units at Different Nodes of a Computer Cluster. MoscowUniv.Comput.Math.Cybern. 44, 1–10 (2020). https://doi.org/10.3103/S0278641920010021

Download citation

Received: 09 July 2019
Revised: 02 October 2019
Accepted: 02 October 2019
Published: 21 May 2020
Issue Date: January 2020
DOI: https://doi.org/10.3103/S0278641920010021

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions