Abstract
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the one- and two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Kampolis I C, Trompoukis X S, Asouti V G, et al. CFD-based analysis and two-level aerodynamic optimization on graphics processing units. Comput Method Appl M, 2010, 199: 712–722
Wang J, Xu M, Ge W, et al. GPU accelerated direct numerical simulation with SIMPLE arithmetic for single-phase flow. Chin Sci Bull, 2010, 55: 1979–1986
Anderson J A, Lorenz C D, Travesset A. General purpose molecular dynamics simulations fully implemented on graphics processing unit. J Comput Phys, 2008, 227: 5342–5359
Chen F, Ge W, Li J. Molecular dynamics simulation of complex multiphase flow on a computer cluster with GPUs. Sci China Ser B: Chem, 2009, 52: 372–380
Xiong Q, Li B, Chen F, et al. Direct numerical simulation of sub-grid structures in gas-solid flow-GPU implementation of macro-scale pseudo-particle modeling. Chem Eng Sci, 2010, 65: 5356–5365
McNamara G R, Zanetti G. Use of the Boltzmann equation to simulate lattice-gas automata. Phys Rev Lett, 1988, 61: 2332–2335
Tolke J, Krafczyk M. TeraFLOP computing on a desktop PC with GPUs for 3D CFD. Int J Comput Fluid D, 2008, 22: 443–456
Ge W, Chen F, Meng F, et al. Multi-scale Discrete Simulation Parallel Computing Based on GPU (in Chinese). Beijing: Science Press, 2009
Bernaschi M, Fatica M, Melchionna S, et al. A flexible high-performance lattice Boltzmann GPU code for the simulations of fluid flows in complex geometries. Concurr Comp-Pract E, 2010, 22: 1–14
Kuznik F, Obrecht C, Rusaouen G, et al. LBM based flow simulation using GPU computing processor. Comput Math Appl, 2010, 59: 2380–2392
Li B, Li X, Zhang Y, et al. Lattice Boltzmann simulation on Nvidia and AMD GPUs (in Chinese). Chin Sci Bull (Chin Ver), 2009, 54: 3177–3184
Myre J, Walsh S, Lilja D, et al. Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters. Concurr Comp-Pract E, 2010, 23: 332–350
NVIDIA. NVIDIA CUDA compute unified device architecture Programming Guide Version 3.1, 2010
Qian Y, Humieres D, Lallemand P. Lattice BGK for Navier-Stokes equation. Europhys Lett, 1992, 17: 479–484
He N, Wang N, Shi B. A unified incompressible lattice BGK model and its application to three-dimensional lid-driven cavity flow. Chin Phys, 2004, 13: 40–46
Obrecht C, Kuznik F, Tourancheau B, et al. A new approach to the lattice Boltzmann method for graphics processing units. Comput Math Appl, 2011, 61: 3628–3638
Yang C, Huang C, Lin C. Hybrid CUDA, Open MP, and MPI parallel programming on multicore GPU clusters. Comput Phys Commun, 2011, 182: 266–269
Mellanox. NVIDIA GPUDirect™ Technology—Accelerating GPU-based Systems. 2010
Komatitsch D, Erlebacher G, Goddeke D, et al. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster. J Comput Phys, 2010, 229: 7692–7714
Ge W, Wang W, Yang N, et al. Meso-scale oriented simulation towards virtual process engineering (VPE)—The EMMS paradigm. Chem Eng Sci, 2011, 66: 4426–4458
Author information
Authors and Affiliations
Corresponding authors
Additional information
This article is published with open access at Springerlink.com
Rights and permissions
This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.
About this article
Cite this article
Xiong, Q., Li, B., Xu, J. et al. Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units. Chin. Sci. Bull. 57, 707–715 (2012). https://doi.org/10.1007/s11434-011-4908-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11434-011-4908-y