Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

Xiong, QinGang; Li, Bo; Xu, Ji; Fang, XiaoJian; Wang, XiaoWei; Wang, LiMin; He, XianFeng; Ge, Wei

doi:10.1007/s11434-011-4908-y

Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

Invited Article
Computer Science & Technology
Open access
Published: 25 February 2012

Volume 57, pages 707–715, (2012)
Cite this article

Download PDF

You have full access to this open access article

Chinese Science Bulletin

Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

Download PDF

QinGang Xiong^1,2,
Bo Li^1,2,
Ji Xu^1,2,
XiaoJian Fang^1,2,
XiaoWei Wang¹,
LiMin Wang¹,
XianFeng He¹ &
…
Wei Ge¹

1715 Accesses
Explore all metrics

Abstract

Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the one- and two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.

Article PDF

LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU

Physically based visual simulation of the Lattice Boltzmann method on the GPU: a survey

Article 27 April 2018

Leveraging the Performance of LBM-HPC for Large Sizes on GPUs Using Ghost Cells

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Kampolis I C, Trompoukis X S, Asouti V G, et al. CFD-based analysis and two-level aerodynamic optimization on graphics processing units. Comput Method Appl M, 2010, 199: 712–722
Article Google Scholar
Wang J, Xu M, Ge W, et al. GPU accelerated direct numerical simulation with SIMPLE arithmetic for single-phase flow. Chin Sci Bull, 2010, 55: 1979–1986
Article Google Scholar
Anderson J A, Lorenz C D, Travesset A. General purpose molecular dynamics simulations fully implemented on graphics processing unit. J Comput Phys, 2008, 227: 5342–5359
Article Google Scholar
Chen F, Ge W, Li J. Molecular dynamics simulation of complex multiphase flow on a computer cluster with GPUs. Sci China Ser B: Chem, 2009, 52: 372–380
Article Google Scholar
Xiong Q, Li B, Chen F, et al. Direct numerical simulation of sub-grid structures in gas-solid flow-GPU implementation of macro-scale pseudo-particle modeling. Chem Eng Sci, 2010, 65: 5356–5365
Article Google Scholar
McNamara G R, Zanetti G. Use of the Boltzmann equation to simulate lattice-gas automata. Phys Rev Lett, 1988, 61: 2332–2335
Article Google Scholar
Tolke J, Krafczyk M. TeraFLOP computing on a desktop PC with GPUs for 3D CFD. Int J Comput Fluid D, 2008, 22: 443–456
Article Google Scholar
Ge W, Chen F, Meng F, et al. Multi-scale Discrete Simulation Parallel Computing Based on GPU (in Chinese). Beijing: Science Press, 2009
Google Scholar
Bernaschi M, Fatica M, Melchionna S, et al. A flexible high-performance lattice Boltzmann GPU code for the simulations of fluid flows in complex geometries. Concurr Comp-Pract E, 2010, 22: 1–14
Article Google Scholar
Kuznik F, Obrecht C, Rusaouen G, et al. LBM based flow simulation using GPU computing processor. Comput Math Appl, 2010, 59: 2380–2392
Article Google Scholar
Li B, Li X, Zhang Y, et al. Lattice Boltzmann simulation on Nvidia and AMD GPUs (in Chinese). Chin Sci Bull (Chin Ver), 2009, 54: 3177–3184
Article Google Scholar
Myre J, Walsh S, Lilja D, et al. Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters. Concurr Comp-Pract E, 2010, 23: 332–350
Article Google Scholar
NVIDIA. NVIDIA CUDA compute unified device architecture Programming Guide Version 3.1, 2010
Qian Y, Humieres D, Lallemand P. Lattice BGK for Navier-Stokes equation. Europhys Lett, 1992, 17: 479–484
Article Google Scholar
He N, Wang N, Shi B. A unified incompressible lattice BGK model and its application to three-dimensional lid-driven cavity flow. Chin Phys, 2004, 13: 40–46
Article Google Scholar
Obrecht C, Kuznik F, Tourancheau B, et al. A new approach to the lattice Boltzmann method for graphics processing units. Comput Math Appl, 2011, 61: 3628–3638
Article Google Scholar
Yang C, Huang C, Lin C. Hybrid CUDA, Open MP, and MPI parallel programming on multicore GPU clusters. Comput Phys Commun, 2011, 182: 266–269
Article Google Scholar
Mellanox. NVIDIA GPUDirect™ Technology—Accelerating GPU-based Systems. 2010
Komatitsch D, Erlebacher G, Goddeke D, et al. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster. J Comput Phys, 2010, 229: 7692–7714
Article Google Scholar
Ge W, Wang W, Yang N, et al. Meso-scale oriented simulation towards virtual process engineering (VPE)—The EMMS paradigm. Chem Eng Sci, 2011, 66: 4426–4458
Article Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Multiphase Complex Systems, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, 100190, China
QinGang Xiong, Bo Li, Ji Xu, XiaoJian Fang, XiaoWei Wang, LiMin Wang, XianFeng He & Wei Ge
Graduate University of Chinese Academy of Sciences, Beijing, 100049, China
QinGang Xiong, Bo Li, Ji Xu & XiaoJian Fang

Authors

QinGang Xiong
View author publications
You can also search for this author inPubMed Google Scholar
Bo Li
View author publications
You can also search for this author inPubMed Google Scholar
Ji Xu
View author publications
You can also search for this author inPubMed Google Scholar
XiaoJian Fang
View author publications
You can also search for this author inPubMed Google Scholar
XiaoWei Wang
View author publications
You can also search for this author inPubMed Google Scholar
LiMin Wang
View author publications
You can also search for this author inPubMed Google Scholar
XianFeng He
View author publications
You can also search for this author inPubMed Google Scholar
Wei Ge
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to XiaoWei Wang or LiMin Wang.

Additional information

This article is published with open access at Springerlink.com

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Xiong, Q., Li, B., Xu, J. et al. Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units. Chin. Sci. Bull. 57, 707–715 (2012). https://doi.org/10.1007/s11434-011-4908-y

Download citation

Received: 23 May 2011
Accepted: 19 October 2011
Published: 25 February 2012
Issue Date: March 2012
DOI: https://doi.org/10.1007/s11434-011-4908-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

Abstract

Article PDF

Similar content being viewed by others

LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU

Physically based visual simulation of the Lattice Boltzmann method on the GPU: a survey

Leveraging the Performance of LBM-HPC for Large Sizes on GPUs Using Ghost Cells

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords