Skip to main content
Springer Nature Link
Account
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Chinese Science Bulletin
  3. Article

Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

  • Invited Article
  • Computer Science & Technology
  • Open access
  • Published: 25 February 2012
  • Volume 57, pages 707–715, (2012)
  • Cite this article
Download PDF

You have full access to this open access article

Chinese Science Bulletin
Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units
Download PDF
  • QinGang Xiong1,2,
  • Bo Li1,2,
  • Ji Xu1,2,
  • XiaoJian Fang1,2,
  • XiaoWei Wang1,
  • LiMin Wang1,
  • XianFeng He1 &
  • …
  • Wei Ge1 
  • 1715 Accesses

  • Explore all metrics

Abstract

Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the one- and two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.

Article PDF

Download to read the full article text

Similar content being viewed by others

LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU

Chapter © 2019

Physically based visual simulation of the Lattice Boltzmann method on the GPU: a survey

Article 27 April 2018

Leveraging the Performance of LBM-HPC for Large Sizes on GPUs Using Ghost Cells

Chapter © 2016
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  1. Kampolis I C, Trompoukis X S, Asouti V G, et al. CFD-based analysis and two-level aerodynamic optimization on graphics processing units. Comput Method Appl M, 2010, 199: 712–722

    Article  Google Scholar 

  2. Wang J, Xu M, Ge W, et al. GPU accelerated direct numerical simulation with SIMPLE arithmetic for single-phase flow. Chin Sci Bull, 2010, 55: 1979–1986

    Article  Google Scholar 

  3. Anderson J A, Lorenz C D, Travesset A. General purpose molecular dynamics simulations fully implemented on graphics processing unit. J Comput Phys, 2008, 227: 5342–5359

    Article  Google Scholar 

  4. Chen F, Ge W, Li J. Molecular dynamics simulation of complex multiphase flow on a computer cluster with GPUs. Sci China Ser B: Chem, 2009, 52: 372–380

    Article  Google Scholar 

  5. Xiong Q, Li B, Chen F, et al. Direct numerical simulation of sub-grid structures in gas-solid flow-GPU implementation of macro-scale pseudo-particle modeling. Chem Eng Sci, 2010, 65: 5356–5365

    Article  Google Scholar 

  6. McNamara G R, Zanetti G. Use of the Boltzmann equation to simulate lattice-gas automata. Phys Rev Lett, 1988, 61: 2332–2335

    Article  Google Scholar 

  7. Tolke J, Krafczyk M. TeraFLOP computing on a desktop PC with GPUs for 3D CFD. Int J Comput Fluid D, 2008, 22: 443–456

    Article  Google Scholar 

  8. Ge W, Chen F, Meng F, et al. Multi-scale Discrete Simulation Parallel Computing Based on GPU (in Chinese). Beijing: Science Press, 2009

    Google Scholar 

  9. Bernaschi M, Fatica M, Melchionna S, et al. A flexible high-performance lattice Boltzmann GPU code for the simulations of fluid flows in complex geometries. Concurr Comp-Pract E, 2010, 22: 1–14

    Article  Google Scholar 

  10. Kuznik F, Obrecht C, Rusaouen G, et al. LBM based flow simulation using GPU computing processor. Comput Math Appl, 2010, 59: 2380–2392

    Article  Google Scholar 

  11. Li B, Li X, Zhang Y, et al. Lattice Boltzmann simulation on Nvidia and AMD GPUs (in Chinese). Chin Sci Bull (Chin Ver), 2009, 54: 3177–3184

    Article  Google Scholar 

  12. Myre J, Walsh S, Lilja D, et al. Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters. Concurr Comp-Pract E, 2010, 23: 332–350

    Article  Google Scholar 

  13. NVIDIA. NVIDIA CUDA compute unified device architecture Programming Guide Version 3.1, 2010

  14. Qian Y, Humieres D, Lallemand P. Lattice BGK for Navier-Stokes equation. Europhys Lett, 1992, 17: 479–484

    Article  Google Scholar 

  15. He N, Wang N, Shi B. A unified incompressible lattice BGK model and its application to three-dimensional lid-driven cavity flow. Chin Phys, 2004, 13: 40–46

    Article  Google Scholar 

  16. Obrecht C, Kuznik F, Tourancheau B, et al. A new approach to the lattice Boltzmann method for graphics processing units. Comput Math Appl, 2011, 61: 3628–3638

    Article  Google Scholar 

  17. Yang C, Huang C, Lin C. Hybrid CUDA, Open MP, and MPI parallel programming on multicore GPU clusters. Comput Phys Commun, 2011, 182: 266–269

    Article  Google Scholar 

  18. Mellanox. NVIDIA GPUDirect™ Technology—Accelerating GPU-based Systems. 2010

  19. Komatitsch D, Erlebacher G, Goddeke D, et al. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster. J Comput Phys, 2010, 229: 7692–7714

    Article  Google Scholar 

  20. Ge W, Wang W, Yang N, et al. Meso-scale oriented simulation towards virtual process engineering (VPE)—The EMMS paradigm. Chem Eng Sci, 2011, 66: 4426–4458

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. State Key Laboratory of Multiphase Complex Systems, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, 100190, China

    QinGang Xiong, Bo Li, Ji Xu, XiaoJian Fang, XiaoWei Wang, LiMin Wang, XianFeng He & Wei Ge

  2. Graduate University of Chinese Academy of Sciences, Beijing, 100049, China

    QinGang Xiong, Bo Li, Ji Xu & XiaoJian Fang

Authors
  1. QinGang Xiong
    View author publications

    You can also search for this author inPubMed Google Scholar

  2. Bo Li
    View author publications

    You can also search for this author inPubMed Google Scholar

  3. Ji Xu
    View author publications

    You can also search for this author inPubMed Google Scholar

  4. XiaoJian Fang
    View author publications

    You can also search for this author inPubMed Google Scholar

  5. XiaoWei Wang
    View author publications

    You can also search for this author inPubMed Google Scholar

  6. LiMin Wang
    View author publications

    You can also search for this author inPubMed Google Scholar

  7. XianFeng He
    View author publications

    You can also search for this author inPubMed Google Scholar

  8. Wei Ge
    View author publications

    You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to XiaoWei Wang or LiMin Wang.

Additional information

This article is published with open access at Springerlink.com

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Xiong, Q., Li, B., Xu, J. et al. Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units. Chin. Sci. Bull. 57, 707–715 (2012). https://doi.org/10.1007/s11434-011-4908-y

Download citation

  • Received: 23 May 2011

  • Accepted: 19 October 2011

  • Published: 25 February 2012

  • Issue Date: March 2012

  • DOI: https://doi.org/10.1007/s11434-011-4908-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • asynchronous execution
  • compute unified device architecture
  • graphic processing unit
  • lattice Boltzmann method
  • non-blocking message passing interface
  • OpenMP
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our brands

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Discover
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Legal notice
  • Cancel contracts here

Not affiliated

Springer Nature

© 2025 Springer Nature