Efficient 3D DNS of gas–solid flows on Fermi GPGPU

doi:10.1016/j.compfluid.2012.08.026

Computers & Fluids

Volume 70, 30 November 2012, Pages 86-94

https://doi.org/10.1016/j.compfluid.2012.08.026 Get rights and content

Abstract

Three-dimensional (3D) gas–solid Direct Numerical Simulation (DNS) requires huge computational resources which imposes a great challenge to both current hardware and software conditions. In this article, an efficient implementation of 3D gas–solid DNS with Lattice Boltzmann Method and Discrete Element Method is developed on a Fermi GPGPU. An Immersed Moving Boundary approach is utilized to impose the no-slip condition at particle–fluid interfaces. Optimization strategies such as changing the sequence of collision and propagation in grid evolution and making multiple kernel executing concurrently are discussed in detail. This algorithm is demonstrated to be competitive both in terms of accuracy and performance. Approximately 131 Millions of Lattices Update Per Second have been achieved, indicating that this GPGPU implementation is very suitable for 3D gas–solid DNS.

Highlights

► Exchange grid collision and propagation sequence to improve modeling speed. ► Let multiple kernels executing concurrently to alleviate total speed by about 4%. ► Introduce a rather simple control volume calculation method in IMB. ► About 27-fold speedup has achieved for Tesla C2050 to Intel Core i5.

Introduction

Gas–solid flows, which are frequently encountered in process engineering, have received extensive attention during the last century. However, theoretical and experimental work still face fundamental challenges due to the inherent non-linear non-equilibrium multi-scale characteristics [1] in such complex multi-phase systems. At the same time, computational modeling has emerged as the third pillar in scientific studies for its controllable, non-intrusive and easy-to-implement features. Thus, computational modeling has been employed in a wide range of fields, including gas–solid flows [2], [3].

In gas–solid simulations, Direct Numerical Simulation (DNS) plays an indispensable role for its ability to investigate micro-scale details and provide constitutive laws for higher level methods such as Two-Fluid Modeling (TFM) [4] and Discrete Particle Modeling (DPM) [5]. These capabilities make DNS a hot spot in gas–solid flows simulation community in recent years [6]. Despite these advantages, DNS of a typical industry-scale even laboratory-scale system seems out of reach for conventional hard- and software. This is essentially caused by the requirement to resolve the local no-slip condition at gas–particle interfaces which requires substantially more resolution and thus computational power than for single-phase flows. Moreover, DNS of gas–solid flows is intrinsically computationally demanding as gas phase needs to be resolved below particle scale at least one order of magnitude to obtain accurate surface integrals. This means for a three-dimensional (3D) case, the number of gas grid nodes exceeds the number of solid particles at least by three orders of magnitude. For a system containing thousands to millions of solid particles, the requirement of gas resolution inevitably implies a huge gap to conventional computer capability.

For the aspect of posing no-slip condition at gas–solid interface in DNS, several solutions have been proposed in recent years in both grid-based and particle-based formulations. In the grid-based approach, the simulation domain is discretized into structured or unstructured grids where special skills such as applying semi-analytical expression based on a Stokes formulation to correct gas velocity adjacent to solid boundary [7], [8] or adaptive grids to make grids adhere to solid boundary [9] are needed to treat gas–solid interface. Although grid-based techniques have a solid theoretical foundation and relatively higher accuracy, the implicit solution of the Poisson equation and the dynamic adaptation of body-fitted grids consume a substantial part of the overall computational effort and may cause poor scalability. In particle-based methods such as Smoothed Particle Hydrodynamics [10] and Macro-scale Pseudo-particle Method [11], the simulation domain is discretized by Lagrange particles both for gas and solid phases (for solid phase these particles are termed “frozen” particles) and a mirror velocity is associated to both gas and “frozen” particles to prescribe consistent velocities at the respective interface boundaries [12]. Although particle-based methods are much easier to implement and can achieve good scalability, much more gas particles are needed to discretize the modeling domain to obtain comparable numerical accuracy as compared to grid-based schemes. Thus in most cases, the required number of gas particles is undoubtedly beyond present reach. In the past two decades, a popular numerical scheme named Lattice Boltzmann Method (LBM) [13] which is based on a simplified Boltzmann equation has attracted much attention for modeling of particle–fluid flows [14]. In combination with appropriate kinetic boundary conditions, its data locality and second-order accuracy provides at least comparable efficiency as compared to traditional grid-based methods [15]. At the same time, the explicit nature of this scheme allows faster modeling speed and improved parallel efficiency. In addition, Nobel and Torczynski [16] proposed the so-called Immersed Moving Boundary (IMB) condition with sub-grid resolution to confirm no-slip boundary condition at gas–solid interface. Such interface treatment was applied by Feng et al. [17] and Wang et al. [18] where stable and reasonable results are obtained.

For the computing power, driven by the insatiable demand of high-resolution real-time display in graphic rending, General Purpose Graphics Processing Units (GPGPU) have evolved into a highly parallel platform with tremendous computational performance and memory bandwidth for SIMD applications. Peak performance of the latest generation of GPGPU card can provide about 1.5 Teraflops in single precision which is significantly higher than conventional CPU chips [19]. Besides, the introduction of Compute Unified Device Architecture (CUDA) by Nvidia corporation in 2007 provides a rather convenient programming interface for developers who are familiar with C/C++ programming language with very low learning curve. Due to the characteristics that both collision and propagation are achieved locally for standard stencils, LBM is proven to be very suitable to GPGPU computing in fine-grained parallelism with attractive speedup up to about two orders of magnitude compared to mainstream CPUs [20]. Thus, LBM in conjunction with IMB implemented on GPGPU may offer an alternative possibility for 3D gas–solid DNS.

In this paper, we describe an efficient implementation of 3D gas–solid DNS with LBM on a single Fermi GPGPU card since it is the starting point of high performance parallel computing on multiple GPGPUs. The LBM scheme for gas flow and Discrete Element Modeling (DEM) for solid particle collision will be briefly described first, together with the IMB treatment of no-slip gas–particle interface. Next, the detailed implementation of each portion on GPGPU will be discussed with emphasis on optimization issues. Validation and performance are discussed quantitatively. A 3D doubly-periodic gas–solid suspension with solid/gas density ratio over 1000 shows the potential of applying this efficient algorithm to both scientific and engineering investigations.

Section snippets

Lattice Boltzmann Method BGK D3Q19 scheme

Among different LBM schemes such as TRT [21] and MRT [22], [23], the D3Q19 scheme proposed by Qian et al. [24] is chosen since it gives the best balance between simulation accuracy and cost for gas modeling [25]. Though D3Q19 is known for its anisotropy effects [26] at intermediate or high Reynolds number, it still can be adopted in this study since particle Reynolds number is O(1). If gas–solid DNS enters the high Reynolds number regime, more complex stencils (e.g. D3Q27) could be employed but

GPGPU implementation

Before we discuss the details of how to carry out 3D gas–solid DNS on Fermi GPGPU card, it is necessary to have a brief overview of the hardware configuration and CUDA environment. Taking the Fermi GPGPU card—Tesla C2050 as an example, the chip has 14 multiprocessors with 32 processors each, summing up to 448. These are generalized floating-point cores operating on integer and float-point types. These cores are clocked at 1.15 GHz, giving the Tesla C2050 2 × 1.15 × 448 ≈ 1TFLOPS if two floating-point

Validation

The accuracy of LBM + IMB was intensively investigated by Strack and Cook [34] and Xiong et al. [36], which demonstrated that its convergence is very good from low to moderate particle Reynolds number if ten to twenty grid nodes are chosen for particle discretization. Here, to test the correctness of the code and assess to which extent single-precision influences ultimate results, a scenario that a spherical particle sinking in quiescent liquid within a container is simulated with both single-

Application

In order to demonstrate the applicability of the code to relevant gas–solid studies, a 3D doubly-periodic suspension with a lattice size of 160 × 160 × 160 and 256 solid particles is directly simulated. Solid particles are initially uniformly placed in this domain and a random velocity is assigned to each particle. The gas phase is initially at rest and we apply an external body force to counter-balance solid gravity. The solid/gas density ratio is set to 1500/1.3. All physical parameters are

Conclusion and prospect

This article presents an improved version of GPGPU implementation of gas–solid DNS with LBM, DEM and IMB. The LBM for gas phase is to some extent realized on a GPGPU similar to that of single phase. Making propagation before collision gains performance as compared to that of collision before propagation. Register pressure is also taken special care of to avoid high latency of local memory by putting variables in the grid evolution kernel into shared memory as much as possible. A spherical

Acknowledgments

The authors gratefully acknowledge the financial support of National Nature Science Foundation of China under the Grants Nos. 20221603 and 2008BAF33B01, and the Chinese Academy of Sciences under the KGCX2-YW-124. Helpful suggestions on optimization skills from Dr. Peng Wang at Nvidia Corporation (China) and language aid from Dr. Cuong, V. Huynh and Dr. Sujith Sukumaran at Iowa State University, are also appreciated.

References (36)

W. Ge et al.
Meso-scale oriented simulation towards virtual process engineering (VPE) – the EMMS Paradigm
Chem Eng Sci
(2011)
Y. Tsuji et al.
Discrete particle simulation of two-dimensional fluidized bed
Powder Technol
(1993)
Z. Yu et al.
Direct numerical simulation of particulate flows with a fictitious domain method
Int J Multiph Flow
(2010)
S. Melchionna
Incorporation of smooth spherical bodies in the Lattice Boltzmann method
J Comput Phys
(2011)
H.H. Hu
Direct simulation of flows of solid–liquid mixtures
Int J Multiph Flow
(1996)
A.V. Potapov et al.
Liquid–solid flows using smoothed particle hydrodynamics and the discrete element method
Powder Technol
(2001)
Q. Xiong et al.
Direct numerical simulation of sub-grid structures in gas–solid flow – GPU implementation of macro-scale pseudo-particle modeling
Chem Eng Sci
(2010)
J.P. Morris et al.
Modeling low Reynolds number incompressible flows using SPH
J Comput Phys
(1997)
Z. Yu et al.
Lattice Boltzmann method for simulating particle–fluid interactions
Particuology
(2010)
S. Geller et al.
Benchmark computations based on lattice-Boltzmann, finite element and finite volume methods for laminar flows
Comput Fluids
(2006)

L. Wang et al.

Direct numerical simulation of particle–fluid systems by combining time-driven hard-sphere model and lattice Boltzmann method

Particuology

(2010)

R. Mei et al.

Lattice Boltzmann method for 3-D flows with curved boundary

J Comput Phys

(2000)

A.T. White et al.

Rotational invariance in the three-dimensional lattice Boltzmann method is dependent on the choice of lattice

J Comput Phys

(2011)

J.A. Anderson et al.

General purpose molecular dynamics simulations fully implemented on graphics processing unit

J Comput Phys

(2008)

J. Xu et al.

Quasi-realtime simulation of rotating drum using discrete element method with parallel GPU computing

Particuology

(2011)

M. Schönherr et al.

Multi-thread implementations of the lattice Boltzmann method on non-uniform grids for CPUs and GPUs

Comput Math Appl

(2011)

F. Kuznik et al.

LBM based flow simulation using GPU computing processor

Comput Math Appl

(2010)

C. Hori et al.

GPU-acceleration for moving particle semi-implicit method

Comput Fluids

(2011)

Cited by (43)

Multi-GPU lattice Boltzmann simulations of turbulent square duct flow at high Reynolds numbers
2023, Computers and Fluids
Based on the multi-GPU lattice Boltzmann method with the half-way bounce-back scheme, fully developed turbulent duct flows at the friction Reynolds numbers $R e_{τ}$ of 300, 600, 1,200, 1,500, 1,800, and 2,000 were simulated. The parallel performance of multi-GPU lattice Boltzmann simulations is up to 300.162 GLUPS using 1.57 billion grids with 384 GPUs. The simulated friction factor $f$ was consistent with other DNS and experiment results, as well as the Karman–Prandtl theoretical friction law, which verified a sufficient grid resolution $Δ^{+} \leq 3.3$ , and the LBGK model is stable for $Δ^{+} \leq 5$ at high Reynolds numbers. The secondary flows were successfully captured, and turbulence statistics of root-mean-square (r.m.s.) velocity and Reynolds stress were analyzed. The two-point velocity correlation functions and turbulent energy spectra at different positions showed that secondary flows in the near-corner region changed spatial turbulence distribution. Multi-GPU lattice Boltzmann simulations with large grid scales can deal with turbulent square duct flows at high Reynolds numbers and show promise for high-fidelity and scale-resolving fluid dynamics.
The lattice Boltzmann method for nearly incompressible flows
2021, Journal of Computational Physics
Citation Excerpt :
The accuracy of these LBE simulations supports the credibility of the DNS of turbulent flows carried out by using the LBE. The LBE algorithm implemented on general-purpose graphic processing units (GPGPUs) has achieved a great parallel efficiency [386,429,430]. The powerful combination of the LBE and GPGPUs has significantly extended our capability to simulate realistic flows, as shown above.
This review summarizes the rigorous mathematical theory behind the lattice Boltzmann equation (LBE). Relevant properties of the Boltzmann equation and a derivation of the LBE from the Boltzmann equation are presented. A summary of some important LBE models is provided. Focus is given to results from the numerical analysis of the LBE as a solver for the nearly incompressible Navier-Stokes equations with appropriate boundary conditions. A number of numerical results are provided to demonstrate the efficacy of the lattice Boltzmann method.
GPU-accelerated large eddy simulation of stirred tanks
2018, Chemical Engineering Science
A fast and economical solver, accelerated by the Graphics Process Units (GPU) of a single graphics card in a desktop computer, is developed for the simulation of stirred tanks, integrating Lattice Boltzmann Method (LBM), coherent Large Eddy Simulation (LES) and Immersed Boundary Method (IBM). The grid resolution can reach 13.8 million nodes in maximum, resolving considerable details of flow field in stirred tanks with only such a simple desktop computer. In the meantime, the computational speed is 1500-fold faster than that of the traditional transient simulation of Computational Fluid Dynamics (CFD) based on Navier-Stokes equations implemented on 16 CPU cores. It only takes less than two minutes to update the simulation of one impeller revolution, enabling the transient simulation of longer physical time of stirred tanks. We find that at least 4000 impeller revolutions in simulation is needed to achieve an obviously time-independent macro-instability frequency. With the coherent LES model and reasonable grid resolution, the fast solver is able to resolve more than 95% of total Turbulent Kinetic Energy (TKE) for highly turbulent region, reproducing the monotonic decrease of TKE along the radial direction, much better than the Smagorinsky model. The average velocity and TKE predicted by the fast solver are in better agreement with the experimental data in literature. The solver is therefore more suitable for the fast simulation of stirred tanks using only a desktop computer without the need of much finer grid resolution and supercomputers.
A scalable interface-resolved simulation of particle-laden flow using the lattice Boltzmann method
2017, Parallel Computing
We examine the scalable implementation of the lattice Boltzmann method (LBM) in the context of interface-resolved simulation of wall-bounded particle-laden flows. Three distinct aspects relevant to performance optimization of our lattice Boltzmann simulation are studied. First, we optimize the core sub-steps of LBM, the collision and the propagation (or streaming) sub-steps, by reviewing and implementing five different published algorithms to reduce memory loading and storing requirements to boost performance. For each, two different array storage formats are benchmarked to test effective cache utilization. Second, the vectorization of the multiple-relaxation-time collision model is discussed and our vectorized collision and propagation algorithm is presented. We find that careful use of Intel’s Advance Vector Extensions and appropriate array storage formats can significantly enhance performance. Third, in the presence of many finite-size, moving solid particles within the flow field, three different communication schemes are proposed and compared in order to optimize the treatment of fluid-solid interactions. These efforts together lead to a very efficient LBM simulation code for interface-resolved simulation of particle-laden flows. Overall, the optimized scalable code of particle-laden flow is a factor of 4.0-to-8.5 times faster than our previous implementation.
Numerical simulation of polygonal particles moving in incompressible viscous fluids
2017, Particuology
A two-dimensional coupled lattice Boltzmann immersed boundary discrete element method is introduced for the simulation of polygonal particles moving in incompressible viscous fluids. A collision model of polygonal particles is used in the discrete element method. Instead of a collision model of circular particles, the collision model used in our method can deal with particles of more complex shape and efficiently simulate the effects of shape on particle–particle and particle–wall interactions. For two particles falling under gravity, because of the edges and corners, different collision patterns for circular and polygonal particles are found in our simulations. The complex vortexes generated near the corners of polygonal particles affect the flow field and lead to a difference in particle motions between circular and polygonal particles. For multiple particles falling under gravity, the polygonal particles easily become stuck owing to their corners and edges, while circular particles slip along contact areas. The present method provides an efficient approach for understanding the effects of particle shape on the dynamics of non-circular particles in fluids.
Numerical sedimentation particle-size analysis using the Discrete Element Method
2015, Advances in Water Resources
Citation Excerpt :
Lagrangian particle tracking used in computational fluid dynamics (CFD) methods are based on multiphase continuum techniques, see [11], and are applied to analyze the behavior of sprays, small bubbles and dust particles; processes in which the contact interactions are negligible. The Immersed Boundary Method, see [12–14], simulates the fluid–particle interactions: fluid and structure are represented by Eulerian and Lagrangian coordinates, respectively. In turn, the Navier–Stokes equations are solved in a structured grid so that the effort needed to generate a body-fitted grid is avoided, and, at the same time, body resistant forces are obtained.
Sedimentation tests are widely used to determine the particle size distribution of a granular sample. In this work, the Discrete Element Method interacts with the simulation of flow using the well known one-way-coupling method, a computationally affordable approach for the time-consuming numerical simulation of the hydrometer, buoyancy and pipette sedimentation tests. These tests are used in the laboratory to determine the particle-size distribution of fine-grained aggregates.
Five samples with different particle-size distributions are modeled by about six million rigid spheres projected on two-dimensions, with diameters ranging from $2.5 \times 10^{- 6}$ m to $70 \times 10^{- 6}$ m, forming a water suspension in a sedimentation cylinder. DEM simulates the particle’s movement considering laminar flow interactions of buoyant, drag and lubrication forces. The simulation provides the temporal/spatial distributions of densities and concentrations of the suspension. The numerical simulations cannot replace the laboratory tests since they need the final granulometry as initial data, but, as the results show, these simulations can identify the strong and weak points of each method and eventually recommend useful variations and draw conclusions on their validity, aspects very difficult to achieve in the laboratory.

View all citing articles on Scopus

View full text

Technical noteEfficient 3D DNS of gas–solid flows on Fermi GPGPU

Abstract

Highlights

Introduction

Section snippets

Lattice Boltzmann Method BGK D3Q19 scheme

GPGPU implementation

Validation

Application

Conclusion and prospect

Acknowledgments

Chem Eng Sci

Powder Technol

Int J Multiph Flow

J Comput Phys

Int J Multiph Flow

Powder Technol

Chem Eng Sci

J Comput Phys

Particuology

Comput Fluids

Particuology

J Comput Phys

J Comput Phys

J Comput Phys

Particuology

Comput Math Appl

Comput Math Appl

Comput Fluids

Technical note
Efficient 3D DNS of gas–solid flows on Fermi GPGPU