Elsevier

Computers & Fluids

Volume 70, 30 November 2012, Pages 86-94
Computers & Fluids

Technical note
Efficient 3D DNS of gas–solid flows on Fermi GPGPU

https://doi.org/10.1016/j.compfluid.2012.08.026Get rights and content

Abstract

Three-dimensional (3D) gas–solid Direct Numerical Simulation (DNS) requires huge computational resources which imposes a great challenge to both current hardware and software conditions. In this article, an efficient implementation of 3D gas–solid DNS with Lattice Boltzmann Method and Discrete Element Method is developed on a Fermi GPGPU. An Immersed Moving Boundary approach is utilized to impose the no-slip condition at particle–fluid interfaces. Optimization strategies such as changing the sequence of collision and propagation in grid evolution and making multiple kernel executing concurrently are discussed in detail. This algorithm is demonstrated to be competitive both in terms of accuracy and performance. Approximately 131 Millions of Lattices Update Per Second have been achieved, indicating that this GPGPU implementation is very suitable for 3D gas–solid DNS.

Highlights

► Exchange grid collision and propagation sequence to improve modeling speed. ► Let multiple kernels executing concurrently to alleviate total speed by about 4%. ► Introduce a rather simple control volume calculation method in IMB. ► About 27-fold speedup has achieved for Tesla C2050 to Intel Core i5.

Introduction

Gas–solid flows, which are frequently encountered in process engineering, have received extensive attention during the last century. However, theoretical and experimental work still face fundamental challenges due to the inherent non-linear non-equilibrium multi-scale characteristics [1] in such complex multi-phase systems. At the same time, computational modeling has emerged as the third pillar in scientific studies for its controllable, non-intrusive and easy-to-implement features. Thus, computational modeling has been employed in a wide range of fields, including gas–solid flows [2], [3].

In gas–solid simulations, Direct Numerical Simulation (DNS) plays an indispensable role for its ability to investigate micro-scale details and provide constitutive laws for higher level methods such as Two-Fluid Modeling (TFM) [4] and Discrete Particle Modeling (DPM) [5]. These capabilities make DNS a hot spot in gas–solid flows simulation community in recent years [6]. Despite these advantages, DNS of a typical industry-scale even laboratory-scale system seems out of reach for conventional hard- and software. This is essentially caused by the requirement to resolve the local no-slip condition at gas–particle interfaces which requires substantially more resolution and thus computational power than for single-phase flows. Moreover, DNS of gas–solid flows is intrinsically computationally demanding as gas phase needs to be resolved below particle scale at least one order of magnitude to obtain accurate surface integrals. This means for a three-dimensional (3D) case, the number of gas grid nodes exceeds the number of solid particles at least by three orders of magnitude. For a system containing thousands to millions of solid particles, the requirement of gas resolution inevitably implies a huge gap to conventional computer capability.

For the aspect of posing no-slip condition at gas–solid interface in DNS, several solutions have been proposed in recent years in both grid-based and particle-based formulations. In the grid-based approach, the simulation domain is discretized into structured or unstructured grids where special skills such as applying semi-analytical expression based on a Stokes formulation to correct gas velocity adjacent to solid boundary [7], [8] or adaptive grids to make grids adhere to solid boundary [9] are needed to treat gas–solid interface. Although grid-based techniques have a solid theoretical foundation and relatively higher accuracy, the implicit solution of the Poisson equation and the dynamic adaptation of body-fitted grids consume a substantial part of the overall computational effort and may cause poor scalability. In particle-based methods such as Smoothed Particle Hydrodynamics [10] and Macro-scale Pseudo-particle Method [11], the simulation domain is discretized by Lagrange particles both for gas and solid phases (for solid phase these particles are termed “frozen” particles) and a mirror velocity is associated to both gas and “frozen” particles to prescribe consistent velocities at the respective interface boundaries [12]. Although particle-based methods are much easier to implement and can achieve good scalability, much more gas particles are needed to discretize the modeling domain to obtain comparable numerical accuracy as compared to grid-based schemes. Thus in most cases, the required number of gas particles is undoubtedly beyond present reach. In the past two decades, a popular numerical scheme named Lattice Boltzmann Method (LBM) [13] which is based on a simplified Boltzmann equation has attracted much attention for modeling of particle–fluid flows [14]. In combination with appropriate kinetic boundary conditions, its data locality and second-order accuracy provides at least comparable efficiency as compared to traditional grid-based methods [15]. At the same time, the explicit nature of this scheme allows faster modeling speed and improved parallel efficiency. In addition, Nobel and Torczynski [16] proposed the so-called Immersed Moving Boundary (IMB) condition with sub-grid resolution to confirm no-slip boundary condition at gas–solid interface. Such interface treatment was applied by Feng et al. [17] and Wang et al. [18] where stable and reasonable results are obtained.

For the computing power, driven by the insatiable demand of high-resolution real-time display in graphic rending, General Purpose Graphics Processing Units (GPGPU) have evolved into a highly parallel platform with tremendous computational performance and memory bandwidth for SIMD applications. Peak performance of the latest generation of GPGPU card can provide about 1.5 Teraflops in single precision which is significantly higher than conventional CPU chips [19]. Besides, the introduction of Compute Unified Device Architecture (CUDA) by Nvidia corporation in 2007 provides a rather convenient programming interface for developers who are familiar with C/C++ programming language with very low learning curve. Due to the characteristics that both collision and propagation are achieved locally for standard stencils, LBM is proven to be very suitable to GPGPU computing in fine-grained parallelism with attractive speedup up to about two orders of magnitude compared to mainstream CPUs [20]. Thus, LBM in conjunction with IMB implemented on GPGPU may offer an alternative possibility for 3D gas–solid DNS.

In this paper, we describe an efficient implementation of 3D gas–solid DNS with LBM on a single Fermi GPGPU card since it is the starting point of high performance parallel computing on multiple GPGPUs. The LBM scheme for gas flow and Discrete Element Modeling (DEM) for solid particle collision will be briefly described first, together with the IMB treatment of no-slip gas–particle interface. Next, the detailed implementation of each portion on GPGPU will be discussed with emphasis on optimization issues. Validation and performance are discussed quantitatively. A 3D doubly-periodic gas–solid suspension with solid/gas density ratio over 1000 shows the potential of applying this efficient algorithm to both scientific and engineering investigations.

Section snippets

Lattice Boltzmann Method BGK D3Q19 scheme

Among different LBM schemes such as TRT [21] and MRT [22], [23], the D3Q19 scheme proposed by Qian et al. [24] is chosen since it gives the best balance between simulation accuracy and cost for gas modeling [25]. Though D3Q19 is known for its anisotropy effects [26] at intermediate or high Reynolds number, it still can be adopted in this study since particle Reynolds number is O(1). If gas–solid DNS enters the high Reynolds number regime, more complex stencils (e.g. D3Q27) could be employed but

GPGPU implementation

Before we discuss the details of how to carry out 3D gas–solid DNS on Fermi GPGPU card, it is necessary to have a brief overview of the hardware configuration and CUDA environment. Taking the Fermi GPGPU card—Tesla C2050 as an example, the chip has 14 multiprocessors with 32 processors each, summing up to 448. These are generalized floating-point cores operating on integer and float-point types. These cores are clocked at 1.15 GHz, giving the Tesla C2050 2 × 1.15 × 448  1TFLOPS if two floating-point

Validation

The accuracy of LBM + IMB was intensively investigated by Strack and Cook [34] and Xiong et al. [36], which demonstrated that its convergence is very good from low to moderate particle Reynolds number if ten to twenty grid nodes are chosen for particle discretization. Here, to test the correctness of the code and assess to which extent single-precision influences ultimate results, a scenario that a spherical particle sinking in quiescent liquid within a container is simulated with both single-

Application

In order to demonstrate the applicability of the code to relevant gas–solid studies, a 3D doubly-periodic suspension with a lattice size of 160 × 160 × 160 and 256 solid particles is directly simulated. Solid particles are initially uniformly placed in this domain and a random velocity is assigned to each particle. The gas phase is initially at rest and we apply an external body force to counter-balance solid gravity. The solid/gas density ratio is set to 1500/1.3. All physical parameters are

Conclusion and prospect

This article presents an improved version of GPGPU implementation of gas–solid DNS with LBM, DEM and IMB. The LBM for gas phase is to some extent realized on a GPGPU similar to that of single phase. Making propagation before collision gains performance as compared to that of collision before propagation. Register pressure is also taken special care of to avoid high latency of local memory by putting variables in the grid evolution kernel into shared memory as much as possible. A spherical

Acknowledgments

The authors gratefully acknowledge the financial support of National Nature Science Foundation of China under the Grants Nos. 20221603 and 2008BAF33B01, and the Chinese Academy of Sciences under the KGCX2-YW-124. Helpful suggestions on optimization skills from Dr. Peng Wang at Nvidia Corporation (China) and language aid from Dr. Cuong, V. Huynh and Dr. Sujith Sukumaran at Iowa State University, are also appreciated.

References (36)

Cited by (43)

  • The lattice Boltzmann method for nearly incompressible flows

    2021, Journal of Computational Physics
    Citation Excerpt :

    The accuracy of these LBE simulations supports the credibility of the DNS of turbulent flows carried out by using the LBE. The LBE algorithm implemented on general-purpose graphic processing units (GPGPUs) has achieved a great parallel efficiency [386,429,430]. The powerful combination of the LBE and GPGPUs has significantly extended our capability to simulate realistic flows, as shown above.

  • Numerical sedimentation particle-size analysis using the Discrete Element Method

    2015, Advances in Water Resources
    Citation Excerpt :

    Lagrangian particle tracking used in computational fluid dynamics (CFD) methods are based on multiphase continuum techniques, see [11], and are applied to analyze the behavior of sprays, small bubbles and dust particles; processes in which the contact interactions are negligible. The Immersed Boundary Method, see [12–14], simulates the fluid–particle interactions: fluid and structure are represented by Eulerian and Lagrangian coordinates, respectively. In turn, the Navier–Stokes equations are solved in a structured grid so that the effort needed to generate a body-fitted grid is avoided, and, at the same time, body resistant forces are obtained.

View all citing articles on Scopus
View full text