Scientific multi-agent reinforcement learning for wall-models of turbulent flows

Bae, H. Jane; Koumoutsakos, Petros

doi:10.1038/s41467-022-28957-7

Download PDF

Article
Open access
Published: 17 March 2022

Scientific multi-agent reinforcement learning for wall-models of turbulent flows

Nature Communications volume 13, Article number: 1443 (2022) Cite this article

10k Accesses
52 Citations
102 Altmetric
Metrics details

Subjects

Abstract

The predictive capabilities of turbulent flow simulations, critical for aerodynamic design and weather prediction, hinge on the choice of turbulence models. The abundance of data from experiments and simulations and the advent of machine learning have provided a boost to turbulence modeling efforts. However, simulations of turbulent flows remain hindered by the inability of heuristics and supervised learning to model the near-wall dynamics. We address this challenge by introducing scientific multi-agent reinforcement learning (SciMARL) for the discovery of wall models for large-eddy simulations (LES). In SciMARL, discretization points act also as cooperating agents that learn to supply the LES closure model. The agents self-learn using limited data and generalize to extreme Reynolds numbers and previously unseen geometries. The present simulations reduce by several orders of magnitude the computational cost over fully-resolved simulations while reproducing key flow quantities. We believe that SciMARL creates unprecedented capabilities for the simulation of turbulent flows.

Automating turbulence modelling by multi-agent reinforcement learning

Article 04 January 2021

Guido Novati, Hugues Lascombes de Laroussilhe & Petros Koumoutsakos

Machine learning for flow-informed aerodynamic control in turbulent wind conditions

Article Open access 16 December 2022

Peter I. Renn & Morteza Gharib

Predictive scale-bridging simulations through active learning

Article Open access 27 September 2023

Satish Karra, Mohamed Mehana, … Hari S. Viswanathan

Introduction

Simulations of wall-bounded turbulent flows have become a key element in the design cycle of wind farms¹ and aircraft² and the major factor in the predictive capabilities of simulations of atmospheric flows³. Due to the high Reynolds numbers associated with these flows, direct numerical simulations (DNS), where all scales of motion are resolved, are not attainable with current computing capabilities. LES aims to reduce the necessary grid requirements by resolving only the energy-containing eddies and modeling the smaller scale motions. However, this requirement is still hard to meet in the near-wall region, as the stress-producing eddies become progressively smaller, scaling linearly in size with the distance to the wall. Several studies^4,5,6 have estimated that the number of grid points necessary for wall-resolved LES scales as ${{{{{{{\mathcal{O}}}}}}}}(R{e}^{13/7})$, where Re is the characteristic Reynolds number of the flow. This number of computational elements is orders of magnitude smaller than that required for DNS, yet it remains prohibitive. In turn, modeling the near-wall flow such that only the large-scale motions in the outer region of the boundary layer are resolved, the grid-point requirement for wall-modeled LES (WMLES) scale at most as ${{{{{{{\mathcal{O}}}}}}}}(Re)$. With WMLES, certification by analysis—prediction of the aerodynamic quantities of interest for engineering applications by numerical simulations alone—may soon be a reality. Certification by analysis is expected to narrow the number of wind tunnel experiments, reducing both the turnover time and cost of the design cycle.

Several strategies for modeling the near-wall region have been explored^7,8,9,10. The taxonomy of WMLES approaches can be broadly categorized as Hybrid LES/RANS methods and wall-flux modeling. Hybrid LES/RANS and its variants⁸ combine Reynolds-averaged Navier–Stokes (RANS) equations close to the wall and LES in the outer layer, with the interface between RANS and LES domains enforced implicitly through the change in the turbulence model. In wall-flux modeling, the usual no-slip and thermal wall boundary conditions are replaced with stress and heat flux boundary conditions provided by the wall model. Examples of well-known approaches involve computing the wall stress using either the law of the wall^11,12,13 or the RANS equations^{14,15,16,17,18,19,20}. Models account for nonlinear advection and pressure gradient by solving the unsteady three-dimensional RANS equations^15,17 or accounting only for the wall-normal diffusion reducing the computational requirements to the solution of a system of ordinary differential equations^19,20.

The main impediments of the above-mentioned models are that they rely on RANS parametrization that requires the use of a priori empirical coefficients calibrated for a particular flow state (usually fully developed turbulence in equilibrium over a flat plate). Such wall models do not function as intended in real-world applications, where various flow states coexist (e.g., separated flows, flow over roughness, predicting transition, etc.)⁷. The use of RANS parametrization for wall modeling was challenged with a dynamic wall model that is free of a priori specified coefficients at a negligible additional cost^21,22. The two approaches were formulated by requiring consistency between the filtered velocity field at the wall and a differential filter kernel.

Dynamic wall models provide encouraging results, but they also face significant challenges. They are robust for changes in Reynolds number and grid resolution but sensitive to numerical methods employed in the flow solver and the choice of the subgrid-scale (SGS) model. This is attributed to the dominance of numerical errors close to the wall that in turn affect the evaluation of the necessary wall-model constants²³. Furthermore, the methodology has only been exploited specifically for structured, incompressible flow solvers, with limited applicability for compressible flows or complex geometries.

The essential requirements for a successful dynamic wall model are that it (i) accommodates diverse flow solvers and SGS models and (ii) generalizes beyond their calibration flow fields. Recent advances in machine learning and data science aim to address these issues and complement the existing turbulence modeling approaches. To date, most efforts have focused on the application of supervised learning to SGS modeling^{24,25,26,27,28,29,30} and wall modeling^31,32,33. However, despite the demonstrated promise, these methods encounter difficulties in generalizing beyond the distributions of the training data. In supervised learning, the parameters of the neural network are commonly derived by minimizing the model prediction error, which is often based on single-step target values to limit computational challenge. Therefore, it is necessary to differentiate between a priori and a posteriori testing. The first measures the accuracy of the supervised learning model in predicting the target values on a database of reference simulations, typically obtained via DNS. A posteriori testing is performed after training, by integrating in time the Navier–Stokes equations along with trained supervised learning closure and comparing the obtained statistical quantities to that of DNS or other reference data. Due to the single-step cost function, the resultant neural network model is not trained to compensate for the systematic discrepancies between DNS and LES (or WMLES) and the compounding errors. The issue of ill-conditioning of data-driven SGS models has been exposed by studies that perform a posteriori testing^27,34,35,36. Wall models are more sensitive than SGS models²², and we expect the compounding of errors to play a more detrimental role in WMLES.

Here, we propose SciMARL for the development of wall models in LES. Reinforcement learning (RL) identifies optimal strategies for agents that perform actions, contingent on their information about the environment, and measures their performance via scalar reward functions. In this work, the agents correspond with the computational elements and their actions compensate both for the closure terms and errors associated with the numerics of the flow solver. RL is a semi-supervised learning framework with foundations on dynamic programming³⁷ and a broad range of applications ranging from robotics^38,39, games^40,41, and more recently flow control^{39,42,43,44,45}. We note that SciMARL has been used in fluid mechanics only recently for the development of SGS models in LES of homogeneous turbulent flow⁴⁶.

In the case of WMLES, the performance of the SciMARL can be measured by comparing the statistical properties of the simulation to those of reference data such as the wall-shear stress. SciMARL is a semi-supervised learning algorithm that requires information about the flow formulated in terms of a reward rather than detailed spatiotemporal data as in the case of supervised learning. In the case of wall modeling, SciMARL does not rely on a priori knowledge of the log-law coefficients but rather aims to discover active closure policies according to patterns in the flow physics captured by the filtered equations. The respective wall models are robust with respect to the numerical discretizations, as these errors are taken into consideration in the training process. Furthermore, the model discovery method can be readily extended to complex geometries and different flow configurations, such as flow over rough surfaces and stratified and compressible boundary layers.

Results

Multi-agent reinforcement learning for wall modeling

In RL, the agent interacts with its environment by sampling its states (s), performing actions (a), and receiving rewards (r). At each time step, the agent performs the action and the system is advanced in time before the agent can observe its new state, receive a scalar reward, and choose a new action. The agent infers a policy π(s, a) through its repeated interactions with the environment to maximize its long-term rewards. The optimal policy π^*(s, a) is found by maximizing the expected utility, which is given by the expected cumulative reward. Throughout the paper, x, y, and z denote the streamwise, wall-normal, and spanwise directions, respectively. The corresponding velocity components are u, v, and w. RL agents are distributed evenly on each channel wall with each agent located at (x, z) receiving local states s_n(x, z) and rewards r_n(x, z) and providing local actions a_n(x, z) at each time step t_n. A single policy is maintained and updated by the multiple agents present in the domain (Fig. 1).

**Fig. 1: Diagram of the SciMARL setup.**

In order for the RL to be universally applicable for a wide range of flow parameters, the states are nondimensionalized using viscosity ν and the modeled instantaneous friction velocity

$${u}_{\tau }^{m}(x,z,t)={\left(\frac{{\tau }_{w}^{m}(x,z,t)}{\rho }\right)}^{1/2},$$

(1)

where ${\tau }_{w}^{m}$ is the modeled wall-shear stress and ρ is the density. These quantities are only dependent on the output of the wall model and can be obtained without any prior knowledge of the flow. This non-dimensionalization is noted by the superscript * and is distinct from the one by the true friction velocity u_τ, noted by the superscript + , which will be used for the assessment of model performance. The goal of the wall model is to predict the correct wall-shear stress τ_w, and thus the u_τ, which will allow for good predictions of quantities such as the mean velocity profile and turbulence intensities⁴⁷.

Velocity-based wall model

We first train the model to adapt to the variation of the velocity with the wall-normal height, which has a universal behavior in the log region. We set as states s_n(x, z) the instantaneous velocity u^*(x, h^m, z, t_n), the wall-normal derivative ∂u^*/∂y^*(x, h^m, z, t_n), and the wall-normal location ${y}^{* }={({h}^{m})}^{* }$ of the sampling point. Agents act to adjust the wall-shear stress through a multiplication factor a_n(x, z) ∈ [0.9, 1.1] such that ${\tau }_{w}^{m}(x,z,{t}_{n+1})={a}_{n}(x,z){\tau }_{w}^{m}(x,z,{t}_{n})$. This choice does not require the model to produce the exact wall-shear stress (which is dependent on Reynolds number), but rather proposes an action that adjusts the wall-shear stress. The reward (see “Methods” for definition) is also incremental and proportional to the improvement in the prediction of the wall-shear stress compared to the one obtained in the previous time step. The agent behavior is rendered stable by providing additional reward if the predicted wall-shear stress is within 1% of the true value.

Log-law-based wall model

The second model is based on the existence of a logarithmic (log) layer in the near-wall region of turbulent flows, present in all flows with an inner–outer scale separation⁴⁸. In the log layer, the velocity profile is expressed as:

$${u}^{+}=\frac{1}{\kappa }\log {y}^{+}+B,$$

(2)

where κ is the von Karman constant and B is the intercept constant. The exact value of κ and B depends on the flow configurations and wall roughness; however, for the current study, we use values attributed to a canonical smooth zero-pressure-gradient boundary layer. The states for the second model are the local instantaneous coefficients for the log law κ^m and B^m, computed from the instantaneous velocity, velocity gradient, and wall-normal location information. We emphasize that this model does not take as input the a priori known values of κ and B from the log law, but rather derived quantities from the instantaneous flow. This has an advantage over the first model in that the values do not depend on the value of y^* and thus the model can learn the log-law behavior for y^* outside the range of values it trained on. This allows the model to be extended to higher Reynolds numbers or coarser grids more readily. The actions and rewards are the same as the first model.

State-action map

We inquire into the learned models by examining the state-action map conditioned to positive rewards for the channel flow at friction Reynolds number Re_τ = 2000, 4200, 8000. As seen in Fig. 2a, the velocity-based wall model (VWM) has distinguished states (y^*, u^*) with distinct actions corresponding to positive rewards. The model is able to up/down-shift the wall-shear stress based on whether the (y^*, u^*) pair is located above or below the log-law profile. The model initially does not have any prior knowledge of the log-law coefficients, yet it is able to learn to adjust the wall-shear stress through the RL process. However, because the model is trained on a limited range of ${({h}^{m})}^{+}$ in the training set, the extrapolation of this behavior to much larger values of ${({h}^{m})}^{+}$ may be challenging. This can be alleviated by refining the grid in the wall-normal direction with N_y~Re^4,5,6.

The log-law-based wall model (LLWM) similarly has distinct states with different actions corresponding to positive rewards (see Fig. 2b). The main mechanism for controlling the wall-shear stress is similar to the VWM, with the wall-shear stress being up/down-shifted based on whether the point corresponding to the slope and intercept of 1/κ^m and B^m are under/over-predicting the log law. Depending on the wall-normal location of h^m, the classification of whether the point is above or below the log law may vary, especially for points farther away from the origin. However, the majority of the states are centered around the true value of 1/κ and B, and the mechanism will work as expected.

Testing: turbulent channel flow

We examine the model predictions on turbulent channel flow for Reynolds numbers in the range from 5200 to 10⁶ (Fig. 3). In the case of VWM, we expect that as long as ${({h}^{m})}^{+}$ is within the range observed during the training process ($150 \; < \; {({h}^{m})}^{+} < \; 1200$), the model will perform as expected. Cases at Re_τ = 2 × 10⁴ and 5 × 10⁴ produce high errors as the ${({h}^{m})}^{+}$ is not within the trained range. Once the values of ${({h}^{m})}^{+}$ are adjusted to be within the range by refining the grid, the errors decrease significantly. This entails refining the grid for higher Reynolds numbers to allow the first grid point off the wall to be within the trained range of ${({h}^{m})}^{+}$. In the case of LLWM, we observe that the prediction error in the friction velocity is less than 4% while the mean velocity profiles are well-aligned with the log law regardless of the value of ${({h}^{m})}^{+}$. The error increases with Reynolds number, most likely due to the high variation of the streamwise wall-normal gradient with increasing Reynolds number as well as the departure of ${({h}^{m})}^{+}$ from the trained range of values. Still, the results are comparable to the results obtained from the widely used equilibrium wall model (EQWM) up to Re_τ ≈ 10⁵, which uses an empirical coefficient tuned for this particular flow configuration. This range of Reynolds numbers is sufficient for various external aerodynamic and geophysical flows. The predicted mean velocity profiles for both models are shown in Fig. 4.

**Fig. 3: Errors in the friction velocities.**

**Fig. 4: Predicted mean velocity profiles for turbulent channel flow.**

Testing: spatially evolving turbulent boundary layer

The predictive performance of the LLWM is assessed in a zero-pressure-gradient flat-plate turbulent boundary layer. The simulation ranges from Re_θ = 1000–7000, where Re_θ is the Reynolds number based on the momentum thickness.

The modeled skin friction coefficient ${C}_{f}^{m}={\tau }_{w}^{m}/(\rho {U}_{\infty }^{2}/2)$ for the full simulation domain is comparable to the C_f from the empirical values⁴⁹ (Fig. 5a). This shows that the model is capable of adapting to variations of wall-shear stress in the streamwise direction, even when it was only trained on a channel-flow simulation.

**Fig. 5: Predicted friction coefficients for the turbulent boundary layer.**

Distribution of wall-shear stress

A growing body of studies in wall-bounded turbulence has shown that the generation of wall-shear stress fluctuations is directly connected with outer-layer large-scale motions^50,51. This observation supports the idea that the log-layer flow contains the information necessary to predict not only the mean wall-shear stress but also the fluctuations. However, in deterministic wall models such as the EQWM, the wall-shear stress is perfectly correlated with the velocity at the sampling location^52,53, as opposed to a correlation coefficient of 0.3 observed in DNS⁵⁰. This can be observed in Figs. 6a and 7, where the wall-shear stress predicted by the EQWM is perfectly correlated with the velocity fluctuations at the sampling location h_wm. On the other hand, LLWM results in a smaller correlation between the velocity at an off-wall location and the wall-shear stress (Figs. 6b and 7) with a maximum correlation of ~0.3, which matches the expected correlation from DNS.

**Fig. 6: Comparison of the instantaneous off-wall streamwise velocity and wall-shear stress.**

**Fig. 7: Correlation of instantaneous off-wall streamwise velocity and wall-shear stress.**

Potential of SciMARL wall models

We demonstrate that the SciMARL wall models perform as well as the RANS-based EQWM, which has been tuned for this particular flow configuration. The SciMARL wall model is able to achieve these results by training on moderate Reynolds number flows with a reward function only based on the mean wall-shear stress rather. Moreover, RL models are trained in-situ with WMLES and do not require any DNS simulation data. This is in contrast to supervised learning methods, where a vast amount of data need to be generated using high-fidelity DNS simulations to proceed with the learning process. For example, in the case of a moderate Reynolds number channel flow (Re_τ = 4200), LLWM can be trained using O(10³) CPU-hours with less than 1 GB of storage. For supervised learning, generating the DNS data will require O(10⁷) CPU-Hours with more than 100 TB storage. DNS databases might be already available for canonical cases such as channel flow, but it would be more difficult to obtain for cases regarding wall roughness or adverse pressure gradients, where wall models will be more useful. The additional overhead for generating data for supervised learning makes it less practical for real-world applications of wall modeling.

The LLWM is easy to extend to complex geometries and flow simulations utilizing different numerical methods or SGS models, as it only takes as states the instantaneous streamwise (or wall-parallel) velocity, its wall-normal gradient, and the distance from the wall. These quantities do not depend heavily on numerics or SGS models, unlike filtered velocities or eddy viscosity values required in the dynamic model²². Thus, the model can be used in a wide range of simulations, much like the EQWM, but without prescribed tunable parameters. Furthermore, the RL framework can be extended to various flow configurations by adding an additional dimension to the state vector. Since all flow with an inner–outer scale separation exhibit a log law⁴⁸ in the overlap region, the current configuration for wall-model development can be extended to flows exhibiting roughness, stratified flows, compressible flows, among many others. These flows usually have different log-law coefficients κ and B that are manually tuned for existing wall models. However, in this work, these values are adjusted automatically using a SciMARL-based model.” This gives the LLWM a distinct advantage over existing models. For example, in cases with varying pressure gradients over the simulation domain, traditional methods will have to assign different model parameters for each location containing different pressure gradients. In contrast, the SciMARL model can smoothly transition between various pressure-gradient effects with a single policy trained from various canonical cases when the parameters such as pressure and velocity gradients are included as a state. A similar argument can be applied to simulations with varying levels of stratification or compressible effects within a simulation domain. In addition, the evaluation of the LLWM involves evaluating the weights of the trained neural net, which is an order of magnitude faster than the EQWM that solves an ODE at each time step.

Discussion

We have introduced a potent method for the automated discovery of closures in simulations of wall-bounded turbulent flows that uses limited data by fusing scientific computing and multi-agent reinforcement learning (SciMARL). In this method, we solve the filtered Navier–Stokes equations using LES and develop a wall model as a control policy enacted by cooperating agents using the recovery of the correct mean wall-shear stress as a reward. SciMARL requires limited data in contrast to supervised learning methods. The training was performed using LES of a turbulent channel flow at moderate Reynolds numbers (Re_τ = 2000, 4200, and 8000). Remarkably, the method generalizes on LES of a turbulent boundary layer and turbulent channel flow at extreme Reynolds numbers.

We examine the robustness of the method by studying two models (VWM and LLWM) with different state spaces. In the VWM, the state space comprises the streamwise velocity and its wall-normal derivatives. This model adjusts the wall-shear stress based on the discrepancy of the velocity profile from the log law. The model captures the mean velocity profile for a wide range of Reynolds numbers when the wall-normal location of the sampling point is within the training set. Alternatively in the LLWM, the state space is based on the instantaneous log-law coefficients. This model generalizes to a broader set of grid resolutions and Reynolds numbers than the VWM. Moreover, despite training in turbulent channel flows we find that the LLWM generalizes to spatially evolving turbulent boundary layer and it recovers the correct skin friction coefficient at a fraction of the cost of high-fidelity simulations.

We note that the LLWM produces correlations between the predicted wall-shear stress and the off-wall velocity profile that are similar to fully resolved flow. This is in contrast to the correlations obtained by the classical RANS models. This implies that the policy of the LLWM replicates the natural mechanisms of wall-shear stress control that can be obtained so far only through highly resolved simulations. Furthermore, as the model only requires instantaneous flow information at one off-wall location, it could be extended to more complex geometries and different numerical methods without additional modifications.

We anticipate that the model can be easily expanded for all wall-bounded flows that exhibit a log law through an inner–outer scale separation⁴⁸. We envision that when SciMARL is trained over a wide range of flows, the model will also acquire experiences for the key flow patterns that are omnipresent in the fundamental physics of flows in complex configurations. This advance will present a paradigm shift in wall-model development for LES in the prediction and control of industrial aerodynamics and environmental flows.

Methods

Reinforcement learning

Learning is performed through the open-source RL library smarties⁵⁴. The library leverages efficiently the computing resources by separating the task of updating the policy parameters from the task of collecting interaction data. The flow simulations are distributed across workers who collect, for each agent, experiences organized into episodes,

$${E}_{i}={\{{s}_{n}^{(i)},{r}_{n}^{(i)},{\mu }_{n}^{(i)},{\sigma }_{n}^{(i)},{a}_{n}^{(i)}\}}_{n = 0,...,N},$$

where n tracks in-episode RL steps, μ and σ are the statistics of the Gaussian policy used to sample a, and t_N is the final time step for each episode. When a simulation concludes, the worker sends one episode per agent to the central learning process (master) and receives updated policy parameters. The master stores the episodes to a replay memory (RM), which is sampled to update the policy parameters according to Remember-and-Forget Experience Replay (ReF-ER)⁵⁴. ReF-ER is combined with an off-policy actor-critic algorithm V-RACER which supports continuous state and action spaces.

V-RACER trains a neural network defined by weights w which, given input state s, outputs the mean μ^w(s) and standard deviation σ^w(s) of the policy π^w and a state-value estimate v^w(s). The statistics μ^w and σ^w are improved through the policy gradient estimator

$${g}_{\pi }({\mathtt{w}})=\; {\mathbb{E}}\bigg[\left.{g}_{\pi ,n}({\mathtt{w}})\equiv \frac{{\pi }^{{\mathtt{w}}}({a}_{n}| {s}_{n})}{{{{{{{{\mathcal{P}}}}}}}}({a}_{n}| {\mu }_{n},\,{\sigma }_{n})}({\hat{q}}_{n}-{v}^{{\mathtt{w}}}({s}_{n})){\nabla }_{{\mathtt{w}}}\log {\pi }^{{\mathtt{w}}}({a}_{n}| {s}_{n})\right|\\ \{{s}_{n},{r}_{n},{\mu }_{n},{\sigma }_{n},{a}_{n}\} \sim \,{{\mbox{RM}}}\bigg],$$

(3)

where ${{{{{{{\mathcal{P}}}}}}}}({a}_{n}| {\mu }_{n},{\sigma }_{n})$ is the probability of sampling a_n from ${{{{{{{\mathcal{N}}}}}}}}({\mu }_{n},{\sigma }_{n})$, and ${\hat{q}}_{n}$ is an estimator of the action value which is computed recursively from a Retrace algorithm⁵⁵ as

$${\hat{q}}_{n}={r}_{n+1}+\gamma {v}^{{\mathtt{w}}}({s}_{n+1})+\gamma \min \left\{1,\frac{{\pi }^{{\mathtt{w}}}({a}_{n}| {s}_{n})}{{{{{{{{\mathcal{P}}}}}}}}({a}_{n}| {\mu }_{n},{\sigma }_{n})}\right\}({\hat{q}}_{n+1}-{v}^{{\mathtt{w}}}({s}_{n+1})),$$

(4)

where γ = 0.995 is the discount factor for rewards into the future. Retrace is also used to derive the gradients for the state-value estimate

$${g}_{v}({\mathtt{w}})= {\mathbb{E}} \bigg[\left.{g}_{v,n}({\mathtt{w}})\equiv \min \left\{1,\frac{{\pi }_{{\mathtt{w}}}({a}_{n}| {s}_{n})}{{{{{{{{\mathcal{P}}}}}}}}({a}_{n}| {\mu }_{n},\,{\sigma }_{n})}\right\}\left({\hat{q}}_{n}-{v}^{{\mathtt{w}}}({s}_{n})\right)\right|\\ \{{s}_{n},{r}_{n},{\mu }_{n},{\sigma }_{n},{a}_{n}\} \sim \,{{\mbox{RM}}}\,\bigg].$$

(5)

The expectations in Eqs. (3) and (5) are approximated by Monte Carlo sampling B observations from RM.

Due to the use of experience replay, V-RACER and similar algorithms become unstable if the policy diverges from the distribution of experiences in the RM. We circumvent this issue by using an importance weight ρ_t to classify whether an experience is “near-policy" or “far-policy" and clip the gradients computed from far-policy samples to zero⁵⁴. In ReF-ER, the gradients are computed as

$${\hat{g}}_{n}({\mathtt{w}})=\left\{\begin{array}{ll}\beta {g}_{n}({\mathtt{w}})-(1-\beta ){g}_{n}^{D}({\mathtt{w}}),\quad &\,{{\mbox{if}}}\,\,1/C \; < \; {\rho }_{t} \; < \; C\\ -(1-\beta ){g}_{n}^{D}({\mathtt{w}}),\quad &\,{{\mbox{otherwise}}}\,,\end{array}\right.$$

(6)

where ${\rho }_{t}={\pi }_{{\mathtt{w}}}({a}_{t}| {s}_{t})/{{{{{{{\mathcal{P}}}}}}}}({a}_{t}| {\mu }_{t},{\sigma }_{t})$. Here, ${g}^{D}={\nabla }_{{\mathtt{w}}}{D}_{KL}({\pi }_{{\mathtt{w}}}(\cdot | {s}_{t}))\parallel {{{{{{{\mathcal{P}}}}}}}}(\cdot | {\mu }_{t},{\sigma }_{t})$, where D_KL(P∥Q) is the Kullback–Leibler divergence measure between distributions P and Q. The coefficient β is iteratively updated to keep a constant fraction of samples in the RM within the trust region by

$$\beta \leftarrow \left\{\begin{array}{ll}(1-\eta )\beta ,\quad &\,{{\mbox{if}}}\,\,{r}_{RM} \; > \; D,\\ \beta +(1-\eta )\beta ,\quad &\,{{\mbox{otherwise}}}\,,\end{array}\right.$$

(7)

where r_RM is the fraction of the RM with importance weights outside the trust region [1/C, C] and D is a parameter.

The most notable hyperparameters used in our description of the MARL setup are the spatial resolution for the interpolation of the actions onto the grid (determined by ${{{\Delta }}}_{x}^{m}/{{{\Delta }}}_{x}$, and ${{{\Delta }}}_{z}^{m}/{{{\Delta }}}_{z}$). The default values ${{{\Delta }}}_{x}^{m}/{{{\Delta }}}_{x}$, and ${{{\Delta }}}_{z}^{m}/{{{\Delta }}}_{z}$ reduce the number of experiences generated per simulation to O(10⁵). This value is similar to the number of experiences generated per simulation used for SciMARL of SGS model development⁴⁶. Consistent with previous studies, we found that further reducing the number of agents per simulation reduced the model’s adaptability and therefore exhibit slightly lower performance. Because we use conventional reinforcement learning update rules in a multi-agent setting, single parameter updates are imprecise. We found that ReF-ER with hyperparameters C = 1.5 and D = 0.05 (Eqs. (6) and (7)) stabilizes training. We ran multiple training runs per reward function and whenever we vary the hyperparameters, but we observe consistent training progress regardless of the initial random seed.

Further implementation details of the algorithm can be found in Novati et al.⁵⁴.

Overview of the training setup

The models are trained on turbulent channel flow simulations of Re_τ = u_τδ/ν ≈ 2000, 4200, and 8000, where δ is the channel half-height with grid resolution Δ_x,y,z ≃ 0.05δ. Each WMLES is initialized for uniformly sampled Re_τ ∈ {2000, 4200, 8000} with the initial velocity field for the training obtained by superposing white noise sampled from ${{{{{{{\mathcal{N}}}}}}}}(0,0.5{u}_{\tau })$ to a previously obtained WMLES flow field at the given Re_τ that is run for a short period of time to remove numerical artifacts. The initial wall-shear stress is set to overestimate or underestimate the correct wall-shear stress within ± 20%. At each time step of the WMLES, the location h^m is randomly selected between 0.075δ and 0.15δ to train over a smooth range of ${({h}^{m})}^{+}$ within the log layer. The velocity and its wall-normal gradient are then interpolated to the chosen wall-normal location h_m to form the state vector. The agents are located with spacings ${{{\Delta }}}_{x}^{m}=4{{{\Delta }}}_{x}$ and ${{{\Delta }}}_{z}^{m}=4{{{\Delta }}}_{z}$. Each iteration of the learning algorithm runs the simulation for 2δ/u_τ, updating the model at every time step.

The policy is parameterized by a neural network with two hidden layers of 128 units each, with soft sign activations and skip connections. The neural network is initialized with small outer weights and bias shifted such that the initial policy is approximately ${{{{{{{\mathcal{N}}}}}}}}(1,1{0}^{-4})$⁵⁶. Gradients are computed with Monte Carlo estimates with sample size B = 512 from an RM of size 10⁶. The parameters are updated with the Adam algorithm⁵⁷ with learning rate η = 10⁻⁵. ReF-ER hyperparameters of C = 1.5 and D = 0.05 are used to stabilize training. Each training run is advanced for 10⁷ policy gradient steps.

For both VWM and LLWM, the action is given by a multiplication factor a_n(x, z) ∈ [0.9, 1.1] such that ${\tau }_{w}^{m}(x,z,{t}_{n+1})={a}_{n}(x,z){\tau }_{w}^{m}(x,z,{t}_{n})$. The reward is given by

$${r}_{n}(x,z)=\, \left(\frac{| {\tau }_{w}-{\tau }_{w}^{m}(x,z,{t}_{n})| -| {\tau }_{w}-{\tau }_{w}^{m}(x,z,{t}_{n-1})| }{{\tau }_{w}}\right)\\ +{\mathbb{1}}\left(\frac{| {\tau }_{w}-{\tau }_{w}^{m}(x,z,{t}_{n})| }{{\tau }_{w}} < 0.01\right),$$

(8)

where ${\mathbb{1}}$ is an indicator function and τ_w is the true mean wall-shear stress. This gives a reward that is proportional to the improvement in the prediction of the wall-shear stress compared to the one obtained in the previous time step with an additional reward if the predicted wall-shear stress is within 1% of the true value. The states of the VWM are the instantaneous velocity u^*(x, h^m, z, t_n), the wall-normal derivative ∂u^*/∂y^*(x, h^m, z, t_n), and the wall-normal location ${y}^{* }={({h}^{m})}^{* }$ of the sampling point. The states of the LLWM are

$$\frac{1}{{\kappa }^{m}}(x,z,{t}_{n}) =\left(\frac{\partial {u}^{* }}{\partial {y}^{* }}{y}^{* }\right)(x,{h}^{m},z,{t}_{n}),\,\,{{\mbox{and}}}\,{B}^{m}(x,z,{t}_{n})\\ ={u}^{* }(x,{h}^{m},z,{t}_{n})-\frac{1}{{\kappa }^{m}}(x,z,{t}_{n})\log {\left({h}^{m}\right)}^{* }.$$

Details of the flow simulation

We solve the filtered incompressible Navier–Stokes equations in a channel using LES with a staggered second-order finite difference in space⁵⁸ with a fractional-step method⁵⁹ and a third-order Runge–Kutta time-advancing scheme⁶⁰. The SGS model is given by the anisotropic minimum dissipation (AMD) model⁶¹, which is known to perform well in highly anisotropic grids⁶².

For the channel flow, periodic boundary conditions are imposed in the streamwise and spanwise directions, and the no-slip and no-penetration boundary conditions at the top and bottom walls. The modeled wall stress ${\tau }_{w}^{m}$ is applied to the LES domain through the eddy viscosity at the wall⁶³,

$${\left.{\nu }_{t}\right|}_{w}={\left.\left(\frac{\partial u}{\partial y}\right)\right|}_{w}^{-1}\frac{{\tau }_{w}^{m}}{\rho }-\nu ,$$

(9)

where ν_t is the eddy viscosity and the subscript w indicates values evaluated at the wall. This boundary condition, compared to the more widely used Neumann boundary condition, is better at resolving the so-called log-layer mismatch for WMLES⁶³. The channel is driven by a constant pressure gradient for the testing cases. For training, the channel is driven by a constant mass flow rate computed from the mean velocity profile of channel flow. The domain size is given by L_x = 2πδ, L_y = 2δ, and L_z = πδ, where δ is the channel half-height.

For the spatially evolving boundary layer, periodic boundary conditions are imposed in the spanwise direction. No-slip and no-penetration boundary condition with viscosity augmentation (Eq. (9)) is used at the wall. In the top plane, we impose u = U_∞ (free-stream velocity), w = 0, and v estimated from the known experimental growth of the displacement thickness for the corresponding range of Reynolds numbers⁴⁹. This controls the average streamwise pressure gradient, whose nominal value is set to zero. The turbulent inflow is generated by the recycling scheme⁶⁴, in which the velocities from a reference downstream plane, x_ref, are used to synthesize the incoming turbulence. The reference plane is located well beyond the end of the inflow region to avoid spurious feedback^65,66. A convective boundary condition is applied at the outlet with convective velocity U_∞⁶⁷ with a small correction to enforce global mass conservation⁶⁶. The spanwise direction is periodic.

The code has been validated in previous studies in turbulent channel flows^22,68,69,70 and flat-plate boundary layers^22,71.

Testing: channel flow

The model predictions of VWM and LLWM are tested on turbulent channel flow for Reynolds numbers in the range from 5200 to 10⁶ (see Table 1) and for a time span of 300δ/u_τ significantly longer than the training period 2δ/u_τ. While only results using Δ_x ≈ Δ_z ≈ 0.05δ are reported here, using different grid resolutions representative of WMLES also produce similar results.

Table 1 List of channel flow test cases and corresponding Reynolds number, wall-normal grid resolution, and matching location h^m.

Full size table

Note that for LLWM, one of the states, 1/κ^m = (∂u^*/∂y^*)y^*, depends on the choice of y with respect to the discrete points of the simulation. For example, if y is located at the midpoint of two computational grid points, a central finite difference can be used to compute the wall-normal derivative ∂u^*/∂y^*. On the other hand, if y is located on the computational grid point, either a left- or right-finite difference is used. In this case, we chose y values that are midpoints of the two computational grid points. Changing the location of y had minor effects on the results, with the wall-shear stress changing ~5% when the location of y was chosen to be on the computational grid point.

Testing: spatially evolving turbulent boundary layer

The predictive performance of LLWM is assessed in a zero-pressure-gradient flat-plate turbulent boundary layer with Re_θ ranging from 1000 to 7000. This range was chosen so that the results can be compared against relevant DNS⁷². The recycling plane for the inlet boundary condition is set to x_ref/θ₀ = 890, where θ₀ is the momentum thickness at the inlet. The length, height and width of the simulated box are L_x = 3570θ₀, L_y = 100θ₀, and L_z = 200θ₀. The streamwise and spanwise resolutions are Δ_x/δ = 0.06 (${{{\Delta }}}_{x}^{+}=128$) and Δ_z/δ = 0.05 (${{{\Delta }}}_{z}^{+}=105$) at Re_θ = 6500. The grid is uniform in the wall-normal direction with Δ_y/δ = 0.03 (${{{\Delta }}}_{y}^{+}=64$) at Re_θ = 6500. The number of wall-normal grid points per boundary layer thickness is chosen to be ~10 at the inlet, which is in line with the channel flow simulations. The sampling point h^m was chosen to be at the third grid point off the wall in the wall-normal direction¹⁶, which places the point in the log region for most of the domain. All computations were run for 50 washout times after transients.

Data availability

All the data analyzed in this paper were produced with an in-house flow solver and an open-source reinforcement learning software described in the code availability statement. Reference data and the scripts used to produce the data figures is available through GitHub (https://github.com/hjbae/SciMARL_WMLES).

Code availability

The wall-modeled large-eddy simulations were performed with a in-house flow solver, which is available on demand. The wall models were trained with the reinforcement learning library smarties (https://github.com/cselab/smarties).

References

Sørensen, J. N. Aerodynamic aspects of wind energy conversion. Annu. Rev. Fluid Mech. 43, 427–448 (2011).
Article ADS MATH Google Scholar
Slotnick, J. et al. CFD vision 2030 study: a path to revolutionary computational aerosciences. NASA Contractor Report, NASA/CR–2014-218178 (2013).
Stoll, R., Gibbs, J. A., Salesky, S. T., Anderson, W. & Calaf, M. Large-eddy simulation of the atmospheric boundary layer. Bound.-Layer Meteorol. 177, 541–581 (2020).
Article ADS Google Scholar
Chapman, D. R. Computational aerodynamics development and outlook. AIAA J. 17, 1293–1313 (1979).
Article ADS MATH Google Scholar
Choi, H. & Moin, P. Grid-point requirements for large eddy simulation: Chapman’s estimates revisited. Phys. Fluids 24, 011702 (2012).
Article ADS Google Scholar
Yang, X. I. A. & Griffin, K. P. Grid-point and time-step requirements for direct numerical simulation and large-eddy simulation. Phys. Fluids 33, 015108 (2021).
Article ADS CAS Google Scholar
Piomelli, U. & Balaras, E. Wall-layer models for large-eddy simulations. Annu. Rev. Fluid Mech. 34, 349–374 (2002).
Article ADS MathSciNet MATH Google Scholar
Spalart, P. R. Detached-eddy simulation. Annu. Rev. Fluid Mech. 41, 181–202 (2009).
Article ADS MATH Google Scholar
Larsson, J., Kawai, S., Bodart, J. & Bermejo-Moreno, I. Large eddy simulation with modeled wall-stress: recent progress and future directions. Mech. Eng. Rev. 3, 15–00418 (2016).
Article Google Scholar
Bose, S. T. & Park, G. I. Wall-modeled large-eddy simulation for complex turbulent flows. Annu. Rev. Fluid Mech. 50, 535–561 (2018).
Article ADS MathSciNet PubMed PubMed Central MATH Google Scholar
Deardorff, J. W. A numerical study of three-dimensional turbulent channel flow at large Reynolds numbers. J. Fluid Mech. 41, 453–480 (1970).
Article ADS MATH Google Scholar
Schumann, U. Subgrid scale model for finite difference simulations of turbulent flows in plane channels and annuli. J. Comput. Phys. 18, 376–404 (1975).
Article ADS MATH Google Scholar
Piomelli, U., Ferziger, J., Moin, P. & Kim, J. New approximate boundary conditions for large eddy simulations of wall-bounded flows. Phys. Fluids A 1, 1061–1068 (1989).
Article ADS Google Scholar
Balaras, E., Benocci, C. & Piomelli, U. Two-layer approximate boundary conditions for large-eddy simulations. AIAA J. 34, 1111–1119 (1996).
Article ADS MATH Google Scholar
Wang, M. & Moin, P. Dynamic wall modeling for large-eddy simulation of complex turbulent flows. Phys. Fluids 14, 2043–2051 (2002).
Article ADS MathSciNet CAS MATH Google Scholar
Kawai, S. & Larsson, J. Wall-modeling in large eddy simulation: length scales, grid resolution, and accuracy. Phys. Fluids 24, 015105 (2012).
Article ADS Google Scholar
Park, G. I. & Moin, P. An improved dynamic non-equilibrium wall-model for large eddy simulation. Phys. Fluids 26, 37–48 (2014).
Article Google Scholar
Yang, X. I. A., Sadique, J., Mittal, R. & Meneveau, C. Integral wall model for large eddy simulations of wall-bounded turbulent flows. Phys. Fluids 27, 025112 (2015).
Article ADS Google Scholar
Bodart, J. & Larsson, J. Wall-modeled large eddy simulation in complex geometries with application to high-lift devices. Annu. Res. Briefs Center Turbulence Res. 2011, 37–48 (2011).
Google Scholar
Bermejo-Moreno, I. et al. Confinement effects in shock wave/turbulent boundary layer interactions through wall-modelled large-eddy simulations. J. Fluid Mech. 758, 5–62 (2014).
Article ADS CAS Google Scholar
Bose, S. T. & Moin, P. A dynamic slip boundary condition for wall-modeled large-eddy simulation. Phys. Fluids 26, 015104 (2014).
Article ADS Google Scholar
Bae, H. J., Lozano-Durán, A., Bose, S. T. & Moin, P. Dynamic slip wall model for large-eddy simulation. J. Fluid Mech. 859, 400–432 (2019).
Article ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Meyers, J. & Sagaut, P. Is plane-channel flow a friendly case for the testing of large-eddy simulation subgrid-scale models? Phys. Fluids 19, 048105 (2007).
Article ADS MATH Google Scholar
Sarghini, F., De Felice, G. & Santini, S. Neural networks based subgrid scale modeling in large eddy simulations. Comput. Fluids 32, 97–108 (2003).
Article MATH Google Scholar
Hickel, S., Franz, S., Adams, N. A. & Koumoutsakos, P. Optimization of an implicit subgrid-scale model for LES. in Proceedings of the 21st International Congress of Theoretical and Applied Mechanics (eds Gutkowski, W. & Kowalewski, T. A.) FM24_11256 (Springer, Warsaw, Poland, 2004).
Maulik, R. & San, O. A neural network approach for the blind deconvolution of turbulent flows. J. Fluid Mech. 831, 151–181 (2017).
Article ADS MathSciNet CAS MATH Google Scholar
Gamahara, M. & Hattori, Y. Searching for turbulence models by artificial neural network. Phys. Rev. Fluids 2, 054604 (2017).
Article ADS Google Scholar
Vollant, A., Balarac, G. & Corre, C. Subgrid-scale scalar flux modelling based on optimal estimation theory and machine-learning procedures. J. Turbul. 18, 854–878 (2017).
Article ADS MathSciNet Google Scholar
Xie, C., Wang, J., Li, H., Wan, M. & Chen, S. Artificial neural network mixed model for large eddy simulation of compressible isotropic turbulence. Phys. Fluids 31, 085112 (2019).
Article ADS Google Scholar
Fukami, K., Fukagata, K. & Taira, K. Super-resolution reconstruction of turbulent flows with machine learning. J. Fluid Mech. 870, 106–120 (2019).
Article ADS MathSciNet CAS MATH Google Scholar
Milano, M. & Koumoutsakos, P. Neural network modeling for near wall turbulent flow. J. Comput. Phys. 182, 1–26 (2002).
Article ADS CAS MATH Google Scholar
Yang, X. I. A., Zafar, S., Wang, J.-X. & Xiao, H. Predictive large-eddy-simulation wall modeling via physics-informed neural networks. Phys. Rev. Fluids 4, 034602 (2019).
Article ADS Google Scholar
Lozano-Durán, A. & Bae, H. J. Self-critical machine-learning wall-modeled LES for external aerodynamics. Annu. Res. Briefs Center Turbulence Res. 2020, 197–210 (2020).
Google Scholar
Nadiga, B. T. & Livescu, D. Instability of the perfect subgrid model in implicit-filtering large eddy simulation of geostrophic turbulence. Phys. Rev. E 75, 046303 (2007).
Article ADS CAS Google Scholar
Wu, J.-L., Xiao, H. & Paterson, E. Physics-informed machine learning approach for augmenting turbulence models: a comprehensive framework. Phys. Rev. Fluids 3, 074602 (2018).
Article ADS Google Scholar
Beck, A., Flad, D. & Munz, C.-D. Deep neural networks for data-driven LES closure models. J. Comput. Phys. 398, 108910 (2019).
Article MathSciNet Google Scholar
Bertsekas, D. P. Reinforcement Learning and Optimal Control (Athena Scientific, 2019).
Google Scholar
Levine, S., Finn, C., Darrell, T. & Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 1334–1373 (2016).
MathSciNet MATH Google Scholar
Reddy, G., Celani, A., Sejnowski, T. J. & Vergassola, M. Learning to soar in turbulent environments. Proc. Natl. Acad. Sci. USA 113, E4877–E4884 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Article ADS CAS PubMed Google Scholar
Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 484 (2016).
Article ADS CAS PubMed Google Scholar
Gazzola, M., Hejazialhosseini, B. & Koumoutsakos, P. Reinforcement learning and wavelet adapted vortex methods for simulations of self-propelled swimmers. SIAM J. Sci. Comput. 36, B622–B639 (2014).
Article MathSciNet MATH Google Scholar
Novati, G. et al. Synchronisation through learning for two self-propelled swimmers. Bioinspir. Biomim. 12, 036001 (2017).
Article ADS PubMed Google Scholar
Verma, S., Novati, G. & Koumoutsakos, P. Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proc. Natl. Acad. Sci. USA 115, 5849–5854 (2018).
Article CAS PubMed PubMed Central Google Scholar
Biferale, L., Bonaccorso, F., Buzzicotti, M., Clark Di Leoni, P. & Gustavsson, K. Zermelo’s problem: optimal point-to-point navigation in 2D turbulent flows using reinforcement learning. Chaos 29, 103138 (2019).
Article ADS MathSciNet CAS PubMed Google Scholar
Novati, G., Lascombes de Laroussilhe, H. & Koumoutsakos, P. Automating turbulence modeling by multi-agent reinforcement learning. Nat. Mach. Intell. 3, 87–96 (2020).
Article Google Scholar
Lee, J., Cho, M. & Choi, H. Large eddy simulations of turbulent channel and boundary layer flows at high Reynolds number with mean wall shear stress boundary condition. Phys. Fluids 25, 110808 (2013).
Article ADS Google Scholar
Millikan, C. B. A critical discussion of turbulent flows in channels and circular tubes. In Proceedings of Fifth International Congress of Applied Mechanics, (eds Hartog, J.P.D. & Peters, H.) 386–392 (Wiley, 1939).
Schlichting, H. & Kestin, J. Boundary Layer Theory, Vol. 121 (Springer, 1961).
Mathis, R., Marusic, I., Chernyshenko, S. I. & Hutchins, N. Estimating wall-shear-stress fluctuations given an outer region input. J. Fluid Mech. 715, 163 (2013).
Article ADS MathSciNet CAS MATH Google Scholar
Cheng, C., Li, W., Lozano-Durán, A. & Liu, H. On the structure of streamwise wall-shear stress fluctuations in turbulent channel flows. J. Fluid Mech. 903, A29 (2020).
Article ADS MathSciNet CAS MATH Google Scholar
Park, G. I. & Moin, P. Space-time characteristics of wall-pressure and wall shear-stress fluctuations in wall-modeled large eddy simulation. Phys. Rev. Fluids 1, 024404 (2016).
Article ADS Google Scholar
Yang, X. I. A., Park, G. I. & Moin, P. Log-layer mismatch and modeling of the fluctuating wall stress in wall-modeled large-eddy simulations. Phys. Rev. Fluids 2, 104601 (2017).
Article ADS PubMed PubMed Central Google Scholar
Novati, G. & Koumoutsakos, P. Remember and forget for experience replay. In Proceedings of the 36th International Conference on Machine Learning. (eds Chaudhuri, K. & Salakhutdinov, R.) 4851–4860 (Proceedings of Machine Learning Research, 2019).
Munos, R., Stepleton, T., Harutyunyan, A. & Bellemare, M. G. Safe and efficient off-policy reinforcement learning. In Advances in Neural Information Processing Systems Vol. 29, 1054–1062 (eds Lee, D. D., von Luxburg, U. Garnett, R., Sugiyama, M. & Guyon, I.) (Curran Associates, Inc., 2016).
Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 249–256 (eds Teh, Y. W. & Titterington, M.) (JMLR Workshop and Conference Proceedings, 2010).
Kingma, D. P. & Ba, J. L. Adam: A method for stochastic optimization. in Proc. 3rd International Conference on Learning Representations (ICLR, 2014).
Orlandi, P. Fluid Flow Phenomena: A Numerical Toolkit. Fluid Flow Phenomena: A Numerical Toolkit (Springer, 2000).
Kim, J. & Moin, P. Application of a fractional-step method to incompressible Navier-Stokes equations. J. Comp. Phys. 59, 308–323 (1985).
Article ADS MathSciNet MATH Google Scholar
Wray, A. A. Minimal-storage time advancement schemes for spectral methods. Tech. Rep. NASA Ames Research Center, California, Report No. MS 202 (1990).
Rozema, W., Bae, H. J., Moin, P. & Verstappen, R. Minimum-dissipation models for large-eddy simulation. Phys. Fluids 27, 085107 (2015).
Article ADS Google Scholar
Haering, S. W., Lee, M. & Moser, R. D. Resolution-induced anisotropy in large-eddy simulations. Phys. Rev. Fluids 4, 114605 (2019).
Article ADS Google Scholar
Bae, H. J. & Lozano-Durán, A. Effect of wall boundary conditions on wall-modeled large-eddy simulation in a finite-difference framework. Fluids 6, 112 (2021).
Article ADS Google Scholar
Lund, T. S., Wu, X. & Squires, K. D. Generation of turbulent inflow data for spatially-developing boundary layer simulations. J. Comp. Phys. 140, 233–258 (1998).
Article ADS MathSciNet MATH Google Scholar
Nikitin, N. Spatial periodicity of spatially evolving turbulent flow caused by inflow boundary condition. Phys. Fluids 19, 091703 (2007).
Article ADS MATH Google Scholar
Simens, M. P., Jiménez, J., Hoyas, S. & Mizuno, Y. A high-resolution code for turbulent boundary layers. J. Comp. Phys. 228, 4218–4231 (2009).
Article ADS CAS MATH Google Scholar
Pauley, L. L., Moin, P. & Reynolds, W. C. The structure of two-dimensional separation. J. Fluid Mech. 220, 397–411 (1990).
Article ADS Google Scholar
Bae, H. J., Lozano-Durán, A., Bose, S. T. & Moin, P. Turbulence intensities in large-eddy simulation of wall-bounded flows. Phys. Rev. Fluids 3, 014610 (2018).
Article ADS Google Scholar
Lozano-Durán, A. & Bae, H. J. Characteristic scales of Townsend’s wall-attached eddies. J. Fluid Mech. 868, 698 (2019).
Article ADS MathSciNet PubMed PubMed Central MATH Google Scholar
Lozano-Durán, A. & Bae, H. J. Error scaling of large-eddy simulation in the outer region of wall-bounded turbulence. J. Comput. Phys. 392, 532–555 (2019).
Article ADS MathSciNet PubMed PubMed Central Google Scholar
Lozano-Durán, A., Hack, M. J. P. & Moin, P. Modeling boundary-layer transition in direct and large-eddy simulations using parabolized stability equations. Phys. Rev. Fluids 3, 023901 (2018).
Article ADS PubMed PubMed Central Google Scholar
Sillero, J. A., Jiménez, J. & Moser, R. D. Two-point statistics for turbulent boundary layers and channels at Reynolds numbers up to δ⁺ ≈ 2000. Phys. Fluids 26, 105109 (2014).
Article ADS Google Scholar

Download references

Acknowledgements

The authors acknowledge the support of Air Force Office of Scientific Research (AFOSR) Multidisciplinary University Research Initiative (MURI) project: Prediction, Statistical Quantification, and Mitigation of Extreme Events Caused by Exogenous Causes or Intrinsic Instabilities under grant number FA9550-21-1-0058. Computational resources were provided by the Swiss National Supercomputing Centre (CSCS) Project s929.

Author information

Authors and Affiliations

School of Engineering and Applied Sciences, Harvard University, 29 Oxford Street, Cambridge, MA, 02138, USA
H. Jane Bae & Petros Koumoutsakos
Graduate Aerospace Laboratories, California Institute of Technology, 1200 E. California Boulevard, Pasadena, CA, 91125, USA
H. Jane Bae
Computational Science and Engineering Laboratory, ETH Zurich, Clausiusstrasse 33, Zurich, CH-8092, Switzerland
Petros Koumoutsakos

Authors

H. Jane Bae
View author publications
You can also search for this author in PubMed Google Scholar
Petros Koumoutsakos
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.J.B. jointly conceived the study with P.K., designed and performed experiments, analyzed the data, and wrote the paper; P.K. devised the concept of SciMARL, supervised the project, and edited the manuscript.

Corresponding authors

Correspondence to H. Jane Bae or Petros Koumoutsakos.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bae, H.J., Koumoutsakos, P. Scientific multi-agent reinforcement learning for wall-models of turbulent flows. Nat Commun 13, 1443 (2022). https://doi.org/10.1038/s41467-022-28957-7

Download citation

Received: 29 May 2021
Accepted: 14 February 2022
Published: 17 March 2022
DOI: https://doi.org/10.1038/s41467-022-28957-7

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Automating turbulence modelling by multi-agent reinforcement learning

Machine learning for flow-informed aerodynamic control in turbulent wind conditions

Predictive scale-bridging simulations through active learning

Introduction

Results

Multi-agent reinforcement learning for wall modeling

Velocity-based wall model

Log-law-based wall model

State-action map

Testing: turbulent channel flow

Testing: spatially evolving turbulent boundary layer

Distribution of wall-shear stress

Potential of SciMARL wall models

Discussion

Methods

Reinforcement learning

Overview of the training setup

Details of the flow simulation

Testing: channel flow

Testing: spatially evolving turbulent boundary layer

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Peer Review File

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links