1 Introduction

Compressible multiphase flows are encountered in many technical applications like fuel injection systems of rocket or jet engines. Due to their multi-scale nature, the accurate and efficient simulation of such flows poses a challenging task and is still an active field of research. For the modeling and computation of multiphase flows, two main approaches can be distinguished in literature: the sharp interface method (SIM) and the diffuse interface method (DIM). In the DIM, the phase interface is modelled as a smooth transition layer between two fluids, by introducing a mixture of the phases. Consequently, the DIM requires a high resolution of the numerical approximation near the smoothed interface to preserve the physical properties. Well-known implementations of this ansatz comprise the Navier–Stokes-Korteweg equations [1] and the Baer–Nunziato equations [3]. The SIM, by contrast, models the interface as a discontinuity separating the bulk phases. This approach is advantageous when the width of the physical interface is infinitesimal with respect to the macroscopic flow scale. The SIM consists of two major building blocks: An algorithm to track the phase interface and a physically consistent coupling of the bulk phases.

In this work, we use a SIM with a level-set approach [58] to track the phase interface. A common strategy to couple the bulk phases in SIM is the ghost-fluid method of Fedkiw et al. [22]. It relies on the definition of ghost states at the interface to obtain boundary conditions for the bulk phases. Merkle and Rohde [43] pioneered a strategy to obtain the required ghost states with the solution of a two-phase Riemann problem across the phase interface. Variations of this idea like the modified ghost-fluid method of Xu and Liu [66] have been recently reported and show promising results for complex, multi-dimensional compressible droplet dynamics. In the present work, we build on the approach by Fechter et al. [18, 20] that employs a ghost-fluid method in combination with two-phase Riemann solvers at the phase interface.

The simulation of compressible multiphase flows with the SIM poses demanding requirements on a numerical scheme. In smooth regions inside the bulk phases, a high order scheme is beneficial for its accuracy and efficiency. At the phase interface, a robust, non-oscillatory scheme is required to cope with the strong discontinuity caused by the sharp interface. Furthermore, for compressible flows, the discretization needs to handle potential shocks in the bulk phase. There exist a multitude of numerical schemes in literature to cope with the above requirements. From the class of Finite Volume schemes (FV), the Weighted Essentially Non-Oscillatory (WENO) [14, 62] and the Central Weighted Essentially Non-Oscillatory (CWENO) [16] schemes have been successfully applied to compressible multiphase flows e.g. by Hu et al. [31] and Tsoutsanis et al. [63]. An alternative is the Discontinuous Galerkin (DG) Method that has gained popularity due to its efficiency and accuracy [9, 27]. To avoid spurious oscillations related to the Gibb’s instabilities at discontinuous solution features, different stabilization and shock capturing techniques have been developed. One approach is to smooth out the oscillations by a local switch-on of artificial viscosity [48, 51, 68]. A different method is to combine the DG method with a WENO approach to avoid instabilities at shocks [13, 15].

In this work, we follow the idea of Huerta et al. [32] and Persson et al. [52] to introduce piecewise constant ansatz functions and apply them on a sub-cell grid inside the original DG element. The approach presented in this paper is based on the work of Sonntag and Munz [55] who introduced an a priori FV sub-cell limiting scheme. If a shock or strong gradient is observed inside a DG element, the element is subdivided into sub-cells and discretized by a second-order FV scheme. In a recent publication [46], we extended this hybrid DG/FV approach by introducing a p-adaptive DG operator and allowing for a FV sub-cell resolution independent of the element-local DG degree N. The resulting scheme combines the high order accuracy of the DG scheme with the robustness of the FV method by applying p-adaptivity in smooth regions and a h-refined sub-cell grid to improve shock localization. In this paper, we introduce this adaptive hybrid scheme [46] to our sharp interface framework [20, 35, 47].

Our sharp interface framework solves a level-set transport equation to compute the position and geometry of the phase interfaces. To provide accurate geometric properties, high order accuracy is required at the phase interface. However, in case of merging phases, the level-set field may exhibit discontinuities and kinks. Therefore we employ a hybrid p-adaptive DG scheme combined with FV sub-cell limiting in analogy to the proposed fluid discretization. The sub-cell limiting and the p-refinement of both the fluid and level-set discretization are controlled at runtime by means of an a priori indicator based on the modal decay of the solution polynomials [41, 46].

The adaptive discretization and the interface tracking algorithm cause an inhomogeneous distribution of the element-local computational costs, which implies a workload imbalance across the processor units in parallel simulations. Therefore, a dynamic load balancing (DLB) is essential to ensure parallel efficiency of the code. Appel et al. [2] introduced a DLB scheme based on repartitioning along a space filling curve to treat the imbalance due to the interface tracking. In the present study, we extend this algorithm to cover imbalances caused by the adaptive discretization and propose an efficient communication strategy to speedup the data redistribution during rebalancing.

In summary, the goal of the present work is to introduce an hp-adaptive hybrid DG/FV method into a sharp interface framework. Hereby, the adaptive discretization is applied to both the interface tracking and the fluid flow. To maintain favorable parallel scaling, the key challenge of an equal load distribution during run time is addressed with an efficient communication and dynamic load balancing scheme.

The paper is organized as follows: In Sect. 2 we introduce the governing equations for the bulk flow and the interface tracking algorithm. Section 3 provides the numerical framework to solve the conservation equations and the level-set interface tracking with an adaptive hybrid DG/FV scheme. The parallelization strategy with the focus on an efficient dynamic load balancing approach is addressed in Sect. 4. Finally, we apply the code framework to well-known shock-droplet and shock-bubble interactions in Sect. 5. We assess the efficiency and accuracy of our scheme by comparing with numerical and experimental results from literature. We close with a conclusion and a summary of our findings.

2 Governing Equations

In this work, we investigate compressible multiphase flows with two pure, immiscible phases separated by a sharp interface of zero thickness. Therefore, we consider a computational domain \(\Omega \), consisting of a liquid part \(\Omega _l\) and a vapor part \(\Omega _v\), separated by the phase interface \(\Gamma \) on a finite time interval [0, T]. The bulk fluid behavior is governed by the compressible Euler equations

$$\begin{aligned} \frac{\partial {\varvec{Q}}}{\partial t} + \nabla _{{\varvec{x}}} \cdot {\varvec{F}}({\varvec{Q}})={\varvec{0}} \quad \text {in} \quad \Omega \times [0,T] \end{aligned}$$
(1)

with the vector of conservative variables \({\varvec{Q}}= \left[ \rho ,\rho {\varvec{u}},E\right] ^\intercal \) and the convective flux vector \({\varvec{F}}= \left[ \rho {\varvec{u}},\rho {\varvec{u}}\otimes {\varvec{u}}+ p{\varvec{I}},{\varvec{u}}(E+p)\right] ^\intercal \) given in terms of the density \(\rho \), the velocity vector \({\varvec{u}}=\left( u_1,u_2,u_3\right) ^\intercal \), the pressure p and the total energy E. The total energy E is defined as the sum of the internal energy and the kinetic energy

$$\begin{aligned} E=\rho e = \rho \epsilon + \frac{1}{2}\rho {\varvec{u}}\cdot {\varvec{u}}. \end{aligned}$$
(2)

The equation system is closed with an equation of state (EOS), that provides expressions for the pressure p and the specific internal energy \(\epsilon \):

$$\begin{aligned} p = p\left( \rho ,\epsilon \right) ,\quad \epsilon =\epsilon \left( \rho ,p\right) . \end{aligned}$$
(3)

Both algebraic and multi-parameter EOS are implemented in our code framework and can be efficiently evaluated by the tabulation method introduced by Föll et al. [23]. Here, we model the liquid phase by a stiffened gas EOS and apply the perfect gas law to model the gaseous phase. Following Le Métayer et al. [45], the stiffened gas EOS is given as

$$\begin{aligned} p = (\gamma -1)\rho \epsilon -\gamma p_{\infty } \end{aligned}$$
(4)

with the ratio of specific heats \(\gamma \) and the reference pressure \(p_{\infty }\). For \(p_{\infty }=0\), the stiffened gas equation reduces to the perfect gas law.

A necessary building block for the SIM is the tracking algorithm for the phase interface \(\Gamma \). Following Sussman et al. [58], we obtain the position of the phase interface as the root of a level-set function \(\Phi \) that is advected by a velocity field \({\varvec{s}}\). The level-set transport equation is defined as

$$\begin{aligned} \frac{\partial \Phi }{\partial t}+{\varvec{s}}\cdot \nabla _{{\varvec{x}}} \Phi =0. \end{aligned}$$
(5)

An essential feature of the level-set function is the signed distance property. It is necessary for the accurate derivation of geometric properties such as the unit normal vector \({\varvec{n}}_{\textrm{ls}}\) and the interface curvature \(\kappa _{\textrm{ls}}\). Since the level-set transport equation (5) does not preserve the signed distance property, a reinitialization procedure is required to regularize the level-set field. Following Sussman et al. [58], this is accomplished by solving a Hamilton–Jacobi equation in pseudo time \(\tau \):

$$\begin{aligned} \frac{\partial \Phi }{\partial \tau }+\text {sign}\left( \Phi \right) \left( |\nabla _{{\varvec{x}}} \Phi |-1\right) =0 \end{aligned}$$
(6)

Fast marching [54] or fast sweeping [61] methods could be used as an alternative here. According to [7], geometric properties of the phase boundary can be expressed in terms of derivatives of the level-set field, with the normal vector \({\varvec{n}}_{\textrm{ls}}\) given as

$$\begin{aligned} {\varvec{n}}_{\textrm{ls}}:= \frac{\nabla _{{\varvec{x}}}\Phi }{|\nabla _{{\varvec{x}}}\Phi |} \end{aligned}$$
(7)

and the curvature \(\kappa _{\textrm{ls}}\) computed by

$$\begin{aligned} \kappa _{\textrm{ls}}&:= \frac{\Phi _{x_1}^2\Phi _{x_2x_2}+2\Phi _{x_1}\Phi _{x_2}\Phi _{x_1x_2} + \Phi _{x_2}^2\Phi _{x_1x_1}}{|\nabla _{{\varvec{x}}} \Phi |^3} \nonumber \\&\quad +\frac{\Phi _{x_1}^2\Phi _{x_3x_3}+2\Phi _{x_1}\Phi _{x_3}\Phi _{x_1x_3} + \Phi _{x_3}^2\Phi _{x_1x_1}}{|\nabla _{{\varvec{x}}} \Phi |^3} \nonumber \\&\quad +\frac{\Phi _{x_2}^2\Phi _{x_3x_3}+2\Phi _{x_2}\Phi _{x_3}\Phi _{x_2x_3} + \Phi _{x_3}^2\Phi _{x_2x_2}}{|\nabla _{{\varvec{x}}} \Phi |^3} \end{aligned}$$
(8)

As a final building block a velocity field \({\varvec{s}}\) has to be provided for the level-set transport equation (5). The velocity \({\varvec{s}}\) at the phase boundary results from the two-phase Riemann problem [19, 21, 43]. To obtain a velocity field in the volume, the velocities computed by the Riemann solver are first copied to neighboring cells. These pointwise data are then extrapolated into the volume by solving in pseudo time a Hamilton–Jacobi equation for each velocity component \(s^i\) [50]

$$\begin{aligned} \frac{\partial s^i}{\partial \tau }+\text {sign}(\Phi ){\varvec{n}}_{\textrm{ls}}\cdot \nabla _{{\varvec{x}}} s^i=0, \quad \text {with} \quad i=1,\ldots ,d \end{aligned}$$
(9)

To save computational cost, the level-set transport equation (5), the reinitialization (6) and the velocity extrapolation (9) are only evaluated in a narrow band around the phase boundary.

3 Numerical Methods

In this section, we first provide the numerical methods used to approximate the governing equations of the bulk flow and the level-set interface tracking algorithm. We then assemble the building blocks to obtain a hp-adaptive level-set ghost-fluid framework. All equations are discretized on a computational domain \(\Omega \subset {\mathbb {R}}^3\) that is subdivided into \(K\in {\mathbb {N}}\) non overlapping hexahedral elements \(\Omega ^E\) such that \(\Omega =\bigcup ^K_{e=1}\Omega ^E\) and \(\bigcap ^K_{i=1}\Omega ^E_i=\emptyset \).

We apply a hybrid discretization based on a p-adaptive Discontinuous Galerkin spectral element method (DGSEM) with a FV sub-cell shock capturing scheme for both the Euler equations (1) and the level-set transport equation (5). The hybrid discretization is illustrated in Fig. 1 where a single droplet and the underlying mesh are visualized. DG elements \(\Omega ^E\) containing the physical phase boundary \(\Gamma \) are subdivided into \(N_{\text {FV}}\) FV sub-cells \(\Omega ^e\) per direction. At the discontinuous phase interface, a switch to the robust FV sub-cells operator avoids spurious oscillations caused by the jump in the solution. However, there is a further motivation for the FV sub-cell approach:

Fig. 1
figure 1

Spatial discretization of a droplet in a vapor environment. Elements in the bulk region containing a smooth solution are discretized by a DG method with a local degree \(N_{\textrm{DG}}\). Elements containing the phase boundary are subdivided into \(N_{\textrm{FV}}\) sub-cells per direction to improve the approximation of the phase boundary by the surrogate surface

The physical phase boundary \(\Gamma \) is approximated on a discrete level by a surrogate phase interface, obtained as the projection of \(\Gamma \) to the grid. A high spatial resolution is thus vital to obtain an accurate approximation of the physical interface \(\Gamma \). Therfore, the FV sub-cell approach is essential to achieve a precise interface localization. For a detailed derivation of the DGSEM and FV sub-cell methods, the reader is referred to [38, 39, 55]. The hp-adaptivity is covered in more depth in [46], while the foundations of the multiphase framework can be found in [20, 35, 47].

3.1 Spatial Discretization

3.1.1 DGSEM with p-Adaptation

In this section, we apply a p-adaptive formulation of the DGSEM in combination with a FV sub-cell scheme to both the hyperbolic conservation law of the bulk flow (1) and the level-set transport equation (5), which is a hyperbolic equation with a non-conservative product.

As a first step, we introduce the mapping from the physical space \({\varvec{x}}=\left( x_1,x_2,x_3\right) ^\intercal \) to reference space \(\varvec{\xi }=\left( \xi _1,\xi _2,\xi _3\right) ^\intercal \) to transform the physical elements \(\Omega ^E\) to a unit reference element \(E=[-1,1]^3\). We thus obtain the conservation law and the level-set transport equation in reference space as

$$\begin{aligned} J_{\textrm{geo}}\frac{\partial {\varvec{Q}}}{\partial t} + \nabla _{\varvec{\xi }} \cdot \varvec{{\mathcal {F}}}({\varvec{Q}})&=0 \end{aligned}$$
(10)
$$\begin{aligned} J_{\textrm{geo}}\frac{\partial \Phi }{\partial t} + {\varvec{s}}\ \cdot \nabla _{\varvec{\xi }}\varvec{{\mathcal {L}}}&=0 \end{aligned}$$
(11)

with the Jacobi determinant of the mapping \(J_{\textrm{geo}}\), the contravariant flux \(\varvec{{\mathcal {F}}}\) and the contravariant level-set field \(\varvec{{\mathcal {L}}}\) respectively. Projection onto a space of test functions \(\psi \in {\mathbb {P}}\) leads to the weak form of Eqs. (10) and (11)

$$\begin{aligned} \int _{E}\left( J_{\textrm{geo}}\frac{\partial {\varvec{Q}}}{\partial t}\right) \psi d\Omega + \oint _{\partial E} (\varvec{{\mathcal {F}}}\cdot {\varvec{n}}_{\varvec{\xi }}) \psi d S_\xi - \int _{E}(\varvec{{\mathcal {F}}}\cdot \nabla _{\varvec{\xi }})\psi d\Omega&= 0 \end{aligned}$$
(12)
$$\begin{aligned} \int _{E}\left( J_{\textrm{geo}}\frac{\partial \Phi }{\partial t}\right) \psi d\Omega + \oint _{\partial E} ({\varvec{s}}\ \cdot \nabla _{\varvec{\xi }}\varvec{{\mathcal {L}}}) \psi d S_\xi + \int _{E}({\varvec{s}}\ \cdot \nabla _{\varvec{\xi }}\varvec{{\mathcal {L}}})\psi d\Omega&= 0 \end{aligned}$$
(13)

The term \(\varvec{{\mathcal {F}}}\cdot {\varvec{n}}_{\varvec{\xi }}\) denotes the contravariant flux across a surface element \({\varvec{n}}_{\varvec{\xi }}\) and is approximated by a numerical flux function \((\varvec{{\mathcal {F}}}\cdot {\varvec{n}}_{\varvec{\xi }})^*\), provided either by a HLLC [60] or HLLE [17] approximate Riemann solver in the present study. To evaluate the surface integral of the level-set transport equation, a path-conservative jump term \((\varvec{{\mathcal {D}}}\cdot {\varvec{n}}_\xi )^*:={\varvec{s}}\ \cdot \nabla _{\varvec{\xi }}\varvec{{\mathcal {L}}}\) is introduced. Following [6, 13, 35], it can be approximated by a Rusanov-type Riemann solver as

$$\begin{aligned} (\varvec{{\mathcal {D}}}\cdot {\varvec{n}}_\xi )^* \approx \frac{1}{4}\left( (s^+ +s^-)-2s_{\textrm{max}}\right) \left( \Phi ^+-\Phi ^-\right) \end{aligned}$$
(14)

with the maximum signal speed \(s_{\textrm{max}}:=\textrm{max}\left( |{\varvec{s}}_L\cdot {\varvec{n}}_{\xi }|,|{\varvec{s}}_R\cdot {\varvec{n}}_{\xi }|\right) \). The transport velocities \(s^\pm \) are determined from \({\varvec{s}}_L\) and \({\varvec{s}}_R\) in an upwinding procedure to allow for topological changes like merging interface contours. A detailed explanation of this procedure is given in [35, 67]. The solution vectors \({\varvec{Q}}\) and \(\Phi \) as well as the contravariant flux \(\varvec{{\mathcal {F}}}\) and contravariant level-set \(\varvec{{\mathcal {L}}}\) are approximated as piecewise polynomials

$$\begin{aligned} {\varvec{Q}}(\varvec{\xi },t)&\approx \sum ^N_{i,j,k=0}{\hat{{\varvec{Q}}}}_{ijk}(t)\zeta _{ijk}(\varvec{\xi }) ,\quad \Phi (\varvec{\xi },t) \approx \sum ^N_{i,j,k=0}{\hat{\Phi }}_{ijk}(t)\zeta _{ijk}(\varvec{\xi })\end{aligned}$$
(15)
$$\begin{aligned} \varvec{{\mathcal {F}}}(\varvec{\xi },t)&\approx \sum ^N_{i,j,k=0}\varvec{{\mathcal {F}}}_{ijk}({\hat{{\varvec{Q}}}}_{ijk})\zeta _{ijk}(\varvec{\xi }) ,\quad \varvec{{\mathcal {L}}}(\varvec{\xi },t) \approx \sum ^N_{i,j,k=0}{\hat{\varvec{{\mathcal {L}}}}}_{ijk}(t)\zeta _{ijk}(\varvec{\xi }) \end{aligned}$$
(16)

in the space spanned by tensor products of one-dimensional Lagrange polynomials

$$\begin{aligned} \zeta _{ijk}(\varvec{\xi }) = \ell _i(\xi ^1)\ell _j(\xi ^2)\ell _k(\xi ^3). \end{aligned}$$
(17)

According to the Galerkin ansatz, we use the tensor product Lagrange polynomials for both the ansatz functions \(\zeta \) and the test functions \(\psi \).

A semi-discrete scheme is eventually obtained by introducing a numeric quadrature for the integrals in Eqs. (12) and (13). A central idea of the DGSEM is to use the same node distribution, in our case Legendre–Gauss nodes, for both the quadrature and the interpolation. This collocation approach results in a tensor product structure of the DG operator and reduces the number of required operations per degree of freedom (DOF). A multi-dimensional operator can thus be constructed as a succession of multiple one-dimensional operations.

The extension to a p-adaptive DGSEM discretization is straightforward, by allowing for a variable element-local degree \(N\in [N_{\textrm{min}},N_{\textrm{max}}]\). The bounds \(N_{\textrm{min}}\) and \(N_{\textrm{max}}\) for the allowed polynomial degree need to be set prior to each computation. In the general case, the degrees of adjacent elements differ, \(M\ne N\) with \(M>N\), such that the flux computation needs to be adjusted. As depicted in Fig. 2, the surface solution \({\varvec{Q}}_N^-\) of the element with the lower degree N is interpolated to the higher degree \({\varvec{Q}}_M^-\). With surface solutions \({\varvec{Q}}_M^-\) and \({\varvec{Q}}_M^+\) of common degree M, the numerical flux \(\varvec{{\mathcal {F}}}^*_M\) can be computed. For the element of degree N, the flux is subsequently projected to the lower degree \(\varvec{{\mathcal {F}}}^*_N\).

Fig. 2
figure 2

Flux calculation for two adjacent elements with degree \(N=3\) (left) and \(M=4\) (right). The node distribution is indicated by dots in the volume and by squares on the surface. The numerical flux is computed on the solution representation of the higher polynomial degree

We follow an analogous procedure for the level-set transport equation, using the path-conservative jump term \((\varvec{{\mathcal {D}}}\cdot {\varvec{n}}_\xi )^*\) instead of the numerical flux. When the local degree of an element changes during runtime, the solution is either interpolated to a higher degree or projected to a lower degree. For a more detailed derivation of the p-adaptive scheme, we refer to [46].

3.1.2 h-Refined FV Scheme for DGSEM

In the presence of strong gradients or discontinuities within an element, high order DGSEM schemes produce oscillatory solutions (Gibb’s phenomenon). For the Euler equations, shock waves and phase boundaries are typical discontinuous solution features that necessitate a stabilization technique. The level-set field can develop sharp gradients for large curvatures, merging interface contours and the cut-off at the edge of the narrow band. To stabilize the solution, we combine the DGSEM with a robust FV scheme on an h-refined sub-cell grid. The FV scheme is formally obtained by introducing piecewise constant test and basis functions and is thus equivalent to a first order DGSEM. With piecewise constant basis functions, the derivative in the volume integral vanishes, simplifying the weak formulations (12) and (13) to

$$\begin{aligned} \int _{e}\left( J_{\textrm{geo}}\frac{\partial {\varvec{Q}}}{\partial t}\right) \psi d\Omega + \oint _{\partial e} (\varvec{{\mathcal {F}}}\cdot {\varvec{n}}_{\varvec{\xi }}) \psi d S_\xi&= 0 \end{aligned}$$
(18)
$$\begin{aligned} \int _{e}\left( J_{\textrm{geo}}\frac{\partial \Phi }{\partial t}\right) \psi d\Omega + \oint _{\partial e} ({\varvec{s}}\ \cdot \nabla _{\varvec{\xi }}\varvec{{\mathcal {L}}}) \psi d S_\xi&= 0 \end{aligned}$$
(19)

The reduced approximation order is compensated by refining an affected DG element E to a sub-cell grid of \(N_{\textrm{FV}}\) equidistant sub-cells e per direction. This improves the localization of shocks and increases the resolution of the surrogate phase boundary significantly. In the previous implementation of our framework [35], a DG operator with constant degree N and a sub-cell resolution \(N_{\textrm{FV}}=N+1\) were used. With this choice, both DG and FV elements use the same number of DOFs, which allows a common data structure and thereby eases the implementation of the sub-cell approach. Following Dumbser et al. [15], we choose an increased sub-cell resolution of \(N_{\textrm{FV}}=2N_{\textrm{max}}+1\). This particular choice is motivated by the time step restriction of the DGSEM compared to the FV scheme. In case of an equidistant sub-cell grid, \(N_{\textrm{FV}}=2N_{\textrm{max}}+1\) is the maximum sub-cell resolution that does not impose a more restrictive time step compared to a DG element of degree \(N_{\textrm{max}}\). Thus, the increased sub-cell resolution of the phase boundary can be achieved without compromising the time step. The accuracy of the FV sub-cell scheme is further improved through a TVD second-order reconstruction scheme, proposed by [56].

Switching between the DG and FV operators requires transformations between piecewise polynomial and piecewise constant solution representations. The transformation \(DG\rightsquigarrow FV\) can be performed in a conservative manner by integrating the DG polynomial over each sub-cell e. Conversely, a polynomial representation \(FV\rightsquigarrow DG\) can be recovered from FV sub-cell data with a discrete projection, equivalent to a least square approach with integral conservation as a constraint. This transformation was proposed by Dumbser et al. [13] for the case \(N_{\textrm{FV}} > N+1\), which causes an over-determined system. Switching between a DG and FV solution is performed as a matrix vector operation with the transformation matrices \({\varvec{V}}^{\textrm{FV}\rightsquigarrow \textrm{DG}}\) and \({\varvec{V}}^{\textrm{FV}\rightsquigarrow \textrm{DG}}\), for which holds

$$\begin{aligned} {\varvec{V}}^{\textrm{FV}\rightsquigarrow \textrm{DG}}\circ {\varvec{V}}^{\textrm{DG}\rightsquigarrow \textrm{FV}}= {\varvec{I}}. \end{aligned}$$
(20)

Adjacent FV sub-cells are coupled via a numerical flux \(\varvec{{\mathcal {F}}}^*\). Following [56], we use the FV sub-cell scheme as an a priori limiter to stabilize the bulk fluid flow. Thus, every element is discretized either by the DG or the FV operator, according to an indicator evaluated prior to each time step. In case of a mixed interface between a DG and a FV element, as depicted in Fig. 3, a common solution representation needs to be provided for the flux computation. To this end, the DG solution is transformed to a FV sub-cell representation at the element side. With a consistent FV solution representation from both sides, the numerical flux can be evaluated as usual and is subsequently transformed back to a DG discretization for the DG element.

Fig. 3
figure 3

Flux computation for a mixed interface with a DG element of degree \(N=3\) (left) and a FV sub-cell element with \(N_{\textrm{FV}}=4\) sub-cells (right). DG volume nodes and FV sub-cell centers are indicated by dots. Surface nodes are indicated by squares. The numerical flux is computed on a piecewise constant FV representation \(\varvec{{\mathcal {F}}}^*_{\textrm{FV}}\) and projected afterwards to a DG solution \(\varvec{{\mathcal {F}}}^*_{\textrm{DG}}\) to provide the flux for the DG element

For the level-set transport equation, the FV sub-cell scheme is applied as an a posteriori limiter. We compute the time update for a pure DG representation as well as a pure FV representation. The update candidate for each element is then chosen according to an a priori indicator. This procedure is motivated by the occurrence of spurious oscillations if adjacent DG and FV elements are coupled via the path-conservative jump term [13]. It should be noted that evaluating both the DG and the FV operator for each element has a minor effect on the overall computation time since the level-set transport equation is only solved in a narrow band around the phase interface.

3.2 Time Discretization

The Euler equations and the level-set transport are both integrated in time with an explicit fourth-order low-storage Runge–Kutta scheme [37]. The time integration is subject to time step restrictions, derived from the CFL condition by Courant, Friedrichs and Lewy [10]. For the DGSEM and FV discretizations of the Euler equations and the level-set transport, time step restrictions can be formulated as follows

$$\begin{aligned} \Delta t_{\textrm{DG}}&= \text {CFL} \cdot \alpha _{\textrm{RK}}(N) \cdot \frac{\Delta x_{\textrm{DG}}}{\left( 2N+1 \right) \left| \lambda _{\textrm{max}} \right| } \end{aligned}$$
(21)
$$\begin{aligned} \Delta t_{\textrm{FV}}&= \text {CFL} \cdot \alpha _{\textrm{RK}}(0) \cdot \frac{\Delta x_{\textrm{FV}}}{ \left| \lambda _{\textrm{max}} \right| }. \end{aligned}$$
(22)

They depend on the maximum signal velocity \(\left| \lambda _{\textrm{max}} \right| \), the grid spacing \(\Delta x_{\textrm{DG}}\) or \(\Delta x_{\textrm{FV}}\) and the empirical scaling factor \(\alpha _{\textrm{RK}}(N)>0\) for the Runge–Kutta scheme. The factor \(\alpha _{\textrm{RK}}(N)\) accounts for the size of the stability region of a particular Runge–Kutta method for a spatial discretization operator and is determined through numerical experiments in [5]. It decreases monotonously for an increasing order N, thus \(\alpha _{\textrm{RK}}(N+1)<\alpha _{\textrm{RK}}(N)\) holds. The DG time step \(\Delta t_{\textrm{DG}}\) is further scaled by the factor \((2N+1)\) to account for the element-local degree N. A comparison between the time step restrictions (21) and (22) reveals that for \(\Delta x_{\textrm{FV}}>\Delta x_{\textrm{DG}}/(2N+1)\), the FV method will not impose a smaller time step compared to a DG element of degree N. This motivates the choice of a sub-cell resolution of \(N_{\text {FV}}=2N_{\text {max}}+1\), with \(N_{\text {max}}\) denoting the highest DG degree within the computation.

In case of the Euler equations the maximum signal velocity is given by the maximum absolute eigenvalue \(\lambda _{\textrm{max}}=\max (|c-u|,|u|,|c+u|)\) with the speed of sound c. For the level-set transport, the maximum signal velocity corresponds to the transport velocity \(\lambda _{\textrm{max}}=s\). In two-phase flows, an additional time step restriction is imposed by capillary waves of the phase interface. According to Denner et al. [11], the dynamic capillary time step constraint depends on the signal velocity

$$\begin{aligned} \lambda _{\textrm{max}} = \sqrt{\frac{2\pi \sigma }{\rho \Delta x_{\textrm{FV}}}}+|{\varvec{s}}| \end{aligned}$$
(23)

with the surface tension \(\sigma \). Instead of the sum of the liquid and the vapor densities, we only use the density of the current degree of freedom to simplify the implementation. Since the sum of liquid and vapor density is always larger than either the liquid or the vapor density, this is a conservative assumption and thus does not compromise numerical stability. The capillary time step criterion is only relevant at the phase boundary that is discretized with the FV sub-cell scheme. Therefore, the signal velocity is formulated for a FV sub-cell element with the grid spacing \(\Delta x_{\textrm{FV}}\).

The maximum signal velocity of the level-set transport s is of the same order of magnitude as the convective velocity of the bulk flow u. Thus, the signal speed of the Euler equations \(c+u\) is usually larger and therefore the dominant time step restriction in practical applications.

3.3 Indicator for FV Sub-Cell Limiting and p-Refinement

p-adaptivity and FV sub-cell limiting are controlled on an element-local level through a combination of different indicators. First, we infer an estimate for the error and the smoothness of the solution from the decay rate of the modal polynomial solution representation [41, 46]. To this end, we interpolate the nodal polynomial solution representation \({\hat{{\varvec{u}}}}_{\textrm{nod}}\) of the DGSEM scheme to a modal Legendre basis \({\hat{{\varvec{u}}}}_{\textrm{mod}}\) using a Vandermonde matrix \({\varvec{V}}_{\textrm{Leg}}\)

$$\begin{aligned} {\hat{{\varvec{u}}}}_{\textrm{mod}} = {\varvec{V}}_{\textrm{Leg}} {\hat{{\varvec{u}}}}_{\textrm{nod}}. \end{aligned}$$
(24)

The relative contribution \(\omega _m\) of the \(m^\textrm{th}\) mode in \(\xi _1\) direction to the solution can be expressed as

$$\begin{aligned} w_{m}=\left[ \frac{\left( \sum ^{N}_{j,k=0}{\hat{{\varvec{u}}}}_{\textrm{mod},ijk}^2\right) ^{i=m}}{\sum ^{N}_{i,j,k=0}{\hat{{\varvec{u}}}}_{\textrm{mod},ijk}^2} \right] . \end{aligned}$$
(25)

The contributions \(\omega _m\) are subsequently fitted to the exponential function \(w_{m} = ae^{-\sigma m}\) for every direction and the indicator is obtained as

$$\begin{aligned} {\mathcal {I}}_{\textrm{modal}} = \textrm{min}(|\sigma _1|,|\sigma _2|,|\sigma _3|) \end{aligned}$$
(26)

The final indicator value \({\mathcal {I}}_{\textrm{modal}}\) is then compared against user-defined thresholds to decide whether p-refinement or FV sub-cell limiting is required. For every degree N, lower thresholds \({\mathcal {T}}_{\textrm{low}}(N)\) and uppers threshold \({\mathcal {T}}_{\textrm{up}}(N)\) are defined for both the DG p-refinement and FV sub-cell limiting:

$$\begin{aligned} {\mathcal {T}}_{\textrm{low},\textrm{FV}}(N)&= {\mathcal {T}}_{\textrm{min},\textrm{FV}}+N \frac{{\mathcal {T}}_{\textrm{max},\textrm{FV}}-{\mathcal {T}}_{\textrm{min},\textrm{FV}}}{N_{\textrm{max}}-N_{\textrm{min}}} \end{aligned}$$
(27)
$$\begin{aligned} {\mathcal {T}}_{\textrm{up},\textrm{FV}}(N)&= {\mathcal {T}}_{\textrm{min},\textrm{FV}}+(N+1) \frac{{\mathcal {T}}_{\textrm{max},\textrm{FV}}-{\mathcal {T}}_{\textrm{min},\textrm{FV}}}{N_{\textrm{max}}-N_{\textrm{min}}} \end{aligned}$$
(28)
$$\begin{aligned} {\mathcal {T}}_{\textrm{low},\textrm{DG}}(N)&= {\mathcal {T}}_{\textrm{min},\textrm{DG}}+N \frac{{\mathcal {T}}_{\textrm{max},\textrm{DG}}-{\mathcal {T}}_{\textrm{min},\textrm{DG}}}{N_{\textrm{max}}-N_{\textrm{min}}} \end{aligned}$$
(29)
$$\begin{aligned} {\mathcal {T}}_{\textrm{up},\textrm{DG}}(N)&= {\mathcal {T}}_{\textrm{min},\textrm{DG}}+(N+1) \frac{{\mathcal {T}}_{\textrm{max},\textrm{DG}}-{\mathcal {T}}_{\textrm{min},\textrm{DG}}}{N_{\textrm{max}}-N_{\textrm{min}}}. \end{aligned}$$
(30)

The thresholds depend on four free parameters \({\mathcal {T}}_{\textrm{min},\textrm{FV}}\), \({\mathcal {T}}_{\textrm{max},\textrm{FV}}\), \({\mathcal {T}}_{\textrm{min},\textrm{DG}}\) and \({\mathcal {T}}_{\textrm{max},\textrm{DG}}\) that are tuned empirically for the bulk flow and the level-set transport in the test problems of Sect. 5. For a more in-depth discussion of the modal indicator and the threshold definitions, the reader is referred to [46].

To stabilize both the bulk flow and the level-set transport in the presence of phase boundaries, additional criteria for the FV sub-cell limiting are defined. Using the position of the level-set zero-line, elements containing a phase boundary are detected and flagged for FV sub-cell limiting in the Euler equations by the indicator \({\mathcal {I}}_{\textrm{interface}}\). In case the level-set function changes its sign multiple times within a DG element, we assume approaching level-set contours and flag affected elements with the indicator \({\mathcal {I}}_{\textrm{topo}}\) for FV sub-cell limiting [35, 67]. In conclusion, p-refinement is controlled for the bulk flow and the level-set by the indicator \({\mathcal {I}}_{\textrm{modal}}\). FV sub-cell limiting depends on the combination of \({\mathcal {I}}_{\textrm{modal}}\) and \({\mathcal {I}}_{\textrm{interface}}\) in the bulk, and on \({\mathcal {I}}_{\textrm{modal}}\) and \({\mathcal {I}}_{\textrm{topo}}\) for the level-set.

3.4 Discretization of the Reinitialization and Velocity Extrapolation

The scalar Hamilton–Jacobi equations of the reinitialization (6) and the velocity extrapolation (9) are discretized with a fifth-order WENO scheme [33] on the FV sub-cell grid. We apply a local Lax–Friedrichs flux for the velocity extrapolation and a Godunov-type flux for the reinitialization [18]. Both Hamilton–Jacobi equations are integrated in pseudo time with an explicit Runge–Kutta scheme [64]. In case of the velocity extrapolation, scalar equations are solved for every space dimension. To obtain initial data for the velocity extrapolation, the velocities provided by the two-phase Riemann solver at the surrogate phase boundary are copied to adjacent sub-cells. If a sub-cell is affected by multiple two-phase Riemann problems, an averaging is performed. From these pointwise data, the velocity extrapolation routine constructs a velocity field in the narrow band region. For a detailed discussion of reinitialization and velocity extrapolation, we refer the reader to [18, 50, 58].

3.5 Calculation of Level-Set Normals and Curvature

The evaluation of the two-phase Riemann problem requires the normal vectors and the curvature of the level-set field at the phase boundary. The normals (7) and the curvature (8) are expressed in terms of first and second derivatives of the level-set field. These derivatives are approximated by applying a central second-order finite difference scheme to the level-set field in FV representation. The second derivatives are obtained by reapplying the same operator to the first derivatives.

3.6 Ghost-Fluid Method and Two-Phase Riemann Solver

The ghost-fluid method of Fedkiw et al. [22] is a common tool in sharp interface models to couple the bulk phases at the phase boundary. It is based on the idea to define ghost states next to the phase interface that serve as boundary conditions for the bulk phases. With these boundary conditions, the bulk phases can be treated with standard methods for single phase flows. In the present framework, ghost states \({\varvec{Q}}_{l}^{{\mathcal {G}}}\) and \({\varvec{Q}}_{v}^{{\mathcal {G}}}\) are obtained as the inner states of a two-phase Riemann problem [43], evaluated at the surrogate phase boundary. Subsequently, ghost fluxes across the phase boundary are computed with the ghost states as \(\varvec{{\mathcal {F}}}_l^{{\mathcal {G}}}=\varvec{{\mathcal {F}}}_l^{{\mathcal {G}}}({\varvec{Q}}_{l},{\varvec{Q}}_{l}^{{\mathcal {G}}})\) and \(\varvec{{\mathcal {F}}}_v^{{\mathcal {G}}}=\varvec{{\mathcal {F}}}_v^{{\mathcal {G}}}({\varvec{Q}}_{v}^{{\mathcal {G}}},{\varvec{Q}}_{v})\) respectively. The two-phase Riemann problem and the ghost-fluid method are visualized with an exemplary sketch in Fig. 4. As a consequence of the non-unique flux across the phase boundary, the ghost-fluid method is not conservative. An additional result of the two-phase Riemann solver is the wave speed \({\varvec{s}}\) of the material interface, which serves as point wise data for the extrapolation of the level-set velocity field, see Sect. 3.4.

Fig. 4
figure 4

Illustration of an exemplary two-phase Riemann problem at the phase interface (left) with the intermediate states \({\varvec{Q}}_{l}^{{\mathcal {G}}}\) and \({\varvec{Q}}_{v}^{{\mathcal {G}}}\). The intermediate states of the Riemann problem serve as ghost states to define boundary conditions and to compute the ghost fluxes \(\varvec{{\mathcal {F}}}_l^{{\mathcal {G}}}({\varvec{Q}}_{l},{\varvec{Q}}_{l}^{{\mathcal {G}}})\) and \(\varvec{{\mathcal {F}}}_v^{{\mathcal {G}}}({\varvec{Q}}_{v}^{{\mathcal {G}}},{\varvec{Q}}_{v})\) across the phase boundary (right)

If the sign of the level-set changes within a sub-cell, i.e. the phase interface has moved from one to the next sub-cell, the respective sub-cell changes its phase. In this case, the state vector \({\varvec{Q}}\) of the sub-cell is populated by the ghost state \({\varvec{Q}}^{{\mathcal {G}}}={\varvec{Q}}^{{\mathcal {G}}}(x=0,t)\) given as the inner state of the two-phase Riemann problem. For a detailed discussion of the ghost-fluid method and the two-phase Riemann solver, the reader is referred to [18, 21, 34].

3.7 Assembly of the Level-Set Ghost-Fluid Framework

Finally, we can assemble the introduced building blocks to obtain an adaptive level-set ghost-fluid framework. Figure 5 illustrates the interaction of the different operators for one time step \([t_n,t_{n+1}]\). Within one time step, the following steps are performed:

  1. 1.

    Based on the sign of the level-set scalar, the domain \(\Omega \) is divided into \(\Omega _{l}\) and \(\Omega _{v}\).

  2. 2.

    At the phase boundary, DG elements are switched to the FV sub-cell representation to provide a stable discretization of the sharp interface and to improve the approximation by the surrogate phase boundary.

  3. 3.

    The level-set field is switched to the FV representation for the subsequent WENO and FV operators.

  4. 4.

    Before evaluating geometric properties of the level-set, a WENO-based reinitialization of the level-set field is performed to restore the signed distance property.

  5. 5.

    Level-set normals and curvature are calculated by a second-order FV method.

  6. 6.

    With the geometric properties of the level-set, a two-phase Riemann problem is solved at the surrogate phase boundary. It provides ghost states to couple the bulk phases as well as the local velocity of the phase interface.

  7. 7.

    The pointwise information of the interface velocity at the surrogate phase boundary is copied to adjacent sub-cells and extrapolated with a WENO routine to obtain a velocity field for the level-set transport.

  8. 8.

    Based on the indicators of Sect. 3.3, the DG/FV distributions for the bulk flow and level-set transport are determined. Furthermore, the polynomial degree for DG elements is chosen based on the indicator \({\mathcal {I}}_{\textrm{modal}}\).

  9. 9.

    Finally, the adaptive hybrid DG/FV operators are advanced for the bulk flow and the level-set.

Fig. 5
figure 5

Assembly of the level-set ghost-fluid framework from the main building blocks and chronological interaction within one time step. Operators applied in every Runge–Kutta stage are colored light gray. Steps that are only performed once per time step are highlighted in dark gray. The novelty is the hp-refinement step prior to the application of the spatial operators for the bulk flow and level-set transport (Color figure online)

4 Parallelization Concept

4.1 MPI Communication

In general, the hp-adaptive scheme, employed for the bulk phase (see Sect. 3.1), is formulated for unstructured, curved grids, which has been demonstrated in [46]. While the WENO scheme discretizing the Hamilton–Jacobi equations (Sect. 3.4) restricts the presented multiphase framework to Cartesian grids, it internally still operates on an unstructured grid representation. The grid representation and the associated parallelization concept inherit from the work by [30] and are optimized towards massively parallelized computations, as shown in [4, 39].

The parallelization concept exploits that the DG operator can be split into two building blocks, cf. Eqs. (12) and (13): the volume integral, which depends exclusively on the element’s inner nodes, and the surface integral, which requires information of the neighbored elements. In parallel simulations, this information exchange implies a communication operation if the two elements sharing one side reside on different partitions. The induced idle time can conveniently be hidden by computing the element-local volume integral while exchanging the side data for the surface integral through non-blocking communication operations. This latency hiding approach increases parallel performance as long as the network communication is faster than the memory buffer operations. The communication routines are provided by the widely used MPI standard [44], which is commonly available on HPC systems in architecture-optimized implementations. Moreover, the latency hiding approach is applied to the surface integral itself. To this end, the data structure separates volume and surface data, with the solution \({\varvec{u}}\) represented at the corresponding interpolation nodes each. Every element face, except for those at the domain boundaries, forms a pair (side) with the face of the adjacent element. As the numerical flux evaluated from this data pair is unique, it needs to be computed only once. We therefore label the element faces as major or minor and evaluate the numerical flux solely on the major. If two elements share a common side but reside on different partitions, the surface data of the minor need to be sent to the major prior to the flux computation, and the resulting flux values are sent back afterwards. Following the aforementioned latency hiding approach, this communication pattern suggests to first calculate the flux on those MPI-sides and use the idle time to calculate the flux on the processor-local inner sides.

Fig. 6
figure 6

Flowchart of a major/minor side communication for the hp-adaptive data structure

In the present hp-adaptive framework, the recurring send and receive operations are accomplished efficiently by collecting the surface data per polynomial degree N in a dedicated array \({\varvec{u}}_{\textrm{recv}}\), as depicted in Fig. 6. Once the communication has finished, \({\varvec{u}}_{\textrm{recv}}\) is unpacked by copying the received data back to the DG side array \({\varvec{u}}_{\textrm{minor}}\). These additional buffer operations account for the fact that the DG side arrays are only sparsely filled in terms of one specific degree N. The same strategy is applied to the computed fluxes. That is, transforming \({\varvec{u}}_{\textrm{minor}}\) to the common polynomial degree as well as projecting the resulting flux back to N (cf. Sect. 3.1) are both performed locally on the major side. Thus, only surface data of the actual degree N are exchanged, which minimizes the communication amount and further enhances the scalability of the hp-adaptive scheme.

4.2 Domain Decomposition

As indicated, the parallelization strategy of the present framework relies on splitting the computational domain into separate subdomains (partitions) of one or more grid cells, which are each assigned to one processor unit. For the given, unstructured mesh representation, Hindenlang [30] suggested the use of space-filling curves (SFCs) for the domain decomposition. Compared to graph partitioning approaches, they were reported to provide a similar surface-to-volume ratio, which correlates with the communication amount per data operation, while greatly simplifying data structures (cf. Sect. 4.1), parallel I/O, and dynamic load balancing (Sect. 4.3). Specifically, we employ the Hilbert SFC [29], which returned highly compact subdomains and thus proved suited for massively parallel simulations.

4.3 Dynamic Load Balancing Scheme

One of the major challenges to the efficient utilization of current and future HPC systems is load balance, i.e. the even distribution of the application workload across the processor units [12, 40]. Temporal changes in the workload at runtime, in particular, necessitate a dynamic load balancing (DLB) approach.

In the presented hp-adaptive multiphase framework, two main factors contribute to a workload imbalance: First, the element-local choice between a p-adaptive DGSEM discretization and the FV sub-cell scheme, and second, the narrow band approach of the interface tracking algorithm [2], which confines the costly solution of the level-set equations to a small band of 2–3 elements adjacent to the surrogate phase boundary. Since both the element-local discretization and the position of the phase interface may change over time, the processor workload needs to be balanced dynamically.

In this work, we extend and improve the DLB scheme of [2] to fit our adaptive discretization. It relies on a domain decomposition through SFCs, which conveniently reduces the partitioning problem to one dimension. The partitioning problem seeks to distribute K elements with the weights \(w_i,\; i=1,\ldots ,K\) among P available processors, such that the maximum processor load, also referred to as bottleneck B, is minimized:

$$\begin{aligned} B = \max \limits _{1\le k \le P} \{L_k\},\quad \text {with} \quad L_k = \sum _{i=s_{k-1}+1}^{s_k}w_i. \end{aligned}$$
(31)

Here \(s_k\) denotes the index of the last element associated with the processor k, and \(L_k\) the total processor workload. The separator indices \(s_0=0\le s_1 \le \cdots \le s_k\cdots \le s_P = K\) subdivide the domain of K elements into P partitions of \(N_{k}= ( s_k-s_{k-1} ) \) elements. The task of the partitioning algorithm is thus to find the sequence of separator indices \((s_0,\ldots ,s_P)\) with the lowest feasible bottleneck \(B^{\textrm{opt}}\). The value of \(B^{\textrm{opt}}\) is unknown and bounded below by the ideal bottleneck

$$\begin{aligned} B^* = \frac{1}{P} \sum _{i=1}^{N} w_i \end{aligned}$$
(32)

which assumes identical loads among all partitions, i.e. \(B^{\textrm{opt}}\ge B^*\).

The partitioning problem, commonly known as chains-on-chains partitioning problem [53], has been addressed by a variety of algorithms. Algorithms that always return an optimal partitioning are called exact, opposed to heuristics, which, however, typically execute faster and can be parallelized. An extensive overview of both, exact methods and heuristics is given in [53]. In our work, we adopt the bisection algorithm “ExactBS+PI” recently proposed by [40]. Bisection algorithms in general use binary search to find a good bottleneck. They repeatedly call a probing function, that checks a given bottleneck value for feasibility, by adjusting the provided bottleneck iteratively through bisection. ExactBS+PI specifically employs an improved probing function and a more accurate initial search interval for the bisection. It thus features quick execution times while remaining exact. For details, the reader is referred to the original work, where ExactBS+PI is embedded in a hierarchical method to design a highly scalable, near-exact partitioning algorithm.

A crucial building block of our load balancing scheme is the determination of the weights \(w_i\). Following [2], we perform element-local wall time measurements. The code instrumentation exploits that the narrow band approach decomposes the set \({\mathcal {D}}\) which contains all elements in \(\Omega \) into three intersecting subsets: The bulk elements \({\mathcal {D}}_{\textrm{bulk}}\equiv {\mathcal {D}}\), where only the Euler equations (1) are solved, the elements inside the narrow band \({\mathcal {D}}_{\textrm{NB}}\subset {\mathcal {D}}_{\textrm{bulk}}\) where the interface tracking is required additionally, and the elements directly at the interface \({\mathcal {D}}_{\Gamma }\subset {\mathcal {D}}_{\textrm{NB}}\), where the two-phase Riemann problem is evaluated. Evidently, the computational cost of an individual grid element rises in the listed order of these subsets.

In [2], the measured time was distributed evenly among the elements within each subset, which directly defined the element weight \(w_i\). This averaging ansatz was plausible since all elements used the same discretization and could therefore be expected to contribute equally to the cost of the considered subset.

To account for the variable element cost in the bulk due to the hp-adaptivity, the cost share of the individual discretization schemes needs to be determined. To this end, a calibration run at the beginning of the simulation calls the bulk operator once for every degree N and once for a pure FV sub-cell discretization in order to measure the average wall times per element \(\tau ^{N_{\textrm{min}}},\tau ^{N_{\textrm{min}}+1},\ldots ,\tau ^{N_{\textrm{max}}},\tau ^{N_{\textrm{FV}}}\).

At runtime, the measured time \(\tau ^{\textrm{bulk}}\) of subset \({\mathcal {D}}_{\textrm{bulk}}\) is distributed among the elements according to the relative cost of the different discretizations. This relative cost is evaluated by first calculating the expected wall time \(\tau ^{\textrm{cal}}\) of each subdomain based on the calibrated values \(\tau _i \in \{\tau ^{N_{\textrm{min}}},\ldots ,\tau ^{N_{\textrm{FV}}}\}\)

$$\begin{aligned} \tau ^{\textrm{cal}} = \sum _{i=s_{k-1}+1}^{s_k} \tau _i. \end{aligned}$$
(33)

This expected wall time, together with the measured wall time \(\tau ^{\textrm{bulk}}\), defines a dimensionless factor \(\chi \),

$$\begin{aligned} \chi = \frac{\tau ^{\textrm{bulk}}}{\tau ^{\textrm{cal}}}, \end{aligned}$$
(34)

which allows to infer the weight \(w_i\) of each bulk element from scaling the calibrated weight

$$\begin{aligned} w_i = \tau _i \cdot \chi . \end{aligned}$$
(35)

For now, the imbalance due to p-refinement in the discretization of the level-set advection (5) is neglected since here both the FV and DG operator are evaluated for every element. Thus, the variable degree N has only a minor effect on the computation time per element. In the subsets \({\mathcal {D}}_{\textrm{NB}}\) and \({\mathcal {D}}_{\Gamma }\), the average costs of the additional interface-related operations are added to \(w_i\).

With the element-local weights known, the current workload imbalance

$$\begin{aligned} I = \frac{B}{B^*} - 1 \quad \in [0,\infty ) \end{aligned}$$
(36)

can be evaluated. It assumes zero for identical workload on all processors, \(I(B=B^*)=0\), and triggers a repartitionig upon exceeding a user-defined threshold. To redistribute the elements between the processors, we first store the current solution of the bulk flow and the level-set field. Then, the new partitioning is computed globally, allowing each processor to determine which elements it needs to send and receive by comparing its old partition to the new one. During the communication, each processor has to recompute the mesh metrics for the new elements, thus hiding most of the communication time. Once the communication is completed, the computation resumes. This repartitioning procedure is summarized in Fig. 7.

Fig. 7
figure 7

Overview of the dynamic load balancing scheme

4.4 Parallel Performance Analysis

In this section, we investigate the influence of the hp-adaptive discretization on the strong scaling behavior. We chose a setup similar to the one used by Appel et al. [2], where the parallel performance of the previous, non-adaptive code version was analyzed. All computations are run on the HPE Apollo System Hawk at the High Performance Computing Center Stuttgart (HLRS). For the scaling tests, we used up to 64 compute nodes, each with two 64-core AMD EPYC 7742 CPUs and 256 GB RAM. The test case consists of a resting droplet in a three-dimensional domain \(\Omega \), discretized by \(30\times 30\times 30\) elements. To study the effects of the adaptive scheme on the performance, we enforced a checkerboard-like distribution of the local polynomial degree. For the bulk, the degree varies between \(N\in [2,4]\) and for the level-set, the degree changes between \(N\in [2,8]\). FV elements deploy \(N_{\textrm{FV}}=9\) sub-cells per spatial direction and the standard smoothness indicator presented in Sect. 3.3 is used. Figure 8 visualizes the discretization for a slice through the \(z=0\) plane.

Fig. 8
figure 8

Setup for the strong scaling test sliced at \(z=0\). The local polynomial degree N and the elements discretized by FV sub-cells are indicated with different colors (Color figure online)

It should be noted that the checkerboard-like discretization is the most expensive scenario for an adaptive scheme, since all element interfaces are of mixed type. To isolate the effects of the repartitioning and the novel hp-adaptive data structure on the parallel performance, we perform a single load balancing step after 5 time steps, followed by further 44 time steps. During these 44 time steps, we measure the performance index PID, which expresses the wall time required by a single processor to advance one DOF for one stage of the Runge–Kutta time integration scheme

$$\begin{aligned} \text {PID} = \frac{\#\text {total wall time} \cdot \#\text {cores } P}{\#\text {DOFs} \cdot \#\text {time steps} \cdot \#\text {RK-stages}}. \end{aligned}$$
(37)

The second metric to analyze the parallel performance is the parallel efficiency \(\eta \) which relates the current PID to the PID obtained with one node (\(P=128\))

$$\begin{aligned} \eta = \frac{\text {PID}_{P=128}}{\text {PID}} \end{aligned}$$
(38)

We compute the setup with a range of \(1,2,4,\ldots ,64\) compute nodes corresponding to \(P=128,256,\ldots ,8192\) cores and supply the results in Fig. 9.

Fig. 9
figure 9

Strong scaling behavior for the benchmark setup of Fig. 8

Up to 32 nodes, a parallel efficiency of about 50% is achieved. At 64 nodes, a sharp decrease of parallel efficiency can be observed. It can be attributed to the limit of the load balancing capabilities when the cost of the most expensive elements approaches the target weight of a single processor unit, \(B^*\). There are two main reasons for the decreased parallel efficiency: First, for processors containing very few elements, the latency hiding of the communication becomes less efficient and, secondly, the partitioning of loads among processors becomes less accurate, i.e. \(B^{\textrm{opt}}\gg B^*\).

It should be noted, that the overhead due to the load balancing is not included in the measured PID. For the present setup, one repartitioning and data redistribution together have a maximum cost equivalent to 17.2 time steps at 16 nodes and a minimum cost of 1.4 time steps at 64 nodes. The overhead due to the time measurements for the weight computation can be neglected.

5 Numerical Results

To assess the performance of the presented adaptive hybrid DG/FV sharp interface framework, we simulate a wide range of benchmark problems. The considered test cases are compressible, inviscid two-phase flows with the gas modeled as an ideal gas and the liquid modeled by the stiffened gas equation of state. The following list provides an overview of the test problems and motivates their choice:

  • The 1D Gas–liquid Riemann problem allows the validation of the scheme against an exact solution for a two-phase shock tube.

  • The 2D Air-helium shock-bubble interaction demonstrates the schemes ability to cope with severe interface deformations in the limit of an infinite Weber number.

  • The 2D Bubble collapse in water is commonly applied to test the robustness of a two-phase scheme in the presence of strong shocks during the bubble collapse.

  • The 2D Shock-droplet interaction is chosen to validate the framework against the results of Xiang and Wang [24]. It is further used as a benchmark case to study the scaling properties of the developed method.

  • The 3D Shock-droplet interaction is simulated to demonstrate the applicability of our method to three-dimensional problems.

For all setups, a hybrid discretization is used with a p-adaptive DG scheme with \(N\in [2,4]\) and a FV resolution of \(N_{\textrm{FV}}=9\) sub-cells per direction in the bulk flow. If not stated otherwise, the level-set transport is discretized with DG elements of \(N\in [2,8]\) and a FV resolution of \(N_{\textrm{FV}}=9\). The computations were again run on the Hawk system and used up to 32 nodes.

5.1 1D Gas–Liquid Riemann Problem

In a first step, the novel scheme is applied to the well-known one-dimensional gas–liquid shock tube problem of Cocchi et al. [8]. It considers two initially constant states inside a computational domain \(\Omega =[-1,1]\). The left side of the shock tube, \(x<0\), contains highly compressed air and the right side liquid at ambient pressure. The initial conditions are provided in non-dimensionalized form in Table 1.

Table 1 Initial conditions of the gas–liquid shock tube problem in non-dimensionalized form, following Cocchi et al. [8]
Fig. 10
figure 10

Density, velocity and pressure distribution of the gas–liquid Riemann problem at the final time \(t=0.2\). The exact solution is compared against a numerical solution computed with a grid of 40 elements. Smooth regions are discretized by a DG scheme (blue) with \(N\in [2,4]\), whereas a FV scheme (red) with \(N_{\textrm{FV}}=9\) sub-cells per element is applied at discontinuities (Color figure online)

The computational domain \(\Omega \) is discretized by 40 elements. The simulation runs until the final time \(t=0.2\). In Fig. 10, the numerical results of our approach are compared against the reference solution, which relies on the exact Riemann solver devised by Cocchi et al. [8] for this setup. As expected, the FV sub-cell scheme is only applied in regions of sharp gradients, that is at the right-moving shock in the liquid phase, the material interface between the phases and the left-moving rarefaction wave in the gas. Since the other regions contain nearly constant states, p-refinement is not necessary in those smooth regions and the DG discretization always uses the lowest possible degree \(N=2\). Despite the coarse discretization, the numerical result is in good agreement with the exact solution and only a minor undershoot can be observed at the material interface.

5.2 2D Air-Helium Shock-Bubble Interaction

As a first two-dimensional test case, we consider the interaction of a \(\textrm{Ma}=6\) shock wave in air with a planar, two-dimensional helium bubble [28, 31]. The setup results in severe deformations of the phase interface during the breakup of the helium bubble and is thus a challenging benchmark for a sharp interface framework. At time \(t=0\,\textrm{s}\), the computational domain \(\Omega =[0.0,0.356]\,\textrm{m}\times [-0.0445,0.0445]\,\textrm{m}\) contains a shock at \(x_1=0.1\,\textrm{m}\) and a helium bubble of radius \(r=0.025\,\textrm{m}\) is located at \(x_1=0.15\,\textrm{m}\). The setup is sketched in Fig. 11 and the initial conditions for the three constant regions \(\Omega _{v_s}\) before the shock, \(\Omega _{v}\) after the shock and \(\Omega _{v_b}\) inside the helium bubble are provided in Table 2.

Fig. 11
figure 11

Initial setup for the two-dimensional air-helium shock-bubble interaction

Table 2 Initial conditions of the two-dimensional air-helium shock-bubble interaction

For the discretization of the physical domain \(\Omega \), a resolution of \(512\times 64\) elements is chosen. Since the phase interface is always discretized by FV sub-cells, this results in an effective resolution of 647 DOFs per bubble diameter. Due to the symmetry of the setup, we only compute half of the domain \(\Omega \) and impose symmetry boundary conditions at \(x_2=0.0\,\textrm{m}\). For the remaining boundaries, we impose inflow boundary conditions on the left, non-reflecting boundary conditions on the right and wall boundary conditions on the top side. In the bulk flow, an HLLE approximate Riemann solver is used. The setup is computed until the final time \(t=1\cdot 10^{-4}\,\textrm{s}\). The current workload imbalance I is evaluated every 20 time steps and triggers a repartitioning if \(I>1.0\). The simulation was computed on 4 nodes of the cluster Hawk and consumed a total of 743 CPU hours for 18,861 time steps. During the computation, 112 DLB steps where performed and required a share of 3.87% of the total wall time. In Fig. 12, the density fields at the non-dimensional times \(t^*=(t u_1)/(2r)=\{0.6, 1.2, 1.8, 2.4\}\) are visualized. Our results are in good agreement with those by Han et al. [28]. An initial deformation followed by a roll-up of the bubble can be observed. Complex shock patterns develop and vortical structures evolve at entropy shear layers. Finally, the separation of helium droplets from the initial bubble and the enclosure of air filaments can be seen.

Fig. 12
figure 12

Density plots of the \(\textrm{Ma}=6\) air-helium shock-bubble interaction at the time instances \(t^*=0.6\), \(t^*=1.2\), \(t^*=1.8\) and \(t^*=2.4\)

The local polynomial degree N of DG elements and areas, where the FV sub-cell limiting is applied, are visualized at \(t^*=2.4\) in Fig. 13 for both the bulk flow and the level-set field. In the bulk flow, shocks and the phase interface are detected correctly and discretized by the FV scheme on a refined sub-cell grid. An increased polynomial degree is applied around vortical structures. To improve the accuracy of the geometry computation, elements containing the level-set root are discretized with the highest possible degree for the level-set transport equation. FV sub-cells are used mainly at the level-set cut off and at sharp kinks of the level-set contour or at merging phase boundaries. All in all, the p-refinement and the FV sub-cell limiting were adaptively controlled as expected, resulting in an accurate and stable representation of shocks and the phase interface as well as a high resolution of vortical structures. The proposed sharp interface framework thereby demonstrates its ability to cope with complex flow patterns and pronounced interface deformations.

Fig. 13
figure 13

Discretization of the bulk flow (left) and the level-set field (right) for the \(\textrm{Ma}=6\) air-helium shock-bubble interaction at \(t^*=2.4\). The element-local degree N of DG elements is indicated by color, whereas FV sub-cell elements are marked in gray. Detail plots highlight the refined resolution of the FV sub-cell grid compared to the underlying DG mesh (Color figure online)

5.3 2D Bubble Collapse in Water

As a second two-dimensional test case, we study the interaction of a planar \(\textrm{Ma}=1.72\) shock wave in water with a gas bubble. During the shock-induced collapse of the gas bubble, a shock wave and a high speed liquid jet are emitted. This mechanism is relevant for hydraulic pumps and turbo machinery, fuel injectors, naval propulsion systems and medical applications like lithotripsy. Since the setup involves strong pressure gradients, high speed fluid dynamics and a complex deformation of the phase interface, it has been frequently used to assess the efficiency of numerical schemes [26, 28, 49, 63].

We consider the computational domain \(\Omega =[0.0,0.024]\,\textrm{m}\times [-0.012,0.012]\,\textrm{m}\) initialized at time \(t=0.0\,\textrm{s}\) with a planar shock in the liquid at \(x_1=0.0066\,\textrm{m}\) and a gas bubble of radius of \(r=0.003\,\textrm{m}\) at \(x_1=0.012\,\textrm{m}\). The setup is sketched in Fig. 14, with the pre and post shock states as well as the state of the bubble provided in Table 3.

Exploiting the symmetry of the setup, we consider only half of the domain by imposing symmetry boundary conditions along the \(x_2=0.0\,\textrm{m}\) axis. Inflow boundary conditions are applied on the left side and the remaining sides are treated as non-reflecting boundaries. The computational domain \(\Omega \) is discretized by \(360\times 360\) elements, resulting in an effective resolution of 810 FV sub-cells per bubble diameter. For the bulk flow, we employ an HLLC approximate Riemann solver. The computation is run until a final time \(t=4.5\,\mathrm {\mu s}\), using the same load balancing parameters as above. On 2 nodes of the Hawk cluster, 105 CPU hours were spent on 3514 time steps, required for this test case. Of the wall time, a fraction of 5.7% was spend for 75 DLB steps. Figure 15 visualizes the non-dimensionalized density and pressure distributions \(\rho ^*=\rho /\rho _0\) and \(p^*=p/p_0\), with reference density \(\rho _0\) and pressure \(p_0\) set to the pre-shock state of the water.

Fig. 14
figure 14

Initial setup of the two-dimensional shock-induced bubble collapse

Table 3 Initial conditions of the two-dimensional shock-induced bubble collapse
Fig. 15
figure 15

Non-dimensionalized density and pressure fields of the shock-induced bubble collapse at various time instances

The simulation reproduces essential flow phenomena like the reflected rarefaction wave at \(t=1.9\,\upmu \textrm{s}\), the kidney shaped deformation of the droplet at \(t=3.7\,\upmu \textrm{s}\), the high speed water jet piercing the bubble at \(t=3.8\,\upmu \textrm{s}\) and the shock wave resulting form the bubble collapse at \(t=4.0\,\upmu \textrm{s}\). Additionally, very delicate flow features like secondary jets, that further divide the remnants of the initial bubble, can be observed at \(t=4.0\,\upmu \textrm{s}\) and \(t=4.2\,\upmu \textrm{s}\). Our numerical results agree very well with the simulations of Nourgaliev et al. [49]. To provide a qualitative comparison with literature, we evaluate the pressure field along the \(x_2=0\) line in Fig. 16. Until the final simulation time of \(t=4.5\,\upmu \textrm{s}\), a maximum pressure peak of \(5.1\,\textrm{GPa}\) can be observed which is in good agreement with the results reported by Tsoutsanis et al. [63].

Fig. 16
figure 16

Pressure distribution along the \(x_2=0\) line for various time instances

The adaptive discretization is analyzed for the time instance \(t=3.8\,\upmu \textrm{s}\) in Fig. 17. In the bulk flow, shocks and the phase interface are discretized by the FV sub-cell scheme, while acoustic waves as well as the direct vicinity of shocks are captured with an increased polynomial degree. Since the gas bubble is displaced by the significantly heavier water at high speed, the level-set transport velocity is in the order of magnitude of the incident shock wave. Therefore, the level-set field is only p-refined up to polynomial degree \(N=4\) to avoid a dominant time step constraint through to the level-set transport equation. The maximum degree is again applied at the level-set root to improve the accuracy of the geometry computation. The FV sub-cell scheme of the level-set field does not only capture the cut-off region at the narrow band edge, but also the tips of the separated bubbles due to their curvature.

Fig. 17
figure 17

Discretization of the bulk flow (left) and the level-set field (right) for the shock-induced bubble collapse at \(t=3.8\,\upmu \textrm{s}\). The element-local degree N is indicated by color, whereas FV sub-cell elements are marked in gray. Detail plots highlight the refined resolution of the FV sub-cell grid compared to the underlying DG mesh (Color figure online)

Fig. 18
figure 18

Initial setup of the 2D shock-droplet interaction without cavity

5.4 2D Shock-Droplet Interaction

In this section, we study the interaction of a planar shock wave with a water column, using the setups investigated by Xiang and Wang [24] and Tsoutsanis et al. [63]. We consider two configurations: a regular liquid column and a liquid column with an enclosed cavity. Both configurations are computed with a Weber number of \(\textrm{We}=1000\) to analyze the breakup in the shear-induced entrainment (SIE) regime [59].

Table 4 Initial conditions of the 2D shock-droplet interaction without cavity

5.4.1 Water Column

First, we consider the interaction of a \(\textrm{Ma}=2.4\) shock wave in gas with a pure liquid column. The domain \(\Omega =[0.0,0.06]\,\textrm{m}\times [-0.02,0.02]\,\textrm{m}\) contains a planar shock at \(x_1=0.0074\,\textrm{m}\) and a water column of radius \(r=0.0048\,\textrm{m}\) at \(x_1=0.015\,\textrm{m}\) at the initial time \(t=0.0\,\textrm{s}\). The setup is illustrated in Fig. 18, with the initial conditions provided by Table 4. The surface tension is set to \(\sigma =11.90\,\mathrm {N/m}\) to obtain \(\textrm{We}=1000\).

The computational domain \(\Omega \) is discretized by \(360\times 240\) elements, which corresponds to a FV sub-cell resolution of 518 DOFs per droplet diameter. We again impose symmetry boundary conditions along the \(x_1=0\) axis, inflow boundaries to the left side and non-reflecting boundaries at the right and top. For the flux computation in the bulk, an approximate HLLC Riemann solver is chosen. The setup is advanced until a final computation time of \(t=2.0\cdot 10^{-4}\,\textrm{s}\). The workload distribution is evaluated every 50 time steps, with the imbalance threshold again set to \(I>1\). The computational cost on 4 nodes of the Hawk system amounts to 694.8 CPU hours for 19,510 time steps required for the described setup. In total, 160 DLB steps were performed, taking up 6.0% of the overall wall time. Figure 19 supplies the numerical results at the non-dimensionalized time instances \(t^*=(t u_1)/(2r)=\{0.8, 1.62, 5.5, 11.8\}\). Schlieren plots visualize the vortical structures and shocks, while the pressure field is given in non-dimensionalized form \(p*=p/p_0\), with \(p_0\) being the pre shock pressure in the gas at \(t=0.0\).

Fig. 19
figure 19

Schlieren images and pressure fields of the 2D \(\textrm{Ma}=2.4\) shock-droplet interaction for a Weber number of \(\textrm{We}=1000\) (SIE regime)

In particular, the wave pattern at time \(t^*=0.8\) has been thoroughly studied in literature, e.g. by Meng and Colonius [42] and Xiang and Wang [24], and the reported numerical results are supported by experimental investigations of Sembian et al. [57]. Our computation reproduces all relevant flow features: the reflected incident shock wave, the weak transmitted shock wave, the reflected rarefaction inside the water column and the Mach stem and slip surfaces.

To support our results quantitatively, we compare the pressure distribution along the \(x_2=0\) axis at time \(t^*=0.8\) with the results of Xiang and Wang [24] in Fig. 20. A near perfect agreement can be observed.

Fig. 20
figure 20

Pressure distribution along the symmetry line \(x_2=0\) of the shock-droplet interaction without cavity and \(\textrm{We}=1000\) at \(t^*=0.8\). The results are compared against the data from Xiang and Wang [24]

During the later stages of the simulation, the water column is flattened and complex recirculation zones and vortex patterns develop in the wake. With \(\textrm{We}=1000\), the droplet breakup occurs in the SIE regime and the formation of filaments, that are stripped from the main liquid body can be observed. The final column shape at \(t^*=11.8\) agrees well with the findings of Xiang and Wang [24] and Tsoutsanis et al. [63]. Computations with the previous code framework, published by Jöns et al. [35], applied a constant number of DOFs per element. To demonstrate the efficiency of the novel adaptive scheme, we compare the adaptive computation with \(N\in [2,4]\) and \(N_{\textrm{FV}}=9\) against non-adaptive computations with \(N=4\) and \(N_{\textrm{FV}}=5\). Table 5 lists the average number of DOFs and the computation time for all setups. Setup 1 uses the same mesh as the adaptive computation, which results in a lower effective number of DOFs per droplet diameter due to the coarser FV sub-cell grid. To achieve the same resolution at the phase interface as the adaptive scheme, setup 2 uses a finer mesh. In Fig. 21, the temperature fields and element-local discretizations are compared for the three configurations.

Table 5 Efficiency comparison of the non-adaptive discretization (setups 1 and 2) with the proposed adaptive scheme (setup 3)
Fig. 21
figure 21

Temperature field (top) and bulk discretization (bottom) of the 2D shock-droplet interaction with \(\textrm{We}=1000\) at \(t^*=11.8\), comparing a static scheme with \(N=4\) and \(N_{\textrm{FV}}=5\) (left, middle) to the proposed adaptive scheme with \(N\in [2,4]\) and \(N_{\textrm{FV}}=9\) (right). The element-local degree N is indicated by color, whereas FV sub-cell elements are marked in gray (Color figure online)

Shocks and the phase interface are detected well for all setups and discretized by FV sub-cells. Setups 2 and 3 both achieve the same FV resolution at shocks and the phase interface. Therefore, shocks appear sharper and more intricate phase interface geometries can be captured compared to setup 1. In setup 3, p-adaptivity allows to apply a reduced degree of \(N=2\) in most of the domain, while intricate vortical structures in the wake are discretized with the highest possible degree. Since setup 2 has a finer DG mesh, slightly more detailed vortical structures in the recirculation zone can be observed. Overall however, setups 2 and 3 produce comparable results. This is significant, since the adaptive scheme required on average 5.8 times less DOFs than setup 2. Compared to setup 1, the adaptive scheme still requires 1.8 times less DOFs, even though a higher resolution is achieved at shocks and the phase boundary. Due to the increased number of DOFs in the expensive narrow band, setup 3 requires 727 CPU hours and is thus more expensive than setup 1 with 240 CPU hours. However, compared with setup 2, requiring 1561 CPU hours, the adaptive scheme achieves a similar result in less than half the wall time.

Fig. 22
figure 22

Evolution of the load imbalance (left), the average number of DOFs per element and the amount of FV sub-cell elements (right) during the simulation time

To study the evolution of the adaptive discretization over time, the average number of DOFs per element and the relative amount of FV sub-cell is plotted in Fig. 22 for setup 3. Starting from a pure FV sub-cell discretization to avoid oscillatory initial states, the amount of required FV sub-cells increases gradually from a minimum of 1.7% to a maximum of 7.1% at the end of the computation. This increase is mainly attributed to the shock patterns forming in the wake of the droplet and the increased surface of the deformed water column. Likewise, a steady increase in the average number of DOFs per element from 10.3 to 16.1 can be observed. This can be explained by the increased number of FV sub-cells and by the p-refinement of structures in the wake of the column. In order to assess the scalability of the framework in practical applications, we finally investigate its parallel performance for the \(\textrm{We}=1000\) setup. To this end, we study the strong scaling behavior in a range of \(P=32,\ldots ,1024\) cores on the cluster Hawk. Figure 23 evaluates the parallel efficiency \(\eta \) and the parallel speedup using the total runtime, opposed to the PID-based results for the generic setup in Fig. 9.

Fig. 23
figure 23

Parallel efficiency (left) and parallel speedup (right) for the 2D shock-droplet interaction with \(\textrm{We}=1000\)

Up to 512 cores, a parallel efficiency beyond 50% is achieved. A significant drop in the parallel efficiency occurs for 1024 cores, which is caused by the fact that the target weight falls below the cost of the most expensive element, \(B^* < w_{\textrm{max}}\), as discussed in Sect. 4.4. The imperfect strong scaling behavior can in general be attributed to two main factors: the overhead of the dynamic load balancing procedure and a remaining load imbalance due to the fast change in element-local loads. To support this explanation, the imbalance I over time was evaluated for the computation with \(P=512\) cores. Starting from an initial imbalance of more than \(I=15\), the dynamic load balancing scheme reduced the imbalance to \(I=0.7\) on average. Thus, the load of the slowest processor still exceeded the target load by a factor of 1.7.

Fig. 24
figure 24

Workload distribution and domain decomposition (top) for the bulk discretization (bottom) of the state given in Fig. 21 using \(P=512\) processor units

Furthermore, a partitioning with \(P=512\) or more cores entails that an increasing number of partitions in the narrow band region host only a singular element due to their high computational cost. As the partitioning operates on the element level, no further decomposition is possible for these partitions, implying the natural limit of parallelization. Few of these one-element partitions can be seen in the Fig. 24, which shows the workload distribution \(\{w_i\}\) for the bulk discretization of the state given in Fig. 21. The domain decomposition also illustrates the general working principle of the dynamic load balancing scheme that makes small partitions cluster in regions of higher computational cost while cheaper elements of \(N=2\) form larger partitions. As computations in practice deploy fewer cores to evade the depicted granularity limit, an overall acceptable parallel efficiency is achieved with the proposed hp-adaptive level-set ghost-fluid framework.

5.4.2 Water Column with Cavity

As seen in the previous setup, a local pressure minimum can be observed inside the water column due to the reflected rarefaction wave. While phase transitions are not included in the present model, in reality the low pressure could lead to cavitation, i.e. the formation of a vapor bubble inside the water column. Therefore Xiang and Wang [24] proposed a setup to study the interaction between a planar incident shock wave and a water column with a gas-filled cavity. Hence, we extend the previous scenario by introducing a circular cavity of radius \(r=0.0036\,\textrm{m}\) inside the water column. All other parameters are adopted from above, resulting in the setup sketched in Fig. 25. The adapted initial conditions are summarized in Table 6.

Fig. 25
figure 25

Initial setup of the 2D shock-droplet interaction with cavity

Table 6 Initial conditions of the 2D shock-droplet interaction with a cavity
Fig. 26
figure 26

Schlieren images and pressure fields of the 2D \(\textrm{Ma}=2.4\) shock-droplet interaction with a gas-filled cavity for a Weber number of \(\textrm{We}=1000\) (SIE regime)

Fig. 27
figure 27

Evolution of the phase boundary for the 2D \(\textrm{Ma}=2.4\) shock-droplet interaction with a gas-filled cavity for a Weber number of \(\textrm{We}=1000\). The separation of the initial gas bubble into four bubbles by liquid jets and the formation of filaments can be observed

Again, a Weber number of \(\textrm{We}=1000\) is considered to simulate a droplet breakup in the SIE regime. The computational setup is identical to Sect. 5.4, and consumed 865 CPU hours, on 4 compute nodes on Hawk. During 19,437 time steps, the domain was repartitioned 155 times by the DLB scheme, requiring 4.9% of the total wall time. Schlieren images and non-dimensional pressure fields are depicted in Fig. 26 for the time instances \(t^*=(t u_1)/(2r) = \{0.8, 1.62, 5.9, 6.5, 9.0,11.8\}\). At \(t^*=1.62\), the reflected incident shock wave, the Mach step and the slip line develop similar to the simulation without a cavity. However, the transmitted shock wave is reflected and retransmitted at the surface of the water ring, creating a more complex pattern of reflected and transmitted waves inside the water ring and the gas cavity. Due to the curvature of the bubble, the transmitted waves inside the bubble appear as circular shapes. Later on, the water column is flattened and a high speed water jet causes the bubble to collapse, similar to the phenomena observed in Sect. 5.3. Our numerical results agree well with the simulations presented by Xiang and Wang [24] and Tsoutsanis et al. [63]. During the final phase of the simulation, the gas bubble is further divided by secondary jets until four separate bubbles are formed. At \(t^*=9.0\), thin filaments have developed at the surface of the water column, which have detached from the main liquid body at \(t^*=11.8\). The evolution of the phase boundary in time is visualized in detail in Fig. 27. A well-known issue of the level-set interface tracking method is the appearance of conservation errors. On the one hand, this is due to the non-conservative level-set transport equation (5). On the other hand, the level-set reinitialization procedure is notorious for affecting the interface position and thus causes further conservation errors [25]. Finally, the ghost-fluid method is non-conservative, as discussed in [34]. Ultimately, the impact of these conservation errors depends on the resolution at the phase interface. Therefore, they appear most prominent if interfacial structures are under-resolved.

Fig. 28
figure 28

Mass conservation error \({\mathcal {E}}_m(t)\) over time for the shock water column interaction test case with an air-filled cavity

Fig. 29
figure 29

Discretization of the bulk flow (left) and the level-set field (right) for the 2D shock-droplet interaction with cavity for \(\textrm{We}=1000\) at \(t^*=5.9\). The element-local degree N is indicated by color, whereas FV sub-cell elements are marked in gray. Detail plots highlight the refined resolution of the FV sub-cell grid compared to the underlying DG mesh (Color figure online)

To investigate the conservation properties of the novel hp-adaptive framework, we analyzed the mass conservation error \({\mathcal {E}}_m(t)=\left( m(t)-m(0)\right) /m(0)\) for the challenging test case at hand. Figure 28 shows the mass conservation error over the simulation time in percent. The conservation error remains below 2% for most parts of the simulation. In the later stages of the droplet breakup, a conservation error of approximately 3% is encountered. Given the extreme interface deformation of the droplet and the complex topological changes during the collapse of the air cavity, the authors consider this conservation error acceptable. Finally, we look into the element-local discretization of the bulk flow and the level-set transport.

Figure 29 visualizes the FV sub-cells and the element-local polynomial degree of the DG elements at time \(t^*=5.9\). Shocks and the phase boundary are successfully captured by FV sub-cells and the wake is resolved by a higher polynomial degree to improve the resolution of vortices or weak acoustics. The level-set field is discretized by the highest possible degree at the level-set zero position and oscillatory solutions at filaments or merging phase boundaries are avoided by locally applying the robust and accurate FV sub-cell discretization.

5.5 3D Shock-Droplet Interaction

As a final test case, we extend the two-dimensional shock-water-column setups from the previous sections to a three-dimensional shock-droplet interaction to evaluate the performance of our scheme in three-dimensions. We consider a \(\textrm{Ma}=2.4\) shock wave and a Weber number of \(\textrm{We}=100\). A similar setup with a lower Mach number of \(\textrm{Ma}=1.47\) was used by Winter et al. [65] to investigate the RTP and SIE breakup regime in three space dimensions. In analogy to the two-dimensional setup, the computational domain \(\Omega =[0.0,0.06]\,\textrm{m}\times [-0.02,0.02]\,\textrm{m}\times [-0.02,0.02]\,\textrm{m}\) contains a shock at \(x_1=0.0074\,\textrm{m}\) and a droplet of radius \(r=0.0048\,\textrm{m}\) centered at \(x_1=0.015\,\textrm{m}\). The surface tension is chosen as \(\sigma =119.0\,\mathrm {N/m}\) and the initial conditions are listed in Table 7. To achieve an effective resolution of 138 DOFs per droplet diameter, the domain is discretized by \(96\times 48\times 48\) elements. Due to the symmetry of the setup, it is sufficient to compute a quarter of the domain and impose symmetry boundary conditions at the \(x_2=0\) and \(x_3=0\) planes. The left boundary is defined as an inflow plane, while all remaining boundaries are non-reflecting. For the flux computation, an approximate HLLC Riemann solver is applied. The setup is advanced in time until the final time \(t=2.0\cdot 10^{-4}\,\textrm{s}\).

Table 7 Initial conditions of the three-dimensional shock-droplet interaction
Fig. 30
figure 30

Pressure field of the 3D \(\textrm{Ma}=2.4\) shock-droplet interaction with a Weber number of 100 at different time instances

Fig. 31
figure 31

Three-dimensional \(\textrm{Ma}=2.4\) shock-droplet interaction at \(t^*=7.0\). The velocity magnitude in the \(x_2=0\) plane is visualized at the bottom and the element-local bulk discretization in the \(x_2=0\) plane is projected to the background. To visualize the recirculation zone in the wake of the droplet, isocontours of the velocity magnitude are included

The workload distribution is evaluated every 200 time steps and triggers a repartitioning if \(I>1.0\). We deployed 32 compute nodes of Hawk and spent 7221 CPU hours on the simulation. During 7231 time steps, DLB was applied 36 times and required a share of 7.0% of the total wall time. Figure 30 depicts the evolution of the droplet geometry and the non-dimensionalized pressure field \(p^*=p/p_0\) at the time instances \(t^*=(t u_1)/(2r) = \{0.8,2.4,4.7,7.0,9.4,11.8\}\). At \(t^*=2.4\), the development of interfacial waves can be observed. Kaiser and Winter [36] suggested, that these waves are triggered by the interaction of pressure fluctuations with the phase interface. From \(t^*=0.8\) until \(t^*=9.4\) the interface waves grow in size and merge into larger liquid sheets. At the final simulation time \(t^*=11.8\), the stripping of the water sheets from the main liquid body can be observed. Figure 31 visualizes the velocity magnitude and the element-local bulk discretization at time \(t^*=7.0\). Due to the significantly lower discretization than in the two-dimensional setups, FV sub-cells are used to discretize the recirculation zone in the wake of the droplet.

Fig. 32
figure 32

Mass conservation error \({\mathcal {E}}_m(t)\) of the 3D shock-droplet interaction

This is necessary to avoid oscillations caused by the aliasing error of a DG discretization in under-resolved regions. Finally, the mass conservation error \({\mathcal {E}}_m\) of the three-dimensional test case is evaluated, analogous to Sect. 5.4.2. Figure 32 depicts the conservation error over time. Despite the coarser resolution compared to the two-dimensional setups, the conservation error remains below 2% throughout the simulation. Given the significant deviation from the initial droplet shape and coarse resolution, the mass loss is in an acceptable range when compared to 3D sharp interface simulations reported in literature, see e.g. [66].

6 Conclusion

We have introduced an hp-adaptive discretization into a level-set ghost-fluid framework for compressible multiphase flows by combining a p-adaptive DGSEM scheme with a FV sub-cell method. The hybrid discretization is applied to the continuum model of the bulk fluids, as well as the level-set transport equation. It relies on an indicator that evaluates the modal decay of the polynomial solution representation and additionally takes into account the position and topology of the phase interface. The element-local discretization can thus automatically be adapted at runtime to provide a high-order accuracy in smooth regions while offering stable and accurate results in the presence of shocks and severely deformed phase boundaries. The adaptive discretization, together with applying the interface tracking algorithm only in a narrow band around the phase interface, cause pronounced variations in the element-local computational costs throughout the domain. In parallel simulations, these translate to significant imbalances in the processor workloads, which necessitate a dynamic load balancing scheme to ensure parallel scalability. The proposed scheme determines the current workload distribution through element-local wall time measurements and repartitions the elements along a space-filling curve. Strong scaling tests show acceptable results, with a parallel efficiency of 50% on up to 4096 cores. We apply our level-set ghost-fluid framework to a wide range of inviscid, compressible multiphase flows and obtain results in good agreement with numerical and experimental findings from literature. When compared to non-adaptive computations, the presented adaptive scheme produced comparable results with far fewer DOFs and a significantly reduced wall time. The results confirm that the proposed hybrid discretization is well suited for the multi-scale nature of multiphase flows. While the FV sub-cell grid enables a high resolution and a good localization at the phase interface, the p-adaptive DG scheme produces accurate results on a relatively coarse mesh. In the future, we plan to include viscous effects to capture the impact of viscosity on the droplet breakup regimes. Furthermore, we seek to include phase transfer to model evaporation and cavitation.