1 Introduction

Mathematical models in many areas of science and engineering are described by parametric ordinary differential equations (ODEs) or partial differential equations (PDEs). The analytical solution of such ODEs and PDEs is usually intractable, therefore engineers prefer detailed numerical simulations of the models to obtain useful insights into the underlying processes. However, a detailed simulation of the model necessitates a very fine discretization of the governing ODEs or PDEs, both in space and time. The resulting discretized system of equations is referred to as the high-fidelity model or the full-order model (FOM). Numerical solutions of the FOM are computationally expensive to obtain, especially when computing resources are limited or when repeated simulations of the FOM are needed. The latter scenario is common in a multi-query task, such as uncertainty quantification, optimization, etc. With the goal of speeding up simulations, the area of model order reduction (MOR) has gained popularity in the last decades [3, 4, 7,8,9]. Different MOR algorithms have been proposed to obtain surrogate models for the FOMs. These surrogates are often referred to as reduced-order models (ROMs) since they have a reduced number of degrees of freedom compared to the FOMs. The simulation results of a suitably developed ROM are indistinguishable from those of the FOM, yet, the ROM incurs only a fraction of the computational cost spent on the FOM. This is a big advantage in real-time simulation or multi-query scenarios. Generally, there exist two classes of ROMs: projection-based ROMs [3] which are obtained by projecting the original model onto an approximation subspace, and data-driven ROMs, which are obtained through a data fit [47]. In the following, we shall limit ourselves to projection-based ROMs.

The Reduced Basis Method (RBM) is a popular, projection-based technique to obtain ROMs of parametric PDEs. The ansatz underlying the RBM is that the solution manifold of the parametric FOM can be well-approximated by a low-dimensional subspace. To achieve efficient online simulation, considerable efforts are made to explore the parameter space and collect solution snapshots in the offline stage. The greedy algorithm explores the parameter space based on suitable a posteriori error estimators. At each iteration of the greedy algorithm, the parameter maximizing the estimated error over a training set is selected and the corresponding FOM is solved to obtain new snapshots. The training set is, in essence, a discrete representation of the parameter domain. The snapshots are used to construct a basis for the low-dimensional approximation space, followed by a (Petrov-)Galerkin projection to obtain the ROM. We refer to the recent books [32, 49] and the survey works [19, 50, 53] for a detailed exposition of the theoretical framework underpinning the RBM and also for applications of the RBM in a variety of problems.

The choice of the training set is critical for the success of the RBM. A poorly-sampled training set can result in an inadequate representation of all the solution modes, causing the ROM to fail to meet the desired tolerance criterion for a parameter not present in the training set. Therefore, it is common practice to adopt a finely-sampled training set. However, the computational cost of the offline stage scales with the cardinality of the training set, which becomes high for problems with high-dimensional parameter space. Therefore, a more efficient sampling strategy is desired.

Many works have attempted to address the issue of optimal training set sampling. Notable among them are: the Multi-Stage Greedy algorithm from [55] and the Adaptively Enriching Greedy algorithm in [33]. In the former, the author suggests performing a set of greedy algorithms over randomly sampled training sets; then the resulting ROM is tested over a much larger random training set and the greedy algorithm is re-run on those points failing to meet the tolerance criterion in the larger training set. In the latter work, the authors propose a saturation criterion which is used to systematically remove parameters from a randomly-sampled training set. New random parameters are then added to the current training set. A larger training set serves as a safety check mechanism at every iteration. However, it may not be efficient, in general, to estimate a robust saturation criterion. The authors of [23, 30] propose a localized RBM approach, where a hierarchical tree-based partitioning of the parameter domain is done and separate ROMs for each partition are generated. In [34], the authors consider a two-stage approach that uses the ANOVA expansion together with parameter domain decomposition to address training set complexity. The work [37] considers an anisotropic sampling of the parameter domain using an empirical norm derived from the truncated Hessian of the solution vector with respect to the parameter. No explicit partition of the parameter domain is considered. However, the basis vectors are determined in the online stage. Moreover, the method needs to compute the Hessian at each point in the training set in order to define a distance metric which is subsequently used to add more samples to the training set. The calculation of the Hessian can be very expensive, especially for non-stationary problems. The method proposed in [36] makes only a subset of the finely sampled training set active at a given iteration of the greedy algorithm. A recent extension of this work [35] proposes hybrid strategies combining the ideas from [33, 55]. Different strategies are proposed to identify the set of active parameters. The works [17, 43] propose a cheap surrogate model for a certain error estimator, based on Kriging and radial basis functions, respectively, and use it to obtain the estimated error for any parameter in a fine training set. A sparse grid-based construction for the training set is suggested in [48].

The authors of [18] perform an eigendecomposition of the Hessian matrix of the output variable with respect to the parameter to identify a small subspace of the high-dimensional parameter space by truncation. The parameters that constitute the training set for the RBM are then sampled from the identified eigenspace. In [56], the so-called active subspace [20] of the parameter space is identified by relying on gradient information of the output with respect to the parameter. Both these works are limited to scalar-valued outputs and steady-state problems.

Most of the existing work related to adaptive training set sampling focuses on steady-state or quasi steady-state problems. To the best of our knowledge, only [17, 23, 30, 43] address training set adaptivity for time-dependent problems. The works [23, 30] propose a localization strategy that involves constructing multiple ROMs over local parameter domains, while [17, 43] consider adaptively enriching a coarse training set by observing a cheap error surrogate over a fine training set.

In this work, we present a goal-oriented training set sampling strategy that relies on the output quantity of interest (QoI). We aim at identifying the structure of the parameter dependency of the output through the empirical interpolation algorithm [5, 15] or a pivoted QR decomposition and utilize this information to find out the parameter importance. Our proposed method is applicable to both steady-state and time-dependent problems with vector-valued outputs. Our central contribution is a two-stage algorithm to control the cardinality of the training set. In the first stage, a coarse RB approximation of the problem is obtained using a fine training set. Then, an approximate output snapshot matrix is derived by time integrating the coarse ROM at all the parameter samples in the fine training set. We apply the pivoted QR decomposition or, alternatively, the discrete empirical interpolation method (DEIM) (or its variants) to the approximate output snapshot matrix. This procedure identifies regions of the parameter space that have a greater contribution to the current RB approximation space. In the second stage, the fine training set is subsampled based on the parameter distribution identified using the pivoted QR decomposition or the DEIM algorithm and leads to a subsampled coarse training set. The RBM is continued over the coarse training set, until a targeted error tolerance is met.

The paper is organized as follows. In Sect. 2 we describe the preliminaries including the problem setting for the proposed methodology, the RBM and the related hyper-reduction techniques. In Sect. 3 we detail the issue of training set sampling for the RBM and present our main algorithm for efficient training set subsampling. Section 4 is dedicated to numerical examples through which we illustrate various aspects of the proposed subsampling strategy and demonstrate the speedup it offers for two numerical benchmark problems. We conclude by summarizing the proposed method and highlighting possible research directions for the near future. Throughout this work, we have used \({\text {MATLAB}}^{\textregistered }\) notation in the presentation of algorithms and numerical experiments.

2 Preliminaries

In this section, we present the continuous problem and the discretized system that the proposed subsampling strategy is valid for. Then, we briefly review the RBM and the associated issue of training set sampling. Afterwards, some hyper-reduction algorithms are reviewed in order to introduce our proposed subsampling algorithms.

2.1 High-fidelity Models

A wide variety of physical and engineering phenomena are modelled via PDEs. Consider the spatial domain \(\varOmega \subset {\mathbb {R}}^{d}\) with \((d = 1, 2, 3)\). Let a model of PDEs defined in \(\varOmega \) be denoted by

$$\begin{aligned} {\mathcal {L}}\left( \mathbf {v}, \mathbf {w}, t, \varvec{\mu } \right) = \varvec{0}, \end{aligned}$$
(2.1)

where \(\mathbf {v}\) is the (vector-valued) state variable and describes the particular physical quantity the PDE models, \(\mathbf {w} \in \varOmega \) is the spatial variable, \(0 \le t \le T\) denotes the time and \(\varvec{\mu } \in {\mathcal {P}} \subset {\mathbb {R}}^{p}\) defines the parameters. The above form is a general description of any time-dependent or steady-state problem with or without parameter variations. The output of the model is usually a function of the solution \(\mathbf {v}\) and the parameter \(\varvec{\mu }\). After numerical discretization in space and time, we get

$$\begin{aligned} \begin{aligned} \mathbf {E}\, \mathbf {x}(t^{k},\varvec{\mu })&= \mathbf {A}\, \mathbf {x}(t^{k-1},\varvec{\mu }) + \mathbf {f}\left( \mathbf {x}(t^{k-1},\varvec{\mu }), \varvec{\mu }\right) + \mathbf {B}\, \mathbf {u}(t^{k-1},\varvec{\mu }),\\ \mathbf {y}\left( \mathbf {x}(t^{k},\varvec{\mu }), \varvec{\mu }\right)&= \mathbf {C}\, \mathbf {x}(t^{k},\varvec{\mu }). \end{aligned} \end{aligned}$$
(2.2)

Here, \(\mathbf {x}(t^{k}, \varvec{\mu }) \in {\mathbb {R}}^{n}\) is the state vector, \(\mathbf {u}(t^{k}, \varvec{\mu }) \in {\mathbb {R}}^{m}\) is the input vector and \(\mathbf {y}\left( \mathbf {x}(t^{k},\varvec{\mu }), \varvec{\mu }\right) \in {\mathbb {R}}^{q}\) is the output or quantity of interest. Further, \(\mathbf {E}, \mathbf {A} \in {\mathbb {R}}^{n \times n}\), \(\mathbf {B} \in {\mathbb {R}}^{n \times m}\) is the input matrix, \(\mathbf {C} \in {\mathbb {R}}^{q \times n}\) is the output matrix and \(\mathbf {f}\left( \mathbf {x}(t^{k},\varvec{\mu }), \varvec{\mu }\right) \in {\mathbb {R}}^{n}\) models the nonlinearity associated with the system.

Remark 2.1

The system matrices (\(\mathbf {E}, \mathbf {A}, \mathbf {B} \, \text {and}\, \mathbf {C}\)) can also be time- and/or parameter-dependent. However, we have not made this dependence explicit for the sake of keeping the notations concise. For the case of steady-state problems, the time dependence of the state, input and output vectors and system matrices vanishes and the system simply reads

$$\begin{aligned} \begin{aligned} \mathbf {E}\, \mathbf {x}(\varvec{\mu })&= \mathbf {f}\left( \mathbf {x}(\varvec{\mu }), \varvec{\mu }\right) + \mathbf {B}\, \mathbf {u}(\varvec{\mu }),\\ \mathbf {y}(\mathbf {x}(\varvec{\mu }), \varvec{\mu })&= \mathbf {C}\, \mathbf {x}(\varvec{\mu }). \end{aligned} \end{aligned}$$
(2.3)

The discretized system in (2.2) or (2.3) is also called the FOM and has a large number of degrees of freedom, i.e., n is very large. The proposed subsampling strategy is applicable to both Eqs. (2.2) and (2.3).

2.2 Reduced Basis Method and the Training Set

The Reduced Basis Method relies on the observation that the solution manifold can be well-approximated by a low-dimensional subspace \({\mathcal {V}}\). Let \(\left[ \mathbf {v}_{1}, \, \mathbf {v}_{2}, \, \ldots , \, \mathbf {v}_{r} \right] =: \mathbf {V} \in {\mathbb {R}}^{n \times r}\) be a basis of the subspace \({\mathcal {V}}\). The approximated solution for any parameter is obtained by considering the ansatz \(\mathbf {x}(t^{k}, \varvec{\mu }) \approx \widetilde{\varvec{x}}(t^{k}, \varvec{\mu }) = \sum _{i = 1}^{r} z_{i} \mathbf {v}_{i}\). The parameter-dependent coefficients \(\mathbf {z} = [z_{1}, \, z_{2}, \, \ldots , \, z_{r}]^{\textsf {T}}\) are obtained by solving the ROM

$$\begin{aligned} \begin{aligned} \mathbf {E}_{r}\, \mathbf {z}(t^{k},\varvec{\mu })&= \mathbf {A}_{r}\, \mathbf {z}(t^{k-1},\varvec{\mu }) + \mathbf {f}_{r}\left( \mathbf {V} \mathbf {z}(t^{k-1},\varvec{\mu }), \varvec{\mu }\right) + \mathbf {B}_{r}\, \mathbf {u}(t^{k-1},\varvec{\mu }),\\ \widetilde{\mathbf {y}}\left( \mathbf {z}(t^{k},\varvec{\mu }), \varvec{\mu }\right)&= \mathbf {C}_{r}\, \mathbf {z}(t^{k},\varvec{\mu }), \end{aligned} \end{aligned}$$
(2.4)

derived through Galerkin projection of the FOM onto the low-dimensional subspace \({\mathcal {V}}\). The reduced system matrices \(\mathbf {E}_{r},\, \mathbf {A}_{r} \in {\mathbb {R}}^{r \times r}\), \(\mathbf {B}_{r} \in {\mathbb {R}}^{r \times m}\) and \(\mathbf {C}_{r} \in {\mathbb {R}}^{q \times r}\) are obtained as \(\mathbf {E}_{r} := \mathbf {V}^{\textsf {T} } \mathbf {E} \mathbf {V}\), \(\mathbf {A}_{r} := \mathbf {V}^{\textsf {T} } \mathbf {A} \mathbf {V}\), \(\mathbf {B}_{r} := \mathbf {V}^{\textsf {T} } \mathbf {B}\) and \(\mathbf {C}_{r} := \mathbf {C} \mathbf {V}\), respectively. Finally, \(\mathbf {f}_{r} := \mathbf {V}^{\textsf {T}} \mathbf {f}\left( \mathbf {V} \mathbf {z}(t^{k-1},\varvec{\mu }), \varvec{\mu } \right) \).

The greedy algorithm or the POD-greedy (Proper Orthogonal Decomposition-greedy) algorithm are the most popular techniques for constructing the RBM approximation space for the steady-state system (2.3) and the time-dependent system (2.2), respectively. In order to initialize the greedy algorithm, a training set \(\varXi _{\text {train}}\) is given a priori, from which parameter samples are iteratively selected, so that the corresponding solution vector is iteratively added to the basis matrix \(\mathbf {V}\). We summarize the (POD-)greedy algorithm in Algorithm 2.1, which takes both the steady-state case and the time-dependent case into consideration. In Step 8 of Algorithm 2.1, the snapshot matrix \(\varvec{\mathfrak {X}}\) for the time-dependent case consists of the solution vector at discretized time instances \(\{t^{i}\}_{i=0}^{K}\) given by:

$$\begin{aligned} \varvec{\mathfrak {X}}(\varvec{\mu }) = \left[ \mathbf {x}(t^{0}, \varvec{\mu }) \cdots \mathbf {x}(t^{K}, \varvec{\mu }) \right] \in {\mathbb {R}}^{n \times N_{t}}, \end{aligned}$$

where \(N_{t} := (K + 1)\). \(\mathbf {U}_{\bar{\varvec{\mathfrak {X}}}}\) in Step 10 is the left singular vector matrix obtained from the singular value decomposition (SVD) of \(\bar{\varvec{\mathfrak {X}}}\), i.e., \(\bar{\varvec{\mathfrak {X}}} = \mathbf {U}_{\bar{\varvec{\mathfrak {X}}}} \varSigma _{\bar{\varvec{\mathfrak {X}}}} \mathbf {V}_{\bar{\varvec{\mathfrak {X}}}}^{\textsf {T}}\). Here, \(\varSigma _{\bar{\varvec{\mathfrak {X}}}}\) contains all the non-zero singular values of \(\bar{\varvec{\mathfrak {X}}}\): \(\sigma _{1} \ge \sigma _{2} \ge \cdots \ge \sigma _{r_{X}} \ge 0\). If no alternative decision criterion is used, \(r_{\text {POD}}\) is usually taken to be 1. In Step 13, \(\varDelta (\varvec{\mu })\) denotes an error estimator for the error in approximating the state variable or the output variable. The error estimator should be much cheaper to compute than the true error. For the sake of clarity, the sketched algorithm is the basic version of the RBM. Several enhancements in the form of primal-dual error estimation, hyper-reduction, adaptive basis construction, etc. exist [6, 16, 26, 28, 58].

figure a

It is noticed that the standard greedy algorithm (Algorithm 2.1) does not address the issue of properly choosing \(\varXi _{\text {train}}\). When the parameter space dimension is high (\(p \gg 2\)) or when the number of time steps \(N_{t}\) is large, the overall computational cost for the greedy algorithm to construct \(\mathbf {V}\) can be substantial. The technique considered in this work aims at reducing the offline cost by subsampling a fine training set. We detail this in Sect. 3. Before that, since it will be used later, we briefly review two hyper-reduction procedures—the DEIM algorithm (and its variants) and the Gappy-POD method.

2.3 Discrete Empirical Interpolation Method

The DEIM algorithm [5, 15, 26] was introduced in the context of MOR, for efficient calculation of nonlinear or nonaffine terms of the ROM. The algorithm proceeds by collecting snapshots of the nonlinear vector in the FOM in (2.2) or (2.3) for different values of \(\varvec{\mu } \in \varXi _{\text {train}}\) given by

$$\begin{aligned} \varvec{\mathfrak {F}} = \left[ \mathbf {f}\left( \mathbf {x}, \varvec{\mu }_{1}\right) \cdots \mathbf {f}\left( \mathbf {x}, \varvec{\mu }_{N_{\text {train}}}\right) \right] \in {\mathbb {R}}^{n \times N_{\text {train}}}. \end{aligned}$$

In order to avoid any n-dependent operations to evaluate the nonlinear vector involved in simulating the ROM, DEIM considers an approximation of the nonlinear vector given by

$$\begin{aligned} \mathbf {f}\left( \mathbf {x}, \varvec{\mu } \right) \approx \widetilde{\mathbf {f}}\left( \mathbf {x}, \varvec{\mu } \right) = \mathbf {U} \varvec{\alpha }, \end{aligned}$$
(2.5)

where \(\varvec{\alpha } \in {\mathbb {R}}^{\ell }\). The columns of \(\mathbf {U} \in {\mathbb {R}}^{n \times \ell }\) are the interpolation basis vectors obtained via SVD of \(\varvec{\mathfrak {F}}\). The number of basis vectors \(\ell \) can be determined through an information-theoretic criterion that depends on the relative energy content of the singular values \(\{\sigma _{i}\}_{i=1}^{r_{X}}\) and reads \(\frac{\sum \limits _{i=\ell +1}^{r_{X}} \sigma _i}{\sum \limits _{i=1}^{r_{X}} \sigma _i}~<~\epsilon _{\text {SVD}}\) where \(\epsilon _{\text {SVD}}\) is a user-defined tolerance. Since (2.5) is an overdetermined system with \(n \gg \ell \), DEIM solves for \(\varvec{\alpha }\) by selecting \(\ell \) rows from \(\mathbf {U}\) and enforces interpolation as below:

$$\begin{aligned} \mathbf {P}^{\textsf {T}} \mathbf {f}\left( \mathbf {x}, \varvec{\mu } \right) = \mathbf {P}^{\textsf {T}} \mathbf {U} \varvec{\alpha }. \end{aligned}$$
(2.6)

The indices of the rows where interpolation is enforced are given by \(\{p_{1}, \ldots , p_{\ell }\}\). The \(i^{\text {th}}\) column denoted as \(e_{p_{i}} \in {\mathbb {R}}^{n}\) of the matrix \(\mathbf {P} \in {\mathbb {R}}^{n \times \ell }\) is essentially the \(i^{\text {th}}\) canonical unit vector with zeros at all but the \(p_{i}^{\text {th}}\) entry. A greedy procedure, shown in Algorithm 2.2, is used to identify \(\mathbf {P}\). The algorithm ensures that \(\left( \mathbf {P}^{\textsf {T}} \mathbf {U} \right) \) is nonsingular, so that \(\widetilde{\mathbf {f}}\left( \mathbf {x}, \varvec{\mu } \right) \) at any parameter \(\varvec{\mu }\) is given by

$$\begin{aligned} \widetilde{\mathbf {f}}\left( \mathbf {x}, \varvec{\mu } \right) = \mathbf {U} \left( \mathbf {P}^{\textsf {T}} \mathbf {U} \right) ^{-1} \mathbf {P}^{\textsf {T}} \mathbf {f}\left( \mathbf {x}, \varvec{\mu } \right) , \end{aligned}$$

where \(\mathbf {U} \left( \mathbf {P}^{\textsf {T}} \mathbf {U} \right) ^{-1} \in {\mathbb {R}}^{n \times \ell }\) can be precomputed and stored. Moreover, the original nonlinear vector needs to be evaluated only at \(\ell \) points, limiting the cost of evaluating the nonlinear vector to order \(\ell \) and independent of n. The error in approximating the nonlinear vector is quantified as

$$\begin{aligned} \left\Vert \mathbf {f}(\mathbf {x}, \varvec{\mu }) - \widetilde{\mathbf {f}}(\mathbf {x}, \varvec{\mu }) \right\Vert _{2} \le \Vert \left( \mathbf {P}^{\textsf {T}} \mathbf {U}\right) ^{-1} \Vert _{2} \Vert \mathbf {f}(\mathbf {x}, \varvec{\mu }) - \mathbf {U} \mathbf {U}^{\textsf {T}} \mathbf {f}(\mathbf {x}, \varvec{\mu }) \Vert _{2}. \end{aligned}$$
(2.7)

The greedy choice of the interpolation indices in the DEIM algorithm is geared towards minimizing the term \(\Vert \left( \mathbf {P}^{\textsf {T}} \mathbf {U}\right) ^{-1} \Vert _{2}\) appearing in the error bound. At each iteration, the new sampling point is chosen as the one resulting in the maximum reduction in \(\Vert \left( \mathbf {P}^{\textsf {T}} \mathbf {U}\right) ^{-1} \Vert _{2}\).

figure b

Several variants of the DEIM algorithm have been proposed [21, 41, 44,45,46]. Among those, the QDEIM algorithm from [21] and the DEIM-based oversampling strategies proposed in [45] shall be of particular interest to our discussion in Sect. 3.

2.3.1 QDEIM

In contrast to the original DEIM algorithm, the QDEIM approach from [21] relies on a column-pivoted QR decomposition to identify the interpolation points. This is different from the sequential, greedy choice of interpolation points in DEIM (see Steps 4–9 in Algorithm 2.2). QDEIM is proven to have a sharper error bound and is also computationally more efficient and straightforward to implement.

2.3.2 KDEIM

The K in the KDEIM algorithm refers to the k-means clustering algorithm. The k-means algorithm is applied to the matrix \(\mathbf {U}\) of (truncated) left singular vectors obtained from SVD of the snapshot matrix \(\varvec{\mathfrak {F}}\), then rows with similar response are assigned to the same cluster. The standard k-means objective function is recast as a relaxed trace maximization problem which is then solved using the QR decomposition. We refer the interested reader to [45] for a deeper discussion. Another early work to consider QR decomposition based clustering in the context of MOR was [39], which used it for reducing networked multi-agent systems.

2.3.3 Gappy-POD

The number of interpolation points of the DEIM algorithm and its variants discussed so far equals to the number \(\ell \) of interpolation basis vectors. However, in many cases it is beneficial to consider \(m > \ell \) interpolation points. The Gappy-POD method and other related approaches fall into this category [12, 24]. The coefficient matrix \(\mathbf {P}^{\textsf {T}}\mathbf {U}\) of the linear system in (2.6) is no longer square and, therefore, does not possess a unique inverse; instead it is solved using the pseudoinverse. The Gappy-POD approximation of the nonlinear vector is given as

$$\begin{aligned} \widetilde{\mathbf {f}}\left( \mathbf {x}, \varvec{\mu } \right) = \mathbf {U} \left( \mathbf {P}^{\textsf {T}} \mathbf {U} \right) ^{\dagger } \mathbf {P}^{\textsf {T}} \mathbf {f}\left( \mathbf {x}, \varvec{\mu } \right) , \end{aligned}$$
(2.8)

where the matrix \(\mathbf {P}\) now has \(m > \ell \) columns and we have \(\mathbf {P}^{\textsf {T}} \mathbf {U} \in {\mathbb {R}}^{m \times \ell }\). In [45], the authors discuss two different oversampling strategies. The first approach, called Gappy-POD Eigenvector, considers the optimization point of view – newly added interpolation points are those that lead to the largest decrease of \(\Vert \left( \mathbf {P}^{\textsf {T}} \mathbf {U} \right) ^{\dagger } \Vert \). The second oversampling strategy, Gappy-POD Clustering, proposed in [45] can be viewed as Gappy-POD based on interpreting the QR decomposition as a clustering algorithm. The additional samples from \(\ell + 1\) till m are identified based on the mutual entropy of the columns. For a more elaborate discussion we refer to [45, 57]. In the next section, we aim to make use of DEIM and its variants, as well as Gappy-POD to select important parameter samples from the parameter domain.

For the pseudocode and the details, we refer the reader to the work [21] for the QDEIM algorithm and the work [45] in case of the KDEIM and Gappy-POD algorithms.

3 Proposed Subsampling Strategy for the Training Set

A representative training set \(\varXi _{\text {train}}\) is crucial for RBM to obtain a ROM that satisfies the required tolerance. While a densely-sampled \(\varXi _{\text {train}}\) is needed to accurately represent the parameter space, it incurs high computational cost. In contrast, a randomly sampled coarse training set may fail to capture all the variations of the solution over the parameter space and result in a ROM that fails to meet the tolerance. Therefore, a wisely sampled coarse training set is desired to make the greedy algorithm efficient while retaining the required accuracy of the ROM. We now discuss two observations that motivate our proposed approach for training set sampling.

3.1 Motivating Observations

We detail two observations that pertain to the greedy algorithm in the RBM, the DEIM algorithm as well as the QR pivoting. We shall see that these two observations have motivated us to develop a subsampling strategy for the RBM training set.

3.1.1 Greedy Parameters, QR Pivots, and DEIM Interpolation Points

Our first observation concerns the parameters \(\varvec{\mu }^{*}\) selected by the greedy algorithm. The second observation is their resemblance to the QR pivots and the DEIM interpolation points.

From our experience, the greedy algorithm tends to repeatedly pick parameter samples from a small subset of the training set, especially for time-dependent problems. This same phenomenon has been reported in other existing works [27, 29, 31, 37, 42, 50, 58]. The solution (or output) vectors at these parameter values usually exhibit large variability. While the greedy algorithm scans through all the parameter samples in the training set, a majority of those samples are never picked. The fact that a few parameters get repeatedly picked reveals that there are still unresolved modes and hence more POD modes, corresponding to the selected parameter, are needed to get a good approximation. These few parameters picked by the greedy algorithm, usually represent solutions that are less smooth and hence are more difficult to approximate. An example of this phenomenon occurs in fluid dynamics problems where the low viscosity solutions develop shock and need a large number of POD modes to approximate. We illustrate this observation through the standard greedy algorithm applied to the discretized 1-D viscous Burgers’ equation with \(n = 1000\). The detailed description of the model is presented in Sect. 4. The parameter considered is the viscosity. We use 100 equispaced parameter samples from the domain \({\mathcal {P}} = [0.005 \, , \, 1]\) to form the training set \(\varXi _{\text {train}}\). A ROM with error below the tolerance \(\texttt {tol} = 10^{-6}\) is requested from the greedy algorithm. In Table 1a, we provide the parameters picked by the greedy algorithm at each iteration from the training set. Noticeably, among the 100 parameter samples in the training set, only 6 contribute to generating the basis \(\mathbf {V}\) that approximates the solution manifold. Of these 6 samples, the sample \(\mu = 0.005\) is picked fourteen times. This is not surprising since this parameter corresponds to the solution vector with the smallest viscosity and is the most difficult to approximate.

Next, we make an important connection between the parameters selected by the greedy algorithm and the pivots obtained through a pivoted QR decomposition of the transpose of the output snapshot matrix defined in (3.1).

For the same viscous Burgers’ equation, we collect the snapshots of the scalar-valued outputs \(\mathbf {y}\) at all the parameters in \(\varXi _{\text {train}}\) into a snapshot matrix given by

$$\begin{aligned} {\mathfrak {Y}} := \begin{bmatrix} {[}\mathbf {y}(\varvec{x}(t^{0},\varvec{\mu }_{1}), \varvec{\mu }_{1})]^{\textsf {T}} &{} \cdots &{} [\mathbf {y}(\varvec{x}(t^{K}, \varvec{\mu }_{1}), \varvec{\mu }_{1})]^{\textsf {T}} \\ \vdots &{} \ddots &{} \vdots \\ {[}\mathbf {y}(\varvec{x}(t^{0},\varvec{\mu }_{N_{\text {train}}}), \varvec{\mu }_{N_{\text {train}}})]^{\textsf {T}} &{} \cdots &{} [\mathbf {y}(\varvec{x}(t^{K}, \varvec{\mu }_{N_{\text {train}}}), \varvec{\mu }_{N_{\text {train}}})]^{\textsf {T}} \end{bmatrix}, \end{aligned}$$
(3.1)

Each row of the matrix consists of the snapshots of the output at \(K+1\) time instances corresponding to a given parameter. Consider first the well-known pivoted QR decomposition of a matrix \(\mathbf {D}\) given by

$$\begin{aligned} \mathbf {D} \varPi = \mathbf {Q} \mathbf {R} = \begin{bmatrix} \mathbf {R}_{11} &{} \mathbf {R}_{12} \\ 0 &{} \mathbf {R}_{22} \end{bmatrix}, \end{aligned}$$
(3.2)

where \(\mathbf {Q}\) is an orthogonal matrix and \(\mathbf {R}\) is upper triangular. The pivots are given by the column permutation matrix \(\varPi \). We apply the QR decomposition to \({\mathfrak {Y}}^{\textsf {T}}\) and identify the pivots. A comparison of the parameters corresponding to the first ten pivots and the parameters selected in the greedy algorithm is shown in Table 1 and Fig. 1. Of the ten pivots, six are identical with the greedy parameters. This close connection between the pivots of the QR decomposition and the greedy parameters chosen in the RBM has been, to the best of our knowledge, discussed only in [2, 40]. It is based on the interpretation of the QR decomposition as a greedy column selection procedure. Note that the application of a QR decomposition assumes the existence of the FOM solution for all the parameters in the training set. In practice, we do not have this information. Instead, we propose to apply the pivoted QR decomposition to the transpose of an approximate output snapshot matrix, in order to identify important parameters which can then be used to subsample the fine training set in the RBM.

Table 1 Greedy parameters and QR Pivots for the Burgers’ equation
Fig. 1
figure 1

Greedy parameters for the Burgers’ equation and QR pivots of the true output snapshots matrix \({\mathfrak {Y}}\)

In Sect. 2.3, the usage of DEIM and QDEIM algorithms were discussed in the context of MOR. The DEIM algorithm uses a greedy sparse sampling of the left singular vector matrix (\(\mathbf {U}\)) of the snapshots to identify interpolation points, whereas QDEIM performs a QR decomposition with column pivoting on \(\mathbf {U}^{\textsf {T}}\). This implicates a similar phenomenon as observed above: QR with pivoting could select points of importance on different demands. It is also noticed that QR with pivoting connects the greedy algorithm with DEIM (QDEIM), which indicates that DEIM and QDEIM could also be used to select representative parameter samples if either is applied to the output snapshot matrix.

Figure 1 shows that the pivots of the QR decomposition on \({\mathfrak {Y}}^{\textsf {T}}\) gives similar points as those selected by the greedy algorithm. It then indicates that sample points selected by the greedy algorithm in a way are highly related to the interpolation points of QDEIM, if the same snapshot matrix is considered by both the greedy algorithm and QDEIM, which is, in our case, the output snapshot matrix \({\mathfrak {Y}}\). By exploiting this interpretation, we propose to use DEIM or other variants of DEIM in order to adapt the training set during the greedy algorithm.

To further support and motivate our proposed scheme, in the next subsection, we show that DEIM also has the similar capability of identifying the most representative parameter samples for dynamics, as that exhibited by the greedy algorithm in the RBM.

3.1.2 DEIM and Parametric Anisotropy

For a function of two variables \(\mathbf {f}(\mathbf {x}, \varvec{\mu }) : {\mathbb {R}}^{n} \times {\mathbb {R}}^{p} \rightarrow {\mathbb {R}}^{n}\), the DEIM algorithm first identifies a linear subspace \(\mathbf {U}\) and a small subset of points in the \(\mathbf {x}\) variable, based on the snapshots matrix \({\mathfrak {F}}\). One can analogously consider the mapping \(\mathbf {f}(\varvec{\mu }, \mathbf {x}) : {\mathbb {R}}^{p} \times {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{n}\) through a transpose of \(\mathbf {\mathfrak {F}}\). However, now the DEIM algorithm identifies a small subset of points in the \(\varvec{\mu }\) variable. We illustrate this on a toy example from [1]. Consider the following nonlinear, two parameter function:

$$\begin{aligned} \mathbf {f}(\mathbf {x}_{1}, \mathbf {x}_{2}, \varvec{\mu }) = \dfrac{1 + \frac{\pi ^{2}}{4}(\mu _{2} - \mu _{1} - (\mu _{1} + \mu _{2}) \mathbf {x}_{2})^{2} \sin ^{2}(\frac{\pi }{2}(\mathbf {x}_{1} + 1) )}{1 + (\mu _{1} + \mu _{2}) \cos (\frac{\pi }{2}(\mathbf {x}_{1} + 1)) }, \end{aligned}$$
(3.3)

where \(\mathbf {x} := [\mathbf {x}_{1}, \mathbf {x}_{2}] \in {\mathbb {R}}^{n \times 2}\) is the spatial variable obtained from the discretization of the two dimensional domain \(\varOmega := [-1 \,, 1] \times [-1 \,, 1]\) with 50 points in each spatial direction, resulting in \(n = 2500\). The parameter \(\varvec{\mu } := (\mu _{1}, \mu _{2}) \in {\mathbb {R}}^{2}\) belongs to the domain \({\mathcal {P}} := [-0.4 \,, 0.4] \times [-0.4 \,, 0.4]\). We collect 1600 snapshots of the function in the snapshots matrix \({\mathfrak {F}} \in {\mathbb {R}}^{2500 \times 1600}\), based on uniform, equally spaced samples of the parameter. In Fig. 2a, the singular values of the snapshot matrix \({\mathfrak {F}}\) are plotted. The rapid decay clearly demonstrates the reducibility of this function. We apply a cut-off of \(\epsilon _{\text {SVD}} = 10^{-10}\) for Algorithm 2.2 applied to both \({\mathfrak {F}}\) and \({\mathfrak {F}}^{\textsf {T}}\). The greedy interpolation points in Fig. 2b are those corresponding to the indices in the row vector \(\mathbf {I}\), obtained from applying Algorithm 2.2 to \({\mathfrak {F}}\). The greedy points in Fig. 2b are those \(\varvec{\mu }\) corresponding to the indices stored in \(\mathbf {I}\), obtained from applying Algorithm 2.2 to \({\mathfrak {F}}^{\textsf {T}}\). The distribution of the points determined by DEIM for both the spatial and parameter variable have a characteristic structure. The number of interpolation points with SVD truncation tolerance \(\epsilon _{\text {SVD}} = 10^{-10}\) was 48, a mere \(3\%\) of the total points. The spatial greedy points illustrate that while the variable \(x_{1}\) is equally important over the entire range of \([-1 \,, 1]\), the \(x_{2}\) variable has almost all its variation concentrated at \(x_{2} \in \{-1, 0, 1\}\). For the parameter variable, the greedy algorithm picks most of the samples from the boundary of the domain and from the diagonal going from the lower left to the upper right. There is a dense concentration of points around the corners \((-0.4, -0.4)\) and (0.4, 0.4). The choice of the greedy points is closely related to the structure of the function \(\mathbf {f}\) being approximated. In most of the existing MOR literature, the DEIM algorithm has been used mainly as a tool to speed up evaluations of nonlinear or nonaffine (parametric) functions in a ROM. However, through the toy example, we have demonstrated its capability to expose the nature of parametric dependence of a function. As seen in Fig. 2c, it is able to identify the regions in the parameter space where the function has large variations.

Fig. 2
figure 2

Toy problem demonstrating anisotropic choice of interpolation points. The colourbars indicate the order of selection of the parameters. Points in the blue end of the spectrum were selected earlier while those in the green/yellow regions of the spectrum were picked later during the course of the algorithm (Color figure online)

3.2 Subsampling the Training Set

Based on the observations in Sects. 3.1.1 and 3.1.2, it is evident that a substantial computational effort can be saved in the offline stage of the RBM if we appropriately (optimally) sample the training set. The rationale for the proposed approach is the following: the standard greedy algorithm scans through the entire training set at each iteration and evaluates the error estimator at each parameter. This approach can incur significant computational cost for training sets with a large number of parameters. The proposed algorithm aims at picking out a small subset of the training set containing the most informative parameters. As will be demonstrated numerically, the parameters match closely to those chosen by the standard greedy algorithm.

Based on our observation of Fig. 1 in Sect. 3.1.1 and Fig. 2 in Sect. 3.1.2, we propose to apply the pivoted QR decomposition and the DEIM algorithm (and its variants) to the snapshot matrix of the approximate output vector \(\widetilde{\mathbf {y}}\left( \mathbf {z}(t^{k},\varvec{\mu }), \varvec{\mu }\right) \). More specifically, we consider the output snapshot matrix given by

$$\begin{aligned} \widetilde{{\mathfrak {Y}}} := \begin{bmatrix} {[}\widetilde{\mathbf {y}}(\mathbf {z}(t^{0},\varvec{\mu }_{1}), \varvec{\mu }_{1})]^{\textsf {T}} &{} \cdots &{} [\widetilde{\mathbf {y}}(\mathbf {z}(t^{K}, \varvec{\mu }_{1}), \varvec{\mu }_{1})]^{\textsf {T}} \\ \vdots &{} \ddots &{} \vdots \\ {[}\widetilde{\mathbf {y}}(\mathbf {z}(t^{0},\varvec{\mu }_{N_{\text {train}}}), \varvec{\mu }_{N_{\text {train}}})]^{\textsf {T}} &{} \cdots &{} [\widetilde{\mathbf {y}}(\mathbf {z}(t^{K}, \varvec{\mu }_{N_{\text {train}}}), \varvec{\mu }_{N_{\text {train}}})]^{\textsf {T}} \end{bmatrix} \in {\mathbb {R}}^{N_{\text {train}} \times q N_{t}} \end{aligned}$$
(3.4)

with each row containing the snapshots of the approximated output quantity at \(K+1\) time instances corresponding to a given parameter sample.

Remark 3.1

In case of stead-state systems with a single output we apply the proposed subsampling approach on the approximate state snapshots. For this, we define \(\widetilde{{\mathfrak {Y}}} := \widetilde{\varvec{\mathfrak {X}}}^{\textsf {T}}\), \(\widetilde{\varvec{\mathfrak {X}}}\) being the snapshot matrix of the approximate state vector (\(\widetilde{\varvec{x}}(\varvec{\mu }) = \mathbf {V} \mathbf {z}(\varvec{\mu })\)) such that

$$\begin{aligned} \widetilde{\varvec{\mathfrak {X}}} := \left[ \widetilde{\varvec{x}}(\varvec{\mu }_{1}), \cdots , \widetilde{\varvec{x}}(\varvec{\mu }_{N_{\text {train}}})\right] \in {\mathbb {R}}^{n \times N_{\text {train}}}. \end{aligned}$$

For steady-state systems with multiple outputs, we define \(\widetilde{{\mathfrak {Y}}} := \left[ \widetilde{\mathbf {y}}(\varvec{\mu }_{1}), \cdots , \widetilde{\mathbf {y}}(\varvec{\mu }_{N_{\text {train}}}) \right] ^{\textsf {T}} \in ~{{\mathbb {R}}^{N_{\text {train}} \times q}}\).

Note that \(\widetilde{{\mathfrak {Y}}}\) can be obtained from a coarse or low-fidelity ROM of the original system without doing FOM simulation at all the parameter samples. We propose two sampling strategies: (i) apply pivoted QR decomposition to \(\widetilde{{\mathfrak {Y}}}^{\textsf {T}}\), (ii) apply DEIM or its variants to \(\widetilde{{\mathfrak {Y}}}\), in order to identify the structure of the parametric dependence of the output variable. Once the distribution of the interpolation points is identified, we can then adapt the training set for subsequent iterations of the greedy algorithm. We now outline the proposed approach and discuss different computational strategies.

figure c
figure d

The proposed sampling procedure consists of two stages. The first stage is identical to the standard RBM procedure outlined in Algorithm 2.1. A finely sampled training set \(\varXi _{\text {train}}^{f}\) is used. We consider two different stopping criteria for the first stage—(a) the first stage runs until the maximum estimated error is below a coarse tolerance denoted by \(\texttt {tol}^{c} > \texttt {tol}\), where tol is the desired error tolerance for the final ROM, or (b) at two successive iterations, the number of DEIM interpolation points or QR pivots does not change. The value of \(\texttt {tol}^{c}\) is user-defined and is of order \({\mathcal {O}}(1)\) in this work. Based on the two stopping criteria, two different schemes of training set subsampling are presented in Algorithms 3.1 and 3.2, respectively. For both algorithms, we do not reset the value of iter at the end of Stage 1, so the final value of iter upon convergence for Algorithms 3.1 and 3.2 is the total number of iterations required by either algorithms to converge to the desired tolerance. It is worth pointing out that an a posteriori error estimator [16] is used in both stages of the proposed algorithms so that the parameter sample picked at each iteration is the one at which an estimated output error is the largest. Furthermore, the adaptive basis enrichment technique proposed in [16] is implemented and serves as a possible solution to the issue of repeated parameter selection. The number of POD modes corresponding to a selected parameter sample is adaptively decided: when the estimated error is large, a higher number of POD modes (\(r_{\text {POD}}\)) for the selected parameter are added; otherwise fewer POD modes are added. This reduces the chance of the same parameter sample being repeatedly chosen at subsequent iterations. This adaptive basis enrichment is implemented for both stages of our proposed method. We now discuss several practical computational strategies in connection with Steps 11 and 12 in Algorithm 3.1 and Steps 12, 13 and 16 in Algorithm 3.2.

Remark 3.2

The active subspaces method (ASM) is an approach for parameter space reduction that has been recently applied in the context of MOR [20, 52, 56] mainly for the case of scalar valued outputs. The ASM identifies a set of important directions in the parameter space onto which the parameter vectors are projected. It does this by means of Monte Carlo sampling of the gradients (with respect to the parameter) of the scalar-valued output quantity at a selection of parameter samples. The active subspaces are the eigenspaces of the (truncated) covariance matrix of the gradients. Compared to ASM, our approach differs in two significant ways. Firstly, the proposed subsampling strategy is applicable to vector-valued output quantities. Secondly, ASM requires calculation of the gradient of the output. Moreover, the user still has to define the training set over which the gradient samples are acquired. Our proposed approach does not require the calculation of any additional quantity.

Remark 3.3

Our proposed subsampling strategy occuring in Step 11 of Algorithm 3.1 and Step 10 of Algorithm 3.2 shares similarity with the column subset selection problem (CSSP) in the fields of numerical linear algebra and data mining [38]. For some general data matrix \(\mathbf {D} \in {\mathbb {R}}^{N \times M}\), the CSSP aims to identify \(h < M\) independent columns of the matrix \(\mathbf {D}\) such that the residual \(\Vert \mathbf {D} - \mathbf {P}_{h} \mathbf {D} \Vert \) is minimized. Here, \(\mathbf {P}_{h} = \mathbf {S} \mathbf {S}^{\dagger }\) is a projection matrix and \(\mathbf {S} \in R^{N \times h}\) consists of the h extracted columns from \(\mathbf {D}\). A number of algorithms, both deterministic and randomized, have been proposed to solve the CSSP [10, 11, 13]. One popular approach is to apply some variant of the QR decomposition (column-pivoted, rank-revealing, hybrid, etc.) either to the data matrix \(\mathbf {D}\) or to the transpose of the (truncated) left (or right) singular matrix \(\mathbf {U}\) (or \(\mathbf {W}\)) of \(\mathbf {D}\). If we consider \(\mathbf {D}\) as the approximate output snapshot matrix, then our proposed algorithm using pivoted QR (or QDEIM) can be seen as a special case of the CSSP.

3.2.1 Training Set Subsampling: Scheme 1

We describe the proposed approach for the first scheme detailed in Algorithm 3.1. In the first stage, a low-fidelity ROM is built with a coarse tolerance \(\texttt {tol}^{c}\), over a finely sampled training set \(\varXi _{\text {train}}^{f}\). The intuition is that a low-fidelity approximation is sufficient to discover the parametric dependence of the output variable. At the end of the first stage, DEIM (or one of its variants) is applied to \(\widetilde{{\mathfrak {Y}}}\) to identify the interpolation points or a pivoted QR decomposition of \(\widetilde{{\mathfrak {Y}}}^{\textsf {T}}\) is used to identify the pivots. Once the set of interpolation points (or pivots) \(p_{i}\) is identified, we proceed to suitably subsample the finely sampled training set based on the distribution of the identified interpolation points or pivots. Different possibilities exist to achieve this. A simple approach is to consider the training set for the second stage \(\varXi _{\text {train}}\) populated only by the identified interpolation points or pivots. Consider the fine training set \(\varXi _{\text {train}}^{f} := \{\varvec{\mu }_{1}, \varvec{\mu }_{2}, \ldots , \varvec{\mu }_{N_{\text {train}}^{f}}\}\) with the subscript denoting the index corresponding to the position of a parameter in the set. Let \(\mathbf {I}\) be the vector whose elements are the indices of the chosen interpolation points or pivots. We define the subsampled training set \(\varXi _{\text {train}}\) as the one consisting of all those parameters \(\varvec{\mu }_{z}\) from \(\varXi _{\text {train}}^{f}\) such that their indices are present in \(\mathbf {I}\), i.e., \(\varXi _{\text {train}} := \{\varvec{\mu }_{z; z \in \mathbf {I}}\}\). If there are only a few interpolation points, this approach would lead to a rapid second stage of the algorithm. However, there may exist the pitfall that it may result in an overfit by which we mean that the resulting ROM after Stage 2 satisfies the desired tolerance tol only over the subsampled training set and does not generalize to other parameters in the parameter domain. We illustrate this phenomenon in the numerical tests. Another possible approach is to define a training budget m for the second stage and use an oversampling strategy like the Gappy-POD to ensure that the training set for the second stage consists of a total of m parameter samples.

3.2.2 Training Set Subsampling: Scheme 2

The first scheme of our proposed training set adaptation method requires a user-defined coarse tolerance. Choosing such a tolerance is rather heuristic. For some problems, a rough approximation may be enough to suitably capture all the parametric dependences, whereas a finer approximation may be needed for others. Therefore, for the second scheme we define a heuristic criterion that leads automatically to the second stage. To achieve this, we apply DEIM approximation (or a variant of DEIM) to the matrix \(\widetilde{{\mathfrak {Y}}}\) or the pivoted QR factorization to \(\widetilde{{\mathfrak {Y}}}^{\textsf {T}}\), at each iteration of the greedy algorithm. Whenever the DEIM interpolations or pivoted QR decompositions at two successive iterations turn out to be equal, we terminate the first stage. This can be easily calculated by comparing the number of interpolation indices or the pivot indices at two successive iterations. Then, the subsampling of the training set for the second stage is carried out with similar approaches as discussed above. Following this, the second stage is run, until the required tolerance tol is met. In Algorithm 3.2, \(\mathbf {I}_{\texttt {iter}}\) and \(\mathbf {I}_{\texttt {iter - 1}}\) in Stage 1 refer to the vector containing the interpolation indices identified by DEIM or its variants, or the indices of the QR pivots at the current iteration and the previous iteration, respectively.

Each of the two schemes has its own benefits or shortcomings. For Scheme 1, the burden of choosing an appropriate coarse tolerance lies with the user. This is highly problem-dependent. In the limit that \(\texttt {tol}^{c} = \texttt {tol}\), Scheme 1 is just the standard Greedy algorithm (Algorithm 2.1) with fixed training set. If \(\texttt {tol}^{c} \gg \texttt {tol}\), a very fast second stage can be ensured, leading to considerable speedup of the offline stage of the RBM. However, it is not known a priori if the chosen coarse tolerance is good. Scheme 2, on the other hand, automatizes the switching from Stage 1 to Stage 2 of the subsampling strategy by considering a heuristic criterion for the subspace approximation of the snapshot matrix \(\widetilde{{\mathfrak {Y}}}\). But, this comes with the additional cost of performing the DEIM algorithm or the pivoted QR decomposition, repeatedly. The success of both schemes is also highly dependent on the strategy adopted to construct the subsampled training set. In the numerical tests, we shall consider two approaches. In the first approach, we consider as many parameters in the subsampled training set as the number of DEIM interpolation points or QR pivots. For the second approach, we fix the cardinality of the subsampled training set to be m and then use oversampling strategies based on the Gappy-POD method to choose those m parameters in a principled way.

3.2.3 Complexity Analysis

The fine training set is used in Stage 1 of both Algorithms 3.1 and 3.2. However, the computational complexity is not high for Algorithm 3.1, since we use a coarse tolerance \(\texttt {tol}^{c}\) in Stage 1, so that the greedy algorithm converges within much fewer steps than when using the user desired tolerance in Stage 2. The computational complexity will grow with the decrease of the coarse tolerance used in Stage 1. However, as we have observed, a moderate value of \(\texttt {tol}^{c}\) is enough to figure out the parameter dependency of the output. The number of FOM simulations is indeed independent of the size of the fine training set, since the FOM simulation is implemented only at those “selected" parameter samples. The situation for Algorithm 3.2 is different since it involves the DEIM or QR algorithms at each iteration (see Step 10) to compute the interpolation points. A fine training set will indeed increase the cost of this step. Nevertheless, one can readily use recent techniques based on randomized linear algebra (such as randomized SVD, randomized QR, etc.) to keep the costs under control, see [22, 54].

In this section, we roughly compare the computational complexity of the standard RBM without training set subsampling in Algorithm 2.1 to that of RBM using the two subsampling strategies introduced in Algorithms 3.1 and 3.2. Our complexity analysis considers the worst-case costs involved with each algorithm and is meant as a simplified illustration of the benefits of subsampling the training set. For simplicity, we only count the dominant costs in each algorithm.

We begin by introducing some notation:

  • Let \({\mathcal {C}}_{\text {FOM}} := O(n^{2})\cdot N_{t}\) denote the cost of solving the FOM (for e.g., (2.2)) using some iterative scheme (GMRES, etc.), at a fixed parameter to obtain the snapshots.

  • Let \({\mathcal {C}}_{\text {SVD}} := O(n^{3})\) be the worst-case cost for performing the SVD.

  • Let \({\mathcal {C}}_{\text {Err}} := O(s)\) be the cost associated with evaluating the error estimator \(\varDelta (\varvec{\mu })\) for a fixed parameter.

  • Note that the dimension of the snapshot matrix in (3.4) is \(N_{\text {train}}^{f} \times q N_{t}\). For large-scale systems, usually we have \(N_{\text {train}}^{f} \le n\), if using, e.g., a sparse grid sampling, and often \(q N_{t} \le n\). Therefore, the cost for the DEIM algorithm (or its variants) or the pivoted QR decomposition should be less than \(O(n^{3})\). In general, we denote it as \({\mathcal {C}}_{\text {ss}} := O(n^{3})\).

The cost associated with Algorithm 2.1, viz., \({\mathcal {C}}_{I}\) is:

$$\begin{aligned} {\mathcal {C}}_{{I}} := \left( {\mathcal {C}}_{\text {FOM}} + {\mathcal {C}}_{\text {SVD}} + {\mathcal {C}}_{\text {Err}} \cdot N_{\text {train}}^{f} \right) \cdot N_{I} \end{aligned}$$

with \(N_{\text {train}}^{f}\) being the cardinality of the fine training set and \(N_{I}\) the number of iterations taken by the greedy algorithm to converge to the desired tolerance. The cost incurred by Algorithm 3.1, \({\mathcal {C}}_{{II}}\), is divided between Stage 1 and Stage 2. This is given by:

$$\begin{aligned} {\mathcal {C}}_{{II}} :=&\overbrace{\big ( {\mathcal {C}}_{\text {FOM}} + {\mathcal {C}}_{\text {SVD}} + {\mathcal {C}}_{\text {Err}} \cdot N_{\text {train}}^{f} \big ) \cdot N_{II,1}}^{\text {Stage 1 cost}} \\&+ \underbrace{{\mathcal {C}}_{\text {ss}} + \big ( {\mathcal {C}}_{\text {FOM}} + {\mathcal {C}}_{\text {SVD}} + {\mathcal {C}}_{\text {Err}} \cdot N_{\text {train}} \big ) \cdot N_{II,2}}_{\text {Stage 2 cost}}. \end{aligned}$$

In the above expression, \(N_{II,1}, N_{II,2}\) are, respectively, the number of iterations of the greedy algorithm in Stage 1 and Stage 2 of Algorithm 3.1. Typically, \(N_{II,2} > N_{II,1}\) since we use a coarse tolerance for Stage 1. Moreover, \(N_{\text {train}}\) is the cardinality of the subsampled training set \(\varXi _{\text {train}}\). Finally, the computational cost of Algorithm 3.2, \({\mathcal {C}}_{{III}}\), is as follows:

$$\begin{aligned} {\mathcal {C}}_{{III}} :=&\overbrace{\big ( {\mathcal {C}}_{\text {FOM}} + {\mathcal {C}}_{\text {SVD}} + {\mathcal {C}}_{\text {Err}} \cdot N_{\text {train}}^{f} + {\mathcal {C}}_{\text {ss}} \big ) \cdot N_{III,1}}^{\text {Stage 1 cost}} \\&+ \underbrace{\big ( {\mathcal {C}}_{\text {FOM}} + {\mathcal {C}}_{\text {SVD}} + {\mathcal {C}}_{\text {Err}} \cdot N_{\text {train}} \big ) \cdot N_{III,2}}_{\text {Stage 2 cost}} \end{aligned}$$

with \(N_{III,1}, N_{III,2}\) being, respectively, the number of iterations of the greedy algorithm in Stage 1 and Stage 2 of Algorithm 3.2. For Algorithm 3.1 or Algorithm 3.2 to be computationally better alternatives to Algorithm 2.1, we should have \({\mathcal {C}}_{I} > {\mathcal {C}}_{II}\) and \({\mathcal {C}}_{I} > {\mathcal {C}}_{III}\).

Cost benefit of Algorithm 3.1: To compare the costs of Algorithms 2.1 and 3.1, we look at the expression \({\mathcal {C}}_{I} - {\mathcal {C}}_{II}\).

$$\begin{aligned} {\mathcal {C}}_{I} - {\mathcal {C}}_{II} :=&\, {\mathcal {C}}_{\text {FOM}} (N_{I} - N_{II,1} - N_{II,2})\\&+ {\mathcal {C}}_{\text {SVD}} (N_{I} - N_{II,1} - N_{II,2})\\&+ {\mathcal {C}}_{\text {Err}} (N_{\text {train}}^{f} \cdot N_{I} - N_{\text {train}}^{f} \cdot N_{II,1} - N_{\text {train}} \cdot N_{II,2})\\&- {\mathcal {C}}_{\text {ss}}. \end{aligned}$$

By assuming \(N_{I} \approx N_{II,1} + N_{II,2}\) we have for the above expression

$$\begin{aligned} {\mathcal {C}}_{I} - {\mathcal {C}}_{II} \approx&\, {\mathcal {C}}_{\text {Err}} (N_{\text {train}}^{f} - N_{\text {train}}) N_{II,2} - {\mathcal {C}}_{\text {ss}}. \end{aligned}$$

For \(N_{\text {train}}^{f} \gg N_{\text {train}}\), the first term would dominate leading to \({\mathcal {C}}_{I} > {\mathcal {C}}_{II}\). Thus, we see that computational cost incurred by Algorithm 3.1 is less than that of Algorithm 2.1. In our analysis, we have assumed that \(N_{I} \approx N_{II,1} + N_{II,2}\). While this helps to simplify the analysis, it does not always hold. As we shall see in Sect. 4, for the example of Burgers’ equation this assumption is true, while it does not hold for the thermal block example. A more relaxed assumption is \(N_{I} > rapprox N_{II,1} + N_{II,2}\) using which, it can still be seen that \(C_{I} > C_{II}\).

Cost benefit of Algorithm 3.2:  We look at the difference in costs \({\mathcal {C}}_{I} - {\mathcal {C}}_{III}\).

$$\begin{aligned} {\mathcal {C}}_{I} - {\mathcal {C}}_{III} :=&\, {\mathcal {C}}_{\text {FOM}} (N_{I} - N_{III,1} - N_{III,2})\\&+ {\mathcal {C}}_{\text {SVD}} (N_{I} - N_{III,1} - N_{III,2})\\&+ {\mathcal {C}}_{\text {Err}} (N_{\text {train}}^{f} \cdot N_{I} - N_{\text {train}}^{f} \cdot N_{III,1} - N_{\text {train}} \cdot N_{III,2})\\&- {\mathcal {C}}_{\text {ss}} N_{III,1}. \end{aligned}$$

By making the assumption \(N_{I} \approx N_{III,1} + N_{III,2}\), we have for the above expression

$$\begin{aligned} {\mathcal {C}}_{I} - {\mathcal {C}}_{III} \approx&\, {\mathcal {C}}_{\text {Err}} (N_{\text {train}}^{f} - N_{\text {train}}) N_{III,2} - {\mathcal {C}}_{\text {ss}} N_{III,1}. \end{aligned}$$

Usually \(N_{\text {train}}^{f} \gg N_{\text {train}}\) and moreover, \(N_{III,1} < N_{III,2}\). Therefore, the first term would dominate, leading to \({\mathcal {C}}_{I} > {\mathcal {C}}_{III}\). Once again, we see that the RBM using the subsampling strategy (Algorithm 3.2) incurs a smaller cost when compared to Algorithm 2.1. In the case of Algorithm 3.2, just like earlier, the assumption \(N_{I} \approx N_{III,1} + N_{III,2}\) need not always hold. From the numerical tests in the next section, we actually have \(N_{I} > rapprox N_{III,1} + N_{III,2}\). The numerical results in Tables 2 and 3 show that Algorithms 3.1 and 3.2 indeed achieve speedups that can only be secured when they possess less computational complexity than Algorithm 2.1.

Remark 3.4

(High-dimensional parameter spaces) Both Algorithms 3.1 and 3.2, can be used when the parameter space is high-dimensional. The cost in Stage 2 will not be affected much, since a small training set identified from Stage 1 is used. The main increase in cost is due to the need to solve additional ROMs and estimate the error in Stage 1 of the proposed algorithms (Step 8 in Algorithms 3.1 and 3.2). If sparse grid sampling and a cheap error estimator are used, the cost will not increase fast. Actually, we can go a step further and make use of cheaply computable surrogate models of the error estimator as done in [17]. In that work, we considered a radial basis surrogate for the error estimator that is adaptively updated during the greedy algorithm. We only evaluate the actual error estimator over a few parameter samples. We then use this data to form a surrogate model, which can be used to evaluate the error for different parameter samples in the fine training set in Stage 1.

Remark 3.5

(Role of output quantity of interest) In many cases, there is often the requirement, based on the application, to have a good approximation for the entire state vector. However, in this work we have specifically focussed on a goal-oriented approach. For several applications, like in control systems or fluid dynamics, only a small number of state variables may be of interest. By focussing on those states alone, the resulting ROM dimension can be considerably lowered when compared to the case where the entire state needs to be well approximated. Also, it is indeed true that different output QoIs may have different influences on the parameter samples chosen. However, if some QoIs give rise to quantities with similar emphasis, then they may result in similar parameter samples being chosen. For example, a QoI defined by the mean of the solution over the spatial domain and a QoI defined by the sum of the solution over the spatial domain should result in similar samples. But a QoI defined by the mean of the solution over space probably gives different samples from the QoI defined by the maximum of the state over the spatial domain.

4 Numerical Results

We test the proposed adaptive training set subsampling algorithm on two examples. They are:

  1. 1.

    Viscous Burgers’ equation with one parameter,

  2. 2.

    thermal block with four parameters.

The first example is a nonlinear system, while the other is linear. All numerical tests are carried out in \({\text {MATLAB}}^{\textregistered }\) 8.5.0.197613 (R2015a) on a laptop with \({\text {Intel}} ^{\textregistered } {\text {Core}} ^{\mathrm{TM}}\) i5-7200U @ 2.5 GHz and 8 GB of RAM. Next, we describe the metrics used in all the numerical tests:

  • The results of the proposed algorithms (Algorithms 3.1 and 3.2) are compared against a standard implementation of the POD-Greedy algorithm (Algorithm 2.1) with a fixed training set. The implementation adopts Galerkin projection. The number of POD modes \(r_{\text {POD}},\, r_{\text {EI}} \) to enrich the RB and DEIM bases is determined at each iteration based on the adaptive approach proposed in [16]. We have also used the primal-dual error estimator proposed by the authors of [16] for our implementation of Algorithms 2.1, 3.1 and 3.2. The dual RB basis required for the error estimator is generated separately.

  • The fixed training set used for Algorithm 2.1 and the initial fine training set used for Algorithms 3.1 and 3.2 are the same.

  • For Algorithms 3.1 and 3.2, we apply (i) the pivoted QR decomposition to the transposed approximate output snapshot matrix \(\widetilde{{\mathfrak {Y}}}^{\textsf {T}}\) and, (ii) the DEIM variants on the approximate output snapshot matrix \(\widetilde{{\mathfrak {Y}}}\) as two approaches to subsample the fine training set.

  • The cut-off criterion to determine the number of pivots (h) for the pivoted QR decomposition in (3.2) of the approximate output snapshot matrix in (3.4) is based on the magnitude of the diagonal elements in the upper triangular matrix \(\mathbf {R}\), i.e., we set \(h = q\) based on the the smallest q such that \(|\mathbf {R}(q+1,q+1)|/|\mathbf {R}(1,1)| < \epsilon _{\text {QR}}\), with \(q \in \{1, 2, \ldots , \min (N_{\text {train}}, N_{t})\}\). The pivoted QR decomposition can effectively identify the rank of a matrix with a small diagonal \(\mathbf {R}(q+1,q+1)\). Although there are cases when the column pivoted QR decomposition fails, they are rare in practice [14]. A more robust rank-revealing QR factorization [14] can also be straightforwardly applied to our proposed subsampling algorithm. However, in this work, we simply use column pivoted QR. The intrinsic \({\text {MATLAB}}^{\textregistered }\) command qr is used with the options \(\texttt {vector}\) enabled, i.e., we call \([\mathbf {Q}, \mathbf {R}, p_{\text {qr}}] = \texttt {qr}(\widetilde{{\mathfrak {Y}}}^{\textsf {T}},\,\,'{} \texttt {vector}')\). It returns the pivot indices \(p_{\text {qr}}\) as a vector, from which we select the first h as our subsampling indices, i.e., \(\mathbf {I} = p_{\text {qr}}(1:h)\). Here, \(\mathbf {I}\) is the vector whose elements are the indices of the QR pivots and it lets us choose the subsampled training set \(\varXi _{\text {train}}\) for Stage 2 of our proposed method, based on the fine training set \(\varXi _{\text {train}}^{f}\) from Stage 1.

  • Our implementation of the k-means algorithm for KDEIM is based on the intrinsic \({\text {MATLAB}}^{\textregistered }\) function kmeans. We use five different initializations and pick the best configuration among the five.

  • The maximum true error over the test set is defined as

    $$\begin{aligned} \epsilon _{\text {t}}^{\text {max}} := \max \limits _{\varvec{\mu } \in \varXi _{\text {test}}} \left( \frac{1}{K+1} \sum _{k = 0}^{K} \Vert \mathbf {y}\left( \mathbf {x}(t^{k},\varvec{\mu })\right) - \widetilde{\mathbf {y}}\left( \mathbf {z}(t^{k},\varvec{\mu })\right) \Vert \right) . \end{aligned}$$
  • Reported runtimes for all the algorithms are obtained by considering the median value of five independent runs.

  • The quantity Iterations reported in Tables 2 to 6 refers to the total number of iterations (iter) of the corresponding greedy algorithm (Algorithms 2.1, 3.1 and 3.2) to converge to the desired tolerance.

4.1 Viscous Burgers’ Equation

We consider the following viscous Burgers’ equation defined on a 1-D domain \(\varOmega := [ 0 \,, 1]\):

$$ \begin{aligned} \begin{aligned} \frac{\partial x}{\partial t} + x \frac{\partial x}{\partial w}&= \mu \frac{\partial ^{2} x}{\partial w^{2}} + s(x,t), \,\,\, x(0, t) = 0 \,\, \& \,\, \frac{\partial x(1,t)}{\partial w} = 0,\\ y&= x(1,t), \end{aligned} \end{aligned}$$
(4.1)

where s(xt) denotes a forcing term defined later. The domain is discretized using the finite difference method with step size \(\varDelta w = 0.001\). We make use of a second order central difference discretization for the diffusion term and a first-order upwind scheme for the convection term. The resulting FOM is of dimension \(n = 1000\). For the time variable \(t \in [0 \,, 2]\), we make use of a first order implicit-explicit (IMEX) scheme with the diffusion term discretized implicitly and the nonlinear convection term discretized explicitly. The step size is \(\varDelta t = 0.001\). The constant input to the system is set to be \(s(x,t) \equiv 1\) and the initial condition is \(\mathbf {x}_{0} := {\mathbf {0}} \in {\mathbb {R}}^{n}\). The parameter domain of the viscosity \(\mu \) is \({\mathcal {P}} := [0.005, \, 1]\). The training set \(\varXi _{\text {train}}\) consists of 100 equally spaced samples in \({\mathcal {P}}\). For the RBM, we fix the tolerance to be \(\texttt {tol} = 1\cdot 10^{-6}\). To validate the ROM, we use a test set \(\varXi _{\text {test}}\) containing 300 randomly sampled parameters, different from those in the training set.

4.1.1 Greedy Algorithm with Fixed Training Set

We begin by applying the standard greedy algorithm (Algorithm 2.1) to the discretized model of the Burgers’ equation. The greedy algorithm requires \(t_{\text {greedy}} = 505.47\) seconds and 19 iterations to converge to the defined tolerance of \(1\cdot 10^{-6}\). The resulting ROM has RB dimension \(r_{\text {POD}} = 32\) along with \(r_{\text {EI}} = 33\) basis vectors for the DEIM projection matrix. The maximum true error over the test set is \(\epsilon _{\text {t}}^{\text {max}} = 2.07\cdot 10^{-8}\). Although the POD-Greedy algorithm with a fixed training set results in a ROM that meets the specified tolerance, its offline time is high and there is scope for improvement by considering a subsampled training set.

Table 2 Results of Algorithm 3.1 with varying \(\epsilon _{\text {SVD}}\), \(\epsilon _{\text {QR}}\) for Burgers’ equation
Table 3 Results of Algorithm 3.2 with varying \(\epsilon _{\text {SVD}}\), \(\epsilon _{\text {QR}}\) for Burgers’ equation

4.1.2 Greedy Algorithm Schemes 1 and 2

We apply Algorithms 3.1 and 3.2 to the Burgers’ equation, making use of both pivoted QR and the two DEIM variants (QDEIM and KDEIM) to identify the interpolation points \(\mathbf {I}\). Further, we consider three different SVD and QR cut-off tolerances (\(\epsilon _{\text {SVD}}\), \(\epsilon _{\text {QR}}\)) for the pivoted QR and DEIM variants \(\{1\cdot 10^{-4}, 1\cdot 10^{-6}, 1\cdot 10^{-8} \}\) to highlight the progressive refinement of the adapted training set in ‘difficult regions’ of the parameter space. The results are summarized in Tables 2 and 3 for Algorithms 3.1 and 3.2, respectively. The matrix \(\widetilde{{\mathfrak {Y}}} \in {\mathbb {R}}^{100 \times 81}\), for either algorithms, was assembled by collecting the snapshots of the output vector at every \(25^{\text {th}}\) time step. The training set in Stage 2 of both algorithms consists of interpolation points identified by QR, QDEIM or KDEIM. As revealed in the results, this choice is sufficient to produce ROMs that meet the required tolerance over the test set. For this example, there is not a big difference between the results of the two algorithms. Both schemes produce ROMs of almost identical RB, DEIM basis sizes (\(r_{\text {POD}},\,r_{\text {EI}}\)) and result in nearly the same maximum error over the test set.

Fig. 3
figure 3

Algorithm 3.2 for the Burgers’ Equation with SVD, QR tolerance \(\epsilon _{\text {SVD}}, \epsilon _{\text {QR}} = 1\cdot 10^{-4}\). The crossmarks denote the parameters in the subsampled training set. For KDEIM each colour represents one cluster; the centroids of each of the clusters make up the subsampled training set (Color figure online)

Fig. 4
figure 4

Algorithm 3.2 for the Burgers’ Equation with SVD, QR tolerance \(\epsilon _{\text {SVD}}, \epsilon _{\text {QR}} = 1\cdot 10^{-6}\). The crossmarks denote the parameters in the subsampled training set. For KDEIM each colour represents one cluster; the centroids of each of the clusters make up the subsampled training set (Color figure online)

We show the subsampled training sets resulting from Algorithm 3.2 using the pivoted QR, QDEIM and KDEIM variants, with different SVD, QR tolerances in Figs. 345. The black crosses denote those samples from the fine training set which were retained in Stage 2 of the algorithm. For the QDEIM variant, it is clear that the subsampled parameters are concentrated more around the lower viscosity regions of the parameter space. Thus, the method is able to successfully identify the physically more relevant points. Moreover, the parameter samples identified by QDEIM are very close to the ones identified by the method using a pivoted QR decomposition. This is not surprising since the former determines the interpolation points through a pivoted QR decomposition of \(\mathbf {U}^{\textsf {T}}\) (\(\mathbf {U}\) is the left singular matrix of \(\widetilde{{\mathfrak {Y}}}\)) whereas the latter applies the pivoted QR decomposition directly to \(\widetilde{{\mathfrak {Y}}}^{\textsf {T}}\). We also show the results of KDEIM, where the subsampled (selected) parameter samples and their corresponding clusters are presented. The subsampled points in this case are the centroids of the clusters. The clusters are smaller in size for the low viscosity regions, while they are comparatively larger in the high viscosity zone. The resulting subsampled training sets from Algorithm 3.1 display a similar trend.

For a given \(\epsilon _{\text {SVD}}\) or \(\epsilon _{\text {QR}}\), the runtimes for the QDEIM and KDEIM based training set adaptation are very close. Using the pivoted QR leads to a subsampled training set with one sample more than that generated by using QDEIM or KDEIM. This results in a marginally higher offline time for this method. One observation worth remarking is that, for some instances, using the KDEIM approach to identify the adapted training set leads to the greedy algorithm converging in fewer iterations. This is most likely due to the fact that the identified parameters in this case represent cluster centroids and are more representative of the average behaviour. This yields a more uniform approximation throughout the parameter domain, with each cluster average being well-represented. The QDEIM version, on the other hand, tends to identify points based on the SVD of the output snapshot matrix and tends to favour points away from the mean behaviour. We illustrate this in Fig. 6 for the case of \(\epsilon _{\text {SVD}} = 10^{-8}\) for Algorithm 3.1. It is evident that while the subsampling strategy using QDEIM results in a smaller magnitude of the maximum error over the test set (\(4.55\cdot 10^{-8}\) vs. \(6.88\cdot 10^{-8}\)), the approach using the KDEIM-based sampling leads to a more uniform distribution of the error over the test set.

On average, for a given SVD tolerance, Algorithm 3.1 is faster than Algorithm 3.2. This can be attributed to the fact that for the latter, the DEIM variant or the pivoted QR decomposition needs to be performed repeatedly to check the criterion in Step 12 of Algorithm 3.2. Since this involves performing an SVD, the associated costs are higher. The proposed subsampling algorithms result in a noticeable speedup of the POD-Greedy algorithm. For Algorithm 3.1, the maximum achieved speedup was 5.6 for \(\epsilon _{\text {SVD}} = 1\cdot 10^{-4}\) using QDEIM. However, the least speedup noticed was 4.0 for \(\epsilon _{\text {SVD}} = 1\cdot 10^{-8}\) using QDEIM. Also, for Algorithm 3.2 the maximum achieved speedup was 5.7 for \(\epsilon _{\text {SVD}} = 1\cdot 10^{-4}\) using KDEIM while the minimum speedup was 3.5 for \(\epsilon _{\text {SVD}} = 1\cdot 10^{-8}\) using QDEIM. Finally, we see from Tables 2 and 3 that the number of iterations to converge for Algorithm 2.1, Algorithms 3.1 and 3.2 are nearly the same. Thus, our assumption in the analysis from Sect. 3.2.3 that \(N_{I} \approx N_{II,1} + N_{II,2}\) and \(N_{I} \approx N_{III,1} + N_{III,2}\) holds true.

Fig. 5
figure 5

Algorithm 3.2 for the Burgers’ Equation with SVD, QR tolerance \(\epsilon _{\text {SVD}}, \epsilon _{\text {QR}} = 1\cdot 10^{-8}\). The crossmarks denote the parameters in the subsampled training set. For KDEIM each colour represents one cluster; the centroids of each of the clusters make up the subsampled training set (Color figure online)

Fig. 6
figure 6

Error plot for Algorithm 3.1 with tolerance \(\epsilon _{\text {SVD}} = 10^{-8}\) applied to the Burgers’ equation. The error between the true and reduced outputs \(\Vert \mathbf {y}\left( \mathbf {x}(t^{k},\varvec{\mu })\right) - \widetilde{\mathbf {y}}\left( \mathbf {z}(t^{k},\varvec{\mu })\right) \Vert \) is plotted over the duration of the simulation for all parameters in the test set

4.2 Thermal Block

The second example is a benchmark model of the time-dependent heat transfer in a thermal block. The governing PDE is given by

$$\begin{aligned} \begin{aligned} \frac{\partial \theta (\mathbf {w}, t, \varvec{\mu })}{\partial t} + \nabla \cdot (-\gamma (\mathbf {w}, \varvec{\mu })\nabla \theta (\mathbf {w}, t, \varvec{\mu }))&= 0, \qquad t \in [0,\,T]. \end{aligned} \end{aligned}$$
(4.2)

The domain \(\varOmega := (0 , 1) \times (0 , 1) \in {\mathbb {R}}^{2}\) is partitioned into five regions — \(\varOmega = \varOmega _{0} \cup \varOmega _{1} \cup \varOmega _{2} \cup \varOmega _{3} \cup \varOmega _{4}\) as shown in Fig. 7. The left boundary of the domain (\(\varGamma _{\text {in}}\)) is associated with an input heat flux of unit magnitude, the top and bottom boundaries (\(\varGamma _{\text {N}}\)) are associated with a Neumann boundary condition with zero flux and finally the right boundary (\(\varGamma _{\text {D}}\)) is fixed at zero. The state variable is the temperature \(\theta (\mathbf {w}, t)\) at a given spatial location \(\mathbf {w} \in \varOmega \), for a given time t. The initial condition is \(\theta (\mathbf {w},0) = 0\). The output is the average temperature measured at \(\varOmega _{2}\). The problem is parametrized by the heat conductivity \(\gamma \) in the subdomains (\(\varOmega _{0}, \varOmega _{1}, \varOmega _{2}, \varOmega _{3}\) and \(\varOmega _{4}\)); \(\gamma (\mathbf {w}, \varvec{\mu }) = 1\) when \(\mathbf {w} \in \varOmega _{0}\) and \(\gamma (\mathbf {w}, \varvec{\mu }) = \kappa _{i}\) whenever \(\mathbf {w} \in \varOmega _{i}\), \(i = 1,2,3,4\). We define the parameter vector \(\varvec{\mu } = [\kappa _{1}, \kappa _{2}, \kappa _{3}, \kappa _{4}]\). The governing PDE is discretized in space using linear finite elements with respect to a simplicial triangulation of the domain \(\varOmega \) obtained via the software gmsh [25]. It is further discretized in time using the implicit Euler scheme for a time ranging from \(t \in [0 \,, 1]\), with step size \(\varDelta t = 0.01\). The spatially discretized system has dimension \(n = 7488\). For more details on the model and the spatial discretization, the reader is referred to [51]. The discretized heat equation can be written in the form of (2.2). Since the problem is linear, we have \(\mathbf {f} \equiv 0\). For the numerical results, the parameter \(\varvec{\mu }\) is sampled from the domain \({\mathcal {P}} := [{1\cdot 10^{-5}} \,, {1\cdot 10^{-2}}] \times [{1\cdot 10^{-5}} \,, {1\cdot 10^{-2}}] \times [{1\cdot 10^{-4}} \,, 1] \times [{1\cdot 10^{-1}} \,, 1]\). For purposes of illustration, we consider the three parameter version of the thermal block problem by fixing \(\kappa _{4}\) to its mean value, i.e., \(\kappa _{4} = 0.5\). The training set \(\varXi _{\text {train}}\) consists of a tensor grid of \(N_{\text {train}} = 6^{3} = 216\) parameters, with 6 parameters sampled for each \(\kappa _{i}, \,\, i = 1,\, 2,\, 3\). The test parameter set consists of 100 parameters, randomly sampled from \({\mathcal {P}}\). The tolerance for the greedy algorithm is set to be \(\texttt {tol} = 1\cdot 10^{-3}\).

Fig. 7
figure 7

Thermal Block Example: Spatial domain and boundaries

4.2.1 Greedy Algorithm with Fixed Training Set

Applying Algorithm 2.1 with a fixed training set to the thermal block example results in a ROM of dimension \(r_{\text {POD}} = 74\), taking 53 iterations to converge in \(t_{\text {greedy}} = 694.73\) seconds. The maximum error over the test set is \(\epsilon _{t}^{\text {max}} = 9.78 \cdot 10^{-4}\). In Fig. 8, the training set \(\varXi _{\text {train}}\) and the greedy parameters identified by Algorithm 2.1 are shown. Of the 216 parameters in the training set, only 44 are chosen. The greedy parameters have a larger concentration at and around (0.01, 0.01, 0.0001), the upper right corner of the figure. In fact, the regions around the vicinity of the upper and right wall of the grid posses many greedy samples near them.

4.2.2 Greedy Algorithm Schemes 1 and 2

Similar to the Burgers’ equation, we now apply the proposed training set subsampling schemes to the thermal block example. We shall also illustrate the advantages of using oversampling. For both Algorithms 3.1 and 3.2, we consider \(\epsilon _{\text {QR}}, \epsilon _{\text {SVD}} = 1\cdot 10^{-10}\) and a coarse tolerance \(\texttt {tol}^{c} = 1\). The approximation to \({\mathfrak {Y}}\) is obtained by taking snapshots at every time step of the implicit Euler scheme. The results are summarized in Tables 4 and 5 for Algorithms 3.1 and 3.2, respectively. The first scheme does not lead to a successful ROM for both QDEIM and KDEIM whereas using the pivoted QR decomposition on \(\widetilde{{\mathfrak {Y}}}^{\textsf {T}}\) to identify the subsampled training set produces a successful ROM. For the second scheme of the proposed algorithm, both QDEIM and KDEIM result in a subsampled training set of cardinality \(N_{c} = 19\) while the pivoted QR approach gives \(N_{c} = 20\). However, both QR and QDEIM are unsuccessful in meeting the required ROM tolerance for the test set. On the other hand, KDEIM results in a ROM satisfying the tolerance, taking a significantly smaller number of iterations (40) to converge. The results seem to indicate that the subsampling approach is not entirely able to capture the full range of features over the training set. This is mainly due to the smaller number of parameters \(N_{c} = 19\), that the algorithm results in. Recall that for the standard greedy approach, 44 unique greedy parameters were determined. However, it is also to be noted that the performance of the ROMs resulting from either scheme on the test set is not bad. The maximum error is only slightly higher than the desired tolerance of \(\texttt {tol} = 1\cdot 10^{-3}\).

Fig. 8
figure 8

Thermal Block: Fine training set with 216 parameters and the 44 greedy parameters picked by Algorithm 2.1

Table 4 Thermal Block Results for Algorithm 3.1 for QDEIM, KDEIM and QR

Next, we perform oversampling to identify more parameters. For the standard DEIM approach, the number of interpolation points is equal to the rank \(\ell \) of the truncated left singular vectors of the snapshots matrix. For oversampling, we set the number of interpolation points to be \(m = 2 \ell \) and test the approaches based on maximizing the smallest singular value (Gappy-POD Eigenvector) and the approach based on clustering (Gappy-POD Clustering), both originally proposed in [45]. The results are given in Table 6. We see that both oversampling approaches result in ROMs that are validated to be accurate over the test set. The Gappy-POD Clustering method results in the smallest test error among the two and takes 40 iterations to converge. Notice that, compared to the previous two approaches based on QDEIM and KDEIM, the oversampling approach requires more time. This is not surprising, since a larger number of parameters is included in the coarse training set at Stage 2 of Algorithms 3.1 and 3.2. The speedup of the Gappy-POD Eigenvector variant is 3.9, while a speedup of 4.6 is achieved by the Gappy-POD Clustering variant. We show the subsampled training sets of both the approaches in Fig. 9. In particular, parameter samples anticipated by the Gappy-POD Clustering variant bear a close resemblance to the greedy parameter distribution in Fig. 8. In Fig. 10, we plot the mean error over time for each parameter in the test set. It is evident that both the proposed oversampling strategies, Gappy-POD Eigenvector (Fig. 10a) and Gappy-POD Clustering (Fig. 10b) have been successful in keeping the error below the desired tolerance, uniformly for all the parameters in the test set. Lastly, for the thermal block example, the total number of iterations to converge for Algorithms 2.1, 3.1 and 3.2 is not equal and varies based on the particular subsampling strategy used (see Tables 45 and 6). Our initial assumptions in Sect. 3.2.3 that \(N_{I} \approx N_{II,1} + N_{II,2}\), \(N_{I} \approx N_{III,1} + N_{III,2}\) do not hold while we see that \(N_{I} > rapprox N_{II,1} + N_{II,2}\), \(N_{I} > rapprox N_{III,1} + N_{III,2}\) is always the case.

Table 5 Thermal Block Results for Algorithm 3.2 for QDEIM, KDEIM and QR
Table 6 Thermal Block Results for Algorithm 3.1 with oversampling
Fig. 9
figure 9

Subsampling strategy using Gappy-POD with oversampling for the Thermal Block

Fig. 10
figure 10

Error plot for Algorithm 3.1 with coarse tolerance \(\texttt {tol}^{c} = 1\) and subsampling based on Gappy-POD Eigenvector and Gappy-POD Clustering applied to the thermal block example. The mean error over time between the true and reduced outputs - \((1/K+1) \sum _{i = 0}^{K } \Vert \mathbf {y}\left( \mathbf {x}(t^{k},\varvec{\mu })\right) - \widetilde{\mathbf {y}}\left( \mathbf {z}(t^{k},\varvec{\mu })\right) \Vert \) - is plotted for all parameters in the test set

5 Conclusions

We presented an efficient method to subsample the training set in the offline stage of the RBM. The proposed two-stage strategy is goal-oriented. It uses the pivoted QR decomposition, the DEIM algorithm or its variants to approximate the parameter-to-output map for time-dependent problems, taking advantage of the information from the pivots or the interpolation points to generate the subsampled training set. Different strategies to identify the interpolation points based on variants of the DEIM algorithm were discussed. The strategy of retaining as many parameters as the number of DEIM interpolation points is simple and leads to considerable speedup. However, there may be occasions where it is not robust, as demonstrated in the thermal block example. For such scenarios, we propose a principled oversampling strategy based on the Gappy-POD approach, to add additional parameters to the subsampled training set. This approach, while slightly more expensive, is robust and leads to reliable ROMs. The different subsampling strategies were tested on two numerical examples and were shown to yield, on average, a speedup of up to 5 for the offline stage of the RBM, without compromising the quality of the generated ROMs.

There exist several promising possibilities for further applications and improvements. One possible idea is in relation to the KDEIM-based subsampling strategy. It is worthwhile to consider the development of localized RBMs for each cluster identified by the algorithm. Such an approach could help overcome existing challenges related to training set adaptation methods that partition the parameter domain based on binary trees [23, 30]. Another promising idea is related to data assimilation within the model-based RBM framework. Assuming output data is available by means of sensors or other measurements, it should be possible to consider a QR decomposition of this parametric data matrix to identify a good initial training set. Such an approach is capable of incorporating data in a natural fashion, to aid the development of problem-tailored ROMs.