Learning multivariate functions with low-dimensional structures using polynomial bases

https://doi.org/10.1016/j.cam.2021.113821Get rights and content

Abstract

In this paper we propose a method for the approximation of high-dimensional functions over finite intervals with respect to complete orthonormal systems of polynomials. An important tool for this is the multivariate classical analysis of variance (ANOVA) decomposition. For functions with a low-dimensional structure, i.e., a low superposition dimension, we are able to achieve a reconstruction from scattered data and simultaneously understand relationships between different variables.

Introduction

The approximation of high-dimensional functions is an active research topic and of high relevance in numerous applications. We assume a setting where we are given scattered data about an unknown function. The related approximation problem is generally referred to as scattered data approximation. Classical methods suffer from the curse of dimensionality in this setting, i.e., the amount of required data increases exponentially with the spatial dimension. Finding ways to circumvent the curse poses the main challenge in this high-dimensional setting. Besides finding an approximation there is the ever more important question of interpretability. In many applications one wishes to understand how important the different dimensions and dimension interactions are in order to interpret the results.

In this paper we consider functions f:[1,1]dR defined over the cube with a high spatial dimension dN. Given scattered data about f, i.e., a finite sampling set X[1,1]d and evaluations y=(f(x))xX, we aim to construct an approximation of f and simultaneously understand its structure, i.e., how important variables and their interactions are. As opposed to black-box approximation or active learning, we may not choose the location of the nodes in X. This prohibits us from using well-established spatial discretizations such as sparse grids, see [1], [2], or rank-1 lattices, see [3], [4], [5], that use low-dimensional structures in the node set. Our approach to circumvent the curse of dimensionality is to assume sparsity in the (analysis of variance) ANOVA decomposition of the function, i.e., we assume that f is dominated by a small number of low-complexity interactions. This may also be referred to as sparsity-of-effects, see e.g. [6].

We focus on complete orthonormal systems {φk} in L2([1,1]d,ω) where the functions are tensor products of univariate polynomials, e.g., the Chebyshev polynomials. Any function from the weighted Lebesgue space L2([1,1]d,ω) can then be written as a series f(x)=kN0dckφk(x) with coefficients ckR, kN0d. Our method focuses on approximations using partial sums of the type SIf(x)=kIckφk(x), with grouped finite index sets IN0d that reflect the low-dimensional structure of f. Determining a frequency index set I that yields a good approximation while not scaling exponentially in d poses one of the main challenges.

The method presented here uses the classical ANOVA decomposition, see [2], [7], [8], [9], as a main tool. The decomposition is important in the analysis of the dimensions for multivariate, high-dimensional functions. It has also been used in understanding the reason behind the success of certain quadrature methods for high-dimensional integration [10], [11], [12] and also infinite-dimensional integration [13], [14], [15]. The unique and orthogonal ANOVA decomposition decomposes a d-variate function in 2d ANOVA terms where each term belongs to a subset of {1,2,,d}. The terms depend only on the variables in the corresponding subset and the number of these variables is the order of the ANOVA term.

Our method assumes sparsity by restricting the number of possible simultaneous dimension interactions. The knowledge that the function f has a structure such that it can be well approximated using this sparsity assumption is the only information we require a-priori. The approach allows us to learn the basis coefficients by solving a least-squares problem. The problem is hard to solve in general since we are dealing with a large system matrix, but we are able to apply the concept of grouped transformation, see [16], to tackle this issue. In summary, we present a method for the approximation of high-dimensional functions with a low-dimensional structure using possibly noisy scattered data.

The outline of the paper is as follows. In Section 2 we introduce some necessary preliminaries for weighted Lebesgue spaces with complete orthonormal systems of polynomials. Moreover, we discuss the non-equispaced fast cosine transform and the fast polynomial transform for the evaluation of Chebyshev partial sums and computing the basis exchange from any polynomial bases to the Chebyshev system, respectively. In Section 3 we consider the properties of the ANOVA decomposition in the previously explained setting of weighted Lebesgue spaces. The approximation method itself is discussed in Section 4 with numerical examples in Section 5.

Section snippets

Prerequisites, notation and orthogonal polynomials

Let ω̃:(1,1)R be a non-negative weight function with 11ω̃(x)dx=1 then we define the weighted Lebesgue space L2([1,1],ω̃)f:[1,1]R:fL2([1,1],ω̃)=11|f(x)|2ω̃(x)dxwith the inner product f,g11f(x)g(x)ω(x)dx.Moreover, we consider a complete orthonormal system of polynomials {φk}kN0 in L2([1,1],ω̃). Here, we have φkΠk with Πk denoting the set of polynomials of degree k. Taking the products φk(x)j=1dφkj(xj) we find that the system {φk}kN0d is an orthonormal basis in the tensor

Classical analysis of variance decomposition on the interval

In this section we introduce the ANOVA decomposition in the setting of weighted Lebesgue spaces with orthonormal polynomials as bases. See also [2], [7], [9], [20], [21]. For a given spatial dimension d we denote with D={1,2,,d} the set of coordinate indices and subsets as bold small letters, e.g., uD. The complement of those subsets is always with respect to D, i.e., uc=Du. For a vector xd we define xu=(xi)iu|u|. Furthermore, we use the p-norm (or quasi norm) of a vector which is

Approximation method

In this section, we present a method for the approximation of functions f:[1,1]d with a high spatial dimension dN such that fL2([1,1]d,ω). In scattered data approximation, the data consists of a finite set of sampling nodes X={x1,x2,,xM}[1,1]d and a vector of values yRM. Now, we assume that yif(xi), i.e., the entries of y are noisy evaluations of the function. Here, it is especially important that we cannot choose the location of the nodes xi. The space L2([1,1]d,ω) and the

Numerical experiments

In this section we apply the proposed approximation method to high-dimension benchmark functions. We start with an 8-dimensional function that is the sum of products of B-splines in Section 5.1. A similar function has been considered in [4]. In Section 5.2 we consider the well-known Friedman benchmark functions which have previously been used as an example for a synthetic regression problem, cf. [32], [33], [34], [35]. The method has been implemented as a Julia package [36]. The padding

Summary

In this paper we considered the classical ANOVA decomposition for functions f in weighted Lebesgue spaces L2([1,1]d,ω) with orthogonal polynomials as bases. Specifically, we proved relations between the basis coefficients of the projections Puf, the ANOVA terms fu, and the function f. Furthermore, we considered sensitivity analysis and truncating the ANOVA decomposition to a certain subset of terms.

We introduced a method to determine important ANOVA terms, i.e., terms with a high global

Acknowledgments

We thank Tino Ullrich and Toni Volkmer for fruitful discussions on the contents of this paper. Daniel Potts acknowledges funding by Deutsche Forschungsgemeinschaft (German Research Foundation) – Project-ID 416228727 – SFB 1410. Michael Schmischke is supported by the BMBF, Germany grant 01|S20053A.

References (36)

  • SloanI.H. et al.
  • PottsD. et al.

    Multivariate sparse FFT based on rank-1 Chebyshev lattice sampling

  • PlonkaG. et al.
  • WuC.F.J. et al.

    Experiments - Planning, Analysis, and Optimization

    (2011)
  • CaflischR. et al.

    Valuation of mortgage-backed securities using Brownian bridges to reduce effective dimension

    J. Comput. Finance

    (1997)
  • RabitzH. et al.

    General foundations of high dimensional model representations

    J. Math. Chem.

    (1999)
  • LiuR. et al.

    Estimating mean dimensionality of analysis of variance decompositions

    J. Amer. Statist. Assoc.

    (2006)
  • NiederreiterH.
  • View full text