1 Introduction

Many nonlinear programs (NLPs), in particular in engineering design, are nonconvex and multimodal. Their global optimization typically relies on the construction of converging convex/concave relaxations, i.e., convex/concave functions that under-/over-estimate the objective function and the constraints. In principle it would be desirable to construct the convex/concave envelopes, but this is typically not practical.

One approach for the convex relaxations are the so-called \(\alpha \)BB and \(\gamma \)BB relaxations, developed by Floudas and coworkers [1, 2, 21]. The methods are applicable to twice-continuously differential functions and rely on an estimation of the Hessian of the original functions. For elementary functions, convex/concave envelopes are known or it is possible to calculate tight relaxations. Note for instance the construction of envelopes of univariate functions described by Maranas and Floudas [22], the work by Liberti and Pantelides [20] for monomials of odd degree, and the work of Tawarmalani and Sahinidis [46, 47] for a class of fractional and lower semi-continuous functions.

McCormick [23, 24] has provided a framework for the convex/concave relaxations of factorable functions, i.e., functions that can be represented as a finite recursive composition of binary sums, binary products and a given library of univariate intrinsic functions. The relaxations of the univariate intrinsic functions are propagated based on two main theorems, which essentially allow the relaxation of expressions in the form \(F_1\circ f_1 + F_2 \circ f_2 \cdot F_3\circ f_3\). These relaxations are in general nonsmooth [30]. If all functions involved are smooth and the convex/concave envelopes of the functions are used in the composition theorem, then the convergence order is at least quadratic [7] even if natural interval extensions with linear convergence order are used for the enclosures of functions.

An alternative to McCormick’s relaxation is the auxiliary variable method (AVM) which employs auxiliary variables for each factor involved [42, 4850]. More precisely, instead of relaxing the functions, the nonconvex optimization problem is relaxed, i.e., the nonconvex problem is reformulated introducing auxiliary variables in such a way that the intrinsic functions are decoupled and can be relaxed one by one. A lower bound to the nonconvex problem is calculated via a relaxed NLP or linear program (LP).

Mitsos et al. [30] proposed the propagation of relaxations and their subgradients through procedures, thus extending the McCormick relaxations to the global optimization with algorithms embedded; examples in [30] demonstrate that optimizing in the original dimensional space can, for a class of problems, result in drastic computational savings compared to the AVM. The nonsmoothness of the relaxations implies the utilization of non-smooth optimization methods [16] for the calculation of lower bounds to the nonconvex optimization problem. The McCormick relaxations can be generalized in other ways [38], allowing also the relaxations of NLPs with dynamics (i.e., an ordinary differential equation or differential algebraic system) embedded [11, 12, 39, 40] as well as the relaxation of implicit functions [43]. Recently, Sahlodin and Chachuat [37] also proposed the so-called McCormick-Taylor models, whereby McCormick relaxations are propagated in addition to interval bounds for enclosing the remainder term. Two implementations of McCormick’s relaxations are MC++ [10] and modMC [13]; the former is freely available and used herein to calculate the McCormick relaxations.

While McCormick relaxations are clearly a very important tool, they have the limitation of only allowing univariate composition, i.e., univariate outer function. Herein, a generalization to multivariate outer functions is proposed via a reformulation of McCormick’s Composition theorem in terms of a simple optimization problem. The new theorem directly allows the relaxation and subgradient propagation through procedures similar to [30] under mild assumptions. It even gives rules for the propagation of the subdifferential.

Auxiliary variable method has two clear advantages compared to McCormick’s relaxations, namely that the relaxations are \((i)\) at least as tight and in some cases tighter and \((ii)\) differentiable for a larger class of functions [48]. On the other hand, McCormick relaxations have the advantage that the relaxations are constructed in the original space and allow for several generalizations. The generalization of McCormick relaxations presented here, makes the relationship of the two approaches explicit, yielding a, to the best of our knowledge previously unknown, interpretation of McCormick Relaxation’s as a decomposition method to solve the relaxed NLP constructed by AVM. We note that such decomposition methods for AVM have not been implemented by the global optimization community.

The proposed generalization allows a more direct relaxation of the product of functions, which proves to be at least as tight and in some cases tighter than McCormick’s product rule. It also allows the direct relaxation of multilinear product of functions, i.e., without resorting to recursive application of the bilinear rule. Similarly, the proposed theorem results in at least as tight and often tighter relaxations for the minimum/maximum and the division of two functions.

The rest of the paper is organized as follows. In Sect. 2 we review McCormick’s Composition Theorem and we give its generalization to multivariate outer functions, while in Sect. 3 we provide a way to propagate subgradient information. In Sect. 4 we discuss the relationship with AVM. We apply our results to compute relaxations of the product of two functions in Sect. 5, the minimum/maximum of two functions in Sect. 6 and the division of two functions in Sect. 7. We conclude and discuss future directions in Sect. 8.

2 Convex underestimator theorems

Theorem 1 is the main result in McCormick [23] and constructs convex/concave relaxations of composite functions where the outer function is univariate. Therein, \(\mathrm{mid}(\alpha ,\beta ,\gamma )\) gives the median of three real numbers; in the trivial case that \(\alpha =\beta =\gamma \) we have \(\mathrm{mid}(\alpha ,\beta ,\gamma )=\alpha \); otherwise it is the numerical value that is smaller than the maximum and/or larger than the minimum.

Theorem 1

(McCormick composition theorem [23]) Let \(Z\subset \mathbb {R}^n\) and \(X \subset \mathbb {R}\) be nonempty compact convex sets. Consider the composite function \(g=F \circ f(\cdot )\) where \(f:Z \rightarrow \mathbb {R}\), \(F:X\rightarrow \mathbb {R}\) and let \(f(Z)\subset X\). Suppose that convex/concave relaxations \(f^{cv},f^{cc}: Z\rightarrow \mathbb {R} \) of \(f\) on \(Z\) are known. Let \(F^{cv}: X \rightarrow \mathbb {R}\) be a convex relaxation of \(F\) on \(X\) and let \(x^{\min } \in X\) be a point where \(F^{cv}\) attains its minimum on \(X\). Then \(\bar{g}^{cv}: Z\rightarrow \mathbb {R}\),

$$\begin{aligned} \bar{g}^{cv}(\mathbf{z})=F^{cv}\left( \mathrm{mid} \{f^{cv}(\mathbf{z} ),f^{cc}(\mathbf{z}),x^{\min }\}\right) , \end{aligned}$$
(1)

is a convex relaxation of \(g\) on \(Z\).

A similar theorem exists for the concave relaxation. Below we give an equivalent, yet more convenient to generalize, definition of the McCormick relaxation.

Proposition 1

Let \(g^{cv}: Z\rightarrow \mathbb {R}\)

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{x \in X}\{ F^{cv}(x)| f^{cv}(\mathbf{z})\le x\le f^{cc}(\mathbf{z})\} \end{aligned}$$
(2)

For the function \(\bar{g}^{cv}\) defined by (1) there holds

$$\begin{aligned} \bar{g}^{cv}(\mathbf{z})=g^{cv}(\mathbf{z}) \end{aligned}$$

for all \(\mathbf{z}\in Z\).

Proof

Note that we clearly have \(f^{cv}(\mathbf{z})\le f^{cc}(\mathbf{z})\) for all \(\mathbf{z} \in Z\) and that since \(f(Z)\subset X\) there holds \([f^{cv}(\mathbf{z}),f^{cc}(\mathbf{z})]\cap X\ne \emptyset \) for all \(\mathbf{z}\in Z\). Furthermore, let \(x^{\min }\) be the minimum of \(F^{cv}\) in \(X\).

We consider all three cases. If

$$\begin{aligned} \text {mid} \{f^{cv}(\mathbf{z} ),f^{cc}(\mathbf{z}),x^{\min }\}=x^{\min } \end{aligned}$$

then we have

$$\begin{aligned} g^{cv}(z)=\min _{x\in X}\{ F^{cv}(x)| f^{cv}(\mathbf{z})\le x\le f^{cc}(\mathbf{z})\}=F^{cv}(x^{\min })=\bar{g}^{cv}(\mathbf{z}). \end{aligned}$$

If on the other hand

$$\begin{aligned} \text {mid}\{f^{cv}(\mathbf{z} ),f^{cc}(\mathbf{z}),x^{\min }\}=f^{cv}(\mathbf{z} ) \end{aligned}$$

we note that \(f^{cv}(\mathbf{z} ) \le f^{cc}(\mathbf{z} )\) and thus \(x^{\min }\le f^{cv}(\mathbf{z})\le f^{cc}(\mathbf{z})\). Since \(F^{cv}\) is convex it must be nondecreasing for \(x\ge x^{\min }\) [30]. In addition we have \(x^{\min } \le f^{cv}(\mathbf{z})\le f(\mathbf{z})\) and therefore \(f^{cv}(\mathbf{z})\in X\). Thus, we have

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{x\in X}\{ F^{cv}(x)| f^{cv}(\mathbf{z})\le x\le f^{cc}(\mathbf{z})\}=F^{cv}\circ f^{cv}(\mathbf{z})=\bar{g}^{cv}(\mathbf{z}). \end{aligned}$$

Similarly, if

$$\begin{aligned} \text {mid}\{f^{cv}(\mathbf{z} ),f^{cc}(\mathbf{z}),x^{\min }\}=f^{cc}(\mathbf{z} ) \end{aligned}$$

we have \(f^{cv}(\mathbf{z})\le f^{cc}(\mathbf{z})\le x^{\min }\). Since \(F^{cv}\) is convex it must be nonincreasing for \(x\le x^{\min }\). In addition we have \(f(\mathbf{z}) \le f^{cc}(\mathbf{z})\le x^{\min }\) and therefore \(f^{cc}(\mathbf{z})\in X\). Thus, we have

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{x\in X}\{ F^{cv}(x)| f^{cv}(\mathbf{z})\le x\le f^{cc}(\mathbf{z})\}=F^{cv}(f^{cc}(\mathbf{z}))=\bar{g}^{cv}(\mathbf{z}). \end{aligned}$$

\(\square \)

Theorem 2 gives a generalization of Theorem 1 for multivariate outer functions. Its proof makes use of Lemma 1, which we will also use in the development of subgradient propagation in Sect. 3. Note that \(\partial g(x)\) denotes the subdifferential of \(g\) at \(x\), i.e., the set of all subgradients.

Lemma 1

[17] Let \(f_1,\ldots ,f_m\) be \(m\) convex functions from \(\mathbb {R}^n \rightarrow \mathbb {R}\) and let \(F\) be a convex and non-decreasing function from \(\mathbb {R}^m \rightarrow \mathbb {R}\). Then \(g(x)=F(f_1(x),\ldots ,f_m(x))\) is a convex function. Furthermore

$$\begin{aligned} \partial g(x)=\left\{ \sum _{i=1}^m \rho _i s_i: (\rho _1,\ldots ,\rho _m) \in \partial F(f_1(x),\ldots ,f_m(x)),s_i \in \partial f_i(x) \quad \forall i=1,\ldots ,m \right\} . \end{aligned}$$

Theorem 2

Let \(Z\subset \mathbb {R}^n\) and \(X \subset \mathbb {R}^m\) be nonempty compact convex sets. Consider the composite function \(g=F(f_1(\mathbf z),\ldots ,f_m(\mathbf{z}))\), where \(F:X\rightarrow \mathbb {R}\) and for \(i\in I= \{1,\ldots ,m\}\), \(f_i:Z \rightarrow \mathbb {R}\) are continuous functions, and let

$$\begin{aligned} \{\left( f_i(\mathbf{z}),\ldots ,f_m(\mathbf{z})\right) | {\mathbf{z}} \in Z\}\subset X. \end{aligned}$$
(3)

Suppose that convex relaxations \(f^{cv}_i: Z\rightarrow \mathbb {R} \) and concave relaxations \(f^{cc}_i: Z\rightarrow \mathbb {R} \) of \(f_i\) on \(Z\) are known for every \(i\in I\). Let \(F^{cv}: X \rightarrow \mathbb {R}\) be a convex relaxation of \(F\) on \(X\) and \(F^{cc}: X \rightarrow \mathbb {R}\) be a concave relaxation of \(F\) on \(X\). Then \(g^{cv}: Z\rightarrow \mathbb {R}\),

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{\mathbf{x} \in X} \left\{ F^{cv}(\mathbf{x})| f^{cv}_i(\mathbf{z})\le x_i \le f^{cc}_i(\mathbf{z}), \quad \forall i\in I \right\} \end{aligned}$$

is a convex relaxation of \(g\) on \(Z\) and \(g^{cc}: Z\rightarrow \mathbb {R}\),

$$\begin{aligned} g^{cc}(\mathbf{z})=\max _{\mathbf{x} \in X} \left\{ F^{cc}(\mathbf{x})| f^{cv}_i(\mathbf{z})\le x_i \le f^{cc}_i(\mathbf{z}), \quad \forall i\in I \right\} \end{aligned}$$

is a concave relaxation of \(g\) on \(Z\).

Proof

First we prove that \(g^{cv}\) underestimates \(g\) on \(Z\). Using (3) and the fact that \(f^{cv}_i(\mathbf{z})\le f_i(\mathbf{z}) \le f^{cc}_i(\mathbf{z}) \) we obtain

$$\begin{aligned} g^{cv}(\mathbf{z})&= \min _{\mathbf{x} \in X} \left\{ F^{cv}(\mathbf{x})| f^{cv}_i(\mathbf{z})\le x_i \le f^{cc}_i(\mathbf{z}), \quad \forall i\in I \right\} \\&\le \min _{\mathbf{x} \in X} \left\{ F^{cv}(\mathbf{x})| x_i = f_i(\mathbf{z}), \quad \forall i\in I \right\} \\&= F^{cv} \left( f_1(\mathbf{z}),\ldots ,f_m(\mathbf{z}) \right) \\&\le F \left( f_1(\mathbf{z}),\ldots ,f_m(\mathbf{z}) \right) =g(\mathbf{z}). \end{aligned}$$

Next we prove that \(g^{cv}\) is convex. Consider the function \(h\) defined on \(X \times X\) by

$$\begin{aligned} h({\varvec{\chi }}^{cv},{\varvec{\chi }}^{cc})=\min _{\mathbf{x} \in X \subset \mathbb {R}^m}\{ F^{cv}(\mathbf{x})| -\mathbf{x}\le -{\varvec{\chi }}^{cv},\mathbf{x} \le -{\varvec{\chi }}^{cc} \} \end{aligned}$$

with \(F^{cv}\) convex.

From convexity of \(F^{cv}\), the function \(h\) is increasing and convex as a perturbation function of a convex problem [8]. Observing that

$$\begin{aligned} g^{cv}(\mathbf{z})=h\left( f_1^{cv}(\mathbf z),\ldots ,f_m^{cv}(\mathbf{z}),-f_1^{cc}(\mathbf z),\ldots ,-f_m^{cc}(\mathbf{z})\right) \end{aligned}$$

and applying Lemma 1 we obtain convexity of \(g^{cv}\). Note, that the negative sign at the right hand side of the second constraint in the definition of \(h\), was chosen conveniently to negate the concave terms \(f_i^{cc}\) and decompose \(g^{cv}\) to a convex function of convex functions. The proof for \(g^{cc}\) is analogous.\(\square \)

We note that \(g^{cv}/g^{cc}\) is not in general the convex/concave envelope of \(g\) even if \(F^{cv},f^{cv}\) are the convex envelopes and \(F^{cc},f^{cc}\) the concave envelopes of \(F,f\) respectively, see e.g., Fig. 1.

Fig. 1
figure 1

Take \(Z=[-2,2]\) and \(g,f_1,f_2: Z \rightarrow \mathbb {R}\), such that \(f_1(z)=z^2\), \(f_2(z)=z\), \(g(z)=f_1(z)\cdot f_2(z)\). The convex relaxation \(g^{cv}\) of \(f\) in \([-2,2]\) proposed in (20) is tighter than McCormick’s relaxation \(\bar{g}^{cv}\) given by Eq. (22)

The definitions of \(g^{cv}/g^{cc}\) at a point \(\mathbf{z}\) involve the minimization/maximization of the convex/concave relaxation \(F^{cv},F^{cc}\) of \(F\), where the convex/concave relaxations are computed over \(X\) and the optimization is over \(X\cap B\) where B is a box defined by \(f_i^{cv}(\mathbf{z})\), \(f_i^{cc}(\mathbf{z})\). This is typically a relatively easy convex problem to solve as \(F^{cv}\), \(F^{cc}\) are usually simple functions. In many cases, including the binary product of functions (Sect. 5), the solution can be described as a function of \(\mathbf{z}\) in closed form.

Similarly to McCormick’s relaxations, nested functions can be handled by recursive application of the theorem and do not present any difficulty. The only requirement is the availability of closed form solutions or reliable algorithms to solve the convex problems.

For the rest of the paper, unless otherwise stated, we assume that

$$\begin{aligned} \left[ f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z})\right] \times \cdots \times \left[ f_m^{cv}(\mathbf{z}),f_m^{cc}(\mathbf{z})\right] \subset X. \end{aligned}$$
(4)

Note that this is without loss of generality as we can always take

$$\begin{aligned} \bar{f}_i^{cv}(\mathbf{z})=\max \left\{ f_i^{cv}(\mathbf{z}),\min _{\mathbf{x} \in X}x_i \right\} ,\quad \bar{f}_i^{cc}(\mathbf{z})=\min \left\{ f_i^{cc}(\mathbf{z}),\max _{\mathbf{x}\in X}x_i \right\} . \end{aligned}$$

More specifically, if we assume that \(X\) is a box defined as \(\left[ f_1^L,f_1^U\right] \times \cdots \times \left[ f_m^L,f_m^U\right] \) where \(\left[ f_i^L,f_i^U\right] \) is an inclusion function of \(f_i\) on \(Z\) we can take

$$\begin{aligned} \bar{f}_i^{cv}(\mathbf{z})=\max \left\{ f_i^{cv}(\mathbf{z}),f_i^L \right\} ,\quad \bar{f}^{cc}(\mathbf{z})=\min \left\{ f_i^{cc}(\mathbf{z}),f_i^U \right\} . \end{aligned}$$

Corollary 3 gives a simplified version of Theorem 2 in the case of monotonicity, which we also utilize to compute convex/concave relaxations for the minimum/maximum of two functions, Sect. 6.

Corollary 3

If in addition to the assumptions of Theorem 2, Assumption (4) holds and

  1. 1.

    \(F^{cv}\) is monotonic increasing then

    $$\begin{aligned} g^{cv}(\mathbf{z})=F^{cv}\left( f_1^{cv}(\mathbf{z}),..,f_m^{cv}(\mathbf{z})\right) \end{aligned}$$

    is a convex relaxation of \(g\).

  2. 2.

    \(F^{cv}\) is monotonic decreasing then

    $$\begin{aligned} g^{cv}(\mathbf{z})=F^{cv}\left( f_1^{cc}(\mathbf{z}),..,f_m^{cc}(\mathbf{z})\right) \end{aligned}$$

    is a convex relaxation of g.

  3. 3.

    \(F^{cc}\) is monotonic increasing then

    $$\begin{aligned} g^{cc}(\mathbf{z})=F^{cc}\left( f_1^{cc}(\mathbf{z}),..,f_m^{cc}(\mathbf{z})\right) \end{aligned}$$

    is a concave relaxation of \(g\).

  4. 4.

    \(F^{cc}\) is monotonic decreasing then

    $$\begin{aligned} g^{cc}(\mathbf{z})=F^{cc}\left( f_1^{cv}(\mathbf{z}),..,f_m^{cv}(\mathbf{z})\right) \end{aligned}$$

    is a concave relaxation of \(g\).

The convexity of \(g^{cv}\) and concavity of \(g^{cc}\) in this case is well known, e.g., [8].

3 Subgradient propagation

Theorem (2) allows the evaluation of the convex/concave relaxation of an arbitrary composite function at a point \(\mathbf{z}\), provided that convex/concave relaxations of the intrinsic functions are available. As demonstrated in Mitsos et al. [30] the calculation of subgradients of the convex/concave relaxations is useful. In this section the results of Mitsos et al. [30] are generalized to multivariate outer-functions and to the entire subdifferential.

Lemma 2

[Adapted from strong duality theorem.] Consider the problem

$$\begin{aligned} h({\varvec{\chi }}^{cv},{\varvec{\chi }}^{cc})=\min _{\mathbf{x} \in X \subset \mathbb {R}^m}\{ F^{cv}(\mathbf{x})| -\mathbf{x}\le -{\varvec{\chi }}^{cv}, \mathbf{x} \le -{\varvec{\chi }}^{cc} \} \end{aligned}$$

with \(F^{cv}\) convex. Then \(\mathbf{u}=\left[ \begin{array}{c} \hat{\varvec{\lambda }}^{cv}\\ \hat{\varvec{\lambda }}^{cc} \end{array}\right] \) is an optimal solution of the dual problem \(D(\hat{{\varvec{\chi }}}^{cv},\hat{{\varvec{\chi }}}^{cc})\) given by

$$\begin{aligned} \max _{({{\varvec{\lambda }}^{cv}},{\varvec{\lambda }}^{cc})} \min _{\mathbf{x} \in X}\{ F^{cv}(\mathbf{x})+{({\varvec{\lambda }}^{cv})}^{T} (-\mathbf{x}+\hat{\varvec{\chi }}^{cv}) + ({\varvec{\lambda }}^{cc})^{T}(\mathbf{x} + \hat{\varvec{\chi }}^{cc}) \} \end{aligned}$$

at \(\hat{{\varvec{\chi }}}= \left[ \begin{array}{c} \hat{\varvec{\chi }}^{cv}\\ \hat{\varvec{\chi }}^{cc} \end{array}\right] \) with \(\hat{{\varvec{\chi }}}^{cv}\le -\hat{{\varvec{\chi }}}^{cc}\), if and only if \(\mathbf{u}\in \partial h(\hat{\varvec{\chi }}^{cv},\hat{\varvec{\chi }}^{cc}).\)

Proof

The proof is based on the proof of the strong duality theorem in [14]. If \( \mathbf{u}=\left[ \begin{array}{c} \hat{\varvec{\lambda }}^{cv}\\ \hat{\varvec{\lambda }}^{cc} \end{array}\right] \ge 0 \) solves the dual then

$$\begin{aligned} \min _{\mathbf{x} \in X} F^{cv}(\mathbf{x})+\mathbf{u}^T \left[ \begin{array}{c} -\mathbf{x}+\hat{\varvec{\chi }}^{cv} \\ \mathbf{x} + \hat{\varvec{\chi }}^{cc} \end{array} \right] =h(\hat{{\varvec{\chi }}}^{cv},\hat{{\varvec{\chi }}}^{cc}). \end{aligned}$$

For any fixed \(\bar{{\varvec{\chi }}}=\left[ \begin{array}{c} \bar{\varvec{\chi }}^{cv}\\ \bar{\varvec{\chi }}^{cc} \end{array}\right] \in \mathbb {R}^{2m}\), if \(\mathbf{x} \in X\) with \(\left[ \begin{array}{c} -\mathbf{x} \\ \mathbf{x} \end{array} \right] \le \left[ \begin{array}{c} -\bar{\varvec{\chi }}^{cv}\\ -\bar{\varvec{\chi }}^{cc} \end{array} \right] \), then there holds

$$\begin{aligned} F^{cv}(\mathbf{x})-\mathbf{u}^T (\bar{{\varvec{\chi }}}-\hat{{\varvec{\chi }}})\ge F^{cv}(\mathbf{x})+\mathbf{u}^T \left[ \begin{array}{c} -\mathbf{x}+\hat{\varvec{\chi }}^{cv} \\ \mathbf{x} + \hat{\varvec{\chi }}^{cc} \end{array} \right] \ge h(\hat{{\varvec{\chi }}}^{cv},\hat{{\varvec{\chi }}}^{cc}). \end{aligned}$$

Thus, for fixed \(\bar{{\varvec{\chi }}}\)

$$\begin{aligned} -\mathbf{u}^T (\bar{{\varvec{\chi }}}-\hat{{\varvec{\chi }}})+h(\bar{{\varvec{\chi }}})= \begin{array}{cc} \displaystyle \min _{\mathbf{x} \in X} &{} F^{cv}(\mathbf{x})-\mathbf{u}^T (\bar{{\varvec{\chi }}}-\hat{{\varvec{\chi }}})\\ \text {s.t.} &{} -\mathbf{x}\le -\bar{{\varvec{\chi }}}^{cv}\\ &{} \mathbf{x}\le -\bar{{\varvec{\chi }}}^{cc} \end{array}\ge h(\hat{{\varvec{\chi }}}) \end{aligned}$$

and \(\mathbf{u}\in \partial h(\hat{\varvec{\chi }}^{cv},\hat{\varvec{\chi }}^{cc}).\)

For the converse assume \(\mathbf{u}=\left[ \begin{array}{c} \hat{\varvec{\lambda }}^{cv}\\ \hat{\varvec{\lambda }}^{cc} \end{array}\right] \in \partial h(\hat{\varvec{\chi }}^{cv},\hat{\varvec{\chi }}^{cc}).\) Noting that \(h\) is non-decreasing, let \(\mathbf{e}^i\) denote the \(i\)th unit vector and observe

$$\begin{aligned} h(\hat{\varvec{\chi }})\ge h(\hat{\varvec{\chi }}-\mathbf{e}^i)\ge h(\hat{\varvec{\chi }}) -u_i \end{aligned}$$

for all \(i\), and thus \(\mathbf{u}\ge \mathbf{0}\) and \(\mathbf{u}\) is dual feasible. For any \(\tilde{\mathbf{x}}\in X\) let \(\tilde{{\varvec{\chi }}}=\left[ \begin{array}{c} \tilde{\mathbf{x}} \\ -\tilde{\mathbf{x}} \end{array} \right] \). We have

$$\begin{aligned} F^{cv}(\tilde{\mathbf{x}})= h(\tilde{{\varvec{\chi }}})\ge h(\hat{{\varvec{\chi }}}) +\mathbf{u}^T (\tilde{{\varvec{\chi }}}-\hat{{\varvec{\chi }}})=h(\hat{{\varvec{\chi }}}) +\mathbf{u}^T \left[ \begin{array}{c} \tilde{\mathbf{x}}-\hat{{\varvec{\chi }}}^{cv}\\ -\tilde{\mathbf{x}}-\hat{{\varvec{\chi }}}^{cc} \end{array} \right] . \end{aligned}$$

or rearranging

$$\begin{aligned} h(\hat{{\varvec{\chi }}})\le F^{cv}(\tilde{\mathbf{x}}) +\mathbf{u}^T \left[ \begin{array}{c} -\tilde{\mathbf{x}}+\hat{{\varvec{\chi }}}^{cv}\\ \tilde{\mathbf{x}}+\hat{{\varvec{\chi }}}^{cc} \end{array} \right] , \end{aligned}$$

and therefore

$$\begin{aligned} h(\hat{{\varvec{\chi }}})\le \min _{\mathbf{x}\in X} F^{cv}(\mathbf{x}) +\mathbf{u}^T \left[ \begin{array}{c} -{\mathbf{x}}+\hat{{\varvec{\chi }}}^{cv}\\ {\mathbf{x}}+\hat{{\varvec{\chi }}}^{cc} \end{array} \right] . \end{aligned}$$

On the other hand, weak duality yields the opposite inequality and thus equality holds and \(\mathbf{u}\) is optimal.\(\square \)

Theorem 4

The subdifferential of \(g^{cv}\) at \(\hat{\mathbf{z}}\) is given by

$$\begin{aligned} \partial g^{cv}(\hat{\mathbf{z}})= \left\{ \begin{array}{c} \sum \limits _{i=1}^m \rho _i^{cv} s_i^{cv}-\rho _i^{cc} s_i^{cc}| \end{array} \begin{array}{c} \left( \rho _1^{cv},\ldots ,\rho _m^{cv},\rho _1^{cc},\ldots ,\rho _m^{cc}\right) \in \Lambda (\hat{\mathbf{z}}),\\ s_i^{cv} \in \partial f_i^{cv}(\hat{\mathbf{z}}),s_i^{cc} \in \partial f_i^{cc}(\hat{\mathbf{z}}) \quad \forall i=1,\ldots ,m \end{array} \right\} , \end{aligned}$$

where

$$\begin{aligned} \Lambda (\hat{\mathbf{z}})&= \mathop {\mathrm{arg\,max}}\limits _{({\varvec{\lambda }}^{cv},{\varvec{\lambda }}^{cc})} \left\{ \min _{\mathbf{x} \in X} L(\mathbf{x},{\varvec{\lambda }}^{cv},{\varvec{\lambda }}^{cc},\hat{\mathbf{z}})\right\} ,\\ L(\mathbf{x},{\varvec{\lambda }}^{cv},{\varvec{\lambda }}^{cc},\hat{\mathbf{z}})&= F^{cv}(\mathbf{x})+\sum _{i=1}^m {\lambda }_i^{cv} \left( -\mathbf{x}+f^{cv}_i(\hat{\mathbf{z}})\right) + \lambda _i^{cc}\left( \mathbf{x} - f^{cc}_i(\hat{\mathbf{z}})\right) \end{aligned}$$

Proof

The subdifferential of the convex function \(-f_i^{cc}\) is given by

$$\begin{aligned} \partial (-f_i^{cc})(\hat{\mathbf{z}})=\left\{ \mathbf{s}: -\mathbf{s}\in \partial (f_i^{cc})(\hat{\mathbf{z}}) \right\} . \end{aligned}$$

Since

$$\begin{aligned} g^{cv}(\hat{\mathbf{z}})=h(f_1^{cv}(\hat{\mathbf{z}}),\ldots ,f_m^{cv}(\hat{\mathbf{z}}),-f_1^{cc}(\hat{\mathbf{z}}),\ldots ,-f_m^{cv}(\hat{\mathbf{z}})) \end{aligned}$$

the result follows from Lemmata 1,2.\(\square \)

We note that in some cases, including fractional (Sect. 7) and multilinear (Sect. 5) terms, a tighter convex relaxation \(F^{cv}\) of \(F\) than the ones available in closed form, can be calculated through a convex optimization problem of the form

$$\begin{aligned} F^{cv}(\mathbf{x})=\left\{ \min _{\mathbf{w}\in W} r_1(\mathbf{x},\mathbf{w})| \mathbf{r}_2(\mathbf{x},\mathbf{w})\le 0 \right\} , \end{aligned}$$

with \(W\subset \mathbb {R}^{n_w}\), \(r_1:X \times W \rightarrow \mathbb {R}\), \(\mathbf{r}_2:X \times W \rightarrow \mathbb {R}^{n_r}\). The convex underestimator of \(g\) will be given by

$$\begin{aligned} \begin{array}{ccc} g^{cv}(\mathbf{z})=&{} \displaystyle \min _{\mathbf{x} \in X,\mathbf{w} \in W} &{} r_1(\mathbf{x},\mathbf{w})\\ &{}\text {s.t.}&{} \mathbf{r}_2(\mathbf{x},\mathbf{w})\le 0 \\ &{} &{}f_i^{cv}(\mathbf{z})\le x_i \le f_i^{cc}(\mathbf{z}), \quad \forall i \end{array}, \end{aligned}$$
(5)

where the defining problem has Lagrangian

$$\begin{aligned} \bar{L}(\mathbf{x}, {\mathbf{w}},{\varvec{\lambda }}^{cv},{\varvec{\lambda }}^{cc},{\varvec{\mu }})= r_1(\mathbf{x},\mathbf{w})+ \sum _{i\in I} \lambda ^{cv}_i \left( f^{cv}_i({\mathbf{z}}){-}x_i\right) +\sum _{i\in I} \lambda ^{cc}_i \left( x_i{-}f^{cc}_i({\mathbf{z}})\right) +{\varvec{\mu }}^T \mathbf{r}_2(\mathbf{x},\mathbf{w}). \end{aligned}$$
(6)

We can calculate the subgradient of \(g^{cv}(\mathbf{z})\) using Theorem 4 using the Lagrangian multipliers associated with the constraints \(f_i^{cv}(\mathbf{z})\le x_i \le f_i^{cc}(\mathbf{z})\). This is formalized in Proposition 2.

Proposition 2

If strong duality holds for (5) and \(\left( \hat{\mathbf{x}},\hat{\mathbf{w}},\left( \hat{\varvec{\lambda }}^{cv},\hat{\varvec{\lambda }}^{cc},\hat{\varvec{\mu }}\right) \right) \) is an optimal primal dual pair of (5) then \(\left( \hat{\mathbf{x}},\left( \hat{\varvec{\lambda }}^{cv},\hat{\varvec{\lambda }}^{cc}\right) \right) \) is an optimal primal dual pair for the problem

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{\mathbf{x} \in X} \left\{ F^{cv}(\mathbf{x})| f^{cv}_i(\mathbf{z})\le x_i \le f^{cc}_i(\mathbf{z}), \quad \forall i\in I \right\} \end{aligned}$$
(7)

with Lagrangian

$$\begin{aligned} L(\mathbf{x},{\varvec{\lambda }}^{cv},{\varvec{\lambda }}^{cc})= F^{cv}(\mathbf{x})+ \sum _{i\in I} \lambda ^{cv}_i \left( f^{cv}_i({\mathbf{z}})-x_i\right) +\sum _{i\in I} \lambda ^{cc}_i \left( x_i-f^{cc}_i({\mathbf{z}})\right) , \end{aligned}$$
(8)

where the constraints defining the box X are not dualized.

Proof

From strong duality we have

$$\begin{aligned}&\displaystyle \bar{L}\left( \hat{\mathbf{x}},\hat{\mathbf{w}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc},\hat{{\varvec{\mu }}}\right) =r_1(\hat{\mathbf{x}},\hat{\mathbf{w}}),\end{aligned}$$
(9)
$$\begin{aligned}&\displaystyle \mathbf{r}_2(\hat{\mathbf{x}},\hat{\mathbf{w}})\le 0,\nonumber \\&\displaystyle f^{cv}_i( {\mathbf{z}})\le \hat{x}_i \le f^{cc}_i( {\mathbf{z}}) \quad \text {for all } i,\end{aligned}$$
(10)
$$\begin{aligned}&\displaystyle \hat{{\varvec{\mu }}}^T \mathbf{r}_2(\hat{\mathbf{x}},\hat{\mathbf{w}})=0,\end{aligned}$$
(11)
$$\begin{aligned}&\displaystyle \hat{\lambda }^{cv}_i \left( f^{cv}_i({\mathbf{z}})-\hat{x}_i\right) =0, \quad \hat{\lambda }^{cc}_i \left( \hat{x}_i-f^{cc}_i({\mathbf{z}})\right) =0\quad \text {for all } i. \end{aligned}$$
(12)

Using (12), (9) we obtain

$$\begin{aligned} L\left( \hat{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) =\bar{L}\left( \hat{\mathbf{x}},\hat{\mathbf{w}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc},\hat{{\varvec{\mu }}}\right) . \end{aligned}$$

Keeping in mind (12) and (10) to show that \(\left( \hat{\mathbf{x}},\left( \hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) \right) \) is an optimal point with its corresponding Lagrangian multipliers of (7) we only need

$$\begin{aligned} \hat{\mathbf{x}} \in \arg \min _{\mathbf{x}\in X} L\left( \mathbf{x},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) . \end{aligned}$$

Assume to the contrary that there exist an \(\bar{\mathbf{x}}\in X\) with

$$\begin{aligned} L\left( \bar{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) <L\left( \hat{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) \end{aligned}$$

and let

$$\begin{aligned} \bar{\mathbf{w}}\in \begin{array}{cc} \displaystyle \arg \min _{\mathbf{w}} &{} r_1(\bar{\mathbf{x}},\mathbf{w})\\ \text {s.t.}&{} \mathbf{r}_2(\bar{\mathbf{x}},\mathbf{w})\le 0 \\ \end{array}. \end{aligned}$$

We have \(r_1(\bar{\mathbf{x}},\bar{\mathbf{w}})=F^{cv}(\bar{\mathbf{x}})\) and \(\mathbf{r}_2(\bar{\mathbf{x}},\bar{\mathbf{w}})\le 0\). Then we have

$$\begin{aligned} \bar{L}\left( \bar{\mathbf{x}},\bar{\mathbf{w}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc},\hat{{\varvec{\mu }}}\right)&= r_1(\bar{\mathbf{x}},\bar{\mathbf{w}}) + \sum _{i\in I} \left( \hat{\lambda }^{cv}_i \left( f^{cv}_i({\mathbf{z}})-\bar{x}_i\right) \right) \\&+\sum _{i\in I} \left( \hat{\lambda }^{cc}_i \left( \bar{x}_i-f^{cc}_i({\mathbf{z}})\right) \right) +\hat{{\varvec{\mu }}}^T \mathbf{r}_2(\bar{\mathbf{x}},\bar{\mathbf{w}})\\&= F^{cv}(\bar{\mathbf{x}})+\sum _{i\in I} \left( \hat{\lambda }^{cv}_i \left( f^{cv}_i({\mathbf{z}})-\bar{x}_i\right) \right) \\&+\sum _{i\in I} \left( \hat{\lambda }^{cc}_i \left( \bar{x}_i-f^{cc}_i({\mathbf{z}})\right) \right) +\hat{{\varvec{\mu }}}^T \mathbf{r}_2(\bar{\mathbf{x}},\bar{\mathbf{w}})\\&= L\left( \bar{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) +\hat{{\varvec{\mu }}}^T \mathbf{r}_2(\bar{\mathbf{x}},\bar{\mathbf{w}})\\&< L\left( \hat{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) +\hat{{\varvec{\mu }}}^T \mathbf{r}_2(\bar{\mathbf{x}},\bar{\mathbf{w}})\\&\le L\left( \hat{\mathbf{x}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc}\right) \\&= \bar{L}\left( \hat{\mathbf{x}},\hat{\mathbf{w}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc},\hat{{\varvec{\mu }}}\right) , \end{aligned}$$

which is a contradiction since

$$\begin{aligned} (\hat{\mathbf{x}},\hat{\mathbf{w}}) \in \arg \min _{\mathbf{x} \in X,\mathbf{w\in W}} \bar{L}\left( {\mathbf{x}}, {\mathbf{w}},\hat{{\varvec{\lambda }}}^{cv},\hat{{\varvec{\lambda }}}^{cc},\hat{{\varvec{\mu }}}\right) . \end{aligned}$$

\(\square \)

4 McCormick relaxations and the auxiliary variable method

In this section we revisit the relationship between McCormick relaxations and the AVM [41, 42]. AVM lies at the heart of the state of the art software BARON [35, 36], and handles composite functions implicitly by a substitution of argument functions with auxiliary variables.

While it is well known that both methods provide lower bounding mechanisms for factorable functions, the restatement of McCormick Relaxations in Theorem 1 and the subsequent generalization makes the relationship between the two approaches explicit and the occasional gap in relaxations smaller.

As mentioned in the introduction, an advantage of AVM compared to McCormick’s approach is the potentially tighter bounds due to repeated terms. While multivariate McCormick can provide better bounds than univariate McCormick it can still be weaker than AVM due to the same reasons. A case where multivariate McCormick can provide tighter bounds than AVM, is if tighter convex relaxations can be made practically available through optimization problems as is the case for fractional terms discussed in Sect. 7.

McCormick’s approach allows for optimization of the bounding problem in the original space. While there is no general rule dictating that a smaller number of variables will lead to superior performance, it has been demonstrated that in a class of problems with few variables and complex expressions, operating in the original space can give a drastic improvement of CPU time [30].

To illustrate the relationship of the two methodologies, consider the functions \(\displaystyle f_1: Z_1\subset \mathbb {R}^2\rightarrow \mathbb {R}\), \(\displaystyle f_2: Z_2\subset \mathbb {R}\rightarrow \mathbb {R}\), \(\displaystyle f_3: Z_3\subset \mathbb {R}\rightarrow \mathbb {R}\), \(\displaystyle f_4: Z_4\subset \mathbb {R}\rightarrow \mathbb {R}\), and the composite function

$$\begin{aligned} g(z)=f_1(f_3(z),f_4(z))+f_2(f_3(z)), \end{aligned}$$

\(z\in Z\subset \mathbb {R}.\) Assume that for all intrinsic functions \(f_i\), convex and concave relaxations \(f_i^{cv}\), \(f_i^{cc}\) on \(Z_i\) are available. Furthermore assume that \(Z_i\) are boxes and that \(Z \subset Z_3\), \(Z \subset Z_4\), \(f_3(Z_3)\times f_4(Z_4)\subset Z_1\), \(f_3(Z_3) \subset Z_2\). Note that the univariate McCormick theorem cannot handle this directly.

To solve \(\{\min _{z\in Z} g(z)\}\) AVM could formulate the problem in two different ways, depending on whether it would recognize the common term \(f_3(z)\).

$$\begin{aligned} \begin{array}{cc|cc} \text {Formulation 1} &{} &{}\text {Formulation 2}&{}\\ \displaystyle \min _{\begin{array}{c} z\in Z,w_1\in Z_1\\ w_2\in Z_2,w_3\in Z_3\\ w_3'\in Z_3,w_4\in Z_4 \end{array}}&{} w_1+w_2 &{}\displaystyle \min _{\begin{array}{c} z\in Z,w_1\in Z_1 \\ w_2\in Z_2,w_3\in Z_3 \\ w_4\in Z_4 \end{array}}&{} w_1+w_2\\ s.t. &{} \displaystyle w_1=f_1(w_3,w_4) &{}s.t. &{} \displaystyle w_1=f_1(w_3,w_4)\\ &{}\displaystyle w_2=f_2(w_3')&{} &{}\displaystyle w_2=f_2(w_3)\\ &{}\displaystyle w_4=f_4(z)&{} &{}\displaystyle w_4=f_4(z)\\ &{}\displaystyle w_3=f_3(z)&{} &{}\displaystyle w_3=f_3(z)\\ &{}\displaystyle w'_3=f_3(z) &{} &{} \end{array}. \end{aligned}$$
(13)

with corresponding convex relaxations

$$\begin{aligned} \begin{array}{cc|cc} \text {Formulation 1} &{} &{}\text {Formulation 2}&{}\\ \displaystyle \min _{\begin{array}{c} z\in Z,w_1\in Z_1 \\ w_2\in Z_2,w_3\in Z_3 \\ w_3'\in Z_3,w_4\in Z_4 \end{array}}&{} w_1+w_2 &{} \displaystyle \min _{\begin{array}{c} z\in Z,w_1\in Z_1 \\ w_2\in Z_2,w_3\in Z_3 \\ w_4\in Z_4 \end{array}}&{} w_1+w_2\\ s.t. &{} f_1^{cv}(w_3,w_4)\le w_1\le f_1^{cc}(w_3,w'_4) &{}s.t. &{} f_1^{cv}(w_3,w_4)\le w_1\le f_1^{cc}(w_3,w_4)\\ &{} f_2^{cv}(w_3')\le w_2\le f_2^{cc}(w_3')&{} &{} f_2^{cv}(w_3)\le w_2\le f_2^{cc}(w_3)\\ &{} f_4^{cv}(z)\le w_4\le f_4^{cc}(z)&{} &{} f_4^{cv}(z)\le w_4\le f_4^{cc}(z)\\ &{} f_3^{cv}(z)\le w_3\le f_3^{cc}(z)&{} &{} f_3^{cv}(z)\le w_3\le f_3^{cc}(z)\\ &{} f_3^{cv}(z)\le w_3'\le f_3^{cc}(z)&{}&{} \end{array}. \end{aligned}$$
(14)

Formulation 2 is tighter and will likely give a better bound. It is not hard to see that multivariate McCormick will give the same bound with Formulation 1 by solving the problem

$$\begin{aligned} \min _z\left\{ \begin{array}{cc} \displaystyle \min _{\hat{w}_1,\hat{w}_2}&{} \hat{w}_1+\hat{w}_2 \\ s.t.&{} \begin{array}{ccc} \left( \begin{array}{cc} \displaystyle \min _{\hat{w}_3,\hat{w}_4} &{} f_1^{cv}(\hat{w}_3,\hat{w}_4)\\ s.t. &{} f_3^{cv}(z)\le \hat{w}_3 \le f_3^{cc}(z) \\ &{} f_4^{cv}(z)\le \hat{w}_4 \le \hat{f}_4^{cc}(z)\\ \end{array} \right) &{}{\le } \hat{w}_1 {\le } &{} \left( \begin{array}{cc} \displaystyle \max _{\hat{w}_3,\hat{w}_4} &{} f_1^{cc}(\hat{w}_3,\hat{w}_4)\\ s.t. &{} f_3^{cv}(z)\le \hat{w}_3 \le f_3^{cc}(z) \\ &{} f_4^{cv}(z)\le \hat{w}_4 \le f_4^{cc}(z)\\ \end{array} \right) \\ \left( \begin{array}{cc} \displaystyle \min _{\hat{w}'_3} &{} f_2^{cv}(\hat{w}'_3)\\ s.t. &{} f_3^{cv}(z)\le \hat{w}'_3 \le f_3^{cc}(z) \\ \end{array} \right) &{}\le \hat{w}_2 \le &{} \left( \begin{array}{cc} \displaystyle \max _{\hat{w}'_3} &{} f_2^{cc}(\hat{w}'_3)\\ s.t. &{} f_3^{cv}(z)\le \hat{w}'_3 \le f_3^{cc}(z) \\ \end{array} \right) \end{array}\\ \end{array} \right\} . \end{aligned}$$
(15)

Equation (15) can be interpreted as a decomposition method of the first formulation of (14). In general all inner problems will be easy to solve analytically and a numerical algorithm will only be needed to minimize the resulting relaxation with respect to the original variable \(z\).

In the multivariate McCormick’s Framework it is possible to introduce just sufficiently mny artificial variables to improve the resulted relaxation and match the AVM. In our example this would yield the problem

$$\begin{aligned} \min _{z,w_3}\left\{ \begin{array}{cc} \displaystyle \min _{\hat{w}_1,\hat{w}_2}&{} \hat{w}_1+\hat{w}_2 \\ s.t.&{} \begin{array}{ccc} \left( \begin{array}{cc} \displaystyle \min _{\hat{w}_3,\hat{w}_4} &{} f_1^{cv}(\hat{w}_3,\hat{w}_4)\\ s.t. &{} f_4^{cv}(z)\le \hat{w}_4 \le \hat{f}_4^{cc}(z)\\ \end{array} \right) &{}\le \hat{w}_1 \le &{} \left( \begin{array}{cc} \displaystyle \max _{\hat{w}_3,\hat{w}_4} &{} f_1^{cc}(\hat{w}_3,\hat{w}_4)\\ s.t. &{} f_4^{cv}(z)\le \hat{w}_4 \le f_4^{cc}(z)\\ \end{array} \right) \\ f_3^{cv}(z)&{}\le w_3 \le &{} f_3^{cc}(z) \\ \end{array}\\ \end{array} \right\} \!. \end{aligned}$$
(16)

yielding the same bound with AVM while increasing the optimization space by a single variable.

It is instructive to consider only one level of composition, i.e., a function \(F(f(\mathbf{z}))\), to compare the two methodologies: as it turns out, the use of a cutting plane algorithm to minimize McCormick relaxations using the subgradient propagation mechanism of Sect. 3, is strongly related to applying generalized benders decomposition [15], on the lower bounding problem defined by AVM.

To minimize \(F(f(\mathbf{z}))\) AVM would formulate the problem

$$\begin{aligned} \min _{\mathbf{x} \in X,\mathbf{z}\in Z} \left\{ F^{cv}(\mathbf{x})| f^{cv}_i(\mathbf{z})\le x_i \le f^{cc}_i(\mathbf{z}), \quad \forall i\in I \right\} . \end{aligned}$$
(17)

If we apply (generalized) Benders decomposition on (17) treating \({\mathbf{z}}\) as the “complicating” variables, the master problem is

$$\begin{aligned}&\min _{\mathbf{z}\in Z} V(\mathbf{z}),\\&\text {where}\quad V(\mathbf{z})= \max _{\lambda ^{cv}\ge 0,\lambda ^{cc}\ge 0}\min _{\mathbf{x}\in X}\left\{ F^{cv}(\mathbf{x})+\sum _i\lambda ^{cv}_i \left( f^{cv}_i(\mathbf{z})-\mathbf{x}\right) +\sum _i \lambda ^{cc}_i \left( \mathbf{x}-f^{cc}_i(\mathbf{z})\right) \right\} \end{aligned}$$

for all \(\mathbf{z}\).

The restricted master \(V_r(\mathbf{z})\) employs a subset \(\Lambda \) of multipliers. A \(\hat{\mathbf{z}}\) obtained by solving the restricted master will be suboptimal if

$$\begin{aligned} V_r(\hat{\mathbf{z}})< \max _{\lambda ^{cv}\ge 0,\lambda ^{cc}\ge 0} \min _{\mathbf{x}\in X} \left\{ F^{cv}(\mathbf{x})+\sum _i \lambda ^{cv}_i \left( f^{cv}_i(\hat{\mathbf{z}})-\mathbf{x}\right) +\sum _i \lambda ^{cc}_i \left( \mathbf{x}-f^{cc}_i(\hat{\mathbf{z}})\right) \right\} . \end{aligned}$$
(18)

The cut obtained is

$$\begin{aligned} V(\mathbf{z})\ge \min _{\mathbf{x} \in X} F^{cv}( \mathbf{x})+\sum _i \hat{\lambda }^{cv}_i \left( f^{cv}_i(\mathbf{z})-\mathbf{x}\right) +\sum _i \hat{\lambda }^{cc}_i \left( \mathbf{x}-f^{cc}_i(\mathbf{z})\right) , \end{aligned}$$
(19)

where

$$\begin{aligned} \left( \hat{\lambda }^{cv},\hat{\lambda }^{cc}\right) \in \mathop {\mathrm{arg\,max}}\limits _{\lambda ^{cv}\ge 0,\lambda ^{cc}\ge 0} \min _\mathbf{x\in X} \left\{ F^{cv}(\mathbf{x})+\sum _i \lambda ^{cv}_i \left( f^{cv}_i(\hat{\mathbf{z}})-\mathbf{x}\right) +\sum _i \lambda ^{cc}_i \left( \mathbf{x}-f^{cc}_i(\hat{\mathbf{z}})\right) \right\} , \end{aligned}$$

cutting off \(\hat{\mathbf{z}}\). The generalized-Benders cut (19) can be further relaxed by linearization around \(\hat{\mathbf{z}}\), yielding,

$$\begin{aligned} V(\mathbf{z})&\ge \min _{\mathbf{x} \in X} F^{cv}( \mathbf{x})+\sum _i \hat{\lambda }^{cv}_i \left( f^{cv}_i(\hat{\mathbf{z}})+\mathbf{s}^{cv}_i(\mathbf{z}{-}\hat{\mathbf{z}}){-}\mathbf{x} \right) {-}\sum _i \hat{\lambda }^{cc}_i\left( f^{cc}_i(\hat{\mathbf{z}})+\mathbf{s}^{cc}_i(\mathbf{z}{-}\hat{\mathbf{z}}){-}\mathbf{x} \right) \\&= \sum _i \left( \hat{\lambda }^{cv}_i \mathbf{s}^{cv}_i-\hat{\lambda }^{cc}_i \mathbf{s}^{cc}_i\right) (\mathbf{z}-\hat{\mathbf{z}})+\min _{\mathbf{x} \in X} F^{cv}( \mathbf{x})+\sum _i \hat{\lambda }^{cv}_i f^{cv}_i(\hat{\mathbf{z}})-\sum _i\hat{\lambda }^{cc}_i f^{cc}_i(\hat{\mathbf{z}})\\&=g^{cv}(\hat{\mathbf{z}})+\sum _i \left( \hat{\lambda }^{cv}_i \mathbf{s}^{cv}_i-\hat{\lambda }^{cc}_i \mathbf{s}^{cc}_i\right) (\mathbf{z}-\hat{\mathbf{z}}), \end{aligned}$$

where \(\mathbf{s}^{cv}_i\), \(\mathbf{s}^{cc}_i\) are subgradients of \(f^{cv}_i\), \(f^{cc}_i\) at \(\hat{\mathbf{z}}\). It can be seen that the linearized cut is equivalent with a subgradient inequality for \(g^{cv}\) as obtained by Theorem 4. Note, that the generalized Benders’ subproblem (18) generating the cut is identical with (the dual of) the problem solved to provide a function evaluation and generate the subgradient.

Therefore, in the single level composition, applying generalized Benders decomposition to AVM is equivalent to minimizing \(g(\mathbf{z})\) through a first-order algorithm which at iteration \(k+1\) chooses point \(\mathbf{z}_{k+1}\) for evaluation by solving the linear relaxation

$$\begin{aligned} \min _{w,\mathbf{z}} \left\{ w: w\ge g(\mathbf{z}_i)+\mathbf{s}_i^T (\mathbf{z}-\mathbf{z}_i), \quad \forall i\le k \right\} , \end{aligned}$$

where \(g(\mathbf{z}_i)\), \(\mathbf{s}_i\) are the function evaluation and subgradient returned by the oracle at iteration \(i\). This is not a very efficient algorithm to minimize \(g(\mathbf{z})\) and here we use it just to illustrate the equivalence. More efficient first-order methods can be found, for example, in [31].

We note, that in the (univariate and multivariate) McCormick Relaxation framework, it is straightforward to apply Theorem 4 recursively to generate subgradients for nested compositions of functions, This is not to say that it would be impossible to construct equivalent nested decomposition schemes for AVM, which has a staircase structure. In the context of stochastic programming, Birge [6] explores nested decomposition schemes for LPs. In the presence of NLPs with a staircase structure, O’Neill [32] proposes a decomposition framework combining primal and dual decomposition ideas, which is however rather involved.

Note also that the advantage of retaining the original variable space is important only in the case of such complex expressions that would need an introduction of a great number of variables to construct the AVM equivalent. Thus, the above observations on the equivalence of the two approaches for a single composition are mainly of theoretical interest.

5 Product rule

An interesting example of multivariate composition are products of functions. Note that bilinear terms and bilinear products of functions are very important in applications, see for instance the recent articles [19, 28]. Let \(\mathrm{mult}(x_1,x_2)=x_1 x_2\). As given in [3, 23], the convex/concave envelopes of \(\mathrm{mult}(\cdot ,\cdot )\) on \(\left[ x_1^L,x_1^U\right] \times \left[ x_2^L,x_2^U\right] \) are

$$\begin{aligned} \mathrm{mult}^{cv}&= \max \left\{ x_2^U x_1+x_1^U x_2- x_1^U x_2^U, x_2^L x_1+x_1^L x_2- x_1^L x_2^L \right\} ,\\ \mathrm{mult}^{cc}&= \min \left\{ x_2^L x_1+x_1^U x_2- x_1^U x_2^L, x_2^U x_1+x_1^L x_2- x_1^L x_2^U \right\} . \end{aligned}$$

Theorem 2 directly gives convex/concave relaxations for the product of two functions.

Corollary 5

Let \(\displaystyle g(\mathbf{z})=\mathrm{mult}(f_1(\mathbf{z}),f_2(\mathbf{z}))\), with \(\displaystyle f_1:Z\subset \mathbb {R}^n \rightarrow \mathbb {R}\), \(\displaystyle f_2:Z \subset \mathbb {R}^n \rightarrow \mathbb {R}\). Let also \(f_i^L\), \(f_i^U\) denote bounds for \(f_i\), i.e., \( f_i^L\le f_i(\mathbf{z})\le f_i^U\) and \(\displaystyle f^{cv}_i\), \(f^{cc}_i\) convex and concave relaxations of \(f_i\) on \(Z\) respectively.

$$\begin{aligned} \begin{array}{ccc}g^{cv}(\mathbf{z})=&{} \displaystyle \min _{x_i\in [f_i^L,f_i^U]} &{} \displaystyle \max \left\{ f_2^U x_1+f_1^U x_2- f_1^U f_2^U, f_2^L x_1+f_1^L x_2- f_1^L f_2^L \right\} \\ &{}\text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ &{}&{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \end{aligned}$$
(20)

is a convex relaxation of \(g\) on \(Z\) and

$$\begin{aligned} \begin{array}{ccc}g^{cc}(\mathbf{z})=&{} \displaystyle \max _{x_i\in [f_i^L,f_i^U]} &{} \displaystyle \min \left\{ f_2^L x_1+f_1^U x_2- f_1^U f_2^L, f_2^U x_1+f_1^L x_2- f_1^L f_2^U \right\} \\ &{}\text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ &{}&{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \end{aligned}$$
(21)

is a concave relaxation of \(g\) on \(Z\).

The convex relaxation for \(g\) that McCormick proposed [23] is

$$\begin{aligned} \bar{g}^{cv}(\mathbf{z})=\max \left\{ \alpha _1(\mathbf{z})+\alpha _2(\mathbf{z})-f_1^L f_2^L,\beta _1(\mathbf{z})+\beta _2(\mathbf{z})-f_1^U f_2^U \right\} \end{aligned}$$
(22)

where

$$\begin{aligned} \begin{array}{cc} \alpha _1(\mathbf{z})= \min \left\{ f_2^L f_1^{cv}(\mathbf{z}),f_2^L f_1^{cc}(\mathbf{z})\right\} ,&{} \alpha _2(\mathbf{z})= \min \left\{ f_1^L f_2^{cv}(\mathbf{z}),f_1^L f_2^{cc}(\mathbf{z})\right\} ,\\ \beta _1(\mathbf{z})= \min \left\{ f_2^U f_1^{cv}(\mathbf{z}),f_2^U f_1^{cc}(\mathbf{z})\right\} ,&{} \beta _2(\mathbf{z})= \min \left\{ f_1^U f_2^{cv}(\mathbf{z}),f_1^U f_2^{cc}(\mathbf{z})\right\} . \end{array} \end{aligned}$$

The equivalent concave relaxation is

$$\begin{aligned} \bar{g}^{cc}(\mathbf{z})=\max \left\{ \gamma _1(\mathbf{z})+\gamma _2(\mathbf{z})-f_1^U f_2^L,\delta _1(\mathbf{z})+\delta _2(\mathbf{z})-f_1^U f_2^U \right\} \end{aligned}$$
(23)

where

$$\begin{aligned} \begin{array}{cc} \gamma _1(\mathbf{z})= \max \left\{ f_2^L f_1^{cv}(\mathbf{z}),f_2^L f_1^{cc}(\mathbf{z})\right\} ,&{} \gamma _2(\mathbf{z})= \max \left\{ f_1^U f_2^{cv}(\mathbf{z}),f_1^U f_2^{cc}(\mathbf{z})\right\} ,\\ \delta _1(\mathbf{z})= \max \left\{ f_2^U f_1^{cv}(\mathbf{z}),f_2^U f_1^{cc}(\mathbf{z})\right\} ,&{} \delta _2(\mathbf{z})= \max \left\{ f_1^L f_2^{cv}(\mathbf{z}),f_1^L f_2^{cc}(\mathbf{z})\right\} . \end{array} \end{aligned}$$

Proposition 3 shows that the proposed relaxations \(g^{cv}/g^{cc}\) are always at least as tight as McCormick’s rule \(\bar{g}^{cv}/\bar{g}^{cc}\), while Fig. 1 shows that they can be tighter.

Proposition 3

\(g^{cv}(\mathbf{z}) \ge \bar{g}^{cv}(\mathbf{z})\) for all \(\mathbf{z} \in Z\) and \(g^{cc}(\mathbf{z}) \le \bar{g}^{cc}(\mathbf{z})\) for all \(\mathbf{z} \in Z\).

Proof

Using the well-known fact, e.g., [51], that for any function \(\phi (\mathbf{x},\mathbf{y})\) defined on \(\mathcal {X} \times \mathcal {Y}\) there holds

$$\begin{aligned} \min _{\mathbf{x}\in \mathcal {X}} \max _{\mathbf{y} \in \mathcal {Y}} \phi (\mathbf{x},\mathbf{y}) \ge \max _{\mathbf{y} \in \mathcal {Y}} \min _{\mathbf{x}\in \mathcal {X}} \phi (\mathbf{x},\mathbf{y}), \end{aligned}$$

by interchanging the minimization and maximization operators we obtain

$$\begin{aligned} g^{cv}(\mathbf{z})&\ge \max \left\{ \begin{array}{cc} \displaystyle \min _{x_i\in [f_i^L,f_i^U]} &{} \displaystyle f_2^U x_1+f_1^U x_2- f_1^U f_2^U \\ \text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ \displaystyle &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array}, \begin{array}{cc} \displaystyle \min _{x_i\in [f_i^L,f_i^U]} &{} \displaystyle f_2^L x_1+f_1^L x_2- f_1^L f_2^L \\ \displaystyle \text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \right\} \nonumber \\ \end{aligned}$$
(24)
$$\begin{aligned}&\ge \max \left\{ \begin{array}{cc} \displaystyle \min _{x_i\in \mathbb {R}} &{} \displaystyle f_2^U x_1+f_1^U x_2- f_1^U f_2^U \\ \text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array}, \begin{array}{cc} \displaystyle \min _{x_i\in \mathbb {R}} &{} \displaystyle f_2^L x_1+f_1^L x_2- f_1^L f_2^L \\ \text {s.t.} &{} \displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\\ &{} \displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \right\} \nonumber \\&= \quad \max \left\{ b_1(\mathbf{z})+b_2(\mathbf{z})-f_1^U f_2^U,a_1(\mathbf{z})+a_2(\mathbf{z})-f_1^L f_2^L \right\} =\bar{g}^{cv}(\mathbf{z}). \end{aligned}$$
(25)

The proof that \(g^{cc}(\mathbf{z}) \le \bar{g}^{cc}(\mathbf{z})\) for all \(\mathbf{z} \in Z\) is similar and is omitted for brevity.\(\square \)

Note that the first inequality in (24) can be strict only if \(f_1^L<0<f_1^U\) or \(f_2^L<0<f_2^U\) and the second only if \(\left[ f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z})\right] \not \subset \left[ f_1^L,f_1^U\right] \) or \(\left[ f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z})\right] \not \subset \left[ f_2^L,f_2^U\right] \), that is, only if Assumption 4 does not hold. Scott and Barton [38] have observed that \(\bar{g}^{cv}\) can be tightened by intersecting with the interval bounds. However, from the definition of \(g^{cv}\) we have that \(g^{cv}\) is at least as tight and in some cases tighter than the result by Scott and Barton. If \(f_1^U=f_1^L\) or \(f_2^U=f_2^L\), at least one of the functions is constant and the computation of the convex and concave envelopes of their product is trivial.

The convex relaxations obtained by Eqs. (20), (21) can be represented in closed form. If \(f_1^U>f_1^L\) and \(f_2^U>f_2^L\), \(g^{cv}(\mathbf{z})\) can be shown to be given by

$$\begin{aligned} g^{cv}(\mathbf{z})=\min \left\{ \begin{array}{cc} \max &{} \left\{ f_2^U f_1^{cv}(\mathbf{z}) +f_1^U \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cv}(\mathbf{z})+\zeta )-f_1^U f_2^U,\right. \\ &{}\left. f_2^L f_1^{cv}(\mathbf{z}) +f_1^L \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}) ,\kappa f_1^{cv}(\mathbf{z})+\zeta ) -f_1^L f_2^L \right\} ,\\ \max &{} \left\{ f_2^U f_1^{cc}(\mathbf{z}) +f_1^U \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cc}(\mathbf{z})+\zeta )-f_1^U f_2^U,\right. \\ &{}\left. f_2^L f_1^{cc}(\mathbf{z}) +f_1^L \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cc}(\mathbf{z})+\zeta ) -f_1^L f_2^L \right\} ,\\ \max &{} \left\{ f_2^U \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cv}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^U f_2^{cv}(\mathbf{z}) -f_1^U f_2^U,\right. \\ &{}\left. f_2^L \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cv}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^L f_2^{cv}(\mathbf{z}) -f_1^L f_2^L \right\} ,\\ \max &{} \left\{ f_2^U \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cc}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^U f_2^{cc}(\mathbf{z}) -f_1^U f_2^U,\right. \\ &{}\left. f_2^L \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cc}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^L f_2^{cc}(\mathbf{z}) -f_1^L f_2^L \right\} \end{array} \right\} \end{aligned}$$

where

$$\begin{aligned} \kappa =\frac{f_2^L-f_2^U}{f_1^U-f_1^L},\quad \zeta =\frac{f_1^U f_2^U-f_1^L f_2^L}{f_1^U-f_1^L}. \end{aligned}$$

Similarly, if \(f_1^U>f_1^L\) and \(f_2^U>f_2^L\) then \(g^{cc}(\mathbf{z})\) is given by

$$\begin{aligned} g^{cc}(\mathbf{z})=\max \left\{ \begin{array}{cc} \min &{} \left\{ f_2^L f_1^{cv}(\mathbf{z}) +f_1^U \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cv}(\mathbf{z})+\zeta )-f_1^U f_2^L,\right. \\ &{}\left. f_2^U f_1^{cv}(\mathbf{z}) +f_1^L \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cv}(\mathbf{z})+\zeta ) -f_1^L f_2^U \right\} ,\\ \min &{} \left\{ f_2^L f_1^{cc}(\mathbf{z}) +f_1^U \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cc}(\mathbf{z})+\zeta )-f_1^U f_2^L,\right. \\ &{}\left. f_2^U f_1^{cc}(\mathbf{z}) +f_1^L \text {mid}(f_2^{cv}(\mathbf{z}),f_2^{cc}(\mathbf{z}), \kappa f_1^{cc}(\mathbf{z})+\zeta ) -f_1^L f_2^U \right\} ,\\ \min &{} \left\{ f_2^L \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cv}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^U f_2^{cv}(\mathbf{z}) -f_1^U f_2^L,\right. \\ &{}\left. f_2^U \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cv}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^L f_2^{cv}(\mathbf{z}) -f_1^L f_2^U \right\} ,\\ \min &{} \left\{ f_2^L \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cc}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^U f_2^{cc}(\mathbf{z}) -f_1^U f_2^L,\right. \\ &{}\left. f_2^U \text {mid}\left( f_1^{cv}(\mathbf{z}),f_1^{cc}(\mathbf{z}), \frac{f_2^{cc}(\mathbf{z})-\zeta }{\kappa }\right) +f_1^L f_2^{cc}(\mathbf{z}) -f_1^L f_2^U \right\} \end{array} \right\} \end{aligned}$$

where

$$\begin{aligned} \kappa =\frac{f_2^U-f_2^L}{f_1^U-f_1^L},\quad \zeta =\frac{f_1^U f_2^L-f_1^L f_2^U}{f_1^U-f_1^L}. \end{aligned}$$

In addition to bilinear products of functions, often multilinear products of functions are used in applications. The class of functions considered herein can be summarized as \(G(\mathbf{z})=\sum _{t \in T} c_t \Pi _{i \in I_t} f_i(\mathbf{z})\), where \(T\) and \(I_t \subset I\) are index sets and \(c_t\) are constants. Such functions can be handled by recursive application of McCormick’s product rule and these approaches give weaker than possible relaxations, compare for instance [4, 5, 9, 25]. In contrast, Theorem 2 provides the framework to directly handle such terms and provide tighter relaxations. Herein, only the convex relaxations are discussed; the concave relaxations are analogous.

Rikun [34] considers \(F: \mathbb {R}^n \rightarrow \mathbb {R}\), \(F(\mathbf{x})=\sum _{t \in T} c_t \Pi _{i \in I_t} x_i\) on a hypercube \(X=X_1 \times X_2 \times \cdots \times X_n\), where \([x_i^L,x_i^U]\). He proves that the convex envelope \(F^{cv,env}\) at a point \(\mathbf{x}\) can be evaluated by the following optimization problem

where \(\mathbf{x}^k\) denote the vertices of \(X\). Note that this is a LP, albeit of size \(m\). Note also that explicit representations exist for subclasses of this function such as the explicit facets of trilinear terms by Meyer and Floudas [25].

By Theorem 2 a convex relaxation of \(G\) on \(Z\) can be constructed as

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{\mathbf{x}} \quad&F^{cv,env}(\mathbf{x}) \\&f_i^{cv}(\mathbf{z})\le&x_i \le f_i^{cc}(\mathbf{z}), \quad i \in I. \end{aligned}$$

Noting that \(\displaystyle \min _{y} \min _{w} h(y,w)=\min _{y,w} h(y,w)\) we obtain

$$\begin{aligned} g^{cv}(\mathbf{z})=\min _{ \mathbf{x},{\varvec{\lambda }}}&\sum _k \lambda _k F(\mathbf{x^k})&\\&\mathbf{x}=\sum _k \lambda _k \mathbf{x}^{k}&\\&\sum _k \lambda _k=1&\\&f_i^{cv}(\mathbf{z})\le x_i \le f_i^{cc}(\mathbf{z}),&\quad i \in I. \end{aligned}$$

which is still an LP of similar size.

By Proposition 2, Theorem 4 can still be used for the computation of subgradients if we take into account in the construction of \(\mathbf{\sigma }^{cv}\) only the Lagrangian multipliers associated with the constraints \(f_i^{cv}(\mathbf{z})\le x_i \le f_i^{cc}(\mathbf{z})\).

6 Convex/concave envelopes and relaxations of min/max operators

The operators \(\min \) and \(\max \) often arise in engineering optimization formulations. It is well-known that the minimum of concave functions is a concave function, but the same does not hold in general for convex functions. To the authors’ best knowledge relaxations for such functions are not available in literature or in most numerical codes. For instance, the operators \(\min /\max \) are currently not handled by the state-of-the-art general-purpose solvers BARON [36, 49] and ANTIGONE [2729], while in MC++ [10] they are handled using the well-known reformulation

$$\begin{aligned} \min \left( f_1(\mathbf{z}),f_2(\mathbf{z})\right) =\frac{1}{2}\left( f_1(\mathbf{z})+f_2(\mathbf{z})-|f_1(\mathbf{z})-f_2(\mathbf{z})| \right) \end{aligned}$$
(26)

and applying the univariate McCormick composition theorem to the negative absolute value. However, the constructed relaxations are not as tight as the ones proposed here.

Calculating interval enclosures for the function \(\min (f_1(x),f_2(x))\) given interval enclosures for \(f_1\) and \(f_2\) is straightforward and is also done in MC++ [10].

Proposition 4

Consider \(Z \in \mathbb {R}^n\) and \(f_1,f_2:Z \rightarrow \mathbb {R}\). Suppose that interval enclosures are given for \(f_1\) and \(f_2\) on \(Z\), i.e., bounds \(f_1^L,f_1^U\), \(f_2^L,f_2^U\) such that

$$\begin{aligned} f_1^L \le f_1(\mathbf{z}) \le f_1^U \qquad f_2^L \le f_2(\mathbf{z}) \le f_2^U \end{aligned}$$

Then we have

$$\begin{aligned} \min \left( f_1^L,f_2^L\right) \le \min (f_1(\mathbf{z}),f_2(\mathbf{z})) \le \min \left( f_1^U,f_2^U\right) \end{aligned}$$

It is noteworthy that these bounds are not exact, as shown in the next example

Example 1

Take \(\min (z,-z)\) with \(z \in [-1,1]\). The range is clearly \([-1,0]\) and the rule given in Proposition 4 gives a valid but overestimated enclosure as \([-1,1]\).

We can utilize Corollary 3 to compute convex/concave relaxations for the minimum/maximum of two functions. The computation of convex/concave relaxations of the minimum/maximum of two functions by Theorem 2 requires the convex/concave envelopes of \(\min (x_1,x_2)/\max (x_1,x_2)\) on an arbitrary rectangle, which is easy to derive to the authors’ best knowledge is not explicitly available in the literature.

Lemma 3

Consider \(Z=X_1 \times X_2\subset \mathbb {R}^2\) with \(X_1=\left[ x_1^L,x_1^U\right] \) and \(X_2=\left[ x_2^L,x_2^U\right] \) and let \(\mathbf{z} =(x_1,x_2)\). The convex envelope of \(\min (x_1,x_2)\) on \(Z\) is given by \(\min ^{cv}:Z \rightarrow \mathbb {R}\),

$$\begin{aligned} \mathrm{min}^{\mathrm{cv}}(\hbox {x}_1,\mathrm{x}_2)&= \max \left( \mathrm{min}^{\mathrm{cv},1}(\mathrm{x}_1,\mathrm{x}_2),\mathrm{min}^{\mathrm{cv},2}(\mathrm{x}_1,\mathrm{x}_2)\right) \quad \text {with} \\ \mathrm{min}^{\mathrm{cv},1}(\mathrm{x}_1,\mathrm{x}_2)&= \min \left( x_1^L,x_2^L\right) + \frac{x_1-x_1^L}{x_1^U-x_1^L} \left( \min (x_1^U, x_2^L)-\min \left( x_1^L, x_2^L\right) \right) \\&+ \,\,\frac{x_2-x_2^L}{x_2^U-x_2^L} \left( \min (x_1^L, x_2^U)-\min \left( x_1^L, x_2^L\right) \right) \\ \mathrm{min}^{\mathrm{cv},2}(\mathrm{x}_1,\mathrm{x}_2)&= \min \left( x_1^U,x_2^U\right) +\frac{x_1-x_1^U}{x_1^L-x_1^U} \left( \min (x_1^L, x_2^U)-\min \left( x_1^U, x_2^U\right) \right) \\&+\,\,\frac{x_2-x_2^U}{x_2^L-x_2^U} \left( \min (x_1^U, x_2^L)-\min \left( x_1^U, x_2^U\right) \right) \\ \end{aligned}$$

and the concave envelope of \(\max (x_1,x_2)\) on \(Z\) is given by \(\max ^{cc}:Z \rightarrow \mathbb {R}\),

$$\begin{aligned} \mathrm{max}^{\mathrm{cc}}(\mathrm{x}_1,\mathrm{x}_2)&= \min \left( \mathrm{max}^{\mathrm{cc},1}(\mathrm{x},\mathrm{x}_2),\mathrm{max}^{\mathrm{cc},2}(\mathrm{x}_1,\mathrm{x}_2)\right) \quad \text { with}\\ \mathrm{max}^{\mathrm{cc},1}(\mathrm{x}_1,\mathrm{x}_2)&= \max \left( x_1^L,x_2^L\right) + \frac{x_1-x_1^L}{x_1^U-x_1^L} \left( \max \left( x_1^U, x_2^L\right) -\max \left( x_1^L, x_2^L\right) \right) \\&+\,\, \frac{x_2-x_2^L}{x_2^U-x_2^L} \left( \max \left( x_1^L, x_2^U\right) -\max \left( x_1^L, x_2^L\right) \right) \\ \mathrm{max}^{\mathrm{cc},2}(\mathrm{x}_1,\mathrm{x}_2)&= \max \left( x_1^U,x_2^U\right) +\frac{x_1-x_1^U}{x_1^L-x_1^U} \left( \max \left( x_1^L, x_2^U\right) -\max \left( x_1^U, x_2^U\right) \right) \\&+\,\,\frac{x_2-x_2^U}{x_2^L-x_2^U} \left( \max \left( x_1^U, x_2^L\right) -\max \left( x_1^U, x_2^U\right) \right) \\ \end{aligned}$$

Proof

The proof is in the Appendix.\(\square \)

A convex relaxation of the maximum of two functions is trivially given by the maximum of the convex relaxations of the two functions and a concave relaxation of the minimum of two functions as the minimum of the concave relaxations of the two functions.

Proposition 5

Consider \(Z \in \mathbb {R}^n\) and \(g_1,g_2,f_1,f_2:Z \rightarrow \mathbb {R}\) such that \(g_1(\mathbf{z})=\min \left( f_1(\mathbf{z}),f_2(\mathbf{z}) \right) \), \(g_2(\mathbf{z})=\max \left( f_1(\mathbf{z}),f_2(\mathbf{z}) \right) \). Suppose that interval enclosures are given for \(f_1\) and \(f_2\) on \(Z\), i.e., bounds \(f_1^L,f_1^U\), \(f_2^L,f_2^U\) such that

$$\begin{aligned} f_1^L \le f_1({\mathbf{z}}) \le f_1^U \qquad f_2^L \le f_2(\mathbf{z}) \le f_2^U \end{aligned}$$

and convex and concave relaxations such that

$$\begin{aligned} f_1^{cv}(\mathbf{z}) \le f_1(\mathbf{z}) \le f_1^{cc}(\mathbf{z}) \qquad f_2^{cv}(\mathbf{z}) \le f_2(\mathbf{z})\le f_2^{cc}(\mathbf{z}). \end{aligned}$$

Recall that Proposition 4 gives interval enclosures for \(f\) on \(Z\). The following procedure defines a convex relaxation \(g_1^{cv}:Z \rightarrow \mathbb {R}\) of \(g_1\) on \(Z\).

If \(f_1^U \le f_2^L\) then \(g_1^{cv}(\mathbf{z})=f_1^{cv}(\mathbf{z})\). Similarly, if \(f_2^U \le f_1^L\) then \(g_1^{cv}(\mathbf{z})=f_2^{cv}(\mathbf{z})\). Otherwise

$$\begin{aligned} g_1^{cv}(\mathbf{z})=\max \left( g_1^{cv,1}(\mathbf{z}),g_1^{cv,2}(\mathbf{z})\right) \end{aligned}$$

where

$$\begin{aligned} g_1^{cv,1}(\mathbf{z})&= \min \left( f_1^L,f_2^L\right) + \frac{f_1^{cv}(\mathbf{z})-f_1^L}{f_1^U-f_1^L} \left( \min \left( f_1^U, f_2^L\right) -\min \left( f_1^L, f_2^L\right) \right) \\&+\,\, \frac{f_2^{cv}(\mathbf{z})-f_2^L}{f_2^U-f_2^L} \left( \min \left( f_1^L, f_2^U\right) -\min \left( f_1^L, f_2^L\right) \right) \\ g_1^{cv,2}(\mathbf{z})&= \min \left( f_1^U,f_2^U\right) +\frac{f_1^{cv}(\mathbf{z})-f_1^U}{f_1^L-f_1^U} \left( \min \left( f_1^L, f_2^U\right) -\min \left( f_1^U, f_2^U\right) \right) \\&+\,\,\frac{f_2^{cv}(\mathbf{z})-f_2^U}{f_2^L-f_2^U} \left( \min \left( f_1^U, f_2^L\right) -\min \left( f_1^U, f_2^U\right) \right) . \end{aligned}$$

Furthermore, the following procedure defines a concave relaxation \(g_2^{cc}:Z \rightarrow \mathbb {R}\) of \(g_2\) on \(Z\). If \(f_1^U \le f_2^L\) then \(g_2^{cc}(\mathbf{z})=f_1^{cc}(\mathbf{z})\). Similarly, if \(f_2^U \le f_1^L\) then \(g_2^{cc}(\mathbf{z})=f_2^{cc}(\mathbf{z})\). Otherwise \(g_2^{cc}(\mathbf{z})=\min \left( g_2^{cc,1}(\mathbf{z}),g_2^{cc,2}(\mathbf{z})\right) \) where

$$\begin{aligned} g_2^{cc,1}(\mathbf{z})&= \max \left( f_1^L,f_2^L\right) + \frac{f_1^{cc}(\mathbf{z})-f_1^L}{f_1^U-f_1^L} \left( \max \left( f_1^U, f_2^L\right) -\max \left( f_1^L, f_2^L\right) \right) \\&+\,\, \frac{f_2^{cc}(\mathbf{z})-f_2^L}{f_2^U-f_2^L} \left( \max \left( f_1^L, f_2^U\right) -\max \left( f_1^L, f_2^L\right) \right) \\ g_2^{cc,2}(\mathbf{z})&= \max \left( f_1^U,f_2^U\right) +\frac{f_1^{cc}(\mathbf{z})-f_1^U}{f_1^L-f_1^U} \left( \max \left( f_1^L, f_2^U\right) -\max \left( f_1^U, f_2^U\right) \right) \\&+\,\,\frac{f_2^{cc}(\mathbf{z})-f_2^U}{f_2^L-f_2^U} \left( \max \left( f_1^U, f_2^L\right) -\max \left( f_1^U, f_2^U\right) \right) \end{aligned}$$

Proof

Since \(\min (\cdot ,\cdot )\) and \(\max (\cdot ,\cdot )\) are monotonic increasing the result follows by Corollary 3.\(\square \)

Note that there is no guarantee that the proposed relaxation is the envelope even if the estimators of the factors are, as shown in Fig. 2.

Fig. 2
figure 2

Take \(Z=[0,1]\) and \(g_1,f_1,f_2: Z \rightarrow \mathbb {R}\), such that \(f_1(z)=z^2\), \(f_2(z)=z\), \(g_1(z)=\min (f_1(z),f_2(z))\). Note that both \(f_1\) and \(f_2\) have range \([0,1]\) and that both are convex and as such \(f_1^{cv}=f_1\) and \(f_2^{cv}=f_2\) are the convex envelopes. We have \(g_1(z)=z^2\) and this is its convex envelope. It follows from Proposition 5 that \(g_1^{cv}: Z\rightarrow \mathbb {R}\), \(g_1^{cv}(z)=\max (0,z^2+z-1)\) is a convex relaxation of \(g_1\) on \(Z\). Note that it does not give the envelope although envelopes are used for the factors. The reformulation via the absolute value furnish a valid but less tight relaxation \(\bar{g}_1^{cv,abs}\)

Reformulating \(\min (\cdot ,\cdot )\) and \(\max (\cdot ,\cdot )\) operators using the absolute value of the difference, results in weak natural interval extensions and also weaker McCormick relaxations as shown in Proposition 6. Figure 2 shows that the inequality in Proposition 6 can be strict.

Proposition 6

Consider \(Z \in \mathbb {R}^n\) and \(f_1,f_2:Z \rightarrow \mathbb {R}\) such that \(g_1(\mathbf{z})=\min \left( f_1(\mathbf{z}),f_2(\mathbf{z}) \right) \). Suppose that interval enclosures are given for \(f_1\) and \(f_2\) on \(Z\), i.e., bounds \(f_1^L,f_1^U\), \(f_2^L,f_2^U\) such that

$$\begin{aligned} f_1^L \le f_1(\mathbf{z}) \le f_1^U \qquad f_2^L \le f_2(\mathbf{z}) \le f_2^U \end{aligned}$$

and convex/concave relaxations such that

$$\begin{aligned} f_1^{cv}(\mathbf{z}) \le f_1(\mathbf{z}) \le f_1^{cc}(\mathbf{z}) \qquad f_2^{cv}(\mathbf{z}) \le f_2(\mathbf{z}) \le f_2^{cc}(\mathbf{z}). \end{aligned}$$

For the overlapping case \(f_1^L< f_2^U\), \(f_2^L < f_1^U\), the convex/concave relaxations for min/max proposed in Theorem 5 are at least as tight as the ones obtained by McCormick’s composition Theorem applied to the reformulation via the absolute value.

Proof

The proof is given in the Appendix \(\square \)

Relaxations of \(\min \left( f_1,\ldots ,f_m \right) \) can be computed either recursively, or by direct application of Theorem 2 if an envelope/relaxation of \(\min \left( x_1,\ldots ,x_m \right) \) is available on the appropriate domain. In \([0,1]^m\), for example, it can be shown that the convex envelope of \(\min \left( x_1,\ldots ,x_m \right) \) is \(\displaystyle \max (0,\sum \nolimits _i x_i-n+1)\).

If the relaxation for the multiterm operator is the envelope, direct application of the multivariate composition will result in at least as tight relaxations. If in contrast the relaxations for the multiterm operator are weak, it may be advisable to use the bivariate composition recursively.

7 Fractional terms

Fractional terms \(f_1(\mathbf{z})/f_2(\mathbf{z})\) often arise in engineering optimization formulations. In McCormick relaxation framework, e.g., in MC++ [10] they are handled rigorously using the presentation as \(f_1(\mathbf{z}) \times \left( f_2(\mathbf{z})\right) ^{-1}\), i.e., as a bilinear product with the inverse function embedded. The multivariate composition theorem can handle the fractional terms more naturally and yields at least as tight and often tighter relaxations. For the rest of this section we assume that \(f_2^L>0\) or \(f_2^U<0\), so that the division is well defined.

Consider the fractional term \(\frac{x_1}{x_2}\) on \(X_1\times X_2=\left[ x_1^L,x_1^U\right] \times \left[ x_2^L,x_2^U\right] \) which we will denote via the division function \(\mathrm{div}(\cdot ,\cdot )\). Tawarmalani and Sahinidis [47] discuss convex relaxations and the envelope for the positive orthant, i.e., for \(x_1^L>0\), \(x_2^L>0\). One relaxation by Zamora and Grossmann [53, 54] is given by

$$ \begin{aligned} \mathrm{div}^{cv,Z \& G}(x_1,x_2)=\frac{1}{x_2}\left( \frac{x_1+\sqrt{x_1^L x_1^U}}{\sqrt{x_1^L}+\sqrt{x_1^U}} \right) ^2. \end{aligned}$$
(27)

The function \( \mathrm{div}^{cv,Z \& G}\) is the convex envelope when \(x_2^L\rightarrow 0\) and \(x_2^U\rightarrow \infty \). A piecewise linear relaxation of \(\mathrm{div}\) [33, 47] is given by

$$\begin{aligned} \mathrm{div}^{cv,lin}(x_1,x_2)=\max \left\{ \frac{x_1 x_2^U-x_1^Lx_2+x_1^Lx_2^U}{(x_2^U)^2}, \frac{x_1 x_2^L-x_1^U x_2+x_1^Ux_2^L}{(x_2^L)^2} \right\} . \end{aligned}$$
(28)

Another method to obtain a valid convex relaxation of \(\mathrm{div}(\cdot ,\cdot )\) on \(X_1\times X_2\), assuming that either \(x_2^U<0\) or \(x_2^L>0\) is to apply the product rule of McCormick [23] (defined in Eq. (22)) using the representation \(x_1 \times \mathrm{Inv}(x_2)\) where \(\mathrm{Inv}(\cdot )=(\cdot )^{-1}\). Let \(\mathrm{Inv}^L\), \(\mathrm{Inv}^U\) denote the implied bounds, and \(\mathrm{Inv}^{cv}(x_2)\), \(\mathrm{Inv}^{cc}(x_2)\) the convex and concave relaxations of \(\mathrm{Inv}\) on \(X_2\). It is easy to verify that the result is

$$\begin{aligned} \mathrm{div}^{cv,mc}(x_1,x_2)=\max \left\{ \begin{array}{c} \mathrm{Inv}^L x_1 + \min \left\{ x_1^L \mathrm{Inv}^{cv}(x_2),x_1^L \mathrm{Inv}^{cc}(x_2) \right\} -x_1^L \mathrm{Inv}^L,\\ \mathrm{Inv}^U x_1 + \min \left\{ x_1^U \mathrm{Inv}^{cv}(x_2),x_1^U \mathrm{Inv}^{cc}(x_2) \right\} -x_1^U \mathrm{Inv}^U. \end{array} \right\} \end{aligned}$$
(29)

which for the positive orthant reduces to

$$\begin{aligned} \mathrm{div}^{cv,mc,+}(x_1,x_2)=\max \left\{ \frac{x_1}{x_2^U} + \frac{x_1^L}{x_2} -\frac{x_1^L}{x_2^U}, \frac{x_1}{x_2^L} + \frac{x_1^U}{x_2} -\frac{x_1^U}{x_2^L} \right\} , \end{aligned}$$
(30)

as computed by Quesada and Grossman [33] following the same procedure. It is shown in [33] that \(\mathrm{div}^{cv,lin}\) is a linearization of \(\mathrm{div}^{cv,mc,+}\) at \(x_2^L, x_2^U\) and thus

$$\begin{aligned} \mathrm{div}^{cv,lin}(x_1,x_2)\le \mathrm{div}^{cv,mc,+}(x_1,x_2). \end{aligned}$$

The concave envelope for the positive orthant is computed in [46] to be

$$\begin{aligned} \mathrm{div}^{cc,mc,+}(x_1,x_2)=\frac{1}{x_2^L x_2^U}\min \left\{ x_2^U x_1-x_1^L x_2+x_1^L x_2^L,x_2^L x_1-x_1^U x_2+x_1^U x_2^U \right\} . \end{aligned}$$
(31)

Finally, Tawarmalani and Sahinidis [46, 47] prove that the convex envelope at a point can be evaluated by solving an optimization problem

$$\begin{aligned} \mathrm{div}^{cv,env}(x_1,x_2)=&\min \limits _{y_p,z_p,z_c^e,\lambda } z_c^e \nonumber \\&s.t.z_py_p \ge x_1^L (1-\lambda )^2 \nonumber \\&(z_c^e-z_p)(x_2-y_p)=x_1^U \lambda ^2 \nonumber \\&y_p \ge x_2^L (1-\lambda ) \nonumber \\&y_p \ge x_2- x_2^U \lambda \nonumber \\&y_p \le x_2^U (1-\lambda ) \nonumber \\&y_p \le x_2- x_2^L \lambda \nonumber \\&x_1=x_1^L +\left( x_1^U-x_1^L\right) \lambda \nonumber \\&z_c^e \ge z_p \nonumber \\&\lambda \in [0,1], z_p \ge 0 \end{aligned}$$
(32)

which can be reformulated as a semi-definite program.

In MC++ [10] a relaxation for \(G(\mathbf{z})=\frac{f_1(\mathbf{z})}{\hbox {f}_2(\mathbf{z})}\) is obtained using the representation \(f_1(\mathbf{z}) \times \left( \mathrm{Inv} \circ f_2(\mathbf{z})\right) \) and applying first McCormick’s composition theorem to obtain \((\mathrm{Inv} \circ \hbox {f}_2)^{\mathrm{cv}}(\mathbf{z})\), \((\mathrm{Inv} \circ \hbox {f}_2)^{cc}(\mathbf{z})\) and then McCormick’s product rule (22). The resulting relaxation is given by

$$\begin{aligned}&\!\!\!\bar{g}^{cv,MC++}(\mathbf{z})\\&~ =\max \left\{ \begin{array}{c} \min \left\{ \frac{1}{f_2^U} f_1^{cv}(\mathbf{z}), \frac{1}{f_2^U} f_1^{cc}(\mathbf{z}) \right\} + \min \left\{ f_1^L \left( {\mathrm{Inv}}\circ f_2\right) ^{cv}(\mathbf{z}),f_1^L \left( {\mathrm{Inv}}\circ f_2\right) ^{cc}(\mathbf{z}) \right\} -\frac{f_1^L}{f_2^U},\\ \min \left\{ \frac{1}{f_2^L} f_1^{cv}(\mathbf{z}), \frac{1}{f_2^L} f_1^{cc}(\mathbf{z}) \right\} + \min \left\{ f_1^U \left( {\mathrm{Inv}}\circ f_2\right) ^{cv}(\mathbf{z}),f_1^U \left( {\mathrm{Inv}}\circ f_2\right) ^{cc}(\mathbf{z}) \right\} -\frac{f_1^U}{f_2^L} \end{array} \right\} . \end{aligned}$$

Since both \(\mathrm{Inv}^{\mathrm{cv}}, \mathrm{Inv}^{\mathrm{cc}}\) are decreasing in \((-\infty , 0)\) and \((0, \infty )\), by Corollary 3 we have

$$\begin{aligned} \left( \mathrm{Inv}\circ f_2\right) ^{cv}(\mathbf{z}) =\mathrm{Inv}^{cv} ( f_2^{cc}(\mathbf{z})), \quad \left( \mathrm{Inv}\circ f_2\right) ^{cc}(\mathbf{z})=\mathrm{Inv}^{cc} (f_2^{cv}(\mathbf{z})), \end{aligned}$$

and thus

$$\begin{aligned}&\!\!\!\bar{g}^{cv,MC++}(\mathbf{z})\nonumber \\&~ = \max \left\{ \begin{array}{llll} \min \left\{ \frac{1}{f_2^U} f_1^{cv}(\mathbf{z}), \frac{1}{f_2^U} f_1^{cc}(\mathbf{z}) \right\} + \min \left\{ f_1^L \mathrm{Inv}^{cv}( f_2^{cc}(\mathbf{z})),f_1^L \mathrm{Inv}^{cc}( f_2^{cv}(\mathbf{z})) \right\} -\frac{f_1^L}{f_1^U},\\ \min \left\{ \frac{1}{f_2^L} f_1^{cv}(\mathbf{z}), \frac{1}{f_2^L} f_1^{cc}(\mathbf{z}) \right\} + \min \left\{ f_1^U \mathrm{Inv}^{cv}(f_2^{cc}(\mathbf{z})),f_1^U \mathrm{Inv}^{cc}( f_2^{cv}(\mathbf{z})) \right\} -\frac{f_1^U}{f_1^L} \end{array} \right\} \!.\nonumber \\ \end{aligned}$$
(33)

The multivariate composition theorem provides a direct method to calculate convex relaxations:

Corollary 6

Consider \(Z \in \mathbb {R}^n\) and \(G,f_1,f_2:Z \rightarrow \mathbb {R}\) such that \(G(\mathbf{z})=\frac{f_1(\mathbf{z})}{f_2(\mathbf{z})}\). Suppose that interval enclosures are given for \(f_1\) and \(f_2\) on \(Z\), i.e., bounds \(f_1^L,f_1^U\), \(f_2^L,f_2^U\) such that

$$\begin{aligned} f_1^L \le f_1(\mathbf{z}) \le f_1^U \qquad f_2^L \le f_2(\mathbf{z}) \le f_2^U \end{aligned}$$

and convex/concave relaxations such that

$$\begin{aligned} f_1^{cv}(\mathbf{z}) \le f_1(\mathbf{z}) \le f_1^{cc}(\mathbf{z}) \qquad f_2^{cv}(\mathbf{z}) \le f_2(\mathbf{z}) \le f_2^{cc}(\mathbf{z}). \end{aligned}$$

A valid convex relaxation for \(G\) on \(Z\) is given by \(g^{cv}\)

$$\begin{aligned} \begin{array}{ccc} g^{cv}(\mathbf{z})=&{} \displaystyle \min _{\begin{array}{c} x_1\in X_1\\ x_2 \in X_2 \end{array}}&{} \mathrm{div}^{cv}_{X_1\times X_2}(x_1,x_2) \\ &{}s.t. &{} f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ &{}&{} f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}), \end{array} \end{aligned}$$
(34)

where \(\mathrm{div}^{cv}_{X_1\times X_2}(x_1,x_2)\) is any valid convex relaxation of \(\mathrm{div}(\cdot ,\cdot )\) on \(X_1\times X_2\).

Similarly, a concave relaxation is obtained by

$$\begin{aligned} \begin{array}{ccc} g^{cc}(\mathbf{z})=&{} \displaystyle \max _{\begin{array}{c} x_1\in X_1\\ x_2 \in X_2 \end{array}}&{} \mathrm{div}^{cc}_{X_1\times X_2}(x_1,x_2) \\ &{}s.t. &{} f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ &{}&{} f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}), \end{array} \end{aligned}$$
(35)

Proposition 7

Consider the relaxation \(g^{cv}\) constructed in Corollary 6 for \(G(\mathbf{z})=f_1(\mathbf{z})/f_2(\mathbf{z})\) and suppose that the relaxations for \(\mathrm{div}(\cdot ,\cdot )\) on \(X_1\times X_2\) are at least as tight as \(\mathrm{div}^{cv,mc}\) and \(\mathrm{div}^{cc,mc}\). Then \(g^{cv}\) is at least as tight as \(\bar{g}^{cv,MC++}\) as defined in Eq. (33).

Proof

The proof is given in the Appendix.\(\square \)

Figure 3 shows that the proposed relaxations can be substantially tighter than the ones obtained via the McCormick relaxations. Moreover, it shows that if weak relaxations are used for the outer function in the multivariate composition theorem, the relaxations can be weaker than the univariate McCormick relaxations.

Fig. 3
figure 3

Consider the trivial function \(z/z=1\) on \([0.1,1]\). The inequality in Proposition 7 can be strict. The multivariate McCormick relaxations (by Theorem 6) are the same when the outer function is relaxed via \( \mathrm{div}^{\mathrm{cv,Z \& G}}\) (by Zamora and Grossmann [53]), or via \(\mathrm{div}^{\mathrm{cv,env}}\) (by Tawarmalani and Sahinidis [47]) ; the convex relaxation obtained is the tightest among the relaxations considered. The univariate McCormick relaxation \({\bar{g}}^{cv,MC++}\)(by Theorem 1) is the same as the multivariate using the bilinear relaxation \(\mathrm{div}^{\mathrm{cv,mc++}}\) and is weaker than the previous ones. The loosest relaxation is obtained when the linear relaxation \(\mathrm{div}^{\mathrm{cv,lin}}\) (by Tawarmalani and Sahinidis [47]) is used for the outer function in the multivariate McCormick relaxations

Implementing Theorem 2 for the division of two functions is straightforward if the outer function is relaxed via (27), (28) or (29), as these relaxations are given in closed form. Subgradient computation is also straightforward and Theorem 4 can be utilized to further propagate them to outer functions.

The use of the convex envelope defined in (32) is more involved but we can use it by solving

$$\begin{aligned} \begin{array}{cccc} \displaystyle \min _{\begin{array}{c} x_1\in X_1\\ x_2 \in X_2 \end{array}}&{} \mathrm{div}^{cv,env}(x_1,x_2) &{}= \displaystyle \min _{\begin{array}{c} y_p,z_p,z_c^e,\lambda \\ ,x_1,x_2 \end{array}} &{} z_c^e \\ s.t. &{}\displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) &{} s.t. &{}\displaystyle z_py_p \ge x_1^L (1-\lambda )^2 \\ &{}\displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z})&{} &{}\displaystyle (z_c^e-z_p)(x_2-y_p)=x_1^U \lambda ^2 \\ &{}&{}&{}\displaystyle y_p \ge x_2^L (1-\lambda ) \\ &{}&{}&{}\displaystyle y_p \ge x_2- x_2^U \lambda \\ &{}&{}&{}\displaystyle y_p \le x_2^U (1-\lambda ) \\ &{}&{}&{}\displaystyle y_p \le x_2- x_2^L \lambda \\ &{}&{}&{}\displaystyle x_1=x_1^L +(x_1^U-x_1^L) \lambda \\ &{}&{}&{}\displaystyle z_c^e \ge z_p \\ &{}&{}&{}\displaystyle \lambda \in [0,1], z_p \ge 0\\ &{}&{}&{}\displaystyle f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z}) \\ &{}&{}&{}\displaystyle f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z}) \end{array} \end{aligned}$$

which for a given \(\mathbf{z}\) can be written a semi-definite program. Similarly to the discussion of multilinear products we can obtain subgradients by using the Lagrange multipliers associated to the constraints \(f_1^{cv}(\mathbf{z}) \le x_1 \le f_1^{cc}(\mathbf{z})\), \(f_2^{cv}(\mathbf{z}) \le x_2 \le f_2^{cc}(\mathbf{z})\).

If the envelope is not used, one can easily take the maximum of \( \mathrm{div}^{\mathrm{cv,Z \& K}}\) and \(\mathrm{div}^{\mathrm{cv,mc+}}\). Tawarmalani and Sahinidis [47] show it can be beneficial compared to any of the two terms.

8 Concluding remarks

We presented a multivariate generalization of McCormick’s composition theorem [23]. McCormick’s results for the relaxation of composite functions with univariate outer function are the basis of so-called McCormick-relaxations, which are one of the key ideas in constructing convex relaxations in deterministic global optimization. Our generalization to multivariate outer functions results in tighter relaxations for important classes of functions including binary product of functions, the division of functions and the minimum/maximum of functions. Similarly to McCormick’s composition and product theorem, the multivariate composition can be applied recursively and in fact the implementation of our result is very similar to McCormick’s relaxations; many of our improvements have been implemented in both MC++ [10] and modMC [13]. In contrast to the univariate McCormick’s relaxations, our result also enables the direct relaxation of classes of functions such as multilinear products of functions. This is particularly important since in recent years many relaxations have been proposed for relatively complicated expressions and it has been shown that using this is advantageous compared to recursive application of simple rules. For instance, an important class of functions are the so-called edge-concave functions treated in [26, 44, 45]; the work presented herein can be used to obtain tight relaxations for functions that are a composition of an edge-concave outer function and an arbitrary inner function; the relaxation can be achieved via a similar reasoning to our theorems for relaxations of bilinear, multilinear and fractional terms. It would be very useful to collect all these rules and implement them in the proposed multivariate McCormick relaxations and then perform a thorough computational comparison of the advances obtained. Moreover, it would be interesting to consider other important functions found in applications, such as \(|f_1(\mathbf{z})-f_2(\mathbf{z})|\) and \((f_1(\mathbf{z})-f_2(\mathbf{z}))^2\) which are found for instance in parameter estimation. Also, it would be interesting to consider discontinuous functions as done in [52].

Similarly to univariate McCormick relaxations, our result is also applicable to functions calculated by algorithms [30]. It is well-known that univariate McCormick relaxations are nonsmooth and recently subgradient propagation has been proposed [30]. For the proposed multivariate framework it is also possible to propagate subgradients and in fact, we provide the framework to obtain, at least in principle, the entire subdifferential.

An alternative to McCormick relaxations is the AVM. Our reformulation and generalization of McCormick’s composition theorem makes the connection with this method more explicit. In particular, it illustrates that the McCormick relaxation framework can be interpreted as a decomposition method for AVM. It would be of interest to indeed utilize such decomposition methods in the AVM. Moreover, we discussed the tightness of relaxations of the AVM compared to the multivariate McCormick relaxations. In cases that common subexpressions are recognized in the AVM this can result in tighter relaxations than the McCormick relaxations [48]; the same holds for the simple recursive application of the proposed multivariate McCormick relaxations. In some cases, it is possible to introduce just enough auxiliary variables to close this gap, and it would be interesting to explore this opportunity computationally. Moreover, the proposed multivariate relaxations can result in tighter relaxations in specific cases by enabling the use of complicated but tight relaxations of some functions. It would be interesting to computationally compare the two methods.