1 Introduction

An important area of modern research is global optimization as it occurs very frequently in applications (extensive surveys on global optimization can be found in Neumaier [27], Floudas [10, 11], Hansen [13], and Kearfott [18]). A method for solving global optimization problems efficiently is by using a branch and bound algorithm (as, e.g., BARON by Sahinidis [31, 32], the COCONUT environment by Schichl [3335], or LINGO by Schrage [36]), which divides the feasible set into smaller regions and then tries to exclude regions that cannot contain a global optimizer. Therefore, it is important to have tools which allow to identify such regions. In this paper we will present a method which is able to find such regions for a CSP (constraint satisfaction problem), i.e. for a global optimization problem with a constant objective function, by generalizing the approach from Fendl [6].

For the following we introduce a bit of notation that we will use throughout the paper: We denote the non-negative real numbers by \({\mathbb {R}}_{\ge 0}:=\lbrace x\in {\mathbb {R}}:~x\ge 0\rbrace \) (and analogously for \(\le \) as well as for >). Furthermore, we denote the p-norm of \(x\in {\mathbb {R}}^n\) by \(|x|_{_p}\) for \(p\in \lbrace 1,2,\infty \rbrace \). The set of real extended intervals \(\varvec{x}=[{\underline{x}},{\overline{x}}]:=\{x\in {\mathbb {R}}\mid {\underline{x}}\le x\le {\overline{x}}\}\) for \({\underline{x}},{\overline{x}}\in {\mathbb {R}}\cup \{-\infty ,\infty \}\), and \({\underline{x}}\le {\overline{x}}\) will be denoted by \({{\mathbb {I}}}{{\mathbb {R}}}\). Then \({{\mathbb {I}}}{{\mathbb {R}}}^{m}\) is the set of all m-dimensional boxes in \({\mathbb {R}}^{m}\).

Certificate of infeasibility. For this purpose we consider the CSP

$$\begin{aligned} \begin{aligned} F(x)&\in \varvec{F}\\ x&\in \varvec{x} \end{aligned} \end{aligned}$$
(1)

with \(F:{\mathbb {R}}^n\longrightarrow {\mathbb {R}}^m\), \(\varvec{x}\in {{\mathbb {I}}}{{\mathbb {R}}}^n\), \(\varvec{F}\in {{\mathbb {I}}}{{\mathbb {R}}}^m\), and we assume that a solver, which is able to solve a CSP, takes the box \(\varvec{u}:=[{\underline{u}},{\overline{u}}]\subseteq \varvec{x}\) into consideration during the solution process. We construct a certificate of infeasibility f, which is a nondifferentiable and nonconvex function in general, with the following property: If there exists a vector y with

$$\begin{aligned} f(y,{\underline{u}},{\overline{u}})<0 ~, \end{aligned}$$
(2)

then the CSP (1) has no feasible point in \(\varvec{u}\) and consequently this box can be excluded for the rest of the solution process. Therefore, a box \(\varvec{u}\) for which (2) holds is called an exclusion box.

Easy examples (see Example 3 on page 7) immediately show that there exist CSPs which have boxes that satisfy (2), so it is worth to pursue this approach further.

Exclusion boxes. The obvious way for finding an exclusion box for the CSP (1) is to minimize f

$$\begin{aligned} \begin{aligned} \min _{y}{f(y,{\underline{u}},{\overline{u}})} \end{aligned} \end{aligned}$$
(3)

and stop the minimization if a negative function value occurs. Since modern solvers offer many other possibilities for treating a box, we do not want to spend too much time for this minimization problem. Therefore, the idea is to let a nonsmooth solver only perform a few steps for solving (3).

To find at least an exclusion box \(\varvec{v}:=[{\underline{v}},{\overline{v}}]\subseteq \varvec{u}\) with \({\underline{v}}+r\le {\overline{v}}\), where \(r\in (0,{\overline{u}}-{\underline{u}})\) is fixed, we can try to solve the linearly constrained problem

$$\begin{aligned} \begin{aligned}&\min _{y,{\underline{v}},{\overline{v}}}{f(y,{\underline{v}},{\overline{v}})}\\&\text { s.t. }[{\underline{v}}+r,{\overline{v}}]\subseteq \varvec{u} ~. \end{aligned} \end{aligned}$$

Another important aspect in this context is to enlarge an exclusion box \(\varvec{v}\) by solving

$$\begin{aligned} \begin{aligned}&\max _{y,{\underline{v}},{\overline{v}}}{\mu ({\underline{v}},{\overline{v}})}\\&\text { s.t. }f(y,{\underline{v}},{\overline{v}})\le \delta ~, ~[{\underline{v}},{\overline{v}}]\subseteq \varvec{u} ~, \end{aligned} \end{aligned}$$
(4)

where \(\delta <0\) is given and \(\mu \) measures the magnitude of the box \(\varvec{v}\) (e.g., \(\mu ({\underline{v}},{\overline{v}}):=|{\overline{v}}-{\underline{v}}|_{_1}\)). Since only feasible points of (4) are useful for enlarging an exclusion box and we only want to perform a few steps of a nonsmooth solver as before, we expect benefits from a nonsmooth solver that only creates feasible iterates because then the current best point can always be used for our purpose. For proofs in explicit detail we refer the reader to Fendl [6, p. 147 ff, Chapter 5].

The paper is organized as follows: In Sect. 2 we first recall the basic facts of interval analysis which are necessary for introducing the certificate of infeasibility which is done afterwards. Then we discuss some important properties of the certificate and we explain in detail how the certificate is used for obtaining exclusion boxes in a CSP by applying a nonsmooth solver. In Sect. 3 we explain how we obtain a starting point for optimization problems to which we apply the nonsmooth solver.

2 Presentation of the application

After summarizing the most basic facts of interval analysis, we construct the certificate of infeasibility in this section. Furthermore, we discuss how a nonsmooth solver can use this certificate to obtain an exclusion box in a CSP.

2.1 Interval arithmetic

Interval arithmetic is an extension of real arithmetic defined on the set \({{\mathbb {I}}}{{\mathbb {R}}}\) of real intervals, rather than the set of real numbers. According to a survey paper by Kearfott [17], a form of interval arithmetic perhaps first appeared in Burkill [4]. Modern interval arithmetic was originaly invented independently in the late 1950s by several researchers; including Warmus [40], Sunaga [38] and finally Moore [24], who set firm foundation for the field in his many publications, including the foundational book Moore [25]. Since then, interval arithmetic is being used to rigorously solve numerical problems.

Let \(\varvec{x}, \varvec{y}\in {{\mathbb {I}}}{{\mathbb {R}}}\). The width of \(\varvec{x}\) is defined as \({\text {wid}}(\varvec{x}) := \overline{x}-\underline{x}\), the radius of \(\varvec{x}\) as \({\text {rad}}(\varvec{x}) := \tfrac{1}{2}{\text {wid}}(\varvec{x})\), the magnitude of \(\varvec{x}\) as \(|\varvec{x}| := \max (-\underline{x},\overline{x})\), and the mignitude of \(\varvec{x}\) as \(\langle \varvec{x}\rangle := \min \{|x|\mid x\in \varvec{y}\}\). For \(\varvec{x}\) bounded we set the midpoint of \(\varvec{x}\) as \({\check{x}} := \tfrac{1}{2}(\underline{x}+\overline{x})\). We define the elementary operations for interval arithmetic by the rule \(\varvec{x}\diamond \varvec{y}= \square \{x \diamond y \mid x \in \varvec{x}, y \in \varvec{y}\}, \forall \diamond \in \{+, -, \times , \div , \hat{}\,\,\}\), where \(\square S\) denotes the smallest interval containing the set S. Thus, the ranges of the four elementary interval arithmetic operations are exactly the ranges of the their real-valued counterparts. Although this rule characterizes these operations mathematically, the usefulness of interval arithmetic is due to the operational definitions based on interval bounds [14]. For example, let \(\varvec{x}= [\underline{x}, \overline{x}]\) and \(\varvec{y}= [\underline{y}, \overline{y}]\), it can be easily proved that

$$\begin{aligned} \varvec{x}+ \varvec{y}= & {} [\underline{x}+\underline{y},\overline{x}+\overline{y}], \\ \varvec{x}- \varvec{y}= & {} [\underline{x}-\overline{y},\overline{x}-\underline{y}], \\ \varvec{x}\times \varvec{y}= & {} \left[ \min \{\underline{x}\underline{y}, \underline{x}\overline{y}, \overline{x}\underline{y}, \overline{x}\overline{y} \}, \max \{\underline{x}\underline{y}, \underline{x}\overline{y}, \overline{x}\underline{y}, \overline{x}\overline{y} \}\right] , \\ \varvec{x}\div \varvec{y}= & {} \varvec{x}\times 1/\varvec{y} if 0 \notin \varvec{y}, where 1/\varvec{y}= [1/\overline{y}, 1/\underline{y}]. \end{aligned}$$

In addition, for an elementary function \(\varphi :{\mathbb {R}}\rightarrow {\mathbb {R}}\) and an interval \(\varvec{x}\) we define

$$\begin{aligned} \varphi (\varvec{x}) := \Box \{\varphi (x)\mid x\in \varvec{x}\}. \end{aligned}$$
(5)

Moreover, if a factorable function f, that is a function that is an expression composed of these arithmetic operations and elementary functions \(\varphi \in \{\sin ,\cos ,\exp ,\log ,\ldots \}\), is given, bounds on the range of f can be obtained by replacing the real arithmetic operations and the real functions by their corresponding interval arithmetic counterparts. Usually, we will write \({\mathfrak {f}}\) for the so defined natural interval extension of f. The strength of interval arithmetic also lies in its containment property, i.e.,

$$\begin{aligned} \left[ \inf _{x\in \varvec{x}}f(x),\sup _{x\in \varvec{x}}f(x)\right] \subseteq {\mathfrak {f}}(\varvec{x}). \end{aligned}$$
(6)

This is also called the fundamental theorem of interval arithmetic.

For example, if given the function \(f(x) = x(x-1)\), then bounds on the ranges of f over [0, 1] can be obtained from its natural extension to interval arithmetic, i.e. \({\mathfrak {f}}(\varvec{x}) = \varvec{x}(\varvec{x}-1)\), precisely by \({\mathfrak {f}}([0, 1]) = [0, 1]([0, 1] - 1) = [0, 1][-1, 0] = [-1, 0]\) which contains the exact range \([-0.25, 0]\). In fact, bounds from interval arithmetic often are sharper and are simpler to derive than bounds from other techniques.

Interval arithmetic can be carried out for virtually every expression that can be evaluated with floating-point arithmetic. However, two important points have to be considered: Interval arithmetic is only subdistributive, so expressions that are equivalent in real arithmetic usually differ in interval arithmetic, giving different amounts of overestimation (the amount by which the real range of the function over an interval and the result computed by interval arithmetic differ). Therefore, computations should be arranged so that overestimation of ranges is minimized. Readers are referred to [1, 14, 15, 28] for more details on basic interval methods.

2.2 Certificate of infeasibility

Let \(s:{\mathbb {R}}^m\times {\mathbb {R}}^n\times {\mathbb {R}}^{q}\times {{\mathbb {I}}}{{\mathbb {R}}}^n\longrightarrow {{\mathbb {I}}}{{\mathbb {R}}}\) be a function. We assume that \(Z:{\mathbb {R}}^m\times {\mathbb {R}}^n\times {\mathbb {R}}^{q}\times {\mathbb {R}}^n\times {\mathbb {R}}^n\longrightarrow {\mathbb {R}}\)

$$\begin{aligned} Z(y,z,w,{\underline{x}},{\overline{x}}) := \sup {s(y,z,w,\varvec{x})} ~, \end{aligned}$$
(7)

where \(\varvec{x}=[{\underline{x}},{\overline{x}}]\in {{\mathbb {I}}}{{\mathbb {R}}}^n\), satisfies

$$\begin{aligned} Z(y,z,w,{\underline{x}},{\overline{x}}) \ge \sup _{x\in \varvec{x}}{y^T\big (F(x)-F(z)\big )} ~. \end{aligned}$$
(8)

Example 1

If we set \(w=(R,S)\in {\mathbb {R}}_{\mathrm {triu}}^{n\times n}\times {\mathcal {R}}_0(n)\), where we denote the linear space of the upper resp. strictly upper triangular \(n\times n\)-matrices by \({\mathbb {R}}_{\mathrm {triu}}^{n\times n}\cong {\mathbb {R}}^{n_1}\) with \(n_1:=\tfrac{1}{2}n(n+1)\) resp. \({\mathbb {R}}_{\mathrm {triu}}^{n\times n}\cong {\mathbb {R}}^{n_0}\) with \(n_0:=\tfrac{1}{2}(n-1)n\), which implies \(q=n_0+n_1=n^2\), and if we define

$$\begin{aligned} s_1(y,z,R,S,\varvec{x}) := \Big (\sum _{k=1}^m{y_k{\mathfrak {F}}_k[z,\varvec{x}]+(\varvec{x}-z)^T(R^TR+S^T-S)}\Big )(\varvec{x}-z) \end{aligned}$$
(9)

then the corresponding Z satisfies (8) because: Due to the antisymmetry of \(S^T-S\), we have

$$\begin{aligned} y^T\big (F(x)-F(z)\big ) +(x-z)^T(R^TR+S^T-S)(x-z) \ge y^T\big (F(x)-F(z)\big ) ~. \end{aligned}$$
(10)

for all \(x\in \varvec{x}\). Since the slope expansion

$$\begin{aligned} F_k(x) = F_k(z)+F_k[z,x](x-z) ~, \end{aligned}$$
(11)

where the slope \(F_k[z,x]\in {\mathbb {R}}^{1\times n}\), holds for all \(x,z\in {\mathbb {R}}^n\) (cf., e.g., Neumaier [26]), we obtain for all \(x\in \varvec{x}\)

$$\begin{aligned} y^T\big (F(x)-F(z)\big ) +(x-z)^T(R^TR+S^T-S)(x-z) \subseteq s_1(y,z,R,S,\varvec{x}) \end{aligned}$$
(12)

due to (11), (5), (6), and (9).Now we obtain (8) due to (10), (12), and (7).

Proposition 1

It holds for all \(z\in \varvec{x}\)

$$\begin{aligned} Z(y,z,w,{\underline{x}},{\overline{x}})\ge 0 ~. \end{aligned}$$
(13)

Proof

(13) follows from (8) and the assumption that \(z\in \varvec{x}\). \(\square \)

Definition 1

We define \(Y:{\mathbb {R}}^m\times {\mathbb {R}}^n\longrightarrow {\mathbb {R}}\) and \(f:{\mathbb {R}}^m\times {\mathbb {R}}^n\times {\mathbb {R}}^{q}\times {\mathbb {R}}^n\times {\mathbb {R}}^n\longrightarrow {\mathbb {R}}\) by

$$\begin{aligned} Y(y,z)&:= \inf {y^T\big (\varvec{F}-F(z)\big )} \end{aligned}$$
(14)
$$\begin{aligned} f(y,z,w,{\underline{x}},{\overline{x}})&:=\frac{Z(y,z,w,{\underline{x}},{\overline{x}})-\max {\big (0,Y(y,z)\big )}}{T(y,z,w,{\underline{x}},{\overline{x}})} ~, \end{aligned}$$
(15)

where \(T:{\mathbb {R}}^m\times {\mathbb {R}}^n\times {\mathbb {R}}^{q}\times {\mathbb {R}}^n\times {\mathbb {R}}^n\longrightarrow {\mathbb {R}}_{>0}\) is positive, continuous and differentiable almost everywhere.

Remark 1

f from (15) depends on \(N=m+3n+q\) variables and is not differentiable everywhere and not convex (in general).

Now we state the main theorem for our application.

Theorem 1

If there exist \(y\in {\mathbb {R}}^m\), \({\underline{x}}\le z\le {\overline{x}}\in {\mathbb {R}}^n\), and \(w\in {\mathbb {R}}^{q}\) with \(f(y,z,w,{\underline{x}},{\overline{x}})<0\), then for all \(x\in \varvec{x}\) there exists \(k\in \lbrace 1,\dots ,m\rbrace \) with \(F_k(x)\not \in \varvec{F}_k\), i.e. there is no \(x\in \varvec{x}\) with \(F(x)\in \varvec{F}\), i.e. there is no feasible point.

Proof

(by contradiction) Suppose that there exists \({\hat{x}}\in \varvec{x}:=[{\underline{x}},{\overline{x}}]\) with \(F({\hat{x}})\in \varvec{F}\). By assumption there exist \(y\in {\mathbb {R}}^m\) and \(z\in \varvec{x}\subseteq {{\mathbb {I}}}{{\mathbb {R}}}^n\) with \(f(y,z,w,{\underline{x}},{\overline{x}})<0\), which is equivalent to \(Z(y,z,w,{\underline{x}},{\overline{x}})<Y(y,z)\) due to (15) and (13). Since

$$\begin{aligned} Z(y,z,w,{\underline{x}},{\overline{x}}) \ge \sup _{x\in \varvec{x}}{y^T\big (F(x)-F(z)\big )} \ge y^T\big (F(x)-F(z)\big ) \end{aligned}$$

for all \(x\in \varvec{x}\) due to (8) and

$$\begin{aligned} Y(y,z) = \inf _{{\tilde{F}}\in \varvec{F}}{y^T\big ({\tilde{F}}-F(z)\big )} \le y^T\big ({\tilde{F}}-F(z)\big ) \end{aligned}$$

for all \({\tilde{F}}\in \varvec{F}\) due to (14) and (6), we obtain for all \(x\in \varvec{x}\) and for all \({\tilde{F}}\in \varvec{F}\)

$$\begin{aligned} y^T\big (F(x)-F(z)\big ) \le Z(y,z,w,{\underline{x}},{\overline{x}}) < Y(y,z) \le y^T\big ({\tilde{F}}-F(z)\big ) ~, \end{aligned}$$

which implies that we have \( y^TF(x) < y^T{\tilde{F}} \) for all \(x\in \varvec{x}\) and for all \({\tilde{F}}\in \varvec{F}\). Now, choosing \(x={\hat{x}}\in \varvec{x}\) and \({\tilde{F}}=F({\hat{x}})\in \varvec{F}\) in the last inequality yields a contradiction. \(\square \)

The following proposition gives in particular a hint how the y-component of a starting point should be chosen (cf. [33]).

Proposition 2

We have for all \(y\in {\mathbb {R}}^m\) and \(z\in {\mathbb {R}}^n\)

$$\begin{aligned} Y(y,z) = \sum _{k=1}^m \left\{ \begin{array}{ll} y_k\big ({\underline{F}}_k-F_k(z)\big ) &{} \text {for }y_k\ge 0\\ y_k\big ({\overline{F}}_k-F_k(z)\big ) &{} \text {for }y_k<0 ~. \end{array} \right. \end{aligned}$$
(16)

Furthermore, let \(I,J\subseteq \lbrace 1,\dots ,m\rbrace \) satisfy \(I\not =\emptyset ~\vee ~J\not =\emptyset \), \({\underline{F}}_i=-\infty ~\wedge ~y_i>0\) for all \(i\in I\) and \({\overline{F}}_j=\infty ~\wedge ~y_j<0\) for all \(j\in J\), then we have for all \(z\in {\mathbb {R}}^n\)

$$\begin{aligned} Y(y,z)=-\infty ~. \end{aligned}$$
(17)

Proof

(16) holds because of (14). For obtaining (17), consider without loss of generality \({\underline{F}}_1=-\infty ~\wedge ~y_1>0\) and \({\overline{F}}_2=\infty ~\wedge ~y_2<0\), then the desired result follows from (16). \(\square \)

2.3 Properties of the certificate for quadratic F

In this subsection we consider the special case of f with quadratic F, i.e.

$$\begin{aligned} F_k(x) = c_k^Tx+x^TC_kx \end{aligned}$$
(18)

with \(c_k\in {\mathbb {R}}^n\) and \(C_k\in {\mathbb {R}}^{n\times n}\) for \(k=1,\dots ,m\). Since

$$\begin{aligned} F_k(x)-F_k(z) = \big (c_k^T+x^TC_k+z^TC_k^T\big )(x-z) \end{aligned}$$

for all \(x,z\in {\mathbb {R}}^n\), the slope expansion from (11) holds with

$$\begin{aligned} F_k[z,x] = c_k^T+x^TC_k+z^TC_k^T ~. \end{aligned}$$
(19)

Proposition 3

Let \(F_k\) be quadratic and set

$$\begin{aligned} \begin{aligned} C(y) := \sum _{k=1}^mC_ky_k, c(y,z)&:= \sum _{k=1}^mc_ky_k+\big (C(y)+C(y)^T\big )z,\\ A(y,R,S)&:= C(y)+R^TR+S^T-S ~. \end{aligned} \end{aligned}$$
(20)

Then (8) is satisfied for

$$\begin{aligned} s_2(y,z,R,S,\varvec{x}) := \big (c(y,z)^T+(\varvec{x}-z)^TA(y,R,S)\big )(\varvec{x}-z) ~. \end{aligned}$$
(21)

Proof

Since \( \sum _{k=1}^m{y_kF_k[z,x]} = c(y,z)^T+(x-z)^TC(y) \) due to (19) and (20), we obtain \( \sum _{k=1}^m{y_kF_k[z,x]}+(x-z)^T(R^TR+S^T-S) = c(y,z)^T+(x-z)^TA(y,R,S) \) due to (20), which implies that \(s_2\) from (21) has the same structure as \(s_1\) from (9), and consequently we obtain that (8) holds for \(s_2\), too. \(\square \)

Proposition 4

Let \(F_k\) be quadratic, then we have for all \(p\in {\mathbb {R}}^m\) and \(\alpha \in {\mathbb {R}}\)

$$\begin{aligned} C(y+\alpha p) = C(y)+\alpha C(p) ~, \quad c(y+\alpha p,z) = c(y,z)+\alpha c(p,z) \end{aligned}$$
(22)

and furthermore we have for all \(\kappa \ge 0\)

$$\begin{aligned} A(\kappa ^2y,\kappa R,\kappa ^2S) = \kappa ^2A(y,R,S) ~. \end{aligned}$$
(23)

Proof

(22) holds due to (20). (23) holds due to (20) and (22). \(\square \)

Proposition 5

Let \(F_k\) be quadratic. If the positive function \(T:{\mathbb {R}}^m\times {\mathbb {R}}^n\times {\mathbb {R}}^{n_1}\times {\mathbb {R}}^{n_0}\times {\mathbb {R}}^n\times {\mathbb {R}}^n\longrightarrow {\mathbb {R}}_{>0}\) satisfies the (partial) homogeneity condition

$$\begin{aligned} T(\kappa ^2y,z,\kappa R,\kappa ^2S,{\underline{x}},{\overline{x}})=\kappa ^2T(y,z,R,S,{\underline{x}},{\overline{x}}) \end{aligned}$$
(24)

for all \(\kappa >0\), \(y\in {\mathbb {R}}^m\), \(z\in [{\underline{x}},{\overline{x}}]\in {{\mathbb {I}}}{{\mathbb {R}}}^n\), \(R\in {\mathbb {R}}_{\mathrm {triu}}^{n\times n}\) and \(S\in {\mathcal {R}}_0(n)\), then the certificate f from 15 is (partially) homogeneous

$$\begin{aligned} f(\kappa ^2y,z,\kappa R,\kappa ^2S,{\underline{x}},{\overline{x}})=f(y,z,R,S,{\underline{x}},{\overline{x}}) ~. \end{aligned}$$
(25)

Proof

Since \(F_k\) is quadratic by assumption, the statements of Proposition 3 hold. By using (22) and (23) we calculate

$$\begin{aligned} c(\kappa ^2y,z)^T+(\varvec{x}-z)^TA(\kappa ^2y,\kappa R,\kappa ^2S) = \kappa ^2\big (c(y,z)^T+(\varvec{x}-z)^TA(y,R,S)\big ) \end{aligned}$$
(26)

and therefore \(Z(\kappa ^2y,z,\kappa R,\kappa ^2S,{\underline{x}},{\overline{x}})=\kappa ^2Z(y,z,R,S,{\underline{x}},{\overline{x}})\) follows due to (7), (21), and (26). Furthermore, we have \(Y(\kappa ^2y,z)=\kappa ^2Y(y,z)\) due to (14) and, hence, we obtain \(\max {\big (0,Y(\kappa ^2y,z)\big )}=\kappa ^2\max {\big (0,Y(y,z)\big )}\). Consequently, (15) and (24) imply (25). \(\square \)

Remark 2

The intention of (25) is to reduce the scale dependence of the unbounded variables y, R and S of f. If we go through the proof of Proposition 5 again, we notice that we use the scaling property (23) of A for showing (26). From the proof of (23) we notice that this proof only holds, if y, R and S are treated as variables and none of them is treated as a constant (since factoring \(\kappa ^2\) out of a constant, yields an additional factor \(\kappa ^{-2}\) to the constant). Nevertheless, if one of the variables y, R resp. S is treated as a constant and we set the corresponding value to \(y=0\), \(R=0\) resp. \(S=0\), then the proof still holds.

Example 2

Consider the variables R and S as constants and set \(R=S=0\). Then \( T_1(y,z,R,S,{\underline{x}},{\overline{x}}):=1 \) does not satisfy (24), while \( T_2(y,z,R,S,{\underline{x}},{\overline{x}}):=|y|_{_2} \) does (cf. Remark 2). Note that \(T_2\) violates the requirement of positivity, as demanded in Definition 1, for \(y=0\), and hence in this case f is only defined outside the zero set of T. Nevertheless, since the zero set of \(T_2\) has measure zero, it is numerically very unlikely to end up at a point of this zero set and therefore we will also consider this choice of T due to the important fact of the reduction of the scale dependence of f as mentioned in Remark 2 [also cf. Sect. 2.4 (directly after optimization problem (27)) and Example 5].

Example 3

Choose \(n=2\), \(m=2\), \(c_1=\left( {\begin{matrix} 1 \\ -3 \end{matrix}}\right) \), \(c_2=\left( {\begin{matrix} 4 \\ 2 \end{matrix}}\right) \), \(C_1=\left( {\begin{matrix} 2 &{} 0\\ 3 &{} 4\end{matrix}}\right) \), \(C_2=\left( {\begin{matrix}-1 &{} 0\\ -2 &{} 7\end{matrix}}\right) \), which yields \(F_1(x)=2x_1^2+x_1+3x_1x_2-3x_2+4x_2^2\) and \(F_2(x)=-x_1^2+4x_1-2x_1x_2+2x_2+7x_2^2\) due to (18), \(\varvec{x}=[-3,3]\times [-4,4]\) and \(\varvec{F}=[-1,7]\times [-2,0]\), then we can illustrate certificate f from (15) with different T from Example 2 in Figs. 18.

Fig. 1
figure 1

\(f(y_1, y_2)\) for \(T_1\)

Fig. 2
figure 2

\(f(y_1, y_2)\) for \(T_2\)

Fig. 3
figure 3

\(f(y_1, z_2)\) for \(T_1\)

Fig. 4
figure 4

\(f(y_1,z_2)\) for \(T_2\)

Fig. 5
figure 5

\(f(y_2,R_{13})\) for \(T_1\)

Fig. 6
figure 6

\(f(y_2,R_{13})\) for \(T_2\)

Fig. 7
figure 7

\(f(y_2,S_{12})\) for \(T_1\)

2.4 Exclusion boxes for constraint satisfaction problems

Now we explain in detail how to use Theorem 1 for finding exclusion boxes for the CSP (1). If we apply a solver for linearly constrained nonsmooth optimization (Note: The certificate f from (15) is not everywhere differentiable due to Remark 1) to

(27)

with a fixed \(r\in [0,{\overline{x}}-{\underline{x}}]\)—although the convergence theory of many solvers (cf., e.g., the second order bundle algorithm by Fendl and Schichl [8, p. 7, 3.1 Theoretical basics]) requires that all occurring functions are defined on the whole \({\mathbb {R}}^{N}\), which might be violated for certain choices of T (cf. Example 2)—and if there occurs a function value smaller than zero (during the optimization process), then there is no feasible point in [uv] according to Theorem 1 and consequently we can reduce the box \(\varvec{x}\) to the set \(\varvec{x}\setminus [u,v]\) in the CSP (1).

If \([u,v]=\varvec{x}\) (i.e. u and v are fixed and therefore no variables), then we can reduce the box \(\varvec{x}\) to the empty set, i.e. the reduction of a box to the empty set is equivalent to removing the box.

The constant r determines the size of the box [uv], which should be excluded: The closer r is to \(0\), the smaller the box [uv] can become (if \(r=0\), [uv] can become thin, what we want to prevent, since we want to remove a preferably large box [uv] out of \(\varvec{x}\), as then the remaining set \(\varvec{x}\setminus [u,v]\) is preferably small).

If during the optimization a point \(z\in \varvec{x}\) is found with \(F(z)\in \varvec{F}\), then we have found a feasible point and therefore we can stop the optimization, since then we cannot verify infeasibility for the box \(\varvec{x}\).

Remark 3

If \(\varvec{y}\subseteq \varvec{x}\) and we remove \(\varvec{y}\) from \(\varvec{x}\), then the remaining set \(\varvec{x}\setminus \varvec{y}\) is not closed. Nevertheless, if we just remove \(\varvec{y}^{\circ }\subset \varvec{y}\), then the remaining set \(\varvec{x}\setminus \varvec{y}^{\circ }\supset \varvec{x}\setminus \varvec{y}\) is closed (i.e. we remove a smaller box and therefore the remaining set is a bit larger, since it contains the boundary of \(\varvec{y}\)). Furthermore, the set \(\varvec{x}\setminus \varvec{y}\) can be represented as a union of at most 2n n-dimensional boxes, i.e. in particular the number of boxes obtained by this splitting process is linear in n.

We make the assumption that the certificate of infeasibility from (15) of the box \([{\hat{u}},{\hat{v}}]\) satisfies \( f({\hat{y}},{\hat{z}},{\hat{w}},{\hat{u}},{\hat{v}})=:{\hat{\delta }}<0 \), i.e. \([{\hat{u}},{\hat{v}}]\) is an exclusion box according to Theorem 1. For \(\delta \in [{\hat{\delta }},0)\) and a box \(\varvec{x}\) with \([{\hat{u}},{\hat{v}}]\subseteq \varvec{x}\), we can try to apply a solver for nonlinearly constrained nonsmooth optimization to

(28)

to enlarge the exclusion box [uv] in \(\varvec{x}\), where \(b:{\mathbb {R}}^{N}\longrightarrow {\mathbb {R}}\) is a measure for the box [uv] in the following sense: If \(b:{\mathbb {R}}^{N}\longrightarrow {\mathbb {R}}_{\le 0}\), then the following conditions must hold: If [uv] is small, then b(., uv) is close to 0 and if [uv] is large, then b(., uv) is negative and large. This means: The larger the box [uv] is, the more negative b(., uv) must be. For examples of this type of box measure cf. (30). Alternatively, if \(b:{\mathbb {R}}^{N}\longrightarrow {\mathbb {R}}_{\ge 0}\), then the following condition must hold: If \([u,v]\subseteq \varvec{x}\) is close to \(\varvec{x}\), then b(., uv) is close to 0. For examples of this type of box measure cf. (31).

Remark 4

In opposite to (27), where the linear constraint \(u+r\le v\) occurs, we use in (28) the bound constraints \(u\le {\hat{u}}\) and \({\hat{v}}\le v\).

Furthermore, we make the following two observations for the global optimization problem

(29)

where \(F_{\mathrm {obj}}:{\mathbb {R}}^n\longrightarrow {\mathbb {R}}\): First of all, the certificate f from (15) can be used for finding exclusion boxes in the global optimization problem (29) with an arbitrary objective function \(F_{\mathrm {obj}}\), since the certificate f only depends on the constraint data F, \(\varvec{F}\) and \(\varvec{x}\) [cf. the CSP (1)] and since a solution of an optimization problem is necessarily feasible. Secondly, we denote the current lowest known function value of the optimization problem (29) by \(F_{\mathrm {obj}}^{\mathrm {cur}}\). Now, if we can find a box \([u,v]\subseteq {x}\) for which the certificate f from (15) with \(\varvec{F}_1:=[-\infty ,F_{\mathrm {obj}}^{\mathrm {cur}}]\) has a negative value, then Theorem 1 implies that for all \(x\in [u,v]\) there exists \(k\in \lbrace 1,\dots ,m\rbrace \) with \(F_k(x)\not \in \varvec{F}_k\), which is equivalent that for all \(x\in [u,v]\) we have \(F_{\mathrm {obj}}^{\mathrm {cur}}<F_1(x)\) or there exists \(i\in \lbrace 2,\dots ,m\rbrace \) with \(F_i(x)\not \in \varvec{F}_i\), i.e. any point in the box [uv] has an objective function value which is higher than the current lowest known function value \(F_{\mathrm {obj}}^{\mathrm {cur}}\) or is infeasible. Consequently, the box [uv] cannot contain a feasible point with function value lower equal \(F_{\mathrm {obj}}^{\mathrm {cur}}\), and hence the box [uv] cannot contain a global minimizer of the global optimization problem (29). Therefore we can exclude the box [uv] from further consideration.

Example 4

For measuring the box [uv] with \(b:{\mathbb {R}}^{N}\longrightarrow {\mathbb {R}}_{\le 0}\), we can use any negative p-norm with \(p\in [1,\infty ]\) as well as variants of them

$$\begin{aligned} b_-^1(y,z,w,u,v) := -|v-u|_{_1} ~, ~ b_-^2(y,z,w,u,v) := -\tfrac{1}{2}|v-u|_{_2}^2 ~, \nonumber \\ b_-^{\infty }(y,z,w,u,v) := -|v-u|_{_{\infty }} ~. \end{aligned}$$
(30)

For measuring the box [uv] with \(b:{\mathbb {R}}^{N}\longrightarrow {\mathbb {R}}_{\ge 0}\), we can use any p-norm with \(p\in [1,\infty ]\) as well as variants of them

$$\begin{aligned} b_+^1(y,z,w,u,v) := \left|\left( {\begin{matrix} u-{\underline{x}}\\ v-{\overline{x}} \end{matrix}} \right) \right|_{_1} ~, ~ b_+^2(y,z,w,u,v) := \tfrac{1}{2} \left|\left( {\begin{matrix} u-{\underline{x}}\\ v-{\overline{x}} \end{matrix}} \right) \right|_{_2}^2 ~, \nonumber \\ b_+^{\infty }(y,z,w,u,v) := \left|\left( {\begin{matrix} u-{\underline{x}}\\ v-{\overline{x}} \end{matrix}} \right) \right|_{_{\infty }} ~. \end{aligned}$$
(31)

The function \(b_-\) is concave, while \(b_+\) is convex, \(b_-\) can be used for unbounded \(\varvec{x}\), while this is not possible for \(b_+\). The function \(b^2\) is smooth, while \(b^1\) and \(b^{\infty }\) are not differentiable. The function \(b^1\) has an equal growing rate for all components, the growing rate of \(b^2\) depends on \(\mathrm {sgn}\big (\tfrac{1}{2}|.|_{_2}^2-1\big )\), and \(b^{\infty }\) already grows, if the absolute value of the largest components grows.

Fig. 8
figure 8

\(f(y_2,S_{12})\) for \(T_2\)

Example 5

Choose \(n=1\), \(m=1\), \(w=0\), \(c_1=1\), \(C_1=\tfrac{1}{2}\), which yields \(F(x)=\tfrac{1}{2}x^2+x\) due to (18), as well as \(\varvec{x}=[-1,2]\) and consider two CSPs (1) with \(\varvec{F}^1=[-2, 1]\) resp. \(\varvec{F}^2=[-2,-1]\) which yield the following two graphics from which we can see that the CSP has feasible points for \(\varvec{F}^1\), while it is infeasible for \(\varvec{F}^2\). The corresponding certificates f from (15), where we only consider the variables \(y\in {\mathbb {R}}\) and \(z\in \varvec{x}\) as well as different T from Example 2, and where we denote the function value of a local minimizer of the optimization problem (27) by \({\hat{f}}\), can be illustrated in Figs. 914.

Fig. 9
figure 9

F for \(\varvec{F}^1\)

We see from Figs. 13 and 14 that the certificate f is not defined for \(y=0\) due to the definition of \(T_2\) in Example 2.

3 Starting point

We implemented the suggestions from Sect. 2.4 in GloptLab by Domes [5], which is a configurable MATLAB framework for computing the global solution of a quadratic CSP (1), i.e. with \(F_k\) from (18). The matrices \(C_k\in {\mathbb {R}}^{n\times n}\) are lower triangular in GloptLab

$$\begin{aligned} C_k\in {\mathbb {R}}_{\mathrm {tril}}^{n\times n} := \lbrace A\in {\mathbb {R}}^{n\times n}:A_{ij}=0\text { for }i<j \rbrace ~. \end{aligned}$$
(32)

For running GloptLab the MATLAB toolbox INTLAB by Rump [30], lp_solve by Berkelaar et al. [3], SEDUMI by Pólik [29] Sturm [37], as well as SDPT3 by [39] were installed for using all features of GloptLab.

So the last issue that remains to be discussed is, how to find a point \((y,z,w,u,v)\) being feasible for the linearly constrained optimization problem (27) with \(f(y,z,w,u,v)<0\) quickly. For this we need a good starting point \((y^0,z^0,w^0,u^0,v^0)\) and therefore we must take the following observations into account: \(y^0\) and \(z^0\) should be chosen so, that \(Y(y^0,z^0)\) is positive, and \((y^0,z^0,w^0,u^0,v^0)\) should be chosen so, that the term \(Z(y^0,z^0,w^0,u^0,v^0)\), which is non-negative due to (13), is near zero. These facts lead to the following suggestions for choosing a starting point \((y^0,z^0,w^0,u^0,v^0)\): First of all, if the solver in use can only handle strictly feasible bound/linear constraints (e.g., the second order bundle algorithm by Fendl and Schichl [9] with using socp by Lobo et al. [20] for computing the search direction), then the initial choices of \(u^0\) and \(v^0\) must satisfy \(u^0,v^0\in ({\underline{x}},{\overline{x}})\) and \(u^0+r<v^0\), e.g., \(u^0:=(1-t_0){\underline{x}}+t_0({\overline{x}}-r)\) and \(v^0:=(1-t_1){\underline{x}}+t_1({\overline{x}}-r)\) for some fixed \(t_0,t_1\in (0,1)\) with \(t_0<t_1\). Otherwise (e.g., SolvOpt by Kappel and Kuntsevich [16]) or the second order bundle algorithm with using MOSEK for computing the search direction) we take the endpoints of \(\varvec{x}\) for \(u^0\) and \(v^0\). Secondly, the natural choice for the starting value of \(z\in [u,v]\subseteq \varvec{x}\) is the midpoint \( z^0:=\tfrac{1}{2}(u^0+v^0) \) of the box \([u^0,v^0]\). Thirdly, to get the term \(\max {\big (0,Y(y,z)\big )}\) in the certificate f from (15) as large as possible, we make the following choices: For the case \({\underline{F}}_k=-\infty \) resp. the case \({\overline{F}}_k=\infty \) resp. the case that both \({\underline{F}}_k\) and \({\overline{F}}_k\) are finite, we choose

(33)

respectively due to (16) and (17). Finally, for the choices of R and S we refer to Proposition 6.

Fig. 10
figure 10

F for \(\varvec{F}^2\)

Fig. 11
figure 11

f for \((\varvec{F}^1\), \(T_1)\) \(\Longrightarrow \) \({\hat{f}}=0\)

Remark 5

If we choose \(T=T_2\) (cf. Example 2), then \(T(y,z,w,u,v)=0~\Longleftrightarrow ~y=0\).

Therefore, if \(y^0=0\) occurs as starting point, then we have a feasible point \(F(z)\in \varvec{F}\) due to (33). Furthermore, we can expect that no solver should have difficulties with this choice of T because of the small size of the zero set of T due to Example 2.

In the following we will make use of the MATLAB operators \(\text {diag}\), \(\text {tril}\) and \(\text {triu}\).

Fig. 12
figure 12

f for \((\varvec{F}^2,T_1)\) \(\Longrightarrow \) \({\hat{f}}=-1\)

Fig. 13
figure 13

f for \((\varvec{F}^1,T_2)\) \(\Longrightarrow \) \({\hat{f}}=0\)

Fig. 14
figure 14

f for \((\varvec{F}^2,T_2)\) \(\Longrightarrow \) \({\hat{f}}=-\tfrac{1}{2} ~. \)

Proposition 6

Let \(F_k\) be quadratic and let (32) be satisfied. Choose any \(y\in {\mathbb {R}}^m\) and consider the modified Cholesky factorization

$$\begin{aligned} {\hat{A}}={\hat{R}}^T{\hat{R}}-D \end{aligned}$$
(34)

of \({\hat{A}}\) (with \({\hat{R}}\in {\mathbb {R}}_{\mathrm {triu}}^{n\times n}\) and the non-negative diagonal matrix \(D\in {\mathbb {R}}^{n\times n}\)), where

$$\begin{aligned} {\hat{A}} := C(y)+S^T-S \in {\mathbb {R}}_{\mathrm {sym}}^{n\times n} ~, \quad S := -\tfrac{1}{2}\text {triu}\big (C(y)^T,1\big )\in {\mathbb {R}}_{\mathrm {triu}}^{n\times n} \end{aligned}$$
(35)

and \({\mathbb {R}}_{\mathrm {triu}}^{n\times n}\times {\mathcal {R}}_0(n)\) denotes the space of all symmetric \(n\times n\)-matrices. Then

$$\begin{aligned} {\hat{A}} = C(y)-\tfrac{1}{2}\text {tril}\big (C(y),-1\big )+\tfrac{1}{2}\text {triu}\big (C(y)^T,1\big ) ~, \quad \text {diag}({\hat{A}}) = \text {diag}\big (C(y)\big ) ~. \end{aligned}$$
(36)

Furthermore, if we set

$$\begin{aligned} R:=D^{\frac{1}{2}} ~, \end{aligned}$$
(37)

then \( A(y,R,S)={\hat{R}}^T{\hat{R}}\in {\mathbb {R}}_{\mathrm {sym}}^{n\times n} \).

Proof

Since \(F_k\) is quadratic by assumption, the statements of Proposition 3 hold. Since C(y) is lower triangular due to (32) and (20), we obtain \(S\in {\mathcal {R}}_0(n)\) and

$$\begin{aligned} {\hat{A}} = C(y)-\tfrac{1}{2}\Big (\text {triu}\big (C(y)^T,1\big )\Big )^T+\tfrac{1}{2}\text {triu}\big (C(y)^T,1\big ) \end{aligned}$$
(38)

due to (35). Now, (38) implies (36). We calculate

$$\begin{aligned} C(y)^T-\tfrac{1}{2}\text {triu}\big (C(y)^T,1\big ) = \text {diag}\big (C(y)\big ) +\tfrac{1}{2}\text {triu}\big (C(y)^T,1\big ) ~. \end{aligned}$$
(39)

Because of

$$\begin{aligned} \tfrac{1}{2} \big ( \text {triu}\big (C(y)^T,1\big ) \big )^T = \text {tril}\big (C(y),-1\big ) -\tfrac{1}{2} \big ( \text {triu}\big (C(y)^T,1\big ) \big )^T \end{aligned}$$

we have

$$\begin{aligned} \text {diag}\big (C(y)\big ) +\tfrac{1}{2} \big ( \text {triu}\big (C(y)^T,1\big ) \big )^T = C(y) -\tfrac{1}{2} \big ( \text {triu}\big (C(y)^T,1\big ) \big )^T . \end{aligned}$$

Therefore, combining (38) and (39) yields \( {\hat{A}}^T = {\hat{A}} \), i.e. \({\hat{A}}\in {\mathbb {R}}_{\mathrm {sym}}^{n\times n}\). Consequently there exists a modified Cholesky factorization of \({\hat{A}}\) of the form (34). Hence, we can choose R according to (37) and evaluating A at (yRS) with R from (37) and S from (35) yields \(A(y,R,S)={\hat{R}}^T{\hat{R}}\) due to (20), (37) and (34). \(\square \)

Remark 6

If \({\hat{A}}\) is positive semidefinite, then \(D=0\) due to (34). If C(y) is a diagonal matrix, then \(S=0\) due to (35). Due to (36), we can construct \({\hat{A}}\) by setting \({\hat{A}}\) equal to C(y), then multiplying the lower triangular part of \({\hat{A}}\) by \(\tfrac{1}{2}\) and finally copying the resulting lower triangular part of \({\hat{A}}\) to the upper triangular part of \({\hat{A}}\).

Now we combine the facts that we presented in this subsection to the Algorithm 1, which we will use for creating a starting point for the optimization problem (27)

with quadratic F.

Algorithm 1

if the solver can only handle strictly feasible

bound/linear constraints

figure a

Remark 7

Infeasible constrained solvers (e.g., SolvOpt by Kappel and Kuntsevich [16]) can be applied directly to the nonlinearly constrained optimization problems (28). In this case the starting point created by Algorithm 1 can be used at once without solving optimization problem (27) first as it is necessary for the second order bundle algorithm by Fendl and Schichl [9]. Therefore, the bound constraints \(u\le \hat{u}\), \(\hat{v}\le v\) of optimization problem (28) do not occur in this situation. Nevertheless, it is useful in this case to add the linear constraint \(u+r\le v\) (with a fixed \(r>0\)) from optimization problem (27) to the constrained problem for preventing the box [uv] from becoming too small.

4 Numerical results

In the following section we compare the numerical results of the second order bundle algorithm by Fendl and Schichl [9], MPBNGC by Mäkelä [23] and SolvOpt by Kappel and Kuntsevich [16] for some examples that arise in the context of finding exclusion boxes for a quadratic CSP in GloptLab by Domes [5].

4.1 Introduction

We will make tests for

  • (the reduced version of) the second order bundle algorithm for nonsmooth, nonconvex optimization problems with inequality constraints by Fendl and Schichl [9] (with optimality tolerance \(\varepsilon :=10^{-5}\) and with MOSEK by Andersen et al. [2]) as QCQP-solver for determining the search direction), where we refer to the linearly constrained version as “BNLC” and to the nonlinearly constrained version as “BNLCn”. It is an extension of the bundle–Newton method for nonsmooth, nonconvex unconstrained minimization by Lukšan and Vlček [21, 22] to the nonlinearly constrained problems.

  • MPBNGC by Mäkelä [23] (with standard termination criterions; since MPBNGC turned out to be very fast with respect to pure solving time for the low dimensional examples in the case of successful termination with a stationary point, the number of iterations and function evaluations was chosen in a way that in the other case the solving times of the different algorithms have approximately at least the same magnitude).

  • SolvOpt by Kappel and Kuntsevich [16] (with the standard termination criterions, which are described in Kuntsevich and Kappel [19])

(we choose MPBNGC and SolvOpt for our comparisons, since both are written in a compiled programming language, both are publicly available, and both support nonconvex constraints) on the following examples:

  • We give results for the linearly constrained optimization problem (27) with a fixed box (i.e. without optimizing u and v) for dimensions between 4 and 11 in Sect. 4.3.

  • We give results for the linearly constrained optimization problem (27) with a variable box (i.e. with optimizing u and v) for dimensions between 8 and 21 in Sect. 4.4.

  • We give results for the nonlinearly constrained optimization problem (28) for dimension 8 in Sect. 4.5, where we use \( b_+^1(y,z,R,S,u,v) := \left|\left( {\begin{matrix} u-{\underline{x}}\\ v-{\overline{x}} \end{matrix}} \right) \right|_{_1} \) as the objective function.

The underlying data for these nonsmooth optimization problems was extracted from real CSPs that occur in GloptLab by Domes [5]. Apart from u and v, we will concentrate on the optimization of the variables y and z due to the large number of tested examples (cf. Sect. 4.2), and since the additional optimization of R and S did not have much impact on the quality of the results which was discovered in additional empirical observations, where a detailed analysis of these observations goes beyond the scope of this paper. Furthermore, we will make our tests for the two different choices of the function T from Example 2, which occurs in the denominator of the certificate f from (15), where for the latter one f is only defined outside of the zero set of T which has measure zero.

The complete tables corresponding to the results, which we will discuss in this section, can be found in Fendl and Schichl [9] and Fendl et al. [7, Appendix A].

All test examples will be sorted with respect to the problem dimension (beginning with the smallest). Furthermore, we use analytic derivative information for all occurring functions (Note: Implementing analytic derivative information for the certificate from (15) effectively, is a nontrivial task) and we performed all tests on an Intel Pentium IV with 3 GHz and 1 GB RAM running Microsoft Windows XP.

We introduce the following notation for the record of the solution process of an algorithm. In the following let fgG be the function value, (sub)gradient, and (sub)Hessian of the objective function, and \(F,{{\hat{g}}},{{\hat{G}}}\) the function value, (sub)gradient, and (sub)Hessian of the constraint.

Notation 3

We denote the number of performed iterations by Nit, we denote the final number of evaluations of function dependent data by

where we denote the duration of the solution process by

$$\begin{aligned} T_{1} := \text {``Time in milliseconds''} \end{aligned}$$

For comparing the cost of evaluating function dependent data (like e.g., function values, subgradients, ...) in a preferably fair way (especially for solvers that use different function dependent data), we will make use of the following realistic “credit point system” that an optimal implementation of algorithmic differentiation in backward mode suggests (cf. Griewank and Corliss [12] and Schichl [3335]).

Definition 2

Let \(f_A\), \(g_A\) and \(G_A\) resp. \(F_A\), \({\hat{g}}_A\) and \({\hat{G}}_A\) be the number of function values, subgradients and (substitutes of) Hessians of the objective function resp. the constraint that an algorithm A used for solving a nonsmooth optimization problem which may have linear constraints and at most one single nonsmooth nonlinear constraint. Then we define the cost of these evaluations by

$$\begin{aligned} c(A):=f_A+3g_A+3n\cdot G_A+\text {nlc}\cdot (F_A+3{\hat{g}}_A+3n\cdot {\hat{G}}_A) ~, \end{aligned}$$
(40)

where \(\text {nlc}=1\) if the optimization problem has a nonsmooth nonlinear constraint, and \(\text {nlc}=0\) otherwise.

Since the the second order bundle algorithm evaluates f, g, G and F, \({\hat{g}}\), \({\hat{G}}\) at every call that computes function dependent data (cf. Fendl and Schichl [9]), we obtain

$$\begin{aligned} c(\text {Red Alg}) = (1+\text {nlc})\cdot \text {Na}\cdot (1+3+3N) ~. \end{aligned}$$

Since MPBNGC evaluates f, g and F, \({\hat{g}}\) at every call that computes function dependent data (cf. [23]), the only difference to the second order bundle algorithm with respect to c from (40) is that MPBNGC uses no information of Hessians and hence we obtain

$$\begin{aligned} c(\text {MPBNGC}) = (1+\text {nlc})\cdot \text {Nb}\cdot (1+3) ~. \end{aligned}$$

Since SolvOpt evaluates f and F at every call that computes function dependent data and only sometimes g or \({\hat{g}}\) (cf. [19]), we obtain

$$\begin{aligned} c(\text {SolvOpt}) = (1+\text {nlc})\cdot \text {Nc}+3(\text {Ng}+\text {nlc}\cdot \text {N}{\hat{g}}) ~. \end{aligned}$$

We will visualize the performance of two algorithms A and B in this section by the following record-plot: In this plot the abscissa is labeled by the name of the test example and the value of the ordinate is given by \( \text {rp}(c):=c(B)-c(A) \) (i.e. if \(\text {rp}(c)>0\), then \(\text {rp}(c)\) tells us how much better algorithm A is than algorithm B with respect to c for the considered example in absolute numbers; if \(\text {rp}(c)<0\), then \(\text {rp}(c)\) quantifies the advantage of algorithm B in comparison to algorithm A; if \(\text {rp}(c)=0\), then both algorithms are equally good with respect to c). The scaling of the plots is chosen in a way that plots that contain the same test examples are comparable (although the plots may have been generated by results from different algorithms).

4.2 Overview of the results

We compare the total time \(t_1\) of the solution process in Table 1.

Table 1 Comparison of solution times for BNLC, MPBNGC, and SolvOpt

For the linearly constrained problems MPBNGC was the fastest of the tested algorithms, followed by BNLC and SolvOpt. If we consider only those nonlinearly constrained examples for which MPBNGC was able to terminate successfully, MPBNGC was the fastest algorithm again. Considering the competitors, for the nonlinearly constrained problems with \(T=1\) BNLCn is 13.3 seconds resp. 11.3 seconds faster than SolvOpt, while for the nonlinearly constrained problems with \(T=|y|_{_2}\) SolvOpt is 7.1 seconds resp. 5.4 seconds faster than BNLCn.

Examining the performance of BNLCn closer yields the observation that at least 85% of the time is consumed by solving the QP (in the linearly constrained case) resp. at least 80% of the time is consumed by solving the QCQP (in the nonlinearly constrained case), which implies that the difference in the percentage between the QP and the QCQP is small in particular (an investigation of the behavior of the solving time for higher dimensional problems can be found in Fendl and Schichl [9]).

Therefore, we will concentrate in Sects. 4.3, 4.4 and 4.5 on the comparison of qualitative aspects between the second order bundle algorithm Komma MPBNGC and SolvOpt (like, e.g., the cost c of the evaluations), where before making these detailed comparisons, we give a short overview of them as a reason of clarity of the presentation: In both cases \(T=1\) (solid line) and \(T=|y|_{_2}\) (dashed line), where we use the two different line types for a better distinction in the following, we tested 128 linearly constrained examples with a fixed box, 117 linearly constrained examples with a variable box and 201 nonlinearly constrained examples, which yields the following two summary tables consisting of the number of examples for which the second order bundle algorithm(BNLC resp. BNLCn) is better than MPBNGC resp. SolvOpt (and vice versa) with respect to the cost c of the evaluations, see Tables 2 and 3 that are visualized in Figs. 15, 16, and 17 and that let us draw the following conclusions:

Table 2 Comparison of BNLC and MPBNGC with respect to evaluation cost
Table 3 Comparison of BNLC and SolvOpt with respect to evaluation cost

The performance differences between BNLC and MPBNGC can be neglected for the largest part of the linearly constrained examples (with small advantages for MPBNGC in about ten percent of these examples). For the nonlinearly constrained examples BNLCn is superior to MPBNGC in one quarter of the examples, for forty percent of the examples one of these two solvers has small advantages over the other (in most cases MPBNGC is the slightly more successful one), the performance differences between the two algorithms considered can be completely neglected for fifteen percent of the examples, and for further fifteen percent of the examples MPBNGC beats BNLCn clearly.

Fig. 15
figure 15

Linearly constrained with fixed box (summary)

Fig. 16
figure 16

Linearly constrained with variable box (summary)

Fig. 17
figure 17

Nonlinearly constrained (summary)

For the linearly constrained examples BNLC is superior to SolvOpt in one third of the examples, for one quarter of the examples one of these two solvers has small advantages over the other (in nearly all cases BNLC is the slightly more successful one), the performance differences between the two algorithms considered can be completely neglected for forty percent of the examples, and in only one percent of the examples SolvOpt beats BNLCn clearly. For the nonlinearly constrained examples BNLCs is superior to SolvOpt in one third of the examples, for 45 percent of the examples one of these two solvers has small advantages over the other (BNLCn is often the slightly more successful one), the performance differences between the considered two algorithms can be completely neglected for ten percent of the examples, and in the remaining ten percent of the examples SolvOpt beats BNLCn clearly.

In contrast to the linearly constrained case, in which all three solvers terminated successfully for all examples, only BNLCn and SolvOpt were able to attain this goal in the nonlinearly constrained case, too.

4.3 Linearly constrained case (fixed box)

We took 310 examples from real CSPs that occur in GloptLab. We observe that for 79 examples the starting point is feasible for the CSP and for 103 examples the evaluation of the certificate at the starting point identifies the box as infeasible and hence there remain 128 test problems.

BNLC versus MPBNGC In the case \(T=1\) we conclude from Fendl et al. [7, Figure 18] that BNLC is significantly better in 1 example and a bit better in 2 examples in comparison with MPBNGC, while MPBNGC is significantly better in 2 examples, better in 5 examples and a bit better in 12 examples in comparison with BNLC. In the 106 remaining examples the costs of BNLC and MPBNGC are practically the same.

In the case \(T=|y|_{_2}\) it follows from Fendl et al. [7, Figure 19] that MPBNGC is significantly better in 2 examples, better in 5 examples and a bit better in 30 examples in comparison with BNLC. In the 91 remaining examples the costs of BNLC and MPBNGC are practically the same.

BNLC vs. SolvOpt In the case \(T=1\) we conclude from Fendl et al. [7, Figure 20] that BNLC is significantly better in 25 examples, better in 13 examples and a bit better in 25 examples in comparison with SolvOpt, while SolvOpt is significantly better in 1 example and better in 3 examples in comparison with BNLC. In the 61 remaining examples the costs of BNLC and SolvOpt are practically the same.

In the case \(T=|y|_{_2}\) it follows from Fendl et al. [7, Figure 21] that BNLC is significantly better in 9 examples, better in 49 examples and a bit better in 34 examples in comparison with SolvOpt, while SolvOpt is significantly better in 1 example, better in 2 examples and a bit better in 1 example in comparison with BNLC. In the 32 remaining examples the costs of BNLC and SolvOpt are practically the same.

4.4 Linearly constrained case (variable box)

We observe that for 80 examples the starting point is feasible for the CSP and for 113 examples the evaluation of the certificate at the starting point identifies the boxes as infeasible and hence there remain 117 test problems of the 310 original examples from GloptLab.

BNLC vs. MPBNGC In the case \(T=1\) we conclude from Fendl et al. [7, Figure 22] that MPBNGC is a bit better in 1 example in comparison with BNLC. In the 116 remaining examples the costs of BNLC and MPBNGC are practically the same.

In the case \(T=|y|_{_2}\) it follows from Fendl et al. [7, Figure 23] that MPBNGC is a bit better in 5 examples in comparison with BNLC. In the 112 remaining examples the costs of BNLC and MPBNGC are practically the same.

BNLC vs. SolvOpt In the case \(T=1\) we conclude from Fendl et al. [7, Figure 24] that BNLC is significantly better in 8 examples, better in 24 examples and a bit better in 37 examples in comparison with SolvOpt. In the 48 remaining examples the costs of BNLC and SolvOpt are practically the same.

In the case \(T=|y|_{_2}\) it follows from Fendl et al. [7, Figure 25] that BNLC is significantly better in 20 examples, better in 19 examples and a bit better in 32 examples in comparison with SolvOpt, while SolvOpt is a bit better in 5 examples (21, 101, 102, 128, 189) in comparison with BNLC. In the 41 remaining examples the costs of BNLC and SolvOpt are practically the same.

4.5 Nonlinearly constrained case

Since we were not able to find a starting point, i.e. an infeasible sub-box, for 109 examples, we exclude them from the following tests for which there remain 201 examples of the 310 original examples from GloptLab.

BNLCn vs. MPBNGC In the case \(T=1\) MPBNGC does not satisfy any of its termination criterions for 32 examples within the given number of iterations and function evaluations. For the remaining 169 examples we conclude from Fendl et al. [7, Figure 26] that BNLCn is significantly better in 3 examples, better in 2 examples and a bit better in 10 examples in comparison with MPBNGC, while MPBNGC is significantly better in 6 examples, better in 28 examples and a bit better in 89 examples in comparison with BNLCn, and in 31 examples the costs of the reduced algorithm and MPBNGC are practically the same.

In the case \(T=|y|_{_2}\) MPBNGC does not satisfy any of its termination criterions for 43 examples within the given number of iterations and function evaluations. For the remaining 158 examples it follows from Fendl et al. [7, Figure 27] that BNLCn is significantly better in 8 examples, better in 14 examples and a bit better in 15 examples in comparison with MPBNGC, while MPBNGC is significantly better in 4 examples, better in 28 examples and a bit better in 59 examples in comparison with BNLCn, and in 30 examples the costs of BNLCn and MPBNGC are practically the same.

BNLCn vs. SolvOpt In the case \(T=1\) we conclude from Fendl et al. [7, Figure 28] that BNLCn is significantly better in 50 examples, better in 20 examples and a bit better in 76 examples in comparison with SolvOpt, while SolvOpt is better in 14 examples and a bit better in 20 examples in comparison with BNLCn. In the 21 remaining examples the costs of BNLCn and SolvOpt are practically the same.

In the case \(T=|y|_{_2}\) it follows from Fendl et al. [7, Figure 29] that BNLCn is significantly better in 12 examples, better in 45 examples and a bit better in 61 examples in comparison with SolvOpt, while SolvOpt is significantly better in 2 examples, better in 24 examples and a bit better in 26 examples in comparison with BNLCn. In the 31 remaining examples the costs of BNLCn and SolvOpt are practically the same.

5 Conclusion

In this paper we presented a nonsmooth function that can be used as a certificate of infeasibility that allows the identification of exclusion boxes during the solution process of a CSP by techniques from nonsmooth optimization: While we can find an exclusion box by solving a linearly constrained nonsmooth optimization problem, the enlargement of an exclusion box can be achieved by solving a nonlinearly constrained nonsmooth optimization problem. Furthermore, we discussed important properties of the certificate as the reduction of scalability and we suggested a method to obtain a good starting point for the nonsmooth optimization problems.

During testing we showed that the certificates of infeasibility can be found by standard nonsmooth solvers like MPBNGC or SolvOpt, as well as by the new solver BLNCn. By an optimization step, the size of the corresponding exclusion boxes can be maximized. For the linearly constrained problems MPBNGC is the best choice. However, for the nonlinearly constrained problems BLNCn is superior to the other solvers, and detailed investigation in Fendl and Schichl [8] have shown that the QCQP overhead reduces for higher dimensional problems, while the stability increases.

Altogether the new exclusion boxes based on nonsmooth optimization are valuable information for deterministic solvers of CSPs and global optimization problems, and they can be efficiently computed.