Skip to main content

Advertisement

Log in

Derivative-free mixed binary necklace optimization for cyclic-symmetry optimal design problems

  • Research Article
  • Published:
Optimization and Engineering Aims and scope Submit manuscript

Abstract

This paper presents an adapted trust-region method for computationally expensive black-box optimization problems with mixed binary variables that involve a cyclic symmetry property. Mixed binary problems occur in several practical optimal design problems, e.g., aircraft engine turbines, mooring lines of offshore wind turbines, electric engine stators and rotors. The motivating application for this study is the optimal design of helicopter bladed disk turbomachines. The necklace concept is introduced to deal with the cyclic symmetry property, and to avoid costly black-box objective-function evaluations at equivalent solutions. An adapted distance is proposed for the discrete-space exploration step of the optimization method. A convergence analysis is presented for the trust-region derivative-free algorithm, DFOb-\(d_H\), extended to the mixed-binary case and based on the Hamming distance. The convergence proof is extended to the new algorithm, DFOb-\(d_{neck}\), which is based on the necklace distance. Computational comparison with state-of-the-art black-box optimization methods is performed on a set of analytical problems and on a simplified industrial application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The number of positive integers between 1 and n that are relatively prime to n.

References

Download references

Acknowledgements

We thank the anonymous referees for their valuable remarks, and Safran Tech and IFP Energies Nouvelles for funding the Ph.D. position of the first author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Delphine Sinoquet.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Proof of Lemma 2

Lemma 2

Let \((x_0,y_0)\) be the initial iterate. Under Assumptions 1and2, the model \({\widetilde{m}}(\cdot ,y_0)\) which is constructed from \({\widetilde{m}}(x,y)\) by fixing \(y = y_0\) is fully linear in \(B_{y_0}(x_0,\varDelta _x)\). In other words, for all \(x \in B_{y_0}(x_0,\varDelta _x)\), there exist \(\kappa _f^*, \kappa _g^* > 0\) such that:

$$\begin{aligned} |f(x,y_0) - {\widetilde{m}}(x,y_0)| \le \kappa _f^* \varDelta _x^2, \end{aligned}$$
(36)

and

$$\begin{aligned} \Vert \triangledown _x f(x,y_0) -\triangledown _x {\widetilde{m}}(x,y_0)\Vert _2 \le \kappa _g^* \varDelta _x. \end{aligned}$$
(37)

Proof

The model constructed in mixed space is given as:

$$\begin{aligned} {\widetilde{m}}(z) = c + g^T z + \dfrac{1}{2} z^T H z, \end{aligned}$$

where \(z = (x,y), \quad g = (g_x,g_y), \quad H = \begin{pmatrix} H_{xx} &{} H_{xy}\\ H_{yx}&{}H_{yy} \end{pmatrix}\), \(H_{xy} = H_{yx}\), and where \(H_{xx}, H_{yy}\) are symmetric matrices.

Thus, the model with y fixed to \(y_0\) is defined as follows:

$$\begin{aligned} {\widetilde{m}}(x,y_0)&= \bar{c}_x + \bar{g}_x x + \dfrac{1}{2} x^T \bar{H}_{x} x, \end{aligned}$$
(38)

with \(\bar{c}_x =\Big ( c + g_y^T y_0 + \dfrac{1}{2} y_{0}^T H_{yy} y_0 \Big )\), \(\bar{g}_x x = \Big ( g_x^T+H_{xy} y_0 \Big )\) and \(\bar{H}_{x} = H_{xx}\).

The gradient of \({\widetilde{m}}(x,y_0)\) with respect to x is therefore:

$$\begin{aligned} \triangledown _x {\widetilde{m}}(x,y_0) = \bar{g}_x + \bar{H}_{x} x. \end{aligned}$$

To be convenient, let us introduce the following notations: \(f_0(x)=f(x,y_0)\), \({\widetilde{m}}_0(x)= {\widetilde{m}}(x,y_0)\), \(\triangledown f_0(x) = \triangledown _x f(x,y_0)\) , \(\triangledown {\widetilde{m}}_0(x) =\triangledown _x{\widetilde{m}}(x,y_0)\) and \(B_0(\varDelta _x)=B_{y_0}(x_0, \varDelta _x)\).

We define

$$\begin{aligned} err_0^f(x)= & {} f_0(x) - {\widetilde{m}}_0(x),\\ err_0^g(x)= & {} \triangledown f_0(x) -\triangledown {\widetilde{m}}_0(x). \end{aligned}$$

For all \(x^i \in B_0(\varDelta _x)\), we develop

$$\begin{aligned} (x^i - x)^T err_0^g (x)= & {} (x^i - x)^T (\bar{H}_{x} x + \bar{g}_x - \triangledown f_0(x)) \nonumber \\= & {} (x^i - x)^T \bar{H}_{x} x + (x^i - x)^T \bar{g}_x - f_0(x^i) + f_0(x) \nonumber \\&+\, [f_0(x^i) - f_0(x) - (x^i - x)^T \triangledown f_0(x) ] \nonumber \\= & {} m_0(x^i) - m_0(x) - \dfrac{1}{2}(x^i-x)^T \bar{H}_x (x^i - x) - f_0(x^i) + f_0(x) \nonumber \\&+\, [f_0(x^i) - f_0(x) - (x^i - x)^T \triangledown f_0(x) ]\nonumber \\= & {} err_0^f(x^i) - err_0^f(x) - \dfrac{1}{2}(x^i-x)^T \bar{H}_x (x^i - x) \nonumber \\&+\, [f_0(x^i) - f_0(x) - (x^i - x)^T \triangledown f_0(x) ]. \end{aligned}$$
(39)

Since \(f_0\) is continuously differentiable, we have:

$$\begin{aligned}{}[f_0(x^i) - f_0(x) - (x^i - x)^T \triangledown f_0(x) ] = \int _0^1 (x^i - x)^T ( \triangledown f_0(x + t(x^i-x)) -\triangledown f_0(x))dt, \end{aligned}$$

which implies that

$$\begin{aligned} (x^i - x)^T err_0^g (x)= & {} \int _0^1 (x^i - x)^T ( \triangledown f_0(x + t(x^i-x)) -\triangledown f_0(x))dt \nonumber \\&+\, err^f_0(x^i) - err^f_0(x) - \dfrac{1}{2}(x^i-x)^T \bar{H}_x (x^i - x). \end{aligned}$$
(40)

Using (40) with \(x = x_0\), we obtain:

$$\begin{aligned} (x^i - x_0)^T err_0^g (x)= & {} (x^i - x)^T err_0^g (x) - (x_0 - x)^T err_0^g (x) \nonumber \\= & {} \int _0^1 (x^i - x)^T ( \triangledown f_0(x + t(x^i - x)) -\triangledown f_0(x))dt + err^f_0(x^i) \nonumber \\&-\, \dfrac{1}{2}(x^i - x)^T \bar{H}_x (x^i - x) \nonumber \\&-\, \int _0^1 (x_0 - x)^T ( \triangledown f_0(x + t(x_0 - x)) -\triangledown f(x))dt - err^f_0(x_0) \nonumber \\&+\, \dfrac{1}{2} (x_0 - x)^T \bar{H}_x (x_0 - x). \end{aligned}$$
(41)

First, note that \(err_0^f(x_0) = 0\). Then, for each terms of (41), we obtain the following upper bounds:

  • From the Lipschitz property of \(f_0(x)\) (Assumption 1), one has:

    $$\begin{aligned} \big |\int _0^1 (x^i - x)^T ( \triangledown f(x + t(x^i - x)) -\triangledown f(x))dt \big |\le & {} \dfrac{1}{2}\nu \Vert x^i-x\Vert ^2\nonumber \\\le & {} \frac{1}{2} \nu (2\varDelta _x)^2\nonumber \\\le & {} 2 \nu \varDelta _x^2. \end{aligned}$$
    (42)
  • In the same way, we have:

    $$\begin{aligned} \big |\int _0^1 (x_0 - x)^T ( \triangledown f(x + t(x_0 - x)) -\triangledown f(x))dt\big |&\le \dfrac{1}{2}\nu \Vert x_0-x\Vert ^2 \le \dfrac{1}{2} \nu \varDelta _x^2. \end{aligned}$$
    (43)
  • In the following two inequalities, note that \(\Vert \bar{H}_x\Vert _F\) is bounded from Assumption 2:

    $$\begin{aligned} |\dfrac{1}{2}(x^i - x)^T \bar{H}_x (x^i - x)|\le & {} \dfrac{1}{2} \Vert \bar{H}_x\Vert _F \Vert x^i-x\Vert ^2 \nonumber \\\le & {} \dfrac{1}{2} \Vert \bar{H}_x\Vert _F(2 \varDelta _x)^2 \le 2 \Vert \bar{H}_x\Vert _F \varDelta _x^2. \end{aligned}$$
    (44)
    $$\begin{aligned} |\dfrac{1}{2}(x_0 - x)^T \bar{H}_x (x_0 - x)|\le & {} \dfrac{1}{2} \Vert \bar{H}_x\Vert _F \Vert x_0-x\Vert ^2 \le \dfrac{1}{2} \Vert \bar{H}_x\Vert _F \varDelta _x^2. \end{aligned}$$
    (45)
  • There exists \(\epsilon ' > 0\) such that

    $$\begin{aligned} |err_0^f(x^i)| \le \epsilon ' \varDelta _x^2, \end{aligned}$$
    (46)

    which can be shown by contradiction. Indeed, suppose that we have

    $$\begin{aligned} |err_0^f(x^i)|> \epsilon ' \varDelta _x^2 \quad \forall \epsilon ' > 0. \end{aligned}$$
    (47)

    By definition of \(err_0^f\) and from the continuity assumption on \(f_0\) and \({\widetilde{m}}\) on \(B_0(\varDelta _x)\), there exist \(\epsilon _1, \epsilon _2 >0\) such that:

    $$\begin{aligned} |err_0^f(x^i)|= & {} |f_0(x^i) - {\widetilde{m}}_0(y_0)| \nonumber \\= & {} |f_0(x^i) - f_0(x_0)+{\widetilde{m}}_0(x_0)- {\widetilde{m}}_0(x^i)| \nonumber \\\le & {} |f_0(x^i) - f_0(x_0)| + |{\widetilde{m}}_0(x_0)- {\widetilde{m}}_0(x^i)| \nonumber \\\le & {} (\epsilon _1 + \epsilon _2)\varDelta _x. \end{aligned}$$
    (48)

    Thus, setting \(\epsilon ' = \dfrac{\epsilon _1 + \epsilon _2}{\varDelta _{x,min}} \ge \dfrac{\epsilon _1 + \epsilon _2}{\varDelta _x}\) in (47) contradicts (48).

Thus, we find from (41) and the inequalities [(42)–(46)]:

$$\begin{aligned} |(x^i - x_0)^T err_0^g (x)| \le \dfrac{5}{2} \varDelta _x^2 \Big ( \nu + \Vert \bar{H}_x \Vert _F + \epsilon \Big ) , \end{aligned}$$
(49)

with \(\epsilon = \dfrac{2}{5} \epsilon '\).

Using now Cauchy–Schwarz inequality, we obtain:

$$\begin{aligned} \Vert err_0^g (x)\Vert _2 \, \, \le \, \dfrac{5}{2} \varDelta _x \Big ( \nu + \Vert \bar{H}_x \Vert _F + \epsilon \Big ) . \end{aligned}$$
(50)

Consider now the matrix \(X = \dfrac{1}{\varDelta _x} \big [ x^1 - x_0, x^2 - x_0, \ldots , x^p -x_0 \big ]\).

We recall that the interpolation set Z is defined as

$$\begin{aligned} Z = \begin{pmatrix} x_0&{}y_0\\ x^1&{}y^1\\ \vdots &{}\vdots \\ x^p&{}y^p\\ \end{pmatrix}. \end{aligned}$$
(51)

Since Z is poised, Z is full rank, i.e., \({{\,\mathrm{rank}\,}}(S) = \min (p,m+n) = m+n\) based on the fact that \(p>m+n\) (see Sect. 2.1), and the m column vectors \(x_0,x^1,\ldots ,x^p\) are linearly independent. Therefore, \(X^T\) is a non-singular matrix.

We have

$$\begin{aligned} X^T err_0^g(x) = \dfrac{1}{\varDelta _x} \begin{bmatrix} (x^1 - x_0)^T\\ \vdots \\ (x^p - x_0)^T \end{bmatrix} err_0^g(x). \end{aligned}$$
(52)

Then, we obtain from inequality (49):

$$\begin{aligned} \Vert X^T err_0^g(x) \Vert _{\infty }= & {} \dfrac{1}{\varDelta _x}\max _{i=1,\ldots ,p} |(x^i - x_0)^T err_0^g(x)|\nonumber \\\le & {} \dfrac{1}{\varDelta _x} \Bigg [\dfrac{5}{2} \varDelta _x^2 \Big ( \nu + \Vert \bar{H}_x \Vert _F + \epsilon \Big ) \Bigg ]\nonumber \\= & {} \varDelta _x \Bigg [\dfrac{5}{2} \Big ( \nu + \Vert \bar{H}_x \Vert _F + \epsilon \Big ) \Bigg ]. \end{aligned}$$
(53)

Moreover, we have:

$$\begin{aligned} \Vert err_0^g(x)\Vert _2= & {} \Vert X^{-T} X^T err_0^g(x) \Vert _2\nonumber \\\le & {} \Vert X^{-T}\Vert _2 \, \, \Vert X^T err_0^g(x) \Vert _2. \end{aligned}$$
(54)

Thus, we obtain:

$$\begin{aligned} \Vert err_0^g(x)\Vert _2\le & {} \sqrt{m} \Vert X^{-T} \Vert _2 \, \, \Vert X^T err_0^g(x) \Vert _{\infty } \nonumber \\\le & {} \sqrt{m} \Vert X^{-T}\Vert _2 \, \, \varDelta _x \Bigg [\dfrac{5}{2} \Big ( \nu + \Vert \bar{H}_x \Vert _F + \epsilon \Big ) \Bigg ]. \end{aligned}$$
(55)

Recovering \(err_0^f(x)\) by Eq. (40), we have:

$$\begin{aligned} |err_0^f(x)|\le & {} \Vert err_0^g(x)\Vert \varDelta _x + 2\nu \varDelta _x^2 + 2\Vert \bar{H}_x\Vert _F \varDelta _x^2 + |err_0^f(x^i)|\nonumber \\\le & {} \Bigg [ \sqrt{m} \Vert X^{-T}\Vert _2 \dfrac{5}{2} \Big ( \nu + \Vert \bar{H}_x \Vert _F + \epsilon \Big ) + 2(\nu +\Vert \bar{H}_x \Vert _F) + \epsilon ' \Bigg ] \varDelta _x^2 . \end{aligned}$$
(56)

We complete the proof by the definition of the two required constants:

$$\begin{aligned} \kappa _g^* = \dfrac{5}{2} \sqrt{m} \Vert X^{-T}\Vert _2 \Big ( \nu + \Vert \bar{H}_x \Vert _F + \epsilon \Big ), \end{aligned}$$
(57)

and

$$\begin{aligned} \kappa _f^* = \Big ( \nu + \Vert \bar{H}_x \Vert _F) \Big ) \Big (\dfrac{5}{2} \sqrt{m} \Vert X^{-T}\Vert _2 + 2 \Big ) + \dfrac{5}{2}\epsilon \Big ( \sqrt{m} \Vert X^{-T}\Vert _2 + 1 \Big ). \end{aligned}$$
(58)

\(\square\)

Appendix B Proof of Proposition 1

Proposition 1

Let \(\mu >0\) be a given constant, Nn be positive integers, and let \(f:\varOmega \subseteq {\mathbb {R}}^N \rightarrow {\mathbb {R}}\) be a quadratic function, \(g_i: \varOmega \rightarrow {\mathbb {R}}\), \(i = 1, 2, \ldots ,n\), be real-valued functions satisfying \(0 \le g_i (z) \le M\), for all \(z\in \varOmega ,\) for some \(M>0\). Then, the two following optimization problems are equivalent:

$$\begin{aligned}&\left\{ \begin{array}{lll} \displaystyle {\min _{z,t} f(z) + \mu t}\\ \text{ s.t. } t = \min \limits _{ i = 1,2,\ldots ,n} \{g_i (z)\}. \end{array}\right. \qquad \qquad \qquad \qquad \qquad (P_1)\\&\left\{ \begin{array}{lll} \displaystyle {\min _{z,{\tilde{y}},t} f(z) + \mu t }\\ \text{ s.t. } t \ge g_i(z) - M {\tilde{y}}_{i},\quad i = 1,2,\ldots ,n\\ \sum _{i=1}^n {\tilde{y}}_{i} = n -1,\\ {\tilde{y}}_{i} \in \{0,1\}, i = 1,2,\ldots ,n. \end{array}\right. \qquad \qquad \qquad \qquad \qquad (P_2) \end{aligned}$$

Proof

We prove the proposition in two steps:

  • firstly, we show that, (\(P_2\)) is a relaxation of (\(P_1\)) in the sense that if \((\bar{z},\bar{t})\) is a feasible solution of (\(P_1\)), then \((\bar{z},\bar{y},\bar{t})\) is a feasible solution of \(P_2\);

  • secondly we prove that any optimal solution \((z^*,y^*,t^*)\) of (\(P_2\)) is feasible for(\(P_1\)).

Let us consider the first assertion: (\(P_2\)) is a relaxation of (\(P_1\)).

Let \((\bar{z},\bar{t})\) be a feasible solution of (\(P_1\)).

Consider now the point \((\bar{z},\bar{y},\bar{t})\) where, for \(i = 1,2,\ldots ,n\):

$$\begin{aligned} \bar{y}_i := {\left\{ \begin{array}{ll} 0, &{} \hbox {if } i \hbox { is the smallest index such that} \, \bar{t} = \min \limits _{i = 1,2, \ldots ,n} \{g_i (\bar{z})\}, \\ 1,&{} \hbox {otherwise}. \end{array}\right. } \end{aligned}$$
(59)

Let I be the unique index i such that \(\bar{y}_i = 0\).

From the definition of \(\bar{y}\) , we note that:

  • \(\bar{t} = g_I(\bar{z})\),

  • \(\bar{y}_I = 0\),

  • \(\bar{y}_i = 1\) for all \(i \ne I\).

Then, for \(i \ne I\), the constraint \(\bar{t} \ge g_i (\bar{z}) - M\) holds since

$$\begin{aligned} \bar{t} = g_I(\bar{z}) = \min \limits _{i = 1,2, \ldots ,n} \{g_i (\bar{z})\} \ge 0 \ge g_i(\bar{z}) - M. \end{aligned}$$

And for \(i = I\), \(\bar{t} \ge g_I (\bar{z}) - M\) holds also since

$$\begin{aligned} \bar{t} = g_I(\bar{z}) \ge g_I(z)-M \bar{y}_I = g_I(\bar{z}). \end{aligned}$$

Then, \((\bar{z}, \bar{y}, \bar{t})\) is feasible for (\(P_2\)).

For the second step, let us now show that: if \((z^*, y^*, t^*)\) is an optimal solution of (\(P_2\)) then \((z^*, t^*)\) is feasible for (\(P_1\)), i.e., we want to prove that \(t^* = \min \limits _{ i= 1,2, \ldots , n} \{g_i (z^*)\}\).

By contradiction, we shall suppose that this optimal solution of (\(P_2\)) is such that \(t^* \ne \min \limits _{ i= 1,2, \ldots , n} \{g_i (z^*)\}\).

Let us consider two cases:

  • either \(t^* < \min \limits _{ i= 1,2, \ldots , n} \{g_i (z^*)\}\),

  • or \(t^* > \min \limits _{ i= 1,2, \ldots , n} \{g_i (z^*)\}\).

Let \(I_{y^*}\) denote the unique index i such that \(y^*_i = 0\).

Then, \(y_{I_{y^*}} = 0\) and \(y^*_i = 1\), for all \(i \ne I_{y^*}\).

Using the fact that \((z^*, y^*, t^*)\) is a feasible solution for (\(P_2\)), we have

$$\begin{aligned} t^* \ge g_{I_{y^*}}(z^*). \end{aligned}$$
(60)

In the first case, with \(t^* < \min \limits _{ i= 1,2, \ldots , n} \{g_i (z^*)\}\), we have

$$\begin{aligned} g_{I_{y^*}}(z^*) \ge \min \limits _{ i= 1,2, \ldots , n} \{g_i (z^*)\} > t^*, \end{aligned}$$

which contradicts (60).

Therefore, the second case necessarily holds, i.e., \(t^* > \min \limits _{ i= 1,2, \ldots , n} \{g_i (z^*)\}\). Consider now a solution \((\bar{z}, \bar{y}, \bar{t})\) defined as follows:

$$\begin{aligned} \bar{z}&= z^*,\nonumber \\ \bar{t}&= \min \limits _{ i= 1,2, \ldots , n} \{g_i (z^*)\}, \end{aligned}$$
(61)

and

$$\begin{aligned} \bar{y}_i := {\left\{ \begin{array}{ll} 0, &{} \hbox {if } i \hbox { is the smallest index satisfying } g_i (z^*) = \min \limits _{ i= 1,2, \ldots , n} \{g_i (z^*)\},\\ 1, &{} \hbox {otherwise}, \end{array}\right. } \end{aligned}$$
(62)

where \(I^* = \{ i: g_i (z^*) = \min \limits _{ i= 1,2, \ldots , n} \{g_i (z^*)\}\}\). We have:

  • This new solution \((\bar{z}, \bar{y}, \bar{t})\) is feasible for (\(P_2\)). Indeed, for \(i \ne I^*,\) the \(i^{th}\) constraint, \(t \ge g_i(z) - M y_i\), is satisfied for \((\bar{z},\bar{y},\bar{t})\), since M is an upper bound for the function \(g_i(z)\) and \(\bar{y}_i = 1\).

    If \(i = I^*\), then on the one hand \(\min \nolimits _{ i= 1,2, \ldots , n} \{g_i (z^*)\} = \bar{t}\), and on the second hand \(g_{I^*}(\bar{z}) = g_{I^*}(z^*) = \min \nolimits _{ i= 1,2, \ldots , n} \{g_i (z^*)\}\) by definition of \(I^*\). Therefore, the \(I^{*}\)th constraint of (\(P_2\)) is satisfied for \((\bar{z}, \bar{y}, \bar{t})\).

  • In terms of objective-function values, it is clear that

    $$\begin{aligned} f(z^*) + \mu t^* > f(\bar{z}) + \mu \bar{t}, \end{aligned}$$

    since by hypothesis \(t^* > \min \nolimits _{ i= 1,2, \ldots , n} \{g_i (z^*)\}\), while \(\bar{t} = \min \nolimits _{ i= 1,2, \ldots , n}\{g_i (z^*)\}\). This contradicts the optimality of \((z^*, y^*, t^*)\).

\(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tran, T.T., Sinoquet, D., Da Veiga, S. et al. Derivative-free mixed binary necklace optimization for cyclic-symmetry optimal design problems. Optim Eng 24, 353–394 (2023). https://doi.org/10.1007/s11081-021-09685-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11081-021-09685-1

Keywords

Navigation