Abstract
We show that Hölder continuity of the gradient is not only a sufficient condition, but also a necessary condition for the existence of a global upper bound on the error of the first-order Taylor approximation. We also relate this global upper bound to the Hölder constant of the gradient. This relation is expressed as an interval, depending on the Hölder constant, in which the error of the first-order Taylor approximation is guaranteed to be. We show that, for the Lipschitz continuous case, the interval cannot be reduced. An application to the norms of quadratic forms is proposed, which allows us to derive a novel characterization of Euclidean norms.
Similar content being viewed by others
Notes
For example, \(x\in \mathbb {R}\mapsto x^3\), or \(x\in \mathbb {R}\mapsto x^2\sin (1/x^2)\) (with continuous extension at 0).
Note that the method requires \(\nu >0\). In fact, finding a descent direction for a non-smooth non-convex function is NP-hard [1], and thus, it is reasonable to ask that \(\nu >0\).
With the convention that \(0^0=1\), hence, \(\left( \frac{1+\nu }{\nu }\right) ^\nu =\left( \frac{1+\nu }{\nu }\right) ^{\nu /2}=1\), when \(\nu =0\).
With the convention that \(0^0=1\), as in Theorem 4.1.
That is, \(\langle Bx,x\rangle \ge 0\) for every \(x\in E\).
The result presented in Theorem 6.1 seems to be a novel (to the best of the authors’ knowledge) characterization of Euclidean norms in the finite-dimensional case. (For a detailed survey of results on equivalent characterizations of Euclidean norms, we refer the reader to the celebrated book by Amir [11].)
Indeed, a norm \(\Vert \cdot \Vert \) is completely determined by its unit ball K, via the identity \(\Vert x\Vert = \inf \,\{\alpha >0:x/\alpha \in K\}\). Since \(\Vert \cdot \Vert '\) is not Euclidean, its unit ball cannot be equal to \(\mathbb {B}^n\) (since \(\mathbb {B}^n\) is the unit ball of the canonical Euclidean norm).
References
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Yashtini, M.: On the global convergence rate of the gradient descent method for functions with Hölder continuous gradients. Optim. Lett. 10(6), 1361–1370 (2016)
Cartis, C., Gould, N.I., Toint, P.L.: Worst-case evaluation complexity of regularization methods for smooth unconstrained optimization using Hölder continuous gradients. Optim. Methods Softw. 32(6), 1273–1298 (2017)
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152(1–2), 381–404 (2015)
Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2018)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, Berlin (2013)
Jordan, P., von Neumann, J.: On inner products in linear, metric spaces. Ann. Math. 36(3), 719–723 (1935)
Friedman, A.: Foundations of Modern Analysis. Courier Corporation, North Chelmsford (1982)
Motzkin, T.S., Straus, E.G.: Maxima for graphs and a new proof of a theorem of Turán. Can. J. Math. 17, 533–540 (1965)
Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, vol. 2. SIAM, Philadelphia (2001)
Amir, D.: Characterizations of Inner Product Spaces. Birkhauser Verlag, Basel (1986)
John, F.: Extremum problems with inequalities as subsidiary conditions. In: Giorgi, G., Kjeldsen, T. (eds.) Traces and Emergence of Nonlinear Programming. Birkhäuser, Basel (2014)
Ball, K.: Ellipsoids of maximal volume in convex bodies. Geom. Dedicata 41(2), 241–250 (1992)
Acknowledgements
This work was supported by (i) the Fonds de la Recherche Scientifique—FNRS and the Fonds Wetenschappelijk Onderzoek—Vlaanderen under EOS Project No. 30468160, (ii) “Communauté française de Belgique—Actions de Recherche Concertées” (contract ARC 14/19-060). The research of the first author was supported by a FNRS/FRIA grant. The research of the third author was supported by the FNRS, the Walloon Region and the Innoviris Foundation. The research of the fourth author was supported by ERC Advanced Grant 788368.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proof of Theorem 6.1
Appendix: Proof of Theorem 6.1
The proof relies on the fact that, if \(\Vert \cdot \Vert \) is not Euclidean, then the unit ball defined by \(\Vert \cdot \Vert \), i.e., \(\{x\in E:\Vert x\Vert \le 1\}\), is not equal to the ellipsoid with smallest volume containing this ball. Based on this ellipsoid, we will build a self-adjoint operator \(B:E\rightarrow E^*\), such that \(\Vert Q_B\Vert <\Vert B\Vert \). The notions of ellipsoid and (Lebesgue) volume are defined on \(\mathbb {R}^n\) only. The following lemma implies, among other things, that there is no loss of generality in restricting to the case \(E=\mathbb {R}^n\):
Lemma A.1
Let E be a real vector space with norm \(\Vert \cdot \Vert \), and let \(A:E \rightarrow E'\) be a bijective linear map. Then, \(\Vert \cdot \Vert \) is Euclidean, if and only if the norm \(\Vert \cdot \Vert '\) on \(E'\), defined by \(\Vert x\Vert ' = \Vert A^{-1}x\Vert \), is Euclidean.
Proof
Straightforward from the definition of \(\Vert \cdot \Vert \) being Euclidean, if and only if it is induced by a scalar product, i.e., if and only if there exists a self-adjoint operator \(H:E\rightarrow E^*\), satisfying \(\Vert x\Vert ^2=\langle Hx,x\rangle \) for all \(x\in E\). \(\square \)
Proof of Theorem 6.1
The “only if” part follows from Proposition 6.1. For the proof of the “if” part, let E be an n-dimensional vector space, and let \(\Vert \cdot \Vert \) be a non-Euclidean norm on E. We will build a self-adjoint operator B on E, such that \(\Vert Q_B\Vert <\Vert B\Vert \).
By Lemma A.1, we may assume that \(E=\mathbb {R}^n\) and that \(\Vert \cdot \Vert \) is a non-Euclidean norm on \(\mathbb {R}^n\). We use superscripts to denote the components of vectors in \(\mathbb {R}^n\): \(x=(x^{(1)},\ldots ,x^{(n)})^\top \).
Let \(K=\{ x\in \mathbb {R}^n : \Vert x\Vert \le 1 \}\). Because K is compact, convex, with non-empty interior, and symmetric with respect to the origin, the Löwner–John ellipsoid theorem [12, 13] asserts that there exists a unique ellipsoid \(\mathcal {E}\), with minimal volume, and such that \(K\subseteq \mathcal {E}\). Moreover, \(\mathcal {E}\) is centered at the origin, and K has n linearly independent vectors on the boundary of \(\mathcal {E}\).
Let \(L:\mathbb {R}^n\rightarrow \mathbb {R}^n\) be a linear isomorphism, such that \(L\mathcal {E}\) is the Euclidean ball \(\mathbb {B}^n=\{x\in \mathbb {R}^n : \Vert x\Vert _2\le 1\}\), where \(\Vert x\Vert _2=\sqrt{x^\top x}\) is the canonical Euclidean norm on \(\mathbb {R}^n\). Let \(\Vert x\Vert '= \Vert L^{-1}x\Vert \), and let \(K'=\{ x\in \mathbb {R}^n : \Vert x\Vert ' \le 1 \}\). By Lemma A.1, \(\Vert \cdot \Vert '\) is not Euclidean. Since \(K'=LK\), it is clear that \(K'\) is compact, convex, with non-empty interior, and symmetric with respect to the origin. Moreover, \(K'\) is included in \(\mathbb {B}^n\), and it has n linearly independent vectors on the boundary \(\mathbb {S}^{n-1}\) of \(\mathbb {B}^n\).
We will need the following lemma to conclude the proof of Theorem 6.1:
Lemma A.2
There exist \(u,v\in \mathbb {S}^{n-1}\cap K'\), not colinear, such that \(\frac{u+v}{\Vert u+v\Vert _2}\notin K'\).
We proceed with the proof of Theorem 6.1 (a proof of Lemma A.2 is provided at the end of this “Appendix”). Let u, v be as in Lemma A.2, and define \(e_1 = \frac{u+v}{\Vert u+v\Vert _2}\) and \(e_2=\frac{u-v}{\Vert u-v\Vert _2}\). Note that these vectors are orthonormal (w.r.t. the inner product \(x^\top y\)).
Let \(\kappa = \max \, \{ |e_1^\top x|: x\in K' \}\). Since \(|e_1^\top x|<1\) for every \(x\in \mathbb {B}^n\setminus \{\pm e_1\}\), and \(\pm e_1\notin K'\), we have that \(\kappa <1\). Moreover, \(\kappa >0\), since \(\mathrm {int}(K')\ne \varnothing \). Let \(\tilde{B}\) be the self-adjoint operator on \(\mathbb {R}^n\), defined by
for every \(x,y\in \mathbb {R}^n\). Let \(x\in K'\). Then,
It follows that, for every \(x\in \mathbb {R}^n\) with \(x\ne 0\), \(|\langle \tilde{B}x,x\rangle |= \Vert x\Vert '^2\,|\langle \tilde{B}\frac{x}{\Vert x\Vert '},\frac{x}{\Vert x\Vert '}\rangle |\le \Vert x\Vert '^2\). Hence, \(\Vert Q_{\tilde{B}}\Vert \le 1\). Now, we will show that \(|\langle \tilde{B}u,v\rangle |>\Vert u\Vert '\Vert v\Vert '\) (where u, v are as above). Therefore, let \(\alpha =\Vert u+v\Vert _2\) and \(\beta =\Vert u-v\Vert _2\). Observe that \(u=\frac{\alpha e_1+\beta e_2}{2}\) and \(v=\frac{\alpha e_1-\beta e_2}{2}\). Thus,
This shows that \(\langle \tilde{B}u,v\rangle >1\), since (by the parallelogram identity)
\(0<\kappa <1\), and \(\alpha >0\). Since \(u,v\in K'\) (i.e., \(\Vert u\Vert ',\Vert v\Vert '\le 1\)), we have that \(\Vert u\Vert '\Vert v\Vert '\le 1<|\langle \tilde{B}u,v\rangle |\). Thus, \(\Vert \tilde{B}\Vert >1\).
Finally, define the self-adjoint operator B on E by \(\langle Bx,y\rangle = \langle \tilde{B}Lx,Ly\rangle \). It is clear, from the definition of \(\Vert \cdot \Vert '\), that \(|\langle Bx,x\rangle |\le \Vert x\Vert ^2\) for every \(x\in E\) and \(|\langle Bx,y\rangle |>\Vert x\Vert \Vert y\Vert \) for \(x=L^{-1}u\) and \(y=L^{-1}v\) (where u, v are as above). Hence, one gets \(\Vert Q_B\Vert \le 1<\Vert B\Vert \). This concludes the proof of Theorem 6.1. \(\square \)
It remains to prove Lemma A.2. The following proposition, known as Fritz John necessary conditions for optimality will be useful in the proof of Lemma A.2:
Proposition A.1
(Fritz John necessary conditions [12]) Let S be a compact metric space. Let F(x) be a real-valued function on \(\mathbb {R}^n\), and let G(x, y) be a real-valued function defined for all \((x,y)\in \mathbb {R}^n\times S\). Assume that F(x) and G(x, y) are both differentiable with respect to x and that F(x), G(x, y), \(\frac{\partial F}{\partial x}(x)\), and \(\frac{\partial G}{\partial x}(x,y)\) are continuous on \(\mathbb {R}^n\times S\). Let \(R=\{x\in \mathbb {R}^n:G(x,y)\le 0,\,\forall y\in S\}\), and suppose that R is non-empty.
Let \(x^*\in R\) be such that \(F(x^*)=\max _{x\in R} F(x)\). Then, there is \(m\in \{0,\ldots ,n\}\), and points \(y_1,\ldots ,y_m\in S\), and nonnegative multipliers \(\lambda _0,\lambda _1,\ldots ,\lambda _m\ge 0\), such that (i) \(G(x^*,y_i)=0\) for every \(1\le i\le m\), (ii) \(\sum _{i=0}^m \lambda _i >0\), and (iii)
We refer the reader to [12] for a proof of Proposition A.1.
Proof of Lemma A.2
Consider the following optimization problem:
with variable \(x\in \mathbb {R}^n\).
First, we show that (11) is bounded. Suppose the contrary, and, for every \(k\ge 1\), let \(x_k\) be a feasible solution with \(\Vert x_k\Vert _2\ge k\). Let \(\hat{x}_k=x_k/\Vert x_k\Vert _2\). Taking a subsequence if necessary, we may assume that \(\hat{x}_k\) converges to some \(\hat{x}_*\), with \(\Vert \hat{x}_*\Vert _2=1\). Since \(\hat{x}_k^\top y\le 1/\Vert x_k\Vert _2\) for every \(y\in \mathbb {S}^{n-1}\cap K'\), we have that \(\hat{x}_*^\top y \le 0\) for every \(y\in \mathbb {S}^{n-1}\cap K'\). By symmetry of \(\mathbb {S}^{n-1}\cap K'\), it follows that \(\hat{x}_*^\top y =0\) for every \(x\in \mathbb {S}^{n-1}\cap K'\), a contradiction with the fact that \(\mathbb {S}^{n-1}\cap K'\) contains n linearly independent vectors. Hence, the set of feasible solutions of (11) is bounded, and closed (as the intersection of closed sets), so that (11) has an optimal solution, say \(\bar{x}\).
We will show that \(\Vert \bar{x}\Vert _2>1\). Therefore, we use the fact that \(K'\ne \mathbb {B}^n\).Footnote 7 Fix some \(z\in \mathbb {S}^{n-1}\setminus K'\), and let \(\eta =\max \,\{ z^\top y:y\in K' \}\). Since \(z^\top y<1\) for every \(y\in \mathbb {B}^n\setminus \{z\}\), and \(z\notin K'\), we have that \(\eta <1\). Let \(x=z/\eta \). From the definition of \(\eta \), it is clear that x is a feasible solution of (11). Moreover, \(\Vert x\Vert _2=\eta ^{-1}>1\), so that \(\Vert \bar{x}\Vert _2\ge \Vert x\Vert _2>1\).
The gradient of F at \(\bar{x}\) is equal to \(2\bar{x}\). Then, Proposition A.1 asserts that there exist vectors \(y_1,\ldots ,y_m\in \mathbb {S}^{n-1}\cap K'\), and nonnegative multipliers \(\lambda _0,\lambda _1,\ldots ,\lambda _m\ge 0\), such that \(\bar{x}^\top y_i=1\) for every \(1\le i\le m\), \(\sum _{i=0}^m \lambda _i>0\), and \(\lambda _0\bar{x}=\sum _{i=1}^m \lambda _iy_i\). If \(\lambda _0=0\), then \(0=\sum _{i=1}^m \lambda _i\bar{x}^\top y_i = \sum _{i=1}^m \lambda _i >0\), a contradiction. Hence, \(\lambda _0>0\). Suppose that \(y_1,\ldots ,y_m\) are colinear. This implies that all \(y_i\)’s must be parallel to \(\bar{x}\) (because \(\lambda _0\bar{x}=\sum _{i=1}^m \lambda _iy_i\) and \(\lambda _0\bar{x}\ne 0\)), and since they are in \(\mathbb {S}^{n-1}\), we have that \(y_i=\pm \bar{x}/\Vert \bar{x}\Vert _2\), so that \(\bar{x}^\top y_i=\Vert \bar{x}\Vert _2>1\) or \(-\bar{x}^\top y_i=\Vert \bar{x}\Vert _2>1\). This gives a contradiction, since \(-y_i\) and \(y_i\) are both in \(\mathbb {S}^{n-1}\cap K'\) (by the symmetry of \(\mathbb {S}^{n-1}\cap K'\)). Thus, there exist at least two non-colinear vectors \(u,v\in \mathbb {S}^{n-1}\cap K'\) satisfying \(\bar{x}^\top u=1\) and \(\bar{x}^\top v=1\).
Let \(e_1 = \frac{u+v}{\Vert u+v\Vert _2}\). Since u and v are not colinear, \(\Vert u+v\Vert _2<2\), and thus, \(\bar{x}^\top e_1=2/\Vert u+v\Vert _2>1\). This shows that \(e_1\notin \mathbb {S}^{n-1}\cap K'\). By definition, \(e_1\) is in \(\mathbb {S}^{n-1}\), so that \(e_1\notin K'\), concluding the proof of the lemma. \(\square \)
Rights and permissions
About this article
Cite this article
Berger, G.O., Absil, PA., Jungers, R.M. et al. On the Quality of First-Order Approximation of Functions with Hölder Continuous Gradient. J Optim Theory Appl 185, 17–33 (2020). https://doi.org/10.1007/s10957-020-01632-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-020-01632-x
Keywords
- Hölder continuous gradient
- First-order Taylor approximation
- Lipschitz continuous gradient
- Lipschitz constant
- Euclidean norms