Abstract
A variational formulation for accelerated optimization on normed vector spaces was recently introduced in Wibisono et al. (PNAS 113:E7351–E7358, 2016), and later generalized to the Riemannian manifold setting in Duruisseaux and Leok (SJMDS, 2022a). This variational framework was exploited on normed vector spaces in Duruisseaux et al. (SJSC 43:A2949–A2980, 2021) using time-adaptive geometric integrators to design efficient explicit algorithms for symplectic accelerated optimization, and it was observed that geometric discretizations which respect the time-rescaling invariance and symplecticity of the Lagrangian and Hamiltonian flows were substantially less prone to stability issues, and were therefore more robust, reliable, and computationally efficient. As such, it is natural to develop time-adaptive Hamiltonian variational integrators for accelerated optimization on Riemannian manifolds. In this paper, we consider the case of Riemannian manifolds embedded in a Euclidean space that can be characterized as the level set of a submersion. We will explore how holonomic constraints can be incorporated in discrete variational integrators to constrain the numerical discretization of the Riemannian Hamiltonian system to the Riemannian manifold, and we will test the performance of the resulting algorithms by solving eigenvalue and Procrustes problems formulated as optimization problems on the unit sphere and Stiefel manifold.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Many data analysis algorithms are designed around the minimization of a loss function or the maximization of a likelihood function. Due to the ever-growing scale of the data sets and size of the problems, there has been a lot of focus on first-order optimization algorithms because of their low cost per iteration. In 1983, Nesterov’s accelerated gradient method (Nesterov 1983) was shown to converge in \(\mathcal {O}(1/k^2)\) to the minimum of the convex objective function f, improving on the \(\mathcal {O}(1/k)\) convergence rate exhibited by standard gradient descent methods. This \(\mathcal {O}(1/k^2)\) convergence rate was shown in Nesterov (2004) to be optimal among first-order methods using only information about \(\nabla f\) at consecutive iterates. This phenomenon in which an algorithm displays this improved rate of convergence is referred to as acceleration, and other accelerated algorithms have been derived since Nesterov’s algorithm, which was shown in Su et al. (2016) to limit to a second-order ordinary differential equation (ODE), as the timestep goes to 0, and that f(x(t)) converges to its optimal value at a rate of \(\mathcal {O}(1/t^2)\) along any trajectory x(t) of this ODE. It was then shown in Wibisono et al. (2016) that in continuous time, an arbitrary convergence rate \(\mathcal {O}(1/t^p)\) can be achieved in normed spaces, by considering flow maps generated by a family of time-dependent Bregman Lagrangian and Hamiltonian systems which is closed under time-rescaling. This variational framework and the time-rescaling property of this family were then exploited in Duruisseaux et al. (2021) by using time-adaptive geometric integrators to design efficient explicit algorithms for symplectic accelerated optimization. It was observed that a careful use of adaptivity and symplecticity could result in a significant gain in computational efficiency. More generally, when applied to Hamiltonian systems, symplectic integrators yield discrete approximations of the flow that preserve the symplectic 2-form (Hairer et al. 2006). The preservation of symplecticity results in the preservation of many qualitative aspects of the underlying dynamical system. In particular, when applied to conservative Hamiltonian systems, symplectic integrators exhibit excellent long-time near-energy preservation (Benettin and Giorgilli 1994; Reich 1999). Variational integrators provide a systematic method for constructing symplectic integrators of arbitrarily high-order based on the numerical discretization of Hamilton’s principle (Marsden and West 2001; Hall and Leok 2015), or equivalently, by the approximation of Jacobi’s solution of the Hamilton–Jacobi equation, which is a generating function for the exact symplectic flow map.
In the past few years, there has been some effort to derive accelerated optimization algorithms in the Riemannian manifold setting (Duruisseaux and Leok 2022a; Alimisis et al. 2020a, b, 2021; Zhang and Sra 2016, 2018; Ahn and Sra 2020; Liu et al. 2017). In Duruisseaux and Leok (2022a), it was shown that in continuous time, the convergence rate of f(x(t)) to its optimal value can be accelerated to an arbitrary convergence rate \(\mathcal {O}(1/t^p)\) on Riemannian manifolds. This was achieved by considering a family of time-dependent Bregman Lagrangian and Hamiltonian systems on Riemannian manifolds which is closed under time-rescaling, thereby generalizing the variational framework for accelerated optimization of Wibisono et al. (2016) to Riemannian manifolds. The time-adaptivity based approach relying on a Poincaré transformation from Duruisseaux et al. (2021) was also extended to the Riemannian manifold setting in Duruisseaux and Leok (2022a). Now, the Whitney embedding theorems (Whitney 1944a, b) state that any smooth manifold of dimension \(n \ge 2\) can be embedded in \(\mathbb {R}^{2n}\) and immersed in \(\mathbb {R}^{2n-1}\), and is thus diffeomorphic to a submanifold of \(\mathbb {R}^{2n}\). Furthermore, the Nash embedding theorems (Nash 1956) state that any Riemannian manifold can be globally isometrically embedded into some Euclidean space. As a consequence of these embedding theorems, the study of Riemannian manifolds can in principle be reduced to the study of submanifolds of Euclidean spaces. Altogether, this motivates the introduction of time-adaptive variational integrators on Riemannian manifolds which exploit the structure of the embedding Euclidean space, and in this paper, we will study how holonomic constraints can be incorporated into different types of variational integrators to constrain the numerical solutions of the Riemannian dynamical system to the Riemannian manifold. Incorporating holonomic constraints in geometric integrators has been studied extensively in the past (see Marsden and West (2001); Hairer et al. (2006); Marsden and Ratiu (1999); Holm et al. (2009) for instance), and some work has been done from the variational perspective for the Type I Lagrangian formulation in Marsden and West (2001) via augmented Lagrangians.
1.1 Outline of the Paper
Section 2 shows the equivalence between constrained variational principles and constrained Euler–Lagrange equations, both in continuous and discrete time, before deriving analogous results for both the Type II and Type III Hamiltonian formulations of mechanics in Sect. 3. In Sect. 4, we will exploit error analysis theorems for unconstrained mechanics from Marsden and West (2001); Schmitt and Leok (2017) to obtain variational error analysis results for the maps defined implicitly by the discrete constrained Euler–Lagrange and Hamilton’s equations. Finally, in Sect. 5, we will exploit these constrained variational integrators and the variational formulation of accelerated optimization on Riemannian manifolds from Duruisseaux and Leok (2022a) to solve numerically generalized eigenvalue problems and Procrustes problems on the unit sphere and Stiefel manifold.
2 Constrained Variational Lagrangian Mechanics
Traditionally, variational integrators have been designed based on the Type I generating function known as the discrete Lagrangian, \(L_d:\mathcal {Q} \times \mathcal {Q} \rightarrow \mathbb {R}\). The exact discrete Lagrangian is the exact generating function for the time-h flow map of Hamilton’s equations, and it can be represented in boundary value form by
where \(q(0)=q_0,\) \(q(h)=q_h,\) and q satisfies the Euler–Lagrange equations over the time interval [0, h]. This is closely related to Jacobi’s solution of the Hamilton–Jacobi equation. A variational integrator is defined by constructing an approximation \(L_d:\mathcal {Q} \times \mathcal {Q} \rightarrow \mathbb {R}\) to the exact discrete Lagrangian \(L_d^E\), and then applying the implicit discrete Euler–Lagrange equations,
which implicitly define a numerical integrator, referred to as the discrete Hamiltonian map \(\tilde{F}_{L_d}:(q_0,p_0)\mapsto (q_1,p_1)\), where \(D_i\) denotes a partial derivative with respect to the i-th argument. These equations define the discrete Legendre transforms, \(\mathbb {F}^{\pm }L_{d}: \mathcal {Q}\times \mathcal {Q} \rightarrow T^{*}\mathcal {Q}\):
and the discrete Hamiltonian map can be expressed as \(\tilde{F}_{L_d}\equiv (\mathbb {F}^{+}L_{d})\circ (\mathbb {F}^{-}L_{d})^{-1}\). Such numerical methods are called variational integrators as they can be derived from a discrete Hamilton’s principle, which involves extremizing a discrete action sum \(S_d\left( \{q_k\}_{k=0}^N \right) \equiv \sum _{k=0}^{N-1} L_d(q_k,q_{k+1})\), subject to fixed boundary conditions on \(q_0\), \(q_N\).
Now, suppose we are given a configuration manifold \(\mathcal {M}\), and a holonomic constraint function \(\mathcal {C} : \mathcal {M} \rightarrow \mathbb {R}^d\). Assuming that \(0\in \mathbb {R}^d\) is a regular point of \(\mathcal {C}\), we can constrain the dynamics to the constraint submanifold \(\mathcal {Q} = \mathcal {C}^{-1}(0),\) which is truly a submanifold of \(\mathcal {M}\) (see Marsden and West (2001); Abraham et al. (1988)). We will now consider variational Lagrangian mechanics with holonomic constraints \(\mathcal {C}(q)\) using Lagrange multipliers \(\lambda : [0,T] \rightarrow \Lambda \).
2.1 Continuous Constrained Variational Lagrangian Mechanics
We begin by presenting an equivalence between the continuous constrained variational principle and the continuous constrained Euler–Lagrange equations:
Theorem 2.1
Consider the constrained action functional \(\mathfrak {S} : C^2([0,T], \mathcal {Q} \times \Lambda ) \rightarrow \mathbb {R}\) given by
The condition that \(\mathfrak {S} (q(\cdot ), \lambda (\cdot )) \) is stationary with respect to the boundary conditions \(\delta q(0) = 0\) and \(\delta q(T) =0\) is equivalent to \(\left( q(\cdot ),\lambda (\cdot ) \right) \) satisfying the constrained Euler–Lagrange equations
Proof
See Appendix A.1. \(\square \)
Remark 2.1
These constrained Euler–Lagrange equations can be thought of as the Euler–Lagrange equations coming from the augmented Lagrangian \(\bar{L} \left( q,\lambda , \dot{q}, \dot{\lambda } \right) = L(q,\dot{q} )- \langle \lambda , \mathcal {C} (q) \rangle \).
Consider the function \(\mathcal {S}(q_0,q_T)\) given by the extremal value of the constrained action functional \(\mathfrak {S}\) over the family of curves \((q(\cdot ) , \lambda (\cdot )) \) satisfying the boundary conditions \(q(0)=q_0\) and \(q(T)=q_T\):
The following theorem shows that \(\mathcal {S}(q_0,q_T)\) is a generating function for the flow of the continuous constrained Euler–Lagrange equations:
Theorem 2.2
The exact time-T flow map of Hamilton’s equations \((q_0,p_0) \mapsto (q_T,p_T)\) is implicitly given by the following relations:
In particular, \( \mathcal {S} (q_0,q_T)\) is a Type I generating function that generates the exact flow of the constrained Euler–Lagrange equations (2.6).
Proof
See Appendix A.4.\(\square \)
2.2 Discrete Constrained Variational Lagrangian Mechanics
We now introduce a discrete variational formulation of Lagrangian mechanics which includes holonomic constraints. Suppose we are given a partition \(0 = t_0< t_1< \ldots < t_N = T\) of the interval [0, T], and a discrete curve in \(\mathcal {Q} \times \Lambda \) denoted by \(\{ (q_k, \lambda _k) \}_{k=0}^{N}\) such that \(q_k \approx q(t_k)\) and \(\lambda _k \approx \lambda (t_k)\). We will formulate discrete constrained variational Lagrangian mechanics in terms of the following discrete analogues of the constrained action functional \(\mathfrak {S}\) given by Eq. (2.5):
where
We can now derive discrete analogues to Theorem 2.1 relating discrete Type I variational principles to discrete Euler–Lagrange equations:
Theorem 2.3
The Type I discrete Hamilton’s variational principles
are equivalent to the discrete constrained Euler–Lagrange equations
where \(L_d(q_{k},q_{k+1}) \) is defined via Eq. (2.11).
Proof
See Appendix A.7. \(\square \)
Remark 2.2
These discrete constrained Euler–Lagrange equations can be thought of as the discrete Euler–Lagrange equations coming from the augmented discrete Lagrangians
3 Constrained Variational Hamiltonian Mechanics
The boundary value formulation of the exact Type II generating function of the time-h flow of Hamilton’s equations is given by the exact discrete right Hamiltonian,
where (q, p) satisfies Hamilton’s equations with boundary conditions \(q(0)=q_0\) and \(p(h)=p_h\). A Type II Hamiltonian variational integrator is constructed by using an approximate discrete right Hamiltonian \(H_d^+\), and applying the discrete right Hamilton’s equations,
which implicitly define the integrator, \(\tilde{F}_{H_d^+}:(q_0,p_0) \mapsto (q_1,p_1)\).
Similarly, the boundary value formulation of the exact Type III generating function of the time-h flow of Hamilton’s equations is given by the exact discrete left Hamiltonian,
where (q, p) satisfies Hamilton’s equations with boundary conditions \(q(h)=q_h\) and \(p(0)=p_0\). A Type III Hamiltonian variational integrator is constructed by using an approximate discrete left Hamiltonian \(H_d^-\), and applying the discrete left Hamilton’s equations,
which implicitly define the integrator, \(\tilde{F}_{H_d^-}:(q_0,p_0) \mapsto (q_1,p_1)\).
We now derive analogous results to those of Sect. 2 from the Hamiltonian perspective. As in the Lagrangian case, we will assume we have a configuration manifold \(\mathcal {M}\), a holonomic constraint function \(\mathcal {C} : \mathcal {M} \rightarrow \mathbb {R}^d\), and that the dynamics are constrained to the submanifold \(\mathcal {Q} = \mathcal {C}^{-1}(0)\).
3.1 Continuous Constrained Variational Hamiltonian Mechanics
The following theorem presents the equivalence between a continuous constrained variational principle and continuous constrained Hamilton’s equations in the Type II case, generalizing Lemma 2.1 from Leok and Zhang (2011) to include holonomic constraints:
Theorem 3.1
Consider the Type II constrained action functional \(\mathfrak {S} : C^2([0,T],T^*\mathcal {Q} \times \Lambda ) \rightarrow \mathbb {R}\)
The condition that \(\mathfrak {S} (q(\cdot ),p(\cdot ), \lambda (\cdot )) \) is stationary with respect to the boundary conditions \(\delta q(0) = 0\) and \(\delta p(T) =0\) is equivalent to \((q(\cdot ),p(\cdot ),\lambda (\cdot ))\) satisfying Hamilton’s constrained equations
Proof
See Appendix A.2. \(\square \)
As in the Type II case, we can derive a theorem relating a continuous constrained variational principle and continuous constrained Hamilton’s equations in the Type III case:
Theorem 3.2
Consider the Type III constrained action functional \(\mathfrak {S} : C^2([0,T],T^*\mathcal {Q} \times \Lambda ) \rightarrow \mathbb {R}\)
The condition that \(\mathfrak {S} (q(\cdot ),p(\cdot ), \lambda (\cdot )) \) is stationary with respect to the boundary conditions \(\delta q(T) = 0\) and \(\delta p(0) =0\) is equivalent to \((q(\cdot ),p(\cdot ),\lambda (\cdot ))\) satisfying Hamilton’s constrained equations
Proof
See Appendix A.3. \(\square \)
Remark 3.1
Hamilton’s constrained equations are the same in the Type II and Type III formulations of Hamiltonian mechanics, and they can be thought of as the Hamilton’s equations generated by the augmented Hamiltonian
where is the conjugate momentum for the variable \(\lambda \). Furthermore, they are equivalent to the constrained Euler–Lagrange Eq. (2.6), provided that the Lagrangian L is hyperregular.
Remark 3.2
It is sometimes beneficial to augment the continuous equations with the equation \( \langle \frac{\partial H}{\partial p} (q,p) , \nabla \mathcal {C}(q) \rangle =0\) (and analogously for the discrete case) to ensure that the momentum p lies in the cotangent space to the manifold, as explained and illustrated in (Hairer et al. 2006, Chapter VII).
We will now generalize Theorem 2.2 and its Type III analogue from Leok and Zhang (2011) to include holonomic constraints \(\mathcal {C}(q)\) using Lagrange multipliers \(\lambda : [0,T] \rightarrow \Lambda \).
In the Type II case, consider the function \(\mathcal {S}(q_0,p_T)\) given by the extremal value of the constrained action functional \(\mathfrak {S}\) over the family of curves \((q(\cdot ),p(\cdot ) , \lambda (\cdot )) \) satisfying the boundary conditions \(q(0)=q_0\) and \(p(T)=p_T\):
The following theorem shows that \(\mathcal {S}(q_0,p_T)\) is a generating function for the flow of the continuous constrained Hamilton’s equations:
Theorem 3.3
The exact time-T flow map of Hamilton’s equations \((q_0,p_0) \mapsto (q_T,p_T)\) is implicitly given by the following relations:
In particular, \( \mathcal {S} (q_0,p_T)\) is a Type II generating function that generates the exact flow of the constrained Hamilton’s Eq. (3.6).
Proof
See Appendix A.5. \(\square \)
In the Type III case, consider the function \(\mathcal {S}(q_T,p_0)\) given by the extremal value of the constrained action functional \(\mathfrak {S}\) over the family of curves \((q(\cdot ),p(\cdot ) , \lambda (\cdot )) \) satisfying the boundary conditions \(q(T)=q_T\) and \(p(0)=p_0\):
The following theorem shows that \(\mathcal {S}(q_T,p_0)\) is a generating function for the flow of the continuous constrained Hamilton’s equations:
Theorem 3.4
The exact time-T flow map of Hamilton’s equations \((q_0,p_0) \mapsto (q_T,p_T)\) is implicitly given by the following relations:
In particular, \( \mathcal {S} (q_T,p_0)\) is a Type III generating function that generates the exact flow of the constrained Hamilton’s Eq. (3.8).
Proof
See Appendix A.6. \(\square \)
3.2 Discrete Constrained Variational Hamiltonian Mechanics
Let us now extend the results of Sect. 3 from Leok and Zhang (2011) to introduce a discrete formulation of variational Hamiltonian mechanics which includes holonomic constraints. Suppose we are given a partition \(0 = t_0< t_1< \ldots < t_N = T\) of the interval [0, T], and a discrete curve in \(T^* \mathcal {Q} \times \Lambda \), denoted by \(\{ (q_k,p_k, \lambda _k) \}_{k=0}^{N}\), such that \(q_k \approx q(t_k)\), \(p_k \approx p(t_k)\) and \(\lambda _k \approx \lambda (t_k)\).
We formulate discrete constrained variational Hamiltonian mechanics in terms of the following discrete analogues of the constrained action functional \(\mathfrak {S}\) given by Eq. (3.5):
where
We can now derive discrete analogues of Theorems 3.1 and 3.2 relating discrete variational principles to discrete constrained Hamilton’s equations, generalizing Lemma 3.1 from Leok and Zhang (2011):
Theorem 3.5
The Type II discrete Hamilton’s phase space variational principle
is equivalent to the discrete constrained right Hamilton’s equations
where \(H_d^+ (q_k,p_{k+1}) \) is defined via Eq. (3.16).
Proof
See Appendix A.8. \(\square \)
Theorem 3.6
The Type III discrete Hamilton’s phase space variational principle
is equivalent to the discrete constrained left Hamilton’s equations
where \(H_d^- (q_{k+1},p_{k}) \) is defined via Eq. (3.17).
Proof
See Appendix A.9. \(\square \)
Remark 3.3
These discrete constrained Hamilton’s equations can be thought of as the discrete Hamilton’s equations generated by the augmented discrete Hamiltonians
This augmented Hamiltonian perspective together with the augmented Lagrangian perspective from Remark 2.2 imply that the constrained \(\bar{H}_d^+\) variational integrator is equivalent to the constrained \(\bar{L}_d^+\) variational integrator whenever the \(H_d^+\) variational integrator is equivalent to the \(L_d^+\) variational integrator (and similarly for the integrators generated by \(\bar{H}_d^-\) and \(\bar{L}_d^-\)). Examples where this happens are presented in Schmitt et al. (2018) for Taylor variational integrators provided the Lagrangian is hyperregular, and in Leok and Zhang (2011) for generalized Galerkin variational integrators constructed using the same choices of basis functions and numerical quadrature formula provided the Hamiltonian is hyperregular.
4 Error Analysis for Variational Integrators
4.1 Unconstrained Error Analysis
Theorem 2.3.1 of Marsden and West (2001) states that if a discrete Lagrangian, \(L_d:\mathcal {Q}\times \mathcal {Q}\rightarrow \mathbb {R}\), approximates the exact discrete Lagrangian \(L_d^E:\mathcal {Q}\times \mathcal {Q}\rightarrow \mathbb {R}\) to order r, i.e.,
then the discrete Hamiltonian map \(\tilde{F}_{L_d}:(q_k,p_k)\mapsto (q_{k+1},p_{k+1})\), viewed as a one-step method defined implicitly from the discrete Euler–Lagrange equations
or equivalently in terms of the implicit discrete Euler–Lagrange equations, which involve the corresponding discrete momenta via the discrete Legendre transforms,
has order of accuracy r.
Theorem 2.3.1 of Marsden and West (2001) has an analogue for Hamiltonian variational integrators. Theorem 2.2 in Schmitt and Leok (2017) states that if a discrete right Hamiltonian \(H^+_d\) approximates the exact discrete right Hamiltonian \(H_d^{+,E}\) to order r, i.e.,
then the discrete right Hamiltonian map \(\tilde{F}_{H^+_d}:(q_k,p_k)\mapsto (q_{k+1},p_{k+1})\), viewed as a one-step method defined implicitly by the discrete right Hamilton’s equations
is order r accurate. As mentioned in Schmitt and Leok (2017), the proof of Theorem 2.2 in Schmitt and Leok (2017) can easily be adjusted to prove an equivalent theorem for the discrete left Hamiltonian case, which states that if a discrete left Hamiltonian \(H^-_d\) approximates the exact discrete left Hamiltonian \(H_d^{-,E}\) to order r, i.e.,
then the discrete left Hamiltonian map \(\tilde{F}_{H^-_d}:(q_k,p_k)\mapsto (q_{k+1},p_{k+1})\), viewed as a one-step method defined implicitly by the discrete left Hamilton’s equations
is order r accurate. Many other properties of the integrator, such as momentum conservation properties of the method, can be determined by analyzing the associated discrete Lagrangian or Hamiltonian, as opposed to analyzing the integrator directly. We will exploit these error analysis results to derive analogous results for the constrained versions discussed in Sects. 2 and 3.
4.2 Constrained Error Analysis
For the Lagrangian case, we can think of the Lagrange multipliers \(\lambda \) as extra position coordinates and define an augmented Lagrangian \(\bar{L}\) via
A corresponding augmented discrete Lagrangian is given by
and the discrete Euler–Lagrange Eq. (4.2)
yield the discrete constrained Euler–Lagrange equations
derived in Sect. 2.2. As a consequence, we can apply Theorem 2.3.1 of Marsden and West (2001) to the augmented Lagrangian (4.8) and obtain the following result:
Theorem 4.1
Suppose that for an exact discrete Lagrangian \(L_d^E\) and a discrete Lagrangian \(L_d\),
Then, the discrete map \((q_k,p_k,\lambda _k)\mapsto (q_{k+1},p_{k+1},\lambda _{k+1})\), viewed as a one-step method defined implicitly by the discrete constrained Euler–Lagrange equations, has order of accuracy r.
For the Hamiltonian case, we can think of the Lagrange multipliers \(\lambda \) as extra position coordinates and define conjugate momenta , which are constants of motion since the time-derivative of \(\lambda \) does not appear anywhere, and are constrained to be zero. The augmented Hamiltonian \(\bar{H}\), given by
yields the following augmented left and right discrete Hamiltonians
and the discrete left and right Hamilton’s equations
yield the discrete constrained left Hamilton’s equations
and the discrete constrained right Hamilton’s equations
derived in Sect. 3. As a consequence, we can apply Theorem 2.2 in Schmitt and Leok (2017) and its Type III analogue to the augmented Hamiltonians and obtain the following results
Theorem 4.2
Suppose that given an exact discrete right Hamiltonian \(H_d^{+,E}\) and a discrete right Hamiltonian \(H_d^+\), we have
Then, the discrete map \((q_k,p_k,\lambda _k)\mapsto (q_{k+1},p_{k+1},\lambda _{k+1})\), viewed as a one-step method defined implicitly by the discrete constrained right Hamilton’s equations, has order of accuracy r.
Theorem 4.3
Suppose that given an exact discrete left Hamiltonian \(H_d^{-,E}\) and a discrete left Hamiltonian \(H_d^-\), we have
Then, the discrete map \((q_k,p_k,\lambda _k)\mapsto (q_{k+1},p_{k+1},\lambda _{k+1})\), viewed as a one-step method defined implicitly by the discrete constrained left Hamilton’s equations, has order of accuracy r.
5 Variational Riemannian Accelerated Optimization
5.1 Riemannian Geometry
We first introduce the main notions from Riemannian geometry that will be used throughout this section (see Absil et al. (2008); Boumal (2020); Duruisseaux and Leok (2022a); Jost (2017); Lee (2018); Lang (1999) for more details).
Definition 5.1
Suppose we have a Riemannian manifold \(\mathcal {Q}\) with Riemannian metric \(g(\cdot ,\cdot ) = \langle \cdot , \cdot \rangle \), represented by the positive-definite symmetric matrix \((g_{ij}) \) in local coordinates. Then, we define the musical isomorphism \(g^{\flat }:T\mathcal {Q} \rightarrow T^*\mathcal {Q}\) via
and its inverse musical isomorphism \(g^{\sharp }:T^*\mathcal {Q} \rightarrow T\mathcal {Q}\). The Riemannian metric \(g(\cdot ,\cdot ) = \langle \cdot , \cdot \rangle \) induces a fiber metric \(g^*(\cdot ,\cdot ) = \langle \langle \cdot , \cdot \rangle \rangle \) on \(T^* \mathcal {Q}\) via
represented by the positive-definite symmetric matrix \((g^{ij})\) in local coordinates, which is the inverse of the Riemannian metric matrix \((g_{ij}) \).
Definition 5.2
Denoting the differential of f by df, the Riemannian gradient \(\mathrm{grad}f(q) \in T_q \mathcal {Q}\) at a point \(q\in \mathcal {Q}\) of a smooth function \(f:\mathcal {Q} \rightarrow \mathbb {R}\) is the tangent vector at q such that
This can also be expressed in terms of the inverse musical isomorphism, \(\mathrm{grad}f(q)=g^\sharp (df(q))\).
Definition 5.3
A geodesic in a Riemannian manifold \(\mathcal {Q}\) is a parametrized curve \(\gamma : [0,1] \rightarrow \mathcal {Q}\) which is of minimal local length, and is a generalization of the notion of straight line from Euclidean spaces to Riemannian manifolds. The other generalization of straight lines involves curves having zero “acceleration" or constant “speed," which requires the introduction of an affine connection. These two generalizations are equivalent if the Riemannian manifold is endowed with the Levi–Civita connection. Given two points \(q,\tilde{q} \in \mathcal {Q}\), a vector in \(T_q \mathcal {Q}\) can be transported to \(T_{\tilde{q}}\mathcal {Q}\) along a geodesic \(\gamma \) by an operation \(\Gamma _q^{\tilde{q}}:T_q \mathcal {Q}\rightarrow T_{\tilde{q}}\mathcal {Q}\) called the parallel transport along \(\gamma \).
Definition 5.4
The Riemannian Exponential map \(Exp _q:T_q \mathcal {Q} \rightarrow \mathcal {Q}\) at \(q\in \mathcal {Q}\) is defined via
where \(\gamma _v\) is the unique geodesic in \(\mathcal {Q}\) such that \(\gamma _v(0) = q\) and \(\gamma _v'(0) = v\), for any \(v\in T_q \mathcal {Q} \). \(Exp _q\) is a diffeomorphism in some neighborhood \( U \subset T_q\mathcal {Q}\) containing 0, so we can define its inverse map, the Riemannian Logarithm map \(Log _p : Exp _q(U) \rightarrow T_q \mathcal {Q}\).
Definition 5.5
A retraction on a manifold \(\mathcal {Q}\) is a smooth mapping \(\mathcal {R}: T\mathcal {Q} \rightarrow \mathcal {Q}\) such that for any \(q \in \mathcal {Q}\), the restriction \(\mathcal {R}_q : T_q\mathcal {Q} \rightarrow \mathcal {Q} \) of \(\mathcal {R}\) to \(T_q\mathcal {Q} \) satisfies
-
\(\mathcal {R}_q(0_q) = q\), where \(0_q\) denotes the zero element of \(T_q\mathcal {Q} \),
-
\(T_{0_q}\mathcal {R}_q = \mathbb {I}_{T_q\mathcal {Q} }\) with the canonical identification \(T_{0_q}T_{q}\mathcal {Q} \simeq T_{q}\mathcal {Q}\), where \(T_{0_q}\mathcal {R}_q\) is the tangent map of \(\mathcal {R}\) at \(0_q \in T_{q}\mathcal {Q}\) and \(\mathbb {I}_{T_q\mathcal {Q} }\) is the identity map on \(T_{q}\mathcal {Q}\).
The Riemannian Exponential map is a natural example of a retraction on a Riemannian manifold.
Definition 5.6
A subset A of a Riemannian manifold \(\mathcal {Q}\) is called geodesically uniquely convex if every two points of A are connected by a unique geodesic in A. A function \(f:\mathcal {Q} \rightarrow \mathbb {R}\) is called geodesically convex if for any two points \(q,\tilde{q} \in \mathcal {Q}\) and a geodesic \(\gamma \) connecting them,
Note that if f is a smooth geodesically convex function on a geodesically uniquely convex subset A,
A function \(f:A\rightarrow \mathbb {R}\) is called geodesically \(\alpha \)- weakly quasi-convex (\(\alpha \)-WQC) with respect to \(q \in \mathcal {Q}\) for some \(\alpha \in (0,1]\) if
Note that a local minimum of a geodesically convex or \(\alpha \)-WQC function is also a global minimum.
Definition 5.7
Given a Riemannian manifold \(\mathcal {Q}\) with sectional curvature bounded below by \(K_{\min }\), and an upper bound D for the diameter of the domain of consideration, define
Note that \(\zeta \ge 1\) since \(x \coth {x} \ge 1\) for all real values of x.
5.2 Hamiltonian Approach
Our approach consists in integrating the Riemannian Bregman Hamiltonian systems derived in Duruisseaux and Leok (2022a) which evolve on the Riemannian manifold \(\mathcal {Q}\), via discrete constrained variational Hamiltonian integrators which enforce the numerical solution to lie on the Riemannian manifold \(\mathcal {Q}\). With \(\zeta \) given by Eq. (5.1), we know from Duruisseaux and Leok (2022a) that if we let \(\lambda = \zeta \) in the geodesically convex case, and \(\lambda = \zeta / \alpha \) in the geodesically \(\alpha \)-weakly quasi-convex case, we obtain the Direct approach Riemannian p-Bregman Hamiltonian
and the Adaptive approach Riemannian \(p\rightarrow \mathring{p}\) Bregman Hamiltonian
It is proven in Duruisseaux and Leok (2022a) that along the trajectories of the Riemannian p-Bregman dynamics, f(Q(t)) converges to its optimal value at a rate of \(\mathcal {O}(1/t^p)\), under suitable assumptions on \(\mathcal {Q}\).
Remark 5.1
In the vector space setting, these Riemannian Bregman Hamiltonians reduce to the direct and adaptive approach Bregman Hamiltonians derived in Duruisseaux et al. (2021) for convex functions:
5.3 Some Optimization Problems on Riemannian Manifolds
5.3.1 Rayleigh Quotient Minimization on the Unit Sphere
An eigenvector v corresponding to the largest eigenvalue of a symmetric \(n \times n\) matrix A maximizes the Rayleigh quotient \(\frac{v^\top Av}{ v^\top v}\) over \(\mathbb {R}^n\). Thus, a unit eigenvector corresponding to the largest eigenvalue of the matrix A is a minimizer of the function \(f(v) = - v^\top Av\) over the unit sphere \(\mathcal {Q} = \mathbb {S}^{n-1}\), which can be thought of as a Riemannian submanifold with constant positive curvature \(K=1\) of \(\mathbb {R}^{n}\) endowed with the Riemannian metric inherited from the Euclidean inner product \(g_v(u,w) = u^\top w\). Solving the Rayleigh quotient optimization problem efficiently is challenging when the given symmetric matrix A is ill-conditioned and high-dimensional. Note that an efficient algorithm that solves the above minimization problem can also be used to find eigenvectors corresponding to the smallest eigenvalue of A by using the fact that the eigenvalues of A are the negative of the eigenvalues of \(-A\).
5.3.2 Eigenvalue and Procrustes Problems on the Stiefel Manifold
When endowed with the Riemannian metric \(g_X(A,B) = \text {Trace}(A^\top B)\), the Stiefel manifold
is a Riemannian submanifold of \(\mathbb {R}^{n\times m}\). The tangent space at any \(X \in \text {St}(m,n)\) is given by \(T_X \text {St}(m,n) = \{ Z\in \mathbb {R}^{n\times m} | X^\top Z+Z^\top X=0 \},\) and the orthogonal projection \(P_X\) onto \(T_X \text {St}(m,n)\) is given by \(P_X Z = Z - \frac{1}{2} X(X^\top Z + Z^\top X).\) A retraction on \(\text {St}(m,n) \) is given by \(\mathcal {R}_X(\xi ) = \text {qf} (X+\xi ) ,\) where \(\text {qf}(A)\) denotes the Q factor of the QR factorization of the matrix \(A\in \mathbb {R}^{n\times m}\) as \(A=QR\) where \(Q\in \text {St}(m,n)\) and R is an upper triangular \(n\times m\) matrix with strictly positive diagonal elements (Absil et al. 2008).
A generalized eigenvector problem consists of finding the m smallest eigenvalues of a \(n\times n\) symmetric matrix A and corresponding eigenvectors. This problem can be formulated as a Riemannian optimization problem on the Stiefel manifold \(\text {St}(m,n)\) via the Brockett cost function
where \(N = \text {diag}(\mu _1 , \ldots , \mu _m)\) for arbitrary \(0 \le \mu _1 \le \ldots \le \mu _m \). The columns of a global minimizer of f are eigenvectors corresponding to the m smallest eigenvalues of A (see Absil et al. (2008)). If we define \(\bar{f} : \mathbb {R}^{n\times m} \rightarrow \mathbb {R}\) via \(X\mapsto \bar{f}(X) = \text {Trace}(X^\top AXN),\) then f is the restriction of \(\bar{f}\) to \(\text {St}(m,n) \) so
The unbalanced orthogonal Procrustes problem consists of minimizing the function
on the Stiefel manifold \(\text {St}(m,n) \), for given matrices \(A \in \mathbb {R}^{l\times n }\) and \(B \in \mathbb {R}^{l\times m }\) with \(l \ge n\) and \(l>m\), where \(\Vert \cdot \Vert _F\) is the Frobenius norm \(\Vert X \Vert _F^2 = \text {Trace}(X^\top X)= \sum _{ij}{X_{ij}^2}\). If we define \(\bar{f} : \mathbb {R}^{n\times m} \rightarrow \mathbb {R}\) via \(X\mapsto \bar{f}(X) = \Vert AX-B \Vert _F^2,\) then f is the restriction of \(\bar{f}\) to \(\text {St}(m,n) \) so
Note that the special case where \(n=m\) is the balanced orthogonal Procrustes problem. In this case, \(\text {St}(m,n) = O(n)\) so \(\Vert AX \Vert _F^2 = \Vert A \Vert _F^2\) and minimizing the function \(f(X) = \Vert AX-B \Vert _F^2\) is replaced by the problem of maximizing \(\text {Trace}(X^\top A^\top B)\) over \(X\in O(n)\). A solution is then given by \(X^* = UV^\top \) where \(B^\top A = U \Sigma V^\top \) is the Singular Value Decomposition of \(B^\top A\) with square orthogonal matrices U and V, and the solution is unique provided \(B^\top A\) is nonsingular (see Eldén and Park (1999); Golub and Van Loan (2013)).
5.4 Numerical Methods
5.4.1 Hamiltonian Taylor Variational Integrators (HTVIs)
HTVIs were first introduced in Schmitt et al. (2018). A discrete approximate Hamiltonian is constructed by approximating the flow map and the trajectory associated with the boundary values using a Taylor method, and approximating the integral by a quadrature rule. The Hamiltonian Taylor variational integrator is then generated by the discrete Hamilton’s equations. More explicitly, Type II HTVIs are constructed as follows:
-
(i)
Construct the r-order and \((r+1)\)-order Taylor methods \(\Psi _h^{(r)}\) and \(\Psi _h^{(r+1)}\) approximating the exact time-h flow map \(\Phi _h : T^*Q \rightarrow T^*Q\).
-
(ii)
Approximate \(p(0)=p_0\) by the solution \(\tilde{p}_0\) of \( p_1 = \pi _{T^*Q} \circ \Psi _h^{(r)}(q_0,\tilde{p}_0) , \) where \(\pi _{T^*Q}:(q,p)\mapsto p\).
-
(iii)
Choose a quadrature rule of order s with weights and nodes given by \((b_i,c_i)\) for \(i=1,...,m\) and generate approximations \((q_{c_i},p_{c_i}) \approx (q(c_i h),p(c_i h))\) via \( (q_{c_i},p_{c_i}) = \Psi _{c_i h}^{(r)}(q_0,\tilde{p}_0).\)
-
(iv)
Approximate \(q_1\) via \( \tilde{q}_1 = \pi _{Q} \circ \Psi _h^{(r+1)}(q_0,\tilde{p}_0),\) where \(\pi _{Q}:(q,p)\mapsto q\).
-
(v)
Use the continuous Legendre transform to obtain \(\dot{q}_{c_i} = \frac{\partial H}{\partial p_{c_i}}\).
-
(vi)
Apply the quadrature rule to obtain the associated discrete right Hamiltonian \( H_d^+(q_0,p_1) = p_1 \tilde{q}_1 - h \sum _{i=1}^{m}{b_i \left[ p_{c_i} \dot{q}_{c_i} - H(q_{c_i},p_{c_i}) \right] }.\)
-
(vii)
The variational integrator is then defined by the discrete right Hamilton’s equations.
Note that the following error analysis result concerning the order of accuracy of HTVIs was derived in Duruisseaux et al. (2021) (it can be extended to the constrained case via the strategy and results of Sect. 4.2):
Theorem 5.1
If the Hamiltonian H and its partial derivative \(\frac{\partial H}{\partial p}\) are Lipschitz continuous in both variables, then \(H_d^+(q_0,p_1)\) approximates \(H_d^{+,E}(q_0,p_1)\) with at least order of accuracy \(\min {(r+1,s)}\).
By Theorem 2.2 in Schmitt and Leok (2017), the associated discrete Hamiltonian map has the same order of accuracy.
In this paper, we will use the Direct approach and Adaptive approach \(r=0\) Type II HTVIs constructed in Duruisseaux et al. (2021) based on the Direct and Adaptive discrete right Hamiltonians (respectively)
5.4.2 Euler–Lagrange Simple Discretization
In Duruisseaux and Leok (2022a), the p-Bregman Euler–Lagrange equations were rewritten as a first-order system of differential equations, for which a Riemannian version of a semi-implicit Euler scheme was applied to obtain the following algorithm:
Version I of Algorithm 2 corresponds to the usual update for the semi-implicit Euler scheme, while Version II is inspired by the reformulation of Nesterov’s method from Sutskever et al. (2013) that uses a corrected gradient \(\nabla f(X_k +h b_kV_k)\) instead of the traditional gradient \(\nabla f(X_k)\).
5.4.3 Riemannian Gradient Descent (RGD)
This is a generalization of Gradient Descent to the setting of Riemannian manifolds which involves the Riemannian gradient and a retraction.
5.5 Numerical Results
It is noted in Duruisseaux and Leok (2022a) that although higher values of p in Algorithm 2 result in provably faster rates of convergence, they also appear to be more prone to stability issues under numerical discretization, which can cause the numerical optimization algorithm to diverge. Numerical experiments in Duruisseaux et al. (2021) showed that in the normed vector space setting, geometric discretizations which respect the time-rescaling invariance and symplecticity of the Bregman Lagrangian and Hamiltonian flows were substantially less prone to these stability issues, and were therefore more robust, reliable, and computationally efficient. This was one of the motivations to develop time-adaptive Hamiltonian variational integrators for the Bregman Hamiltonians. Numerical experiments were conducted for the Rayleigh quotient minimization problem on \(\mathbb {S}^{n-1}\), and for the generalized eigenvalue and Procrustes problems on the Stiefel manifold \(\text {St}(m,n)\).
The results from Fig. 1 show how the Hamiltonian Taylor variational integrators compare to the Euler–Lagrange discretizations from Duruisseaux and Leok (2022a) and the standard Riemannian gradient descent. Note that for certain instances of the Procrustes problem with certain initial values, all the algorithms converged to a local minimizer, and not the global minimizer, of the objective function. We can observe from Fig. 1 that for the same value of the timestep h, the Adaptive Hamiltonian variational integrator clearly outperforms its Direct counterpart, Riemannian gradient descent and the Euler–Lagrange discretizations in terms of number of iterations required. Furthermore, unlike the Euler–Lagrange discretizations (Algorithm 2) and the Riemannian gradient descent (Algorithm 3), the HTVI methods (Algorithm 1) do not require the use of retractions or parallel transports. Note that the Rayleigh minimization results indicate that the Euler–Lagrange discretizations suffer from stability issues leading to a loss of convergence, as the polynomially growing unbounded coefficient \(C p^2 (kh)^{p-2} \) is multiplied with \(\text {grad} f\), so for this product to be bounded, the gradient has to decay to zero, but due to finite numerical precision, the gradient remains bounded away from zero, thereby causing the product to grow without bound. This issue can be resolved by adding a suitable upper bound to the coefficient \(C p^2 (kh)^{p-2} \) in the updates, as can be seen both for the Euler–Lagrange discretizations and Hamiltonian variational integrators for the problems on \(\text {St}(m,n)\).
However, the algorithms generated by these constrained Hamiltonian variational integrators are implicit, which can significantly increase the cost per iteration as the dimension of the problem becomes very large. In this case, it might be beneficial to consider other options using the unconstrained explicit Hamiltonian Taylor variational integrator, such as incorporating the constraints within the objective function as a penalty, although this might not constrain the solution trajectory to lie exactly on the manifold, or using projections if they can be computed efficiently and accurately for the Riemannian manifold of interest (Duruisseaux and Leok 2021). Further, note that the implementation of the Hamiltonian variational integrators needs a very careful tuning of the various parameters at play, which may be challenging and thus also motivates the development of different methods.
6 Conclusion
Motivated by variational formulations of optimization problems on Riemannian manifolds, we first studied the relationship between the constrained Type I/II/III variational principles and the corresponding constrained Hamilton’s or Euler–Lagrange equations both in continuous and discrete time, and derived variational error analysis results for the maps defined implicitly by the resulting discrete constrained equations. We then exploited these discrete constrained variational integrators and the variational formulation of accelerated optimization on Riemannian manifolds from Duruisseaux and Leok (2022a) to numerically solve the generalized eigenvalue and Procrustes problems on \(\mathbb {S}^{n-1}\) and \(\text {St}(m,n)\).
The numerical experiments conducted in this paper corroborated the observation made for the vector space setting in Duruisseaux et al. (2021) that the adaptive Hamiltonian variational integrator is significantly more efficient than the direct Hamiltonian variational integrator, and that it can significantly outperform the Euler–Lagrange discretizations and Riemannian gradient descent, when its parameters are tuned carefully. Furthermore, it was noted that unlike the Euler–Lagrange discretizations and Riemannian gradient descent, the Hamiltonian algorithms did not require the use of retractions or parallel transports, which could be important when the problem considered lies on a Riemannian manifold for which it might not be possible to compute or approximate these objects efficiently.
We noted however that tuning the parameters of these discrete constrained variational integrators can be challenging, and also that the resulting algorithms are implicit, which may significantly increase the cost per iteration as the dimension of the problem becomes very large, in which case it might be beneficial to consider using the unconstrained explicit HTVIs with projections (Duruisseaux and Leok 2021) or by incorporating the constraints within the objective function as a penalty. Moreover, although the Whitney and Nash embedding theorems (Whitney 1944a, b; Nash 1956) imply that there is no loss of generality when studying Riemannian manifolds only as submanifolds of Euclidean spaces, there are limitations to the constrained integration strategy based on embeddings presented in this paper, and an approach intrinsically defined on Riemannian manifolds would be desirable. Indeed, the embedding approach usually leads to higher-dimensional computations, and requires an effective way of constructing the embedding or a natural way of writing down equations that constrain the problem and the numerical solutions to the Riemannian manifold. Furthermore, most results in Riemannian geometry or results concerning specific Riemannian manifolds are proven from an intrinsic perspective because the embedding approach tends to flood intrinsic geometric properties of the manifold with superfluous information coming from the additional dimensions of the Euclidean space. This motivates the development of intrinsic methods that would exploit the symmetries and geometric properties of the manifold and of the problem at hand.
Developing an intrinsic extension of Hamiltonian variational integrators to manifolds will require some additional work, since the current approach involves Type II/III generating functions \(H_d^+(q_k, p_{k+1})\), \(H_d^-(p_k, q_{k+1})\), which depend on the position at one boundary point, and the momentum at the other boundary point. However, this does not make intrinsic sense on a manifold, since one needs the base point in order to specify the corresponding cotangent space, and one should ideally consider a Hamiltonian variational integrator construction based on discrete Dirac mechanics (Leok and Ohsawa 2011), which would yield a generating function \(E_d^+(q_k, q_{k+1}, p_{k+1})\), \(E_d^-(q_k, p_k, q_{k+1})\), that depends on the position at both boundary points and the momentum at one of the boundary points. This approach can be viewed as a discretization of the generalized energy \(E(q,v,p)=\langle p,v\rangle - L(q,v)\), in contrast to the Hamiltonian \(H(q,p)=\mathop {\mathrm {ext}}\limits _{v}\langle p,v\rangle - L(q,v)=\left. \langle p,v\rangle - L(q,v)\right| _{p=\frac{\partial L}{\partial v}}\). On the other hand, the formulation of Lagrangian variational integrators presented in the introduction of Sect. 2 makes sense intrinsically on manifolds, and a framework for variable time-stepping in Lagrangian variational integration was introduced in our subsequent work (Duruisseaux and Leok 2022b) to design intrinsic time-adaptive Lagrangian variational integrators for accelerated optimization on Riemannian manifolds.
It would also be interesting to extend the proposed approach to the problem of optimization with nonintegrable constraints, which naturally leads to the question of whether vakonomic or nonholonomic mechanics is the appropriate description (Cortés et al. 2002). In the context of optimization with nonintegrable constraints, the relevant extension will likely involve vakonomic variational integrators (Benito and Martín de Diego 2005; Jiménez and Martín de Diego 2012). However, it would be interesting to relate the methods introduced in this paper to the existing work on variational integrators applied to optimal control problems (Junge et al. 2005; de León et al. 2007), and the discrete optimal control of nonholonomic dynamical systems would likely require a combination of the methods described here and nonholonomic integrators (Cortés and Martínez 2001; de León et al. 2004; Fedorov and Zenkov 2005; McLachlan and Perlmutter 2006).
Data Availability Statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Abraham, R., Marsden, J.E., Ratiu, T.: Manifolds, tensor analysis, and applications, vol. 75. In: Applied Mathematical Sciences, 2nd edn. Springer, New York (1988)
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ (2008)
Ahn, K., Sra, S.: From Nesterov’s estimate sequence to Riemannian acceleration. In Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research (pp. 84–118). PMLR, 09–12 July (2020)
Alimisis, F., Orvieto, A., Bécigneul, G., Lucchi, A.: Practical accelerated optimization on Riemannian manifolds (2020a)
Alimisis, F., Orvieto, A., Bécigneul, G., Lucchi, A.: A continuous-time perspective for modeling acceleration in Riemannian optimization. In Proceedings of the 23rd International AISTATS Conference, volume 108 of PMLR (pp. 1297–1307) (2020b)
Alimisis, F., Orvieto, A., écigneul, G.B., Lucchi, A.: Momentum improves optimization on Riemannian manifolds. In AISTATS (2021)
Arnol’d, V.I.: Mathematical Methods of Classical Mechanics, vol. 60 . In: Graduate Texts in Mathematics, 2nd edn. Springer-Verlag, New York (1989) Translated from the Russian by K. Vogtmann and A, Weinstein (1989)
Benettin, G., Giorgilli, A.: On the Hamiltonian interpolation of near-to-the identity symplectic mappings with application to symplectic integration algorithms. J. Stat. Phys. 74, 1117–1143 (1994). https://doi.org/10.1007/BF02188219
Benito, R., Martín de Diego, D.: Discrete vakonomic mechanics. J. Math. Phys. 46(8), 083521 (2005). https://doi.org/10.1063/1.2008214
Boumal, N.: An introduction to optimization on smooth manifolds (2020) URL http://www.nicolasboumal.net/book
Cortés, J., Martínez, S.: Non-holonomic integrators. Nonlinearity 14, 1365–1392 (2001)
Cortés, J., de León, M., Martín de Diego, D., Martínez, S.:. Geometric description of vakonomic and nonholonomic dynamics. comparison of solutions. SIAM J. Control Optim., 41 (5):1389–1412, 2002. https://doi.org/10.1137/S036301290036817X
de León, M., Martín de Diego, D., Santamaría-Merino, A.: Geometric numerical integration of nonholonomic systems and optimal control problems. Eur. J. Control. 10(5), 515–521 (2004)
de León, M., Martín de Diego, D., Santamaría-Merino, A.: Discrete variational integrators and optimal control theory. Adv. Comput. Math. 26, 251–268 (2007). https://doi.org/10.1007/s10444-004-4093-5
Duruisseaux, V., Leok, M.: Accelerated optimization on Riemannian manifolds via projected variational integrators. (2021)
Duruisseaux, V., Leok, M.: A variational formulation of accelerated optimization on Riemannian manifolds. SIAM J. Math. Data Sci. (2022a)
Duruisseaux, V., Leok, M.: Time-adaptive Lagrangian variational inegrators for accelerated optimization on manifolds (2022b). URL https://arxiv.org/abs/2201.03774
Duruisseaux, V., Schmitt, J., Leok, M.: Adaptive Hamiltonian variational integrators and applications to symplectic accelerated optimization. SIAM J. Sci. Comput. 43(4), A2949–A2980 (2021). https://doi.org/10.1137/20M1383835
Eldén, L., Park, H.: A Procrustes problem on the Stiefel manifold. Numer. Math. 82(4), 599–619 (1999). https://doi.org/10.1007/s002110050432
Fedorov, Y.N., Zenkov, D.V.: Discrete nonholonomic LL systems on Lie groups. Nonlinearity 18, 2211–2241 (2005)
Golub, G.H., Van Loan, C.F.: Matrix computations. In: Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore (2013)
Hairer, E., Lubich, C., Wanner, G.: Geometric numerical integration. In: Springer Series in Computational Mathematics, vol. 31, 2nd edn. Springer-Verlag, Berlin (2006)
Hall, J., Leok, M.: Spectral variational integrators. Numer. Math. 130(4), 681–740 (2015)
Holm, D., Schmah, T., Stoica, C.: Geometric mechanics and symmetry: From finite to infinite dimensions. In: Oxford Texts in Applied and Engineering Mathematics. Oxford University Press, Oxford (2009)
Jiménez, F., Martín de Diego, D.: Continuous and discrete approaches to vakonomic mechanics. Revista De La Real Academia De Ciencias Exactas Fisicas Y Naturales Serie A-matematicas 106, 03 (2012). https://doi.org/10.1007/s13398-011-0028-4
Jost, J.: Riemannian Geometry and Geometric Analysis, 7th edn. Springer, Berlin (2017)
Junge, O., Marsden, J.E., Ober-Blöbaum, S.: Discrete mechanics and optimal control. IFAC Proc. Vol. 38(1), 538–543 (2005)
Lang, S.: Fundamentals of Differential Geometry. In: Graduate Texts in Mathematics, vol. 191. Springer -Verlag, New York (1999)
Lee, J.M.: Introduction to Riemannian Manifolds. In: Graduate Texts in Mathematics, vol. 176, 2nd edn. Springer, Cham (2018)
Leok, M., Ohsawa, T.: Variational and geometric structures of discrete Dirac mechanics. Found. Comput. Math. 11(5), 529–562 (2011)
Leok, M., Zhang, J.: Discrete Hamiltonian variational integrators. IMA J. Numer. Anal. 31(4), 1497–1532 (2011)
Liu, Y., Shang, F., Cheng, J., Cheng, H., Jiao, L.: Accelerated first-order methods for geodesically convex optimization on Riemannian manifolds. NeurIPS 30, 4868–4877 (2017)
Marsden, J.E., Ratiu, T.S.: Introduction to mechanics and symmetry. In: Texts in Applied Mathematics, vol. 17, 2nd edn. Springer-Verlag, New York (1999)
Marsden, J.E., West, M.: Discrete mechanics and variational integrators. Acta Numer. 10, 357–514 (2001)
McLachlan, R.I., Perlmutter, M.: Integrators for nonholonomic mechanical systems. J. Nonlinear Sci. 16, 283–328 (2006)
Nash, J.: The imbedding problem for Riemannian manifolds. Ann. Math. 63(1), 20–63 (1956)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(\cal{O}(1/k^2)\). Sov. Math. Doklady 27(2), 372–376 (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization, vol. 87. Kluwer Academic Publishers, Boston, MA (2004)
Reich, S.: Backward error analysis for numerical integrators. SIAM J. Numer. Anal. 36, 1549–1570 (1999)
Schmitt, J.M., Leok, M.: Properties of Hamiltonian variational integrators. IMA J. Numer. Anal. 38(1), 377–398 (2017)
Schmitt, J.M., Shingel, T., Leok, M.: Lagrangian and Hamiltonian Taylor variational integrators. BIT Numer. Math. 58, 457–488 (2018). https://doi.org/10.1007/s10543-017-0690-9
Su, W., Boyd, S., Candes, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17(153), 1–43 (2016)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In ICML’13 Proceedings of the 30th International Conference on International Conference on Machine Learning Vol. 28 (pp. 1139–1147), Atlanta, GA, USA (2013)
Whitney, H.: The singularities of a smooth \(n\)-manifold in \((2n--1)\)-space. Ann. Math. 45(2), 247–293 (1944a)
Whitney, H.: The self-intersections of a smooth \(n\)-manifold in \(2n\)-space. Ann. Math. 45(2), 220–246 (1944b)
Wibisono, A., Wilson, A., Jordan, M.: A variational perspective on accelerated methods in optimization. Proc. Natl. Acad. Sci. 113(47), E7351–E7358 (2016)
Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. In 29th Annual Conference on Learning Theory pp. 1617–1638 (2016)
Zhang, H., Sra, S.: An estimate sequence for geodesically convex optimization. In Proceedings of the 31st Conference On Learning Theory, vol. 75 of Proceedings of Machine Learning Research pp. 1703–1723 (2018)
Funding
The authors were supported in part by NSF under Grants DMS-1411792, DMS-1345013, DMS-1813635, CCF-2112665, by AFOSR under Grant FA9550-18-1-0288, and by the DoD under grant 13106725 (Newton Award for Transformative Ideas during the COVID-19 Pandemic).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Arash Yavari.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A. Proofs of Theorems for Constrained Variational Mechanics
Appendix A. Proofs of Theorems for Constrained Variational Mechanics
1.1 A.1. Proof of Theorem 2.1
Theorem A.1
Consider the constrained action functional \(\mathfrak {S} : C^2([0,T],\mathcal {Q} \times \Lambda ) \rightarrow \mathbb {R}\) given by
The condition that \(\mathfrak {S} (q(\cdot ), \lambda (\cdot )) \) is stationary with respect to the boundary conditions \(\delta q(0) = 0\) and \(\delta q(T) =0\) is equivalent to \((q(\cdot ),\lambda (\cdot ))\) satisfying the constrained Euler–Lagrange equations
Proof
Computing the variation of \(\mathfrak {S}\) yields
Using integration by parts and the boundary conditions \(\delta q(0) = 0\) and \(\delta q (T) = 0,\) we get
Now, if \(\delta \mathfrak {S} = 0\), then the fundamental theorem of the calculus of variations (Arnol’d 1989) yields the constrained Euler–Lagrange Eq. (A.2). Conversely, if \((q,\lambda )\) satisfies the constrained Euler–Lagrange Eq. (A.2), then the integrand vanishes and \(\delta \mathfrak {S} = 0\). \(\square \)
1.2 A.2. Proof of Theorem 3.1
Theorem A.2
Consider the constrained action functional \(\mathfrak {S} : C^2([0,T],T^*\mathcal {Q} \times \Lambda ) \rightarrow \mathbb {R}\) given by
The condition that \(\mathfrak {S} (q(\cdot ),p(\cdot ), \lambda (\cdot )) \) is stationary with respect to the boundary conditions \(\delta q(0) = 0\) and \(\delta p(T) =0\) is equivalent to \((q(\cdot ),p(\cdot ),\lambda (\cdot ))\) satisfying Hamilton’s canonical constrained equations
Proof
Computing the variation of \(\mathfrak {S}\) yields
Using integration by parts and the boundary conditions \(\delta q(0) = 0\) and \(\delta p (T) = 0,\) we get
Now, if \(\delta \mathfrak {S} = 0\), then the fundamental theorem of the calculus of variations (Arnol’d 1989) yields Hamilton’s constrained Eq. (A.4). Conversely, if \((q,p,\lambda )\) satisfies Hamilton’s constrained Eq. (A.4), then the integrand vanishes and \(\delta \mathfrak {S} = 0\). \(\square \)
1.3 A.3. Proof of Theorem 3.2
Theorem A.3
Consider the constrained action functional \(\mathfrak {S} : C^2([0,T],T^*\mathcal {Q} \times \Lambda ) \rightarrow \mathbb {R}\) given by
The condition that \(\mathfrak {S} (q(\cdot ),p(\cdot ), \lambda (\cdot )) \) is stationary with respect to the boundary conditions \(\delta q(T) = 0\) and \(\delta p(0) =0\) is equivalent to \((q(\cdot ),p(\cdot ),\lambda (\cdot ))\) satisfying Hamilton’s canonical constrained equations
Proof
The proof is almost identical to that of Theorem 3.1. We compute the variation of \(\mathfrak {S}\) as before and get Eq. (A.5) except that the term \(\left( q(T) \delta p(T) + p(T) \delta q (T) \right) \) is replaced by \(\left( - q(0) \delta p(0) - p(0) \delta q (0)\right) \). As before, integration by parts and the boundary conditions \(\delta q(T) = 0\) and \(\delta p(0) =0\) yield
Then, if \(\delta \mathfrak {S} = 0\), then the fundamental theorem of the calculus of variations (Arnol’d 1989) yields Hamilton’s constrained Eq. (A.7). Conversely, if \((q,p,\lambda )\) satisfies Hamilton’s constrained Eq. (A.7), then the integrand vanishes and \(\delta \mathfrak {S} = 0\). \(\square \)
1.4 A.4. Proof of Theorem 2.2
Theorem A.4
The exact time-T flow map of Hamilton’s equations \((q_0,p_0) \mapsto (q_T,p_T)\) is implicitly given by the following relations:
Thus, \( \mathcal {S} (q_0,q_T)\) is a Type I generating function that generates the exact flow of the constrained Euler–Lagrange Eq. (2.6).
Proof
Using integration by parts and simplifying gives
By Theorem 2.1, the extremum of the action functional \(\mathfrak {S}\) is achieved when \((q,\lambda )\) satisfies the constrained Euler–Lagrange Eq. (2.6), so we get \( D_1 \mathcal {S} (q_0,q_T) = - \frac{\partial L}{ \partial \dot{q}}(q_0,\dot{q}(0)) \) and \( D_2 \mathcal {S} (q_0,q_T) = \frac{\partial L}{ \partial \dot{q}}(q_T,\dot{q}(T)) \). \(\square \)
1.5 A.5. Proof of Theorem 3.3
Theorem A.5
The exact time-T flow map of Hamilton’s equations \((q_0,p_0) \mapsto (q_T,p_T)\) is implicitly given by the following relations:
In particular, \( \mathcal {S} (q_0,p_T)\) is a Type II generating function that generates the exact flow of Hamilton’s constrained Eq. (3.6).
Proof
Using integration by parts and simplifying gives
By Theorem 3.1, the extremum of the action functional \(\mathfrak {S}\) is achieved when the curve \((q,p,\lambda )\) satisfies Hamilton’s constrained Eq. (3.6), so the integrands vanish, and thus \(p_0 = \frac{\partial \mathcal {S} }{ \partial q_0} (q_0,p_T) = D_1 \mathcal {S} (q_0,p_T)\) and \(q_T = \frac{\partial \mathcal {S} }{ \partial p_T} (q_0,p_T) = D_2 \mathcal {S} (q_0,p_T)\). \(\square \)
1.6 A.6. Proof of Theorem 3.4
Theorem A.6
The exact time-T flow map of Hamilton’s equations \((q_0,p_0) \mapsto (q_T,p_T)\) is implicitly given by the following relations:
In particular, \( \mathcal {S} (q_T,p_0)\) is a Type III generating function that generates the exact flow of Hamilton’s constrained Eq. (3.8).
Proof
Integrating by parts and simplifying yields
By Theorem 3.2, the extremum of the action \(\mathfrak {S}\) is achieved when the curve \((q,p,\lambda )\) satisfies Hamilton’s constrained Eq. (3.8), so the integrands vanish, and thus \(p_T = - \frac{\partial \mathcal {S} }{ \partial q_T} (q_T,p_0) = - D_1 \mathcal {S} (q_T,p_0)\) and \(q_0 = - \frac{\partial \mathcal {S} }{ \partial p_0} (q_T,p_0) = - D_2 \mathcal {S} (q_T,p_0)\). \(\square \)
1.7 A.7. Proof of Theorem 2.3
Theorem A.7
The Type I discrete Hamilton’s variational principles
are equivalent to the discrete constrained Euler–Lagrange equations
where \(L_d(q_{k},q_{k+1}) \) is defined via Eq. (2.11).
Proof
Using the fact that \(\delta q_0 =0\) and \(\delta q_N =0\), we have
If the discrete constrained Euler–Lagrange Eq. (A.12) are satisfied, then each term vanishes and \(\delta \mathfrak {S}_d^{\pm } = 0\). Conversely, if \(\delta \mathfrak {S}_d^{\pm } = 0\), then a discrete fundamental theorem of the calculus of variations yields the discrete constrained Euler–Lagrange Eq. (A.12). \(\square \)
1.8 A.8. Proof of Theorem 3.5
Theorem A.8
The Type II discrete Hamilton’s phase space variational principle
is equivalent to the discrete constrained right Hamilton’s equations
where \(H_d^+ (q_k,p_{k+1}) \) is defined via Eq. (3.16).
Proof
Using the fact that \(\delta q_0 =0\) and \(\delta p_N =0\) since \((q_0,p_N)\) is fixed, we obtain the following expression for the variations of \( \mathfrak {S}_d^{+} \):
If the discrete constrained right Hamilton’s Eq. (A.14) are satisfied, then each term vanishes and \(\delta \mathfrak {S}_d^{+}= 0\). Conversely, if \(\delta \mathfrak {S}_d^{+} = 0\), then a discrete fundamental theorem of the calculus of variations yields the discrete constrained right Hamilton’s Eq. (A.14). \(\square \)
1.9 A.9. Proof of Theorem 3.6
Theorem A.9
The Type III discrete Hamilton’s phase space variational principle
is equivalent to the discrete constrained left Hamilton’s equations
where \(H_d^- (q_{k+1},p_{k}) \) is defined via Eq. (3.17).
Proof
Using the fact that \(\delta q_N =0\) and \(\delta p_0 =0\) since \((q_N,p_0)\) is fixed, we obtain the following expression for the variations of \( \mathfrak {S}_d^{-} \):
If the discrete constrained left Hamilton’s Eq. (A.16) are satisfied, then each term vanishes and \(\delta \mathfrak {S}_d^{-} = 0\). Conversely, if \(\delta \mathfrak {S}_d^{-} = 0\), then a discrete fundamental theorem of the calculus of variations yields the discrete constrained left Hamilton’s Eq. (A.16). \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Duruisseaux, V., Leok, M. Accelerated Optimization on Riemannian Manifolds via Discrete Constrained Variational Integrators. J Nonlinear Sci 32, 42 (2022). https://doi.org/10.1007/s00332-022-09795-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00332-022-09795-9
Keywords
- Riemannian optimization
- Accelerated optimization
- Symplectic optimization
- Constrained variational integrators