1 Introduction

Let \(\gamma :{\mathbb {T}}\rightarrow {{\mathbb {R}}^{m}}\) be a sufficiently smooth embeddingFootnote 1 of the circle \({\mathbb {T}}\) into Euclidean space. Its Möbius energy [27, 61] is defined as

$$\begin{aligned} {\mathcal {E}}(\gamma ) \;{:}{=}\; \int _{\mathbb {T}}\int _{\mathbb {T}}\left( \frac{1}{|{\gamma (x)-\gamma (y)}|^2} - \frac{1}{\varrho _\gamma ^2(x,y)} \right) \, |{\gamma '(x)}|\, |{\gamma '(y)}| \, {{\text {d}}}x \, {{\text {d}}}y, \end{aligned}$$
(1)

where \(\varrho _\gamma (x,y)\) denotes the length of the shortest arc of \(\gamma \) connecting \(\gamma (x)\) and \(\gamma (y)\).

The original motivation [28] was to define an energy that measures complexity or “entangledness” of a given curve. One may expect that minimization will unravel the initial configuration to a state of less complexity. Ideally, this should also preserve topological properties, in particular the isotopy class. By definition, an isotopy class is a path component in the space of embedded curves. The Möbius energy was designed to erect infinite energy barriers that separate isotopy classes within the space of curves. The term \(|{\gamma (x)-\gamma (y)}|^{-2}\) blows up whenever a self-contact emerges, lending itself as contact barrier for modeling impermeability of curves and rods. Moreover, this term promotes the spreading of the geometry, which indeed leads to the desired unfurling. Subtracting the second term \(\varrho _\gamma ^{-2}(x,y)\) guarantees that the energy is finite for sufficiently smooth embeddings. This way, any time-continuous descent method like, e.g., a gradient flow, will necessarily preserve the isotopy class.Footnote 2 Another pleasant feature of the Möbius energy is that its critical points enjoy higher smoothness.

Fig. 1
figure 1

Discrete Sobolev gradient descent subject to edge length constraint and barycenter constraint. The isotopy class is maintained along the iteration which is the crucial feature of a knot energy. As initial condition, we use a “difficult” configuration proposed in [27] (1648 edges; numbers in parentheses indicate the iteration steps). The global minimizer (the round circle) is reached after about 200 iterations. The curves have perceived constant thickness in the plots while a coordinate cross serves as a reference for the respective scaling factor. See also Fig. 3 for a comparison to further optimization methods; the present one is “\(W^{3/2,2}\) projected gradient, explicit”

In this paper we propose a new concept of numerical optimization techniques for the large family of self-repulsive energies by discussing the prototypical case of the Möbius energy. Due to the nonlocal point-point interactions (which manifest themselves in the occurrence of a double integral), any evaluation of the energy or its gradient is rather expensive; this renders the numerical optimization a challenging task. The key idea of our approach is to introduce a special geometric variant of the metric of the Sobolev space \(W^{{3/2},2}\) that discourages movement of an embedded curve in regions of near self-contact. Contrary to black-box approaches, our method allows us to minimize the Möbius energy of even quite complicated starting configuration within only a few hundred iterations (see Figs. 16). As illustrated in Fig. 2, computing gradients with respect to this metric allows for choosing significantly larger step sizes compared to the \(L^{2}\)- or even the \(W^{{3/2},2}\)-metric. This is in agreement with the interpretation of \(W^{{3/2},2}\)-gradient descent as a coarse discretization of an ordinary differential equation. In contrast to full discretization (i.e., in space and time) of a general (transient) partial differential equation, an ordinary differential equation does not require any mesh-dependent bound on the time step size for stability. Consequently, our gradient descent scheme requires only few iteration steps, even for fine spatial resolution. This makes it, besides from being robust, particularly efficient. This is demonstrated by the performance comparison in Fig. 3.

Potential applications for self-repulsive energies are manifold as they can be employed as barriers for shape optimization problems and physical simulation with self-contact: They arise, for instance, in mechanics [21, 29, 47, 80, 81, 93] and in molecular biology [22, 23, 34, 35, 52, 53]. The Möbius energy can also be considered as differentiable relaxation of curve thickness. For example, as reported in [82], the speed of migration of knotted DNA molecules undergoing gel electrophoresis seems to be proportional to the average crossing number of the corresponding maximizers of curve thickness. Software tools for the maximization of thickness or equivalently, for the minimization of ropelength, have been developed in [65] (SONO) and [2] (ridgerunner). Further potential fields of applications for repulsive energies include computer graphics [10, 79], packing problems [30, 31], the modeling of coiling and kinking of submarine communications cables [24, 100], and even solar coronal structures [70].

Fig. 2
figure 2

We visualize different gradients as vector fields along a given curve. The \(L^2\)-gradient is pathologically concentrated on regions of near self-contact. Consequently, one has to pick tiny step sizes to prevent self-collision. The pure \(W^{{3/2},2}\)-gradient behaves much better in the sense that it is more uniformly distributed along the curve. However, this can still be improved considerably by adding a lower order term to the inner product that discourages movement in regions of near self-contact, cf. Theorem 4.1

1.1 Previous work

Since its invention by O’Hara [61,62,63] and the very influential paper by Freedman, He, and Wang [27], the Möbius energy has been studied by many authors. Detailed investigations on its derivatives have been performed in [17, 37, 42]. Existence of minimizers in prime knot classes has been established in [27]. Invariance of the energy under conformal transformations of \({{\mathbb {R}}^{m}}\) has been studied in [3, 27, 49, 56]. Smoothness of minimizers has been established in [27, 37], while smoothness and even analyticity of all critical points has finally been shown in [17] and [18]. Except for the global minimizer [27] and first results on critical points in nontrivial prime knot classes [14, 44], almost nothing is known on the geometry of the energy space. In light of the Smale conjecture (proven by Hatcher [36]), it would be of great interest to know whether some gradient flow of the Möbius energy actually defines a retract of the unknots to the round circles. The \(L^2\)-gradient flow of the Möbius energy has been studied in [12, 13, 37].

Fig. 3
figure 3

Exemplary performance comparison between several feasible (top) and infeasible (bottom) optimization methods and with respect to various Sobolev metrics, applied to the initial configuration from Fig. 1 (1648 edges). “Feasible” means that the constraints were respected in each iteration step (up to a certain tolerance, of course). “Infeasible” means that a penalty formulation was used in place of hard constraints. Each dataset corresponds to a combination of an optimization method (encoded by line dashing) and a Sobolev-metric (encoded by color; e.g., green corresponds to our Sobolev metric) that have been employed to compute gradients. We see that apart from the implicit projected gradient descent (which generally does not work well in this context), all optimization methods perform best in conjuction with our \(W^{{3/2},2}\)-metric. All experiments were implemented in Mathematica® and ran single-threaded for 60 min on an Intel® Xeon® E5-2690 v3. Further details will be provided in Sect. 6.4

Various numerical methods have been devised for discretizing and minimizing the Möbius energy [44, 48, 49, 78], partially with error analysis [67, 68, 73]. A recently proposed scheme also preserves conformal invariance [5, 15].

The Möbius energy has also inspired the development of similar so-called knot energies [19, 33, 83, 86, 87] and higher-dimensional generalizations [43, 46, 49, 64, 84, 85, 88].

Fig. 4
figure 4

As a matter of fact, the statistics in Fig. 3 highly rely on the hardware. To provide a more independent comparison, we here plot the number of iteration steps versus the values of the Möbius energy attained after that time. Of course, as the effort involved for performing a single iteration step differs among the methods discussed here, it is debatable whether this is a meaningful unit after all

Both theoretical and numerical results have been obtained on linear combinations of the bending energy and the Möbius energy [51, 61, 95]. More generally, in order to find minimizers of an elastic energy within an isotopy class, each knot energy can be employed in two ways: either as regularizer as it was done, e.g., in [4,5,6, 29, 32, 40, 94, 97], or by using it to encode a hard bound into the domain, which was done with the knot thickness in [39, 76, 96].

The applicability of self-avoiding energies is heavily limited by their immense cost: Typical discretizations replace the double integrals by double sums which leads to a computational complexity of at least \(\varOmega ((N \cdot m)^2)\) for evaluating the discrete Möbius energy and its derivative, where \(N \cdot m\) is the number of degrees of freedom of the discretized geometry (e.g., the number of vertices of a polygonal line times the dimension of the ambient space). This issue can be mended by sophisticated kernel compression techniques, see [99]. In this article, however, we focus on another issue that is more related to mathematical optimization, namely the fact that, for \(N \rightarrow \infty \), the discretized optimization problems become increasingly ill-conditioned. It is well-known that the convergence rate of many gradient-based optimization methods (method of steepest descent, nonlinear conjugate gradient method, and also more sophisticated quasi-Newton methods like L-BFGS) is very sensitive to the condition number of the Hessian of the energy (at a minimum) on the one hand, and the inner product that is used to compute the gradients on the other hand. The Hessian of the Möbius energy is deeply related to the fractional Laplacian \((-\varDelta )^{3/2}\) which is a differential operator of order three, cf. [37]. Thus the condition number of the discrete problem grows like \(O(h^{-3})\) where h denotes the typical length of an edge in the discretization. In practice, this results in a rapid increase of the number of optimization iterations to “reach the minimizer” when the discretization is refined (i.e., for \(h \rightarrow 0\)). Combined with the immense cost of evaluating \({\mathcal {E}}\) and \(D{\mathcal {E}}\), this leads to a prohibitively high cost of minimizing \({\mathcal {E}}\) with black-box optimization routines (see Figs. 34).

In particular, this issue applies to the explicit Euler time discretization scheme for the \(L^2\)-gradient flow of the Möbius energy. Denoting the discretized energy by \({\mathcal {E}}_h\), the next time iterate \(\gamma _{(t + \varDelta t)}\) is computed from the current iterate \(\gamma _t\) by solving

$$\begin{aligned} \big \langle \tfrac{\gamma _{(t + \varDelta t)} - \gamma _t}{\varDelta t }, \varphi \big \rangle _{L^2_{\gamma _t}} + D {\mathcal {E}}_h(\gamma _{t}) \, \varphi = 0 \quad \hbox { for all discrete vector fields}\ \varphi :{\mathbb {T}}\rightarrow {{\mathbb {R}}^{m}}, \end{aligned}$$

where \(\left\langle u,v \right\rangle _{L^2_{\gamma _t}}=\int _{{\mathbb {T}}}\left\langle u(x),v(x) \right\rangle |{\gamma '(x)}|{{\text {d}}}x\). This can also be reinterpreted as method of steepest descent with respect to the (discretized) \(L^2\)-gradient and with step size \(\varDelta t>0\). Here the ill-conditioning manifests itself in the Courant–Friedrichs–Lewy condition: As the \(L^2\)-gradient flow is a system of third order parabolic partial differential equations, the step size has to be truncated to \(\varDelta t = {{\,\mathrm{O}\,}}(h^3)\) in order to make this scheme stable. This is also why a line search that enforces the Armijo condition (also referred to as first Wolfe condition), cf. [59], Chapter 3, will typically lead to tiny step sizes, rendering the method impractical for optimization (see Fig. 3). It is well-known that the Courant–Friedrichs–Lewy condition can be circumvented by implicit time integration schemes. For example, in the implicit Euler or backward Euler scheme, one determines the next iterate \(\gamma _{(t + \varDelta t)}\) by solving the equation

$$\begin{aligned} \big \langle \tfrac{\gamma _{(t + \varDelta t)} - \gamma _t}{\varDelta t}, \varphi \big \rangle _{L^2_{\gamma _t}} + D {\mathcal {E}}_h(\gamma _{(t + \varDelta t)}) \, \varphi = 0 \quad \text {for all discrete vector fields }\varphi :{\mathbb {T}}\rightarrow {{\mathbb {R}}^{m}}. \end{aligned}$$

Standard techniques for solving this nonlinear equation, e.g., Newton’s method, require solving multiple linearizations of the above equation and thus involve the Hessian \(DD{\mathcal {E}}_h\) in each time iteration. Moreover, the linearization has to be recomputed whenever the step size \(\varDelta t\) changes, which makes it nontrivial to set up an adaptive time stepping scheme. This explains why implicit time integrators turn out to be rather inefficient optimization schemes (see Fig. 3). If one allows oneself to employ second derivatives of \({\mathcal {E}}_h\), applying Newton’s method (and its damped or regularized derivates) for solving \(D{\mathcal {E}}_h(\gamma _{(t+\varDelta t)}) = 0\) in the first place would lend itself as a more efficient optimization algorithm. However, it is well-known that Newton’s method does not necessarily perform well when applied far away from critical points.

1.2 Sobolev gradients

These problems can be overcome by optimization methods based on Sobolev gradients which are defined in terms of a Sobolev metric G that is “natural” for the Möbius energy. Blatt [11] characterized the energy space of the Möbius energy \({\mathcal {E}}\) as \(W^{1,\infty }({\mathbb {T}};{\mathbb {R}}^m)\cap W^{{3/2},2}({\mathbb {T}};{\mathbb {R}}^m)\), cf. Theorem 2.1. Here and in the following, \(W^{s,p}\) denotes the Sobolev–Slobodeckiĭ space of functions with “s fractional derivatives in \(L^{p}\)” if \(s \not \in {\mathbb {Z}}\) and a conventional Sobolev space for \(s \in {\mathbb {Z}}\). This result points to the fact that \(D {\mathcal {E}}\) is a nonlinear differential operator of order \(2 \cdot \frac{3}{2} = 3\), which has already been observed by He [37]. So morally, a suitable inner product G should be of the form

$$\begin{aligned} G(u,w) \textstyle \;{:}{=}\; \int _{\mathbb {T}}\langle (- \varDelta )^{3/4} \, u(x), (- \varDelta )^{3/4} \, w(x) \rangle \, {{\text {d}}}x . \end{aligned}$$

Then the G-gradient \({{\,\mathrm{grad}\,}}({\mathcal {E}}) |_\gamma \) at \(\gamma \) can be defined by the following weak formulation:

$$\begin{aligned} G({{\,\mathrm{grad}\,}}({\mathcal {E}}) |_\gamma , w) \;{:}{=}\; D{\mathcal {E}}(\gamma )\,w \quad \hbox { for all}\ w \in C^{\infty }({\mathbb {T}};{\mathbb {R}}^m). \end{aligned}$$

Thus, at least formally, the G-gradient satisfies the equation

$$\begin{aligned} {{\,\mathrm{grad}\,}}({\mathcal {E}}) |_\gamma = (- \varDelta )^{-3/2} \, D{\mathcal {E}}(\gamma ). \end{aligned}$$

By a somewhat naive counting of fractional derivatives, the right hand side is a nonlinear differential operator of order zero. Hence there is a chance that \({{\,\mathrm{grad}\,}}({\mathcal {E}}) |_\gamma \) resides in the same Banach space as \(\gamma \) so that \({{\,\mathrm{grad}\,}}({\mathcal {E}})\) would be a vector field. Then the evolution equation

$$\begin{aligned} \partial _t \gamma _t = - {{\,\mathrm{grad}\,}}({\mathcal {E}}) |_{\gamma _{t}} \end{aligned}$$
(2)

would actually be an ordinary differential equation. Indeed, this turns out to be true and is part of our main result (see Theorem 1.2). This seems to imply that no Courant–Friedrichs–Lewy condition applies to the discretized problem, so that the number of gradient descent iterations “to reach the minimum” is quite insensitive to the mesh resolution. At least, this is what we observed in our experiments.

Since the inner product G involves a choice of a Riemannian metric on the parametrization domain (line element and Laplacian), it is even more natural to define a \(\gamma \)-dependent family \(\gamma \mapsto G_\gamma \) of inner products. With the Riesz operator , the G-gradient can then be expressed by

(3)

There are plenty of possible choices for . Most important is that is an elliptic pseudo-differential operator of order three. All compact perturbations of that are positive-definite will lead to the operator with the same qualitative properties. In particular, we are not limited to the exact fractional Laplacian; this gives us the freedom to pick an that is computationally more amenable. Up to lower order terms, we design G such that it resembles the \(W^{{3/2},2}\)-Gagliardo inner product, replacing intrinsic distances by (the easier computable) secant distances (see Theorem 4.1). For a curve parametrized by arc length (i.e., \(|{\gamma '}| = 1\)) and up to lower order terms, it reads as

$$\begin{aligned} G_{\gamma }(u,w)&= \textstyle \int _{\mathbb {T}}\int _{\mathbb {T}}\Big \langle \frac{u'(x) - u'(y)}{ |{\gamma (x)-\gamma (y)}|^{1/2}} , \frac{w'(x) - w'(y)}{ |{\gamma (x)-\gamma (y)}|^{1/2}} \Big \rangle \, \frac{{{\text {d}}}x\, {{\text {d}}}y}{|{\gamma (x) - \gamma (y)}|} + {\text {l.o.t.}}, \end{aligned}$$
(4)

where in case of a curve \(\gamma \) parameterized by arc length the lower-order terms are given by

$$\begin{aligned} \textstyle {\text {l.o.t.}}&= \int _{{\mathbb {T}}} \int _{{\mathbb {T}}} \Big \langle \frac{u(x)-u(y)}{\left|\gamma (x)-\gamma (y)\right|^{1/2}} , \frac{w(x)-w(y)}{\left|\gamma (x)-\gamma (y)\right|^{1/2}} \Big \rangle \, \Big ( \frac{1}{\left|\gamma (x)-\gamma (y)\right|^{2}} - \frac{1}{\varrho _{\gamma }(x,y)^{2}} \Big ) \, \frac{{{\text {d}}}x \, {{\text {d}}}y}{\left|\gamma (x)-\gamma (y)\right|} \\&\quad \textstyle + \Big \langle \int _{\mathbb {T}}u(x) \,{{\text {d}}}x, \int _{\mathbb {T}}w(y) \, {{\text {d}}}y \Big \rangle \end{aligned}$$

and \(\varrho _{\gamma }\) denotes the geodesic distance introduced in (6) below. Here the first summand is essentially the \(W^{1/2,2}\)-Gagliardo inner product with the energy density as additional weight.

Indeed, even if \(\gamma \) is not parametrized by arc length, a more detailed analysis reveals that has (up to a constant) the same principal symbol as \((-\varDelta _\gamma )^{3/2}\) where \(\varDelta _\gamma \) is the Laplace-Beltrami operator with respect to the Riemannian metric on \({\mathbb {T}}\) induced by the embedding \(\gamma \) (see the proof of Theorem 4.1).

Fig. 5
figure 5

Discrete Sobolev gradient descent as in Fig. 1 starting at another difficult configuration (1940 edges)

1.3 As Riemannian as you can get

The overarching idea behind all this is to consider \((\mathcal {C},G)\) as a Riemannian manifold and \({\mathcal {E}}:\mathcal {C}\rightarrow {\mathbb {R}}\) as a smooth function. Here \(\mathcal {C}\) denotes a Banach manifold of immersed embedded curves which will be defined in (5) below. If \({{\,\mathrm{grad}\,}}({\mathcal {E}})\) is a well-behaved vector field on \(\mathcal {C}\), various optimization techniques that work on Riemannian manifolds can be utilized to minimize \({\mathcal {E}}\). This is actually a long standing dream of differential geometers: to apply Riemannian geometry to an infinite-dimensional space of shapes. Such Sobolev inner products and their geodesics have been studied from a geometrical point of view, e.g., in [7, 8, 55]. It has been observed that \(W^{1,2}\)-inner products work well in the numerical treatment of full dimensional elasticity and of membrane energies such as the area functional for surfaces or the length functional for curves [58, 66, 75]. Moreover, it is known that \(W^{2,2}\)-inner products provide good preconditioning for bending energies such as Bernoulli’s elastic energy of curves, Kirchhoff’s thin shell energy, the Willmore energy and Helfrich-type energies [26, 38, 74, 75]. Various standard optimization schemes (e.g, nonlinear conjugate gradient, Nesterov’s accelerated gradient, L-BFGS, trust region) can be sped up significantly by using the “right” notion of gradient (see Figs. 3 and 4). This is because these methods exploit that the gradient field is (locally) Lipschitz continuous with respect to the employed metric.

Alas, the story here is not that simple, because there is no Morrey embedding from the energy space \(W^{3/2,2}({\mathbb {T}};{\mathbb {R}}^m)\) to \(W^{1,\infty }({\mathbb {T}};{\mathbb {R}}^m)\) and any open \(W^{{3/2},2}\)-neighborhood of an embedded arc-length parametrized \(W^{{3/2},2}\)-curve may contain non-embedded curves or curves with vanishing or infinite derivative. Therefore, Fréchet differentiability of the Möbius energy could only be established with respect to the somewhat artificial \(W^{3/2,2}\cap W^{1,\infty }\)-topology [17]. This problem can be resolved by working in the slightly smaller Banach space with suitable \({\nu }>0\) and \(p \ge 2 \). Then embeds into \(C^{1}\) and the configuration space

$$\begin{aligned} \mathcal {C}\;{:}{=}\; \{\gamma \in W^{s+\nu ,p} ({\mathbb {T}};{\mathbb {R}}^m) \mid {\gamma }\,\text {is an immersed embedding}\} \end{aligned}$$
(5)

is an open subset of \(C^{1}\).

Fig. 6
figure 6

Discrete Sobolev gradient descent as in Fig. 1 within the nontrivial knot class \(7_2\), using the method “\(W^{3/2,2}\) projected gradient, explicit”. The initial configuration has 3000 edges and was randomly generated with KnotPlot [71]

We construct the Riesz isomorphism as an elliptic pseudo-differential operator of order three, and we show in Theorem 4.1 that it gives rise to a generalized Riesz isomorphism where , with the Hölder conjugate \(q \;{:}{=}\; (1-1/p)^{-1}\) of p. Notice that does no longer identify with its dual space as , thus . So one of our major tasks (see Theorem 3.1) will be to establish that whenever \(\gamma \in \mathcal {C}\). Moreover, we show that \(D{\mathcal {E}}\) is locally Lipschitz continuous as a mapping , leading to our first main result.

Theorem 1.1

The gradient \({{\,\mathrm{grad}\,}}({\mathcal {E}})\) of \({\mathcal {E}}\) defined by (3) is a well-defined, locally Lipschitz continuous vector field on the configuration space \(\mathcal {C}\) (with respect to the norm on ).  Moreover, it satisfies \( D{\mathcal {E}}(\gamma ) \, {{\,\mathrm{grad}\,}}({\mathcal {E}}) |_\gamma \ge 0 \) with equality if and only if \(D{\mathcal {E}}(\gamma ) = 0\).

Combined with the Picard–Lindelöff theorem, this statement guarantees the short-time existence of the gradient flow, both for the downward and the upward direction.

In Sect. 5, we deal also with equality constraints, i.e., with Banach submanifolds of the form \({\mathcal {M}}\;{:}{=}\; \{\gamma \in \mathcal {C}| \varPhi (\gamma ) = 0\}\), where \(\varPhi :\mathcal {C}\rightarrow \mathcal {N}\) is a suitable submersion, namely the constraint of constant speed and vanishing barycenter, cf. (33), into a further Banach space \(\mathcal {N}{}=W^{\sigma +\nu , p}({\mathbb {T}};{\mathbb {R}})\oplus {\mathbb {R}^m}\). We formulate a linear saddle point system for determining the projected gradient \({{\,\mathrm{grad}\,}}_{{\mathcal {M}}} ({\mathcal {E}}|_{\mathcal {M}})|_\gamma \) and analyze when the system is solvable. We perform the analysis for a concrete set of constraints (fixed barycenter and parametrization by arc length), but we also try to outline which steps have to be taken for more general constraints. Finally, Theorem 5.1 will establish our second main result.

Theorem 1.2

(Projected gradient) The projected gradient \({{\,\mathrm{grad}\,}}_{\mathcal {M}}({\mathcal {E}}|_{\mathcal {M}})\) of \({\mathcal {E}}|_{\mathcal {M}}\) defined by

$$\begin{aligned} G_\gamma \big ( {{\,\mathrm{grad}\,}}_{\mathcal {M}}({\mathcal {E}}|_{\mathcal {M}})|_\gamma , w\big )\! \;{:}{=}\; D({\mathcal {E}}|_{\mathcal {M}})(\gamma ) \, w \quad \hbox {for}\ w \in C^{\infty }({\mathbb {T}};{\mathbb {R}}^m) \; \text {with} \; D\varPhi (\gamma ) \, w = 0 \end{aligned}$$

is a well-defined, locally Lipschitz continuous vector field on \({\mathcal {M}}\). The gradient satisfies \( D({\mathcal {E}}|_{\mathcal {M}})(\gamma ) \; {{\,\mathrm{grad}\,}}_{\mathcal {M}}({\mathcal {E}}|_{\mathcal {M}}) |_\gamma \ge 0 \) with equality if and only if \(D({\mathcal {E}}|_{\mathcal {M}})(\gamma ) = 0\).

Invoking the Picard–Lindelöff theorem again, we conclude that both the downward and the upward gradient flows of \({\mathcal {E}}|_{\mathcal {M}}\) exist for short times.

The question of long-time existence is much more involved. Following the way paved by Knappmann et al. [45] for a subfamily of integral Menger curvature functionals, one may derive this property in the case of subcritical Hilbert spaces. These correspond to the functionals obtained by replacing the squares in (1) by powers \(\alpha \in (2,3)\). Due to the fact that the general case where \(p\ne 2\) seems to be “degenerate” analogously to the p-Laplacian it seems unclear whether long-time existence can be established also for the setting discussed in this article.

1.4 Future Directions

The present study demonstrates the design of a minimization scheme being both robust and efficient which is based on a metric that is tailored to the structure of a geometric nonlocal functional modeling self-avoidance.

The general strategy outlined in this paper applies to a large range of functionals on curves and surfaces of arbitrary dimension and codimension. We stress the fact that the arguments given below mainly rely on analytical features of a functional defined on fractional Sobolev spaces rather than on geometric peculiarities, except for the metric itself which has to be chosen carefully depending on the respective problem.

Although the definition of the Möbius energy has been motivated by the electrostatic energy [62], it is admittedly not a physical quantity in the first place. However, it seems to be an appropriate candidate to demonstrate the general approach while avoiding too much technicalities as, from an analyst’s perspective, it is the most elementary smooth knot energy.

Even more importantly, one may find minimizers of physical functionals such as, e.g., the bending energy or the Helfrich energy within prescribed isotopy classes by a regularization approach, cf. [29]. In this context one may choose the regularizer to be a smooth repulsive functional which approximates the (reciprocal) thickness such as the tangent-point potential which has been employed e.g. in [4]. In combination with the technique described in the present paper, one may greatly improve not only the performance but also the complexity of the objects (i.e., isotopy types) that can be dealt with.

The higher-dimensional case as well as the adaption of this technique to other functionals is work in progress [98].

2 Preliminaries

2.1 General notation

Throughout, we let \({\mathbb {T}}\;{:}{=}\; \{x \in {\mathbb {R}}^2 | |{x}| = (2\,{\uppi })^{-1}\}\) be the round circle with a fixed orientation and normalized to have total length \(|{{\mathbb {T}}}| = 1\). We will make use of the identification \({\mathbb {T}}\cong {\mathbb {R}}/{\mathbb {Z}}\) whenever convenient. Moreover, we write \({{\mathbb {T}}^2}= {\mathbb {T}}\times {\mathbb {T}}\) for the Cartesian product of the circle with itself and denote by \(\pi _1:{{\mathbb {T}}^2}\rightarrow {\mathbb {T}}\) and \(\pi _2:{{\mathbb {T}}^2}\rightarrow {\mathbb {T}}\) the Cartesian projections onto the first and second factor, respectively. We denote the canonical intrinsic distance function on \({\mathbb {T}}\) by

$$\begin{aligned} d_{\mathbb {T}}(x,y) \;{:}{=}\; (2 \, \uppi )^{-1}\,\left|\measuredangle (x,y)\right| = (2 \, \uppi )^{-1}\arccos \big ((2 \, \uppi )^{2}\!\left\langle x,y \right\rangle \big ) \, {\in }\, {\big [0,\tfrac{1}{2}\big ]} \qquad \hbox { for}\ x, y\, {\in }\, {\mathbb {T}}\end{aligned}$$

and the canonical line measure by \({{\text {d}}}x\) or \({{\text {d}}}y\). Each sufficiently smooth immersed embedding \(\gamma :{\mathbb {T}}\rightarrow {{\mathbb {R}}^{m}}\) induces a line element \({\omega _{\gamma }}{(x)}\) and a unit tangent field \(\tau _\gamma \) via

$$\begin{aligned} {\omega _{\gamma }}(x)\;{:}{=}\; |{\gamma '(x)}| \, {{\text {d}}}x \quad \text {and} \quad \tau _\gamma (x)\;{:}{=}\; \tfrac{\gamma '(x)}{|{\gamma '(x)}|}. \end{aligned}$$

Moreover \(\gamma \) induces two further distance functions that we have to distinguish: The secant distance \(|{\triangle \gamma }|(x,y) \;{:}{=}\; |{\gamma (x) - \gamma (y)}|\) and the geodesic distance \(\varrho _\gamma \); more precisely,

$$\begin{aligned} \textstyle \varrho _\gamma (x,y) \;{:}{=}\; \int _{I_\gamma (x,y)}{\omega _{\gamma }} , \quad I_\gamma (x,y) \;{:}{=}\; {{\,\mathrm{arg\,min}\,}}\{ \int _{J} {\omega _{\gamma }} | \hbox { }\ J \subset {\mathbb {T}}\text { conn., }\partial J = \{x,y\} \}, \end{aligned}$$
(6)

where \(I_\gamma (x,y)\) denotes the shortest arc that connects x and y. Since \(\gamma \) is immersed, \(d_{\mathbb {T}}\) and \(\varrho _\gamma \) are equivalent. We point out that this equivalence extends to \(|{\triangle \gamma }|\) if the embedding \(\gamma \) is sufficiently smooth, e.g., of class \(C^{1,\alpha }\) with \(\alpha \in \left]0,1\right[\) or \(W^{1+\sigma ,r}\) with \(\sigma - 1/r \ge 0\), cf. [11, Lemma 2.1]. In this case \(\gamma \) is bi-Lipschitz continuous and the measures \({{\text {d}}}x\) and \(\omega _{\gamma }\) are equivalent as well, i.e., there are \(c_1\), \(c_2 >0\) such that \(c_1 \, {{\text {d}}}x \le \omega _{\gamma }(x) \le c_2 \, {{\text {d}}}x\) holds for all \(x \in {\mathbb {T}}\). This implies that also the Lebesgue norms

$$\begin{aligned} \Vert {u}\Vert _{L^{p}} \;{:}{=}\; \Big (\textstyle \int _{\mathbb {T}}|{u(x)}|^p \, {{\text {d}}}x\Big )^{1/p} \quad \text {and} \quad \Vert {u}\Vert _{L_{\gamma }^p} \;{:}{=}\; \Big (\textstyle \int _{\mathbb {T}}|{u(x)}|^p \, \omega _{\gamma } (x)\Big )^{1/p} \end{aligned}$$

for \(1\le p < \infty \) and any measurable function \(u :{\mathbb {T}}\rightarrow {{\mathbb {R}}^{m}}\) are equivalent. We also employ this notation for bivariate measurable functions \(U:{\mathbb {T}}^{2}\rightarrow {{\mathbb {R}}^{m}}\), letting

$$\begin{aligned}&\Vert {U}\Vert _{L^{p}} \;{:}{=}\; \Big (\textstyle \int _{{\mathbb {T}}^{2}} |{U(x,y)}|^p \, {{\text {d}}}x\, {{\text {d}}}y\Big )^{1/p}\\&\quad \text {and} \quad \Vert {U}\Vert _{L_{\gamma }^{p}} \;{:}{=}\; \Big (\textstyle \int _{{\mathbb {T}}^{2}} |{U(x,y)}|^p \, \omega _{\gamma } (x)\, \omega _{\gamma } (y)\Big )^{1/p}. \end{aligned}$$

Likewise, for \( 0< \sigma <1\) and \(1\le p < \infty \), the Sobolev–Slobodeckiĭ seminorms

$$\begin{aligned}{}[{u}]_{W^{\sigma ,p}}&\;{:}{=}\; \Big (\textstyle \int _{{\mathbb {T}}^{2}} \Big |{\frac{u(x) - u(y)}{d_{\mathbb {T}}(x,y)^\sigma }}\Big |^p \frac{{{\text {d}}}x \, {{\text {d}}}y}{d_{\mathbb {T}}(x,y)} \Big )^{1/p} \;\; \quad \text { and }\\ [{u}]_{W^{\sigma ,p}_{\gamma }}&\;{:}{=}\; \Big (\textstyle \int _{{\mathbb {T}}^{2}} \big |{\frac{u(x) - u(y)}{|{\triangle \gamma (x,y)}|^\sigma }}\big |^p \, \frac{\omega _\gamma (x)\,\omega _\gamma (y)}{|{\triangle \gamma (x,y)}|} \Big )^{1/p} \end{aligned}$$

and the induced norms \(\Vert {u}\Vert _{W^{\sigma ,p}} \;{:}{=}\; [{u}]_{W^{\sigma ,p}} + \Vert {u}\Vert _{L_{p}}\) and \(\Vert {u}\Vert _{W^{\sigma ,p}_{\gamma }} \;{:}{=}\; [{u}]_{W^{\sigma ,p}_{\gamma }} + \Vert {u}\Vert _{L_{\gamma }^{p}}\) are equivalent, respectively. In all what follows, we will frequently make use of the following \(\gamma \)-dependent measures and operators:

$$\begin{aligned} \varOmega _\gamma (x,y)&\;{:}{=}\; \omega _{\gamma }(x)\,\omega _{\gamma }(y),&\mu _\gamma&\;{:}{=}\; \tfrac{\varOmega _\gamma }{|{\triangle \gamma }|}, \end{aligned}$$
(7)
$$\begin{aligned} \triangle u(x,y)&\;{:}{=}\; u(x) - u(y),&\delta ^{\sigma }_{\gamma } u&\;{:}{=}\; \tfrac{\triangle u}{|{\triangle \gamma }|^\sigma }. \end{aligned}$$
(8)

For example, the \(\gamma \)-dependent Sobolev–Slobodeckiĭ seminorm can be written much more economically as \( \left[ u\right] _{W^{\sigma ,p}_{\gamma }} =\Vert { \delta ^{\sigma +1/p}_{\gamma } u}\Vert _{L^{p}_{\gamma }} = \Vert { \delta ^{\sigma }_{\gamma } u}\Vert _{L^{p}_{\mu _\gamma }}, \) where \(L^{p}_{\mu _\gamma }({\mathbb {T}}^2;{\mathbb {R}}^m)\) denotes the Lebesgue space with respect to \(\mu _\gamma \) and \(\Vert {\cdot }\Vert _{L^{p}_{\mu _\gamma }}\) its associated norm. We define \(W^{s,p}\)-seminorms for \(1<s<2\) by concatenating the \(W^{s-1,p}_{\gamma }\)-seminorms with suitable differential operators of first order:

$$\begin{aligned}{}[{u}]_{W^{s,p}} \;{:}{=}\; [{u'}]_{W^{s-1,p}} \quad \text {and} \quad [{u}]_{W^{s,p}_{\gamma }} \;{:}{=}\; [{{\mathcal {D}}_\gamma u}]_{W^{s-1,p}_{\gamma }}, \quad \text {where} \quad {\mathcal {D}}_\gamma u \;{:}{=}\; \tfrac{u'}{|{\gamma '}|}. \end{aligned}$$

Here, the differential operator \({\mathcal {D}}_\gamma \) can be interpreted as derivative with respect to arc length. Provided that \(\gamma \) is a sufficiently smooth immersed embedding, \(\Vert {u}\Vert _{W^{s,p}} \;{:}{=}\; [{u}]_{W^{s,p}} + \Vert {u}\Vert _{L^p}\) and \(\Vert {u}\Vert _{W_{\gamma }^{s,p}} \;{:}{=}\; [{u}]_{W_{\gamma }^{s,p}} + \Vert {u}\Vert _{L_{\gamma }^{p}}\) are equivalent and both topologize the Sobolev–Slobodeckiĭ space

$$\begin{aligned} W^{s,p}({\mathbb {T}};{\mathbb {R}}^m) \;{:}{=}\; \{ u \in W^{1,p}({\mathbb {T}};{\mathbb {R}}^m)\mid [{u}]_{ W^{s,p}}< \infty \}. \end{aligned}$$

More precisely, the norm \(\Vert {\cdot }\Vert _{W_{\gamma }^{s,p}}\) is well-defined and equivalent to \(\Vert {\cdot }\Vert _{W^{s,p}}\) if \(\gamma \) is an immersed embedding of class \(W^{S,P}({\mathbb {T}};{\mathbb {R}}^m)\) provided that one of the conditions for the “product rule” Theorem A.4 are met for \(\sigma _1 = S-1\), \(p_1 = P\), \(\sigma _2 = s-1\), \(p_2 = p\).

2.2 Spaces

Our initial motivation to consider \(W^{{3/2},2}\)-inner products for optimization is the following characterization of the energy space of the Möbius energy, i.e., of the smallest space that contains all finite-energy configurations:

Theorem 2.1

(Blatt [11]) Let \(\gamma \in W^{1, \infty }({\mathbb {T}};{\mathbb {R}}^m)\) be an embedded immersed curve parametrized by arc length, i.e., \(|{\gamma '(x)}| =1\) for a.e. x. Then one has \( {\mathcal {E}}(\gamma ) < \infty \) if and only if \(\gamma \in W^{3/2,2}({\mathbb {T}};{\mathbb {R}}^m)\).

Moreover, provided that \(\gamma \) has a certain minimal regularity, the differential of \({\mathcal {E}}\) has been characterized as a nonlinear, nonlocal “differential operator“ of order 3 in the sense that \(D{\mathcal {E}}(\gamma )\) is a distribution with three derivatives less than \(\gamma \) (see [37]). We will see this also in Theorem 3.1 below. As indicated in the introduction, instead of working with the energy space \(W^{3/2,2}({\mathbb {T}};{\mathbb {R}}^m) \cap W^{1,\infty }({\mathbb {T}};{\mathbb {R}}^m)\), we prefer spaces of curves with slightly higher regularity. In the first place, we avoid some technicalities effected by the critical scaling of \(W^{1/2,2}({\mathbb {T}};{\mathbb {R}}^m)\) (see [54]) related to discontinuous tangents, in particular with respect to product rules. Here and in the following, we fix parameters \({s}\), \(\nu \), and \(p\) satisfying

(9)

In fact, we will soon focus on the case \({s}= \frac{3}{2}\) only. Moreover, we think of \({\nu }\) being close to 0 and of \(p\) being close to 2. By the Morrey embedding theorem [25, Therorem 6.5], the space \(W^{s+\nu ,p}({\mathbb {T}};{\mathbb {R}}^m)\) embeds continuously into \(C^{1,\alpha }({\mathbb {T}};{\mathbb {R}}^m)\) where \(\alpha \;{:}{=}\; s+\nu - 1 - 1/p\in \left]0,1\right[\). Thus, the configuration space \(\mathcal {C}\) defined in (5) is well-defined and an open subset of \(W^{s+\nu ,p}({\mathbb {T}};{\mathbb {R}}^m)\). We consider the Banach spaces

$$\begin{aligned} {\mathcal {X}}\;{:}{=}\; W^{s+\nu ,p}({\mathbb {T}};{\mathbb {R}}^m), \quad {\mathcal {H}}\;{:}{=}\; W^{s,2}({\mathbb {T}};{\mathbb {R}}^m), \quad \text {and} \quad {\mathcal {Y}}\;{:}{=}\; W^{s-\nu ,q}({\mathbb {T}};{\mathbb {R}}^m), \end{aligned}$$

where \(q \;{:}{=}\; (1- 1/p)^{-1}\) denotes the Hölder conjugate of p. For \(\gamma \in \mathcal {C}\), we will equip these spaces with the norms

$$\begin{aligned} \Vert {\cdot }\Vert _{{\mathcal {X}},\gamma } \;{:}{=}\; \Vert {\cdot }\Vert _{W^{s+\nu ,p}_{\gamma }}, \quad \Vert {\cdot }\Vert _{{\mathcal {H}},\gamma } \;{:}{=}\; \Vert {\cdot }\Vert _{W^{{s},2}_{\gamma }}, \quad \text {and}\quad \Vert {\cdot }\Vert _{{\mathcal {Y}},\gamma } \;{:}{=}\; \Vert {\cdot }\Vert _{W^{s-\nu ,q}_{\gamma }}. \end{aligned}$$
(10)

Their continuous dual spaces will be denoted by \({\mathcal {X}}'{^{\!}}\), \({\mathcal {H}}'{^{\!}}\), and \({\mathcal {Y}}'{^{\!}}\). Since \(\mathcal {C}\subset {\mathcal {X}}\) is an open set, its tangent space \(T_\gamma \mathcal {C}\) is identical to \({\mathcal {X}}\) for each \(\gamma \in \mathcal {C}\). By the Sobolev embedding theorem, the canonical embeddings

$$\begin{aligned} i_{\mathcal {C}} :{\mathcal {X}}\hookrightarrow {\mathcal {H}}\quad \text {and} \quad j_{\mathcal {C}} :{\mathcal {H}}\hookrightarrow {\mathcal {Y}}\end{aligned}$$
(11)

are well-defined and continuous with dense images. We point out that \({\mathcal {H}}\) is a Hilbert space; suitable scalar products on this space will play a pivotal role in defining the Sobolev gradients of the Möbius energy (see Sect. 4).

There are several reasons for picking the parameters \({\nu }\) and \(p\) as in (9): So far, it is only clear that \(p\ge 2\) and \({\nu }\ge 0\) are necessary for the existence of the continuous embeddings \(i_{\mathcal {C}}\) and \(j_{\mathcal {C}}\) while \({{s}+{\nu }}- 1/p>1\) is necessary for the Morrey embedding \(\mathcal {C}\hookrightarrow W^{1,\infty }(\mathbb {T};{\mathbb {R}}^m)\). In addition to that, we require \({\nu }> 0\) in order to be able to use certain product rules for bilinear maps of the form \(B :W^{{{s}+{\nu }},p} \times W^{{{s}-{\nu }},q} \rightarrow W^{{{s}-{\nu }},q}\) and \(B :W^{{{s}+{\nu }},p} \times W^{{s},2} \rightarrow W^{{s},2}\) as discussed in Theorem A.4. Indeed, the requirements \({{s}+{\nu }}- 1/p>1\) and \({\nu }>0\) allow us to treat all occurring nonlinearities in a satisfactory way. The condition \(p< \infty \) guarantees that all involved Banach spaces are reflexive and separable.

3 Energy

From now on, if not stated otherwise, we fix \({s}= \frac{3}{2}\) and suppose that \({\nu }>0\) and \(p\ge 2\). Our principal aim in this section is to investigate the Möbius energy

$$\begin{aligned} {\mathcal {E}}:\mathcal {C}\rightarrow {\mathbb {R}}, \qquad {\mathcal {E}}(\gamma ) \;{:}{=}\; \textstyle \int _{{\mathbb {T}}^2}E(\gamma )\, \varOmega _\gamma \quad \text {where} \quad E(\gamma ) \;{:}{=}\; \frac{1}{|{\triangle \gamma }|^2} - \frac{1}{\varrho _\gamma ^2} \end{aligned}$$
(12)

along with its first two derivatives. The first two variations of the Möbius energy have been discussed under various regularity assumptions before, cf. [17, 37, 42]. The first variation is typically given in terms of principal-value integrals. Here, by keeping everything in weak (or variational) formulation, we can work with very low regularity assumptions and avoid principal-value integrals altogether.

Theorem 3.1

The following statements hold true:

  1. 1.

    The Möbius energy \({\mathcal {E}}:\mathcal {C}\rightarrow {\mathbb {R}}\) is Fréchet differentiable.

  2. 2.

    The linear functional can be continuously extended to a functional . In particular, this shows that , is a (nonlinear) differential operator of order at most \(({{s}+{\nu }}) + ({{s}-{\nu }}) = 3\).

  3. 3.

    The mapping is locally Lipschitz continuous.

Proof

We are going to show that the energy density \(E :\mathcal {C}\rightarrow L^{1}_{\gamma }{(\mathbb {T}^2;\mathbb {R}})\) is Fréchet differentiable. This will also imply that \({\mathcal {E}}\) is Fréchet differentiable with derivative identical to the linear form defined by

(13)

We do so by following a “shoot first ask questions later” approach. To this end, we first investigate pointwise derivatives of \(E(\gamma )\). For \(k \in \mathbb {N}_0\) and \(u_1, \dotsc ,u_k \in {\mathcal {X}}\), we abbreviate

$$\begin{aligned} F_k(\gamma ;u_1,\dotsc ,u_k)(x,y)&\;{:}{=}\; D^k \big (\gamma \mapsto E(\gamma )(x,y)\big )(\gamma ) \, (u_1,\dotsc ,u_k) \quad \text {and} \quad \\ G_k(\gamma ;u_1,\dotsc ,u_k)(x,y)&\;{:}{=}\; \textstyle \int _{I_\gamma (x,y)} D^k \big (\gamma \mapsto \omega _\gamma \big )(\gamma ) \, (u_1,\dotsc ,u_k). \end{aligned}$$

Recall that the \(W^{s+{\nu },p}\)-norm dominates the \(C^{1}\)-norm. Thus, due to the definition of the geodesic distance in (6), for each point (xy) in the open set

$$\begin{aligned} \varSigma \;{:}{=}\; \{(x,y) \in {{\mathbb {T}}^2}\mid x \ne y \; \text {and} \; \varrho _\gamma (x,y) < \ell \} \quad \text {where} \quad \textstyle \ell \;{:}{=}\; \frac{1}{2} \int _{\mathbb {T}}\omega _\gamma , \end{aligned}$$

there is an open neighborhood \({\mathcal {U}}(x,y)\) of \(\mathcal {C}\) such that \(\gamma \mapsto I_\gamma \) is constant on \({\mathcal {U}}(x,y)\). Consequently, sufficiently small perturbations of \(\gamma \) do not affect the integration domain of \(G_k(\gamma ;\cdots )\). Utilizing the formulas

$$\begin{aligned} D(\gamma \mapsto \omega _\gamma )(\gamma ) \, u = \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma u \rangle \omega _\gamma , \; D(\gamma \mapsto {\mathcal {D}}_\gamma v)(\gamma ) \, u = - \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma u \rangle {\mathcal {D}}_\gamma v, \end{aligned}$$
(14)

we obtain

$$\begin{aligned} G_1(\gamma ;u_1)&= \textstyle \int _{I_\gamma } \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma u_1 \rangle \, \omega _\gamma , \quad \text {and} \quad \\ G_2(\gamma ;u_1,u_2)&= \textstyle \int _{I_\gamma } \big ( \langle {\mathcal {D}}_\gamma u_1,{\mathcal {D}}_\gamma u_2 \rangle - \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma u_1 \rangle \, \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma u_2 \rangle \big ) \, \omega _\gamma . \end{aligned}$$

By pointwise differentiation at \((x,y) \in \varSigma \) and by observing that \(\varSigma \) has full measure, we are lead to the following identities which hold almost everywhere on \({{\mathbb {T}}^2}\):

$$\begin{aligned}&F_1(\gamma ;u_1) = 2\, \Big ( \tfrac{1}{\varrho _\gamma ^4} \, {\varrho _\gamma \, G_1(\gamma ;u_1)} - \tfrac{1}{|{\triangle \gamma }|^4} \, \langle \triangle \gamma , \triangle u_1 \rangle \Big ) \quad \text {and} \quad \\&\quad F_2(\gamma ;u_1,u_2)\\&\quad = 8\, \left[ \tfrac{1}{|{\triangle \gamma }|^6} \, \langle \triangle \gamma , \triangle u_1 \rangle \, \langle \triangle \gamma , \triangle u_2 \rangle - \tfrac{1}{\varrho _\gamma ^6} \, {\varrho _\gamma \, G_1(\gamma ;u_1) \cdot \varrho _\gamma \, G_1(\gamma ;u_2)} \right] \\&\qquad - 2\, \left[ \tfrac{1}{|{\triangle \gamma }|^4} \, \langle \triangle u_1 , \triangle u_2 \rangle - \tfrac{1}{\varrho _\gamma ^4} \, \Big (G_1(\gamma ;u_1) \, G_1(\gamma ;u_2) + \varrho _\gamma \, G_2(\gamma ;u_1,u_2)\Big ) \right] . \end{aligned}$$

Claim 1 below will imply that \(F_1(\gamma ;u_1)\) is indeed a candidate for \(DE(\gamma ) \, u_1\). Moreover, it guarantees that the right hand side of (13) makes sense even if one replaces \(u \in {\mathcal {X}}\) by \(w \in {\mathcal {Y}}\) so that has a unique continuous extension to an element .

Claim 1

There exists a \(\gamma \)-dependent \(C\ge 0\) such that \(\Vert {F_1(\gamma ;u_1)}\Vert _{L^{1}_{\gamma }}\le C \, \Vert {u_1}\Vert _{W^{s-{\nu },q}_{\gamma }}\) holds for all \(u_1 \in {\mathcal {X}}\).

We split \(F_1\) as follows:

$$\begin{aligned} F_1(\gamma ;u_1) = 2 \, \tfrac{1}{\varrho _\gamma ^4} \, \Big ( (\varrho _\gamma \, G_1(\gamma ;u_1)) - \langle \triangle \gamma , \triangle u_1 \rangle \Big ) - 2 \, \Big ( \tfrac{1}{|{\triangle \gamma }|^4} - \tfrac{1}{\varrho _\gamma ^4} \Big ) \, \langle \triangle \gamma , \triangle u_1 \rangle . \end{aligned}$$

The desired bound for the first summand is derived in Theorem 3.4. The second summand can be treated with Theorem 3.3 because it has the form \(2\, {\mathcal {B}}^{\alpha ,\beta }_\gamma (\gamma ,u)\) with \(\alpha = 0\) and \(\beta = 2\).

We would like to use the \(L^{1}\)-norm of \(F_2\) to bound remainder terms of Taylor expansions. This will make use of the following claim.

Claim 2

There exists a number \(\varXi (\gamma ) {}>{} 0\), continuous in \(\gamma \), such that  for all \(u_1\), \(u_2 \in {\mathcal {X}}\), we have \(\Vert {F_2(\gamma ;u_1,u_2)}\Vert _{L^{1}_{\gamma }} \le \varXi (\gamma ) \, \Vert {u_1}\Vert _{{\mathcal {X}},\gamma } \, \Vert {u_2}\Vert _{{\mathcal {Y}},\gamma }\).

We may split \(F_2(\gamma ;u_1,u_2)\) into the following four summands:

$$\begin{aligned}&8 \, \Big ( \tfrac{1}{|{\triangle \gamma }|^6} - \tfrac{1}{\varrho _\gamma ^6} \Big ) \, \langle \triangle \gamma , \triangle u_1 \rangle \, \langle \triangle \gamma , \triangle u_2 \rangle \end{aligned}$$
(15)
$$\begin{aligned}&\qquad - 2 \, \Big ( \tfrac{1}{|{\triangle \gamma }|^4} - \tfrac{1}{\varrho _\gamma ^4} \Big ) \, \langle \triangle u_1 , \triangle u_2 \rangle \end{aligned}$$
(16)
$$\begin{aligned}&\qquad + 8 \, \tfrac{1}{\varrho _\gamma ^6} \, \Big ( \langle \triangle \gamma , \triangle u_1 \rangle \, \langle \triangle \gamma , \triangle u_2 \rangle - \varrho _\gamma \, G_1(\gamma ;u_1) \, \varrho _\gamma \, G_1(\gamma ;u_2)\Big ) \end{aligned}$$
(17)
$$\begin{aligned}&\qquad + 2 \, \tfrac{1}{\varrho _\gamma ^4} \, \Big ( {G_1(\gamma ;u_1) \, G_1(\gamma ;u_2) + \varrho _\gamma \, G_2(\gamma ;u_1,u_2)} - \langle \triangle u_1 , \triangle u_2 \rangle \Big ) . \end{aligned}$$
(18)

Here, (15) and (16) are again of the type discussed in Theorem 3.3, namely with \(\alpha = 0\), \(\beta = 4\) and \(\alpha = 0\), \(\beta = 2\), respectively. We may factorize (17) as

$$\begin{aligned} 8\Big ( \Big \langle \tfrac{\triangle \gamma }{\varrho _\gamma }, \tfrac{\triangle u_1}{\varrho _\gamma } \Big \rangle + \tfrac{G_1(\gamma ;u_1)}{\varrho _\gamma } \Big ) \cdot \Big ( \tfrac{1}{\varrho _\gamma ^4} \, \big ( \langle \triangle \gamma , \triangle u_2 \rangle - \varrho _\gamma \, G_1(\gamma ;u_2) \big ) \Big ) \end{aligned}$$

to discover that we have discussed its second factor already. Up to a \(\gamma \)-dependent constant, its first factor is bounded by \(2 \, \Vert {{\mathcal {D}}_\gamma u_1}\Vert _{L^{\infty }}\), thus dominated by \(\Vert {u_1}\Vert _{{\mathcal {X}},\gamma }\). With \(H \;{:}{=}\; \int _{I_\gamma }\!\langle {\mathcal {D}}_\gamma u_1 , {\mathcal {D}}_\gamma u_2 \rangle \, \omega _{\gamma }\), we can split (18) into the following two summands:

$$\begin{aligned} \tfrac{2}{\varrho _\gamma ^4} \, \big [ \varrho _\gamma \, H {-} \langle \triangle u_1 , \triangle u_2 \rangle \big ] + \tfrac{2}{\varrho _\gamma ^4} \, \big [ G_1(\gamma ;u_1) \, G_1(\gamma ;u_2) {-} \varrho _\gamma \, (H - G_2(\gamma ;u_1,u_2)) \big ]. \end{aligned}$$
(19)

The first summand of (19) can be treated with Theorem 3.4. With \(\varphi _i \;{:}{=}\; \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma u_i \rangle \) and the identities \(\varrho _\gamma = \int _{I_\gamma } \omega _{\gamma }(s)\) and \(H-G_2(\gamma ;u_1,u_2) = \int _{I_\gamma } \varphi _1(t) \, \varphi _2(t)\, \omega _{\gamma }(t)\) the second summand of (19) simplifies to

$$\begin{aligned}&\textstyle \tfrac{2}{\varrho _\gamma ^4} \, { {\int _{I_\gamma } \varphi _1(s) \, \omega _{\gamma }(s)} \, {\int _{I_\gamma } \varphi _2(t) \, \omega _{\gamma }(t)} - {\int _{I_\gamma } \omega _{\gamma }(s)} \, {\int _{I_\gamma } \varphi _1(t) \, \varphi _2(t) \, \omega _{\gamma }(t)} } \\&\quad = \textstyle \tfrac{1}{\varrho _\gamma ^4} \, \int _{I_\gamma ^2} (\varphi _1(s)- \varphi _1(t)) \, \varphi _2(t) \,\varOmega _\gamma (s,t) + \tfrac{1}{\varrho _\gamma ^4} \, \int _{I_\gamma ^2} (\varphi _1(t)- \varphi _1(s)) \, \varphi _2(s) \,\varOmega _\gamma (t,s) \\&\quad = \textstyle - \tfrac{1}{\varrho _\gamma ^4} \, \int _{I_\gamma ^2} (\triangle \varphi _1) \, (\triangle \varphi _2) \,\varOmega _\gamma . \end{aligned}$$

Now the same techniques as in Theorem 3.4 and the product rule Theorem A.4 imply

$$\begin{aligned} \textstyle \int _{{{\mathbb {T}}^2}} \frac{1}{\varrho _\gamma (x,y)^{4}} \, \big |{\textstyle \int _{I_\gamma ^2(x,y)} \triangle \varphi _1 \, \triangle \varphi _2 \, \varOmega _\gamma }\big | \, \varOmega _\gamma (x,y) \le C \, [{u_1}]_{W^{s+{\nu },p}_{\gamma }} \, [{u_2}]_{W^{s-{\nu },q}_{\gamma }}, \end{aligned}$$

which proves the claim.

Claim 3

E is Fréchet differentiable with \(DE(\gamma ) \,u = F_1(\gamma ;u)\).

It suffices to show that there is a \(C \ge 0\) such that \( \Vert { E(\gamma + u)-E(\gamma )-F_1(\gamma ;u) }\Vert _{L^{1}_{\gamma }} \le C \, \Vert {u}\Vert _{{\mathcal {X}},\eta }^2 \) holds for all sufficiently small \(u \in {\mathcal {X}}\). Because \(\mathcal {C}\subset {\mathcal {X}}\) is open and \(\varXi \) is continuous, we may find an \(\varepsilon >0\) such that for all \(\Vert {u}\Vert _{{\mathcal {X}},\gamma } < \varepsilon \) we have \(\eta \;{:}{=}\; \gamma +u \in \mathcal {C}\) and \(\varXi (\eta ) \le 2 \, \varXi (\gamma )\). By shrinking \(\varepsilon \) if necessary, we may achieve that all the densities \(\varOmega _{\eta }\) and the norms \(\Vert {\cdot }\Vert _{{\mathcal {X}},\eta }\) are equivalent for all such u, i.e., there are \(c>0\) and \(\varLambda >1\) such that

$$\begin{aligned}&(1 - c\, \Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }}) \, \omega _{\gamma } \le \omega _{\eta } \le (1 + c\, \Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }}) \, \omega _{\gamma } \end{aligned}$$
(20)
$$\begin{aligned}&\varLambda ^{-1} \, \varOmega _{\gamma } \le \varOmega _{\eta } \le \varLambda \, \varOmega _{\gamma }, \quad \text {and} \quad \varLambda ^{-1} \, \Vert {\cdot }\Vert _{{\mathcal {X}},\eta } \le \Vert {\cdot }\Vert _{{\mathcal {X}},\gamma } \le \varLambda \, \Vert {\cdot }\Vert _{{\mathcal {X}},\eta } . \end{aligned}$$
(21)

For the remainder of the proof, we let \(u \in {\mathcal {X}}\) be of length \(\Vert {u}\Vert _{{\mathcal {X}},\gamma } < \varepsilon \) and abbreviate \(\eta \;{:}{=}\; \gamma + u\) and \(\gamma _t \;{:}{=}\; \gamma + t\, u\).

Fig. 7
figure 7

Illustration why handling \(\gamma \mapsto \varrho _\gamma \) correctly is so difficult. a Two points \(\gamma (x)\) and \(\gamma (y)\) on a circular curve and the geodesic arc \(\gamma (I_\gamma (x,y))\) connecting them. b A displacement vector field u along \(\gamma \). c The geodesic arc \(\eta (I_\eta (x,y))\) connecting \(\eta (x)\) and \(\eta (y)\) on the displaced curve \(\eta = \gamma + u\). Observe that \(I_\gamma (x,y) \ne I_\eta (x,y)\), so (xy) belongs to the “bad” set V(u). d The points (xy), (yx) and the “bad” set V(u), plotted over the relief of \(\varrho _\gamma \) in the parameterization domain \({{\mathbb {T}}^2}\)

Now we split the integration domain \({{\mathbb {T}}^2}= U(u) \cup V(u)\cup \{(x,x) | x \in {\mathbb {T}}\} \) into a “good” part U(u), a “bad” part V(u), and an “ugly“ part, the diagonal of \({{\mathbb {T}}^2}\) (cf. Fig. 7). Since the latter is a null set, it may be neglected. The other two parts are defined as follows:

On the “good” part, we may apply Taylor’s theorem along with Claim 2 and (21) to obtain

$$\begin{aligned} \textstyle \int _{U(u)} |{ E(\eta ) - E(\gamma ) - F_1(\gamma ; u) }| \, \varOmega _{\gamma }&\le \textstyle \int _{U(u)} \int _0^1 |{F_2(\gamma _t ; u,u)}| \, {{\text {d}}}t \, \varOmega _{\gamma } \\&\le \textstyle \varLambda ^2 \int _0^1 \int _{U(u)} |{F_2(\gamma _t ; u,u)}| \, \varOmega _{\gamma _t} \, {{\text {d}}}t \\&\le \textstyle \varLambda ^2 \int _0^1 \varXi (\gamma _t) \, \Vert {u}\Vert _{{\mathcal {X}},\gamma _t}^2 \, {{\text {d}}}t\\&\le 2 \, \varXi (\gamma ) \, \varLambda ^4 \, \Vert {u}\Vert _{{\mathcal {X}},\gamma }^2. \end{aligned}$$

Here the first estimate is only admissible on the “good” set U(u) as the intrinsic distance (6) is differentiable. However, we cannot argue this way on the “bad” set V(u). Instead, we observe that V(u) has positive distance from the diagonal. Thus, by Claim 4 below, \(\gamma \mapsto E(\gamma )|_{V(u)}\) is Lipschitz continuous as a map into \(L^{\infty }(V(u);\mathbb {{\mathbb {R}}})\). Together with Claim 5 below, which states that V(u) is a small set, the triangle inequality \( |{E(\eta ) - E(\gamma ) - F_1(\gamma ; u)}| \le |{E(\eta ) - E(\gamma )}| + |{F_1(\gamma ; u)}| \) leads us to

$$\begin{aligned} \textstyle \int _{V(u)} |{ E(\eta ) - E(\gamma ) - F_1(\gamma ; u) }| \, \varOmega _\gamma \le C \, \Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }}^2, \end{aligned}$$

which proves the claim.

Claim 4

\(\gamma \mapsto \varrho _\gamma \) is locally Lipschitz continuous as a mapping into \(L^{\infty }\).

As \({{\mathbb {T}}^2}\setminus \varSigma \) is a set of measure zero, we may restrict our attention to \((x,y) \in \varSigma \). For \(\eta \;{:}{=}\; \gamma + u\) and \((x,y) \in \varSigma \) there are two cases: The first case is \(I_{\eta }(x,y) = I_{\gamma }(x,y)\). Then the bound \(|{\varrho _{\eta }(x,y)-\varrho _\gamma (x,y)}| \le C \,\Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }}\) follows from the differentiability of \(\omega _{\gamma }\). The second case \(I_{\eta }(x,y) \ne I_{\gamma }(x,y)\) is a bit more elaborate. We abbreviate \(I \;{:}{=}\; I_\gamma (x,y)\) and denote its complement by \(J \;{:}{=}\; {\mathbb {T}}\setminus I\). With (20), we obtain

$$\begin{aligned} \textstyle \int _{J} \omega _{\gamma } - c \, \Vert {{\mathcal {D}}_\gamma u}\Vert _ {L^{\infty }} \int _{J} \omega _{\gamma } \le \int _{J} \omega _{\eta } \le \int _{I} \omega _{\eta } \le \int _{I} \omega _{\gamma } + c \, \Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }} \int _{I} \omega _{\gamma }. \end{aligned}$$

Here we used (20) for the first and the third inequality. This shows \( |{ \int _{J} \omega _{\gamma } \! - \! \int _{I} \omega _{\gamma } }| = \int _{J} \omega _{\gamma } - \int _{I} \omega _{\gamma } \le 2 \, c \, \ell \, \Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }} \). By (20), we obtain \( |{ \int _{J} \omega _{\eta } \! - \! \int _{J} \omega _{\gamma } }| \le 2 \, c \, \ell \Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }} \). Combining these latter two inequalities proves Claim 4:

$$\begin{aligned} \textstyle |{ \varrho _{\eta }(x,y) \!-\!\varrho _{\gamma }(x,y)}|&= \textstyle |{\int _{J} \omega _\eta \!-\!\int _{I} \omega _\gamma }|\\&\le \textstyle |{\int _{J} \omega _{\eta } \!-\!\int _{J} \omega _{\gamma }}| + |{\int _{J} \omega _{\gamma } \!-\!\int _{I} \omega _{\gamma }}| \le C \,\Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }}. \end{aligned}$$

Claim 5

\(\int _{V(u)} \, \varOmega _\gamma \le C \, \Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }}\) for \(\Vert {u}\Vert _{{\mathcal {X}},\gamma } < \varepsilon \).

A Taylor expansion of the integrand leads to

$$\begin{aligned} |{ \textstyle \int _{J} \omega _{\gamma + t \, u} - \int _{J} \omega _{\gamma } - \int _{J} \langle {\mathcal {D}}_\gamma \gamma ,{\mathcal {D}}_\gamma (t\, u) \rangle \, \omega _{\gamma } }| \le C \, \Vert {{\mathcal {D}}_\gamma (t\, u)}\Vert _{L_{\infty }}^2. \end{aligned}$$

Now let \((x,y) \in V(u)\). Then there is a \(t \in \left[ 0,1\right] \) such that \(I_{\gamma + t \, u}(x,y) = J\) and we have

Together with Claim 4 this implies that V(u) is contained in a narrow band around the set \(\{(x,y) | \varrho _\gamma (x,y) = \ell \}\) whose area is proportional to \(\Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }}\) (cf.  Fig. 7d).

Claim 6

There is a \(C \ge 0\) such that holds for all \(w \in {\mathcal {Y}}\) and all \(u \in {\mathcal {X}}\) with \(\Vert {u}\Vert _{{\mathcal {X}},\gamma }< \varepsilon \).

With the operator \(\varPsi _\gamma w \;{:}{=}\; \big ( \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma w \rangle \circ \pi _1+ \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma w \rangle \circ \pi _2\big )\), we may write . By the triangle inequality and the fundamental theorem of calculus, we may bound from above by

$$\begin{aligned}&\left| \textstyle \int _0^1 \frac{{{\text {d}}}}{{{\text {d}}}t} \, \Big [ \int _{U(u)} \Big ( F_1(\gamma _t;w) {+} E(\gamma _t) \, (\varPsi _{\gamma _t} w) \Big ) \, \varOmega _{\gamma _t} \Big ] \, {{\text {d}}}t \right| \end{aligned}$$
(22)
$$\begin{aligned}&\qquad \textstyle {+} \int _{V(u)} |{ F_1(\gamma ;w) + E(\gamma ) \, (\varPsi _\gamma w) }| \, \varOmega _\gamma + \int _{V(u)} |{ F_1(\eta ;w)+ E(\eta ) \, (\varPsi _{\eta } w)}| \, \varOmega _{\eta } . \end{aligned}$$
(23)

Recall that on the “good” set U(u) we may interchange differentiation and integration. Hence, we have

$$\begin{aligned}&\textstyle \frac{{{\text {d}}}}{{{\text {d}}}t} \int _{U(u)} \Big ( F_1(\gamma _t;w)\,+\,E(\gamma _t) \, (\varPsi _{\gamma _t} w)\Big ) \, \varOmega _{\gamma _t}\\&\quad = \textstyle \int _{U(u)} \Big ( F_2(\gamma _t; u, w) + F_1(\gamma _t; u) \, (\varPsi _{\gamma _t} w) + E(\gamma _t) \, ( \tfrac{{{\text {d}}}}{{{\text {d}}}t} \varPsi _{\gamma _t} w) \Big ) \, \varOmega _{\gamma _t} \\&\qquad \textstyle + \int _{U(u)} \Big ( F_1(\gamma _t;w) + E(\gamma _t) \, (\varPsi _{\gamma _t} w) \Big ) \, (\varPsi _{\gamma _t} u) \, \varOmega _{\gamma _t} . \end{aligned}$$

Now it follows from the other claims above that (22) is bounded by a multiple of \(\Vert {u}\Vert _{{\mathcal {X}},\gamma } \, \Vert {w}\Vert _{{\mathcal {Y}},\gamma }\). For (23), we exploit that V(u) has finite, positive distance to the diagonal of \({{\mathbb {T}}^2}\): This implies that the quantities \(|{\triangle \gamma (x,y)}|\) and \(\varrho _\gamma (x,y)\) are uniformly bounded away from zero and that this remains true for sufficiently small perturbations \(\eta = \gamma + u\) of \(\gamma \). This is why we can express the term \(\big [\big ( F_1(\gamma ;w) + E(\gamma ) \, (\varPsi _\gamma w)\big ) \, \varOmega _\gamma \big ](x,y)\) by

$$\begin{aligned}&\big \langle Z_1\big (\gamma '(x),\gamma '(y),|{\triangle \gamma (x,y)}|,\varrho _\gamma (x,y)\big ), w'(x) \big \rangle \, \varOmega _\gamma (x,y) \\&\qquad + \big \langle Z_2\big (\gamma '(x),\gamma '(y),|{\triangle \gamma (x,y)}|,\varrho _\gamma (x,y)\big ), w'(y) \big \rangle \, \varOmega _\gamma (x,y) \end{aligned}$$

for \((x,y) \in V(u)\) with Lipschitz continuous functions \(Z_1\) and \(Z_2\). So the same applies to \(\eta \), and Claim 5 shows that (23) is bounded by

$$\begin{aligned} C \, \Vert {1_{V(u)}}\Vert _{L^p_\gamma } \, \Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }} \, \Vert {w}\Vert _{W^{1,q}_{\gamma }} \le C \, \Vert {{\mathcal {D}}_\gamma u}\Vert _{L^{\infty }}^{1+1/p} \, \Vert {w}\Vert _{W^{1,q}_{\gamma }}, \end{aligned}$$

which finally proves the claim. \(\square \)

Remark 3.2

In fact, a bit more is true: The mapping is even continuously Fréchet differentiable and what we have shown in Claim 6 above is that is a continuous bilinear form. Now we may conclude that its second derivative must satisfy for \(u_1\), \(u_2 \in {\mathcal {X}}\) where \(i_{\mathcal {C}}\) and \(j_{\mathcal {C}}\) denote the canonical embeddings defined in (11). Although this might be relevant for optimization methods based on Newton’s method and also for the implicit integration of the \(L^2\)-gradient flow, we do not dive into details here.

3.1 Details

Here we state and prove the lemmas used in the proof of Theorem 3.1 above. The following is our main tool for dealing with the lower order terms that occur in .

Lemma 3.3

Suppose \(s = \frac{3}{2}\) and (9) with \(\frac{1}{p}+\frac{1}{q}=1\). Fix \(k \in \mathbb {N}\cup \{0\}\), \(\alpha \), \(\beta \in {\mathbb {R}}\). Let \(\gamma \in \mathcal {C}\) be an embedded curve and let \(b :(\prod _{i=1}^k {\mathbb {R}}^{m_i}) \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) be a \((k+1)\)-multilinear form. For operators \(L_i\), \(K \in \{ \triangle / \varrho _\gamma , \varphi \mapsto {\mathcal {D}}_\gamma \varphi \circ \pi _1, \varphi \mapsto {\mathcal {D}}_\gamma \varphi \circ \pi _2\}\), \(i \in {1,\dotsc ,k}\) consider the following multilinear form \( {\mathcal {B}}^{\alpha ,\beta }_\gamma :(\prod _{i=1}^k {\mathcal {X}}'{^{\!}}) \times {\mathcal {Y}}'{^{\!}}\rightarrow {\mathbb {R}}\):

$$\begin{aligned} \textstyle {\mathcal {B}}^{\alpha ,\beta }_\gamma (\varphi _1,\dotsc ,\varphi _k,\psi ) \;{:}{=}\; \int _{{\mathbb {T}}^2}b( L_1 \varphi _1,\dotsc ,L_k \varphi _k, K \psi ) \, \frac{\varrho _\gamma ^{\alpha +\beta }}{|{\triangle \gamma }|^\alpha } \, \Big ( \frac{1}{|{\triangle \gamma }|^{2+\beta }} - \frac{1}{\varrho _\gamma ^{2+\beta }} \Big ) \, \varOmega _\gamma . \end{aligned}$$

Then \({\mathcal {B}}^{\alpha ,\beta }_\gamma \) is well-defined and there is a continuous function such that

$$\begin{aligned} \textstyle |{{\mathcal {B}}^{\alpha ,\beta }_\gamma (\varphi _1,\dotsc ,\varphi _k,\psi )}| \le \Vert {b}\Vert \, \varXi (\gamma ) \, \big (\prod _{i=1}^k \Vert {{\mathcal {D}}_\gamma \varphi _i}\Vert _{L^{\infty }} \big )\, \Vert {{\mathcal {D}}_\gamma \psi }\Vert _{W^{s-1-{\nu },q}_{\gamma }} . \end{aligned}$$

Here the expression \(\Vert {b}\Vert \) denotes the operator norm of the multilinear form b. If \(k=0\), we use the convention \((\prod _{i=1}^0 {\mathbb {R}}^{m_i}) \times {\mathbb {R}}^d={\mathbb {R}}^{d}\) and \((\prod _{i=1}^0 {\mathcal {X}}'{^{\!}}) \times {\mathcal {Y}}'{^{\!}}= {\mathcal {Y}}'{^{\!}}\). In this case, \({\mathcal {B}}^{\alpha ,\beta }_\gamma \) only depends on \(\psi \).

Proof

We heavily rely on the techniques developed in the proof of Theorem 1.1 in [11]. With the function \(\zeta :\left]0,\infty \right[ \rightarrow {\mathbb {R}}\), \( \zeta (r) \;{:}{=}\; r^{2+\alpha } \, \frac{r^{2+\beta } - 1}{r^{2} - 1} \), we write

$$\begin{aligned} \textstyle \frac{\varrho _\gamma ^{\alpha +\beta }}{|{\triangle \gamma }|^\alpha } \Big ( \frac{1}{|{\triangle \gamma }|^{2+\beta }} - \frac{1}{\varrho _\gamma ^{2+\beta }} \Big ) = \tfrac{1}{2} \zeta \Big (\tfrac{\varrho _\gamma }{|{\triangle \gamma }|}\Big ) \cdot \tfrac{1}{\varrho _\gamma ^4} \, \Big ( 2 \, \varrho _\gamma ^2 - 2 \, \langle \triangle \gamma , \triangle \gamma \rangle \Big ). \end{aligned}$$

Denoting the shorter arc between x, \(y \in {\mathbb {T}}\) by \(I_\gamma \), we observe

$$\begin{aligned} 2 \, \varrho _\gamma ^2 - 2 \, \langle \triangle \gamma , \triangle \gamma \rangle&= \textstyle \int _{I_\gamma ^2} \big ( |{\tau _\gamma (s)}|^2 + |{\tau _\gamma (t)}|^2 \big ) \, \varOmega _\gamma (s,t)- 2 \, \int _{I_\gamma ^2} \langle \tau _\gamma (s), \tau _\gamma (t) \rangle \, \varOmega _\gamma (s,t)\\&= \textstyle \int _{I_\gamma ^2} |{\tau _\gamma (s) - \tau _\gamma (t)}|^2 \, \varOmega _\gamma (s,t). \end{aligned}$$

Since \(\zeta \) is continuous and \(\gamma \) is bi-Lipschitz, the factor \(\zeta (\tfrac{\varrho _\gamma }{|{\triangle \gamma }|})\) is bounded. Moreover, the functions \(L_i \varphi _i\) are uniformly bounded. So it suffices to bound the \(L^{1}_\gamma \)-norm of

$$\begin{aligned} \textstyle (K \psi ) \cdot \tfrac{1}{\varrho _\gamma ^4} \, \int _{I_\gamma ^2} |{\tau _\gamma (s) - \tau _\gamma (t)}|^2 \, \varOmega _\gamma (s,t) . \end{aligned}$$
(24)

We abbreviate half the length of \(\gamma \) by \(\ell \) and denote by \(\eta _x \;{:}{=}\; \exp _x^\gamma :\left]-\ell ,\ell \right[ \rightarrow {\mathbb {T}}\) the Riemannian exponential map induced by \(\varrho _\gamma \). With X such that \(y = \eta _x(X)\), we may write \(\varrho _\gamma (x,y) = |{X}|\) and \(K \,\psi (x,y) = \int _0^1 {\mathcal {D}}_\gamma \psi (\xi _{x}(\theta _1 , X))\, {{\text {d}}}\theta _1\), where \(\xi _{x}(\theta _1 , X)\) is either \(\eta _{x}(\theta _1 \, X)\), x, or \(\eta _{x}(X) = y\). Thus, we may rewrite (24) as follows:

$$\begin{aligned}&\textstyle \int \limits _{\left[ 0,1\right] ^{3}} {\mathcal {D}}_\gamma \psi (\xi _{x}(\theta _1 , X)) \, |{ \tau _\gamma (\eta _{x}(\theta _2 X))-\tau _\gamma (\eta _{x}(\theta _3 X)) }|^2 \tfrac{1}{|{X}|^2} \, {{\text {d}}}\theta ^{3} \, {{\text {d}}}\theta ^{2} \, {{\text {d}}}\theta ^{1}. \end{aligned}$$

By Fubini’s theorem, the \(L^{1}_\gamma \)-norm of (24) is bounded by

$$\begin{aligned}&\textstyle \int \limits _{\left[ 0,1\right] ^{3}} \! \int \limits _{-\ell }^{\ell } \int \limits _{\mathbb {T}}|{{\mathcal {D}}_\gamma \psi (\xi _{x}(\theta _1 , X))}| \, |{\tau _\gamma (\eta _{x}(\theta _2 X))-\tau _\gamma (\eta _{x}(\theta _3 X))}|^{2} \omega _{\gamma }(x) \, \tfrac{{{\text {d}}}X}{|{X}|^2}{{\text {d}}}\theta ^{3} \, {{\text {d}}}\theta ^{2} \, {{\text {d}}}\theta ^{1}. \end{aligned}$$

We employ the Hölder inequality to obtain an upper bound for this integral: For \(s=\frac{3}{2}\) we define \(\tilde{q}\) by \({\tilde{q}} \;{:}{=}\; 2\, p \, (p -2 + 2 \, p \, {\nu })^{-1} \ge 1\). Then we have a Sobolev embedding \(W^{{s}-1 - {\nu },q} \hookrightarrow L^{{\tilde{q}}}\) due to (9), see Theorem A.3, thus \(\Vert {{\mathcal {D}}_\gamma \psi }\Vert _{L_{\gamma }^{\tilde{q}}} \le C(\gamma ) \, \Vert {\psi }\Vert _{W_{\gamma }^{s-{\nu },q}}\) with \(C(\gamma )\) depending continuously on \(\gamma \). The Hölder conjugate of \({\tilde{q}}\) is \({\tilde{p}} \;{:}{=}\; 2 \, p \, (p + 2 - 2 \, p \, {\nu })^{-1}\). Thus, we obtain the following upper bound:

$$\begin{aligned} \textstyle \Vert {{\mathcal {D}}_\gamma \psi }\Vert _{L^{{\tilde{q}}}_{\gamma }({\mathbb {T}})} \int _{\left[ 0,1\right] ^{2}} \! \int _{-\ell }^{\ell } \big \Vert { \tau _\gamma (\eta _{(\cdot )}(\theta _2 \, X))-\tau _\gamma (\eta _{(\cdot )}(\theta _3 \, X)) }\big \Vert _{L^{2{\tilde{p}}}_{\gamma }({\mathbb {T}})}^2 \, \frac{{{\text {d}}}X}{|{X}|^{2}} \, {{\text {d}}}\theta _3 \, {{\text {d}}}\theta _2 . \end{aligned}$$

Exploiting that \(\eta _{(\cdot )}(\theta _i \, X) :{\mathbb {T}}\rightarrow {\mathbb {T}}\) are isometries with respect to \(\varrho _\gamma \), and utilizing the substitution \(Y \;{:}{=}\; (\theta _2-\theta _3) \, X\), we may compute as follows:

(25)

Here \(\Vert {\cdot }\Vert _{B,\gamma }\) is a natural, \(\gamma \)-dependent norm on the Besov space \(B^{s-1}_{2 {\tilde{p}},2}({\mathbb {T}};{{\mathbb {R}}^{m}})\). This shows that \(\varXi (\gamma ) = \frac{1}{2} \, \Vert {\zeta (\varrho _\gamma /|{\triangle \gamma }|)}\Vert _{L^{\infty }} \, C(\gamma ) \, \Vert {\tau _\gamma }\Vert _{B,\gamma }^2\). Because of \(s - 1 +{\nu }- 1/p > s - 1 - 1/ (2 {\tilde{p}})\) with \(s=\tfrac{3}{2}\) and (9), we have a continuous Sobolev embedding \(W^{s - 1 +{\nu },p} ({\mathbb {T}};{\mathbb {R}}^m)\hookrightarrow B^{s-1}_{2 {\tilde{p}},2}({\mathbb {T}};{{\mathbb {R}}^{m}})\) (see [92, Theorem 3.3.1] or [72, Theorem 2.4.4/1]), showing that \(\varXi \) is continuous. \(\square \)

The next statement allows us to handle the principal order terms of .

Lemma 3.4

Suppose \(s = \frac{3}{2}\) and (9) with \(\frac{1}{p}+\frac{1}{q}=1\). Let \(I_\gamma (x,y) \subset {\mathbb {T}}\) denote a shortest arc with respect to \(\varrho _\gamma \) that connects x and y. Then the bilinear form \({\mathcal {B}}_\gamma :{\mathcal {X}}'{^{\!}}\times {\mathcal {Y}}'{^{\!}}\rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} {\mathcal {B}}_\gamma (u,w) \;{:}{=}\; \textstyle \int _{{\mathbb {T}}^2}\Big ( \textstyle \tfrac{2 }{\varrho _\gamma (x,y)} \int _{I_\gamma (x,y)} \langle {\mathcal {D}}_\gamma u, {\mathcal {D}}_\gamma w \rangle \, \omega _{\gamma } - 2\, \big \langle \tfrac{\triangle u}{\varrho _\gamma } , \tfrac{\triangle w}{\varrho _\gamma } \big \rangle (x,y) \Big ) \, \tfrac{\varOmega _\gamma (x,y)}{\varrho _\gamma (x,y)^2} \end{aligned}$$

is well-defined and bounded. More precisely, we have \( |{{\mathcal {B}}_\gamma (u,w)}| \le [{u}]_{{\mathcal {X}},\gamma }\,[{w}]_{{\mathcal {Y}},\gamma } \).

Proof

Using the following two identities

$$\begin{aligned}&\textstyle \tfrac{2}{\varrho _\gamma }\int _{I_\gamma } \langle {\mathcal {D}}_\gamma u, {\mathcal {D}}_\gamma w \rangle \, \omega _{\gamma } = \textstyle \tfrac{1}{\varrho _\gamma ^2} \int _{I_\gamma ^2}\Big ( \langle {\mathcal {D}}_\gamma u(s), {\mathcal {D}}_\gamma w(s) \rangle + \langle {\mathcal {D}}_\gamma u(t), {\mathcal {D}}_\gamma w(t) \rangle \Big )\, \varOmega _\gamma (s,t), \\&\quad \textstyle 2 \, \Big \langle \tfrac{\triangle u}{\varrho _\gamma } , \tfrac{\triangle w}{\varrho _\gamma } \Big \rangle = \textstyle \tfrac{1}{\varrho _\gamma ^2} \int _{I_\gamma ^2} \Big ( \langle {\mathcal {D}}_\gamma u(s), {\mathcal {D}}_\gamma w(t) \rangle + \langle {\mathcal {D}}_\gamma u(t), {\mathcal {D}}_\gamma w(s) \rangle \Big )\, \varOmega _\gamma (s,t), \end{aligned}$$

we obtain

$$\begin{aligned} \textstyle \tfrac{2}{\varrho _\gamma } \int _{I_\gamma } \langle {\mathcal {D}}_\gamma u, {\mathcal {D}}_\gamma w \rangle \, \omega _{\gamma } - 2\,\Big \langle \tfrac{\triangle u}{\varrho _\gamma } , \tfrac{\triangle w}{\varrho _\gamma } \Big \rangle&= \textstyle \int _{I_\gamma ^2} \Big \langle \tfrac{\triangle {\mathcal {D}}_\gamma u(s,t)}{\varrho _\gamma }, \tfrac{\triangle {\mathcal {D}}_\gamma w(s,t)}{\varrho _\gamma } \Big \rangle \, \varOmega _\gamma (s,t). \end{aligned}$$
(26)

Now we apply the same technique as in (25): Utilizing the notation of Theorem 3.3 and the substitutions \(y = \eta _x(X)\), \(s = \eta _x(\theta _1 \, X)\), \(t = \eta _x(\theta _2 \, X)\), and \(Y = (\theta _1 -\;\theta _2) \, X\), we arrive at:

$$\begin{aligned}&{\mathcal {B}}_\gamma (u,w) = \textstyle \int _0^1 \!\! \int _0^1 \!\! \int _{{\mathbb {T}}} \! \int _{-|{\theta _1-\theta _2}|\ell }^{|{\theta _1-\theta _2}| \ell }\Big \langle \tfrac{\triangle {\mathcal {D}}_\gamma u(x,\eta _x(Y))}{|{Y}|}, \tfrac{\triangle {\mathcal {D}}_\gamma w(x,\eta _x(Y))}{|{Y}|} \Big \rangle \, |{\theta _1-\theta _2}| \, {{\text {d}}}Y \, \omega _{\gamma }(x) \, {{\text {d}}}\theta _1 \, {{\text {d}}}\theta _2. \end{aligned}$$

Thus, we can bound \(|{{\mathcal {B}}_\gamma (u,w)}|\) from above by

$$\begin{aligned}&\textstyle \int _{{\mathbb {T}}} \! \int _{-\ell }^{\ell } \big |{ \tfrac{\triangle {\mathcal {D}}_\gamma u(x,\eta _x(Y))}{|{Y}|} }\big | \big |{ \tfrac{\triangle {\mathcal {D}}_\gamma w(x,\eta _x(Y))}{|{Y}|} }\big | \, {{\text {d}}}Y \, \omega _{\gamma }(x)\\&\quad \le [{u}]_{W^{{{s}+{\nu }},p}_{\gamma }}\, [{w}]_{W^{s-\nu ,q }_{\gamma }} = [{u}]_{{\mathcal {X}},\gamma }\,[{w}]_{{\mathcal {Y}},\gamma }. \end{aligned}$$

\(\square \)

4 Metrics and Riesz isomorphisms

Next to the differential of the energy, the second ingredient that one requires for defining a gradient is a Riemannian metric on the configuration space \(\mathcal {C}\) that has been introduced in (5). Below, we pick a suitable inner product on \({\mathcal {H}}\) which is essentially a geometric version of the \(W^{{s},2}\)-Gagliardo inner product G from the introduction (see (4)). Throughout, we represent this inner product at the point \(\gamma \in \mathcal {C}\) only by its Riesz isomorphism .

If identified the tangent space \(T_\gamma \mathcal {C}\) of \(\mathcal {C}\) at \(\gamma \) with the cotangent space, i.e., its dual \(T_\gamma '{^{\!}}\mathcal {C}\), we could define the gradient of \({\mathcal {E}}\) by

Alas, \(T_\gamma \mathcal {C}= {\mathcal {X}}\) is not a Hilbert space for its Hilbert space completion (with respect to ) is \({\mathcal {H}}\ne {\mathcal {X}}\), so that there cannot be any linear isomorphism \(T_\gamma \mathcal {C}\rightarrow T_\gamma '{^{\!}}\mathcal {C}= {\mathcal {X}}'{^{\!}}\) induced by a positive-definite bilinear form. However, we have already seen in Theorem 3.1 that \(D {\mathcal {E}}(\gamma )\) can be interpreted as an element in the smaller space \({\mathcal {Y}}'{^{\!}}\subset {\mathcal {X}}'{^{\!}}\), and that is locally Lipschitz continuous. Thus, Theorem 1.1 is proven as soon as we show that induces an isomorphism which depends locally Lipschitz continuously on \(\gamma \). This is our goal in this section.

Depending on context, we use \(\langle \cdot , \cdot \rangle \) for the dual pairing of a Banach space with its dual, e.g., \(\langle \cdot , \cdot \rangle :{\mathcal {H}}'{^{\!}}\times {\mathcal {H}}\rightarrow {\mathbb {R}}\) and \(\langle \cdot , \cdot \rangle :{\mathcal {Y}}'{^{\!}}\times {\mathcal {Y}}\rightarrow {\mathbb {R}}\), as well as the Euclidean inner product on \({{\mathbb {R}}^{m}}\). Note that the canonical embeddings \(i_{\mathcal {C}} :{\mathcal {X}}\hookrightarrow {\mathcal {H}}\) and \(j_{\mathcal {C}} :{\mathcal {H}}\hookrightarrow {\mathcal {Y}}\) introduced in (11) give rise to dual maps \(i_{\mathcal {C}}'{^{\!}}:{\mathcal {H}}'{^{\!}}\hookrightarrow {\mathcal {X}}'{^{\!}}\) and \(j_{\mathcal {C}}'{^{\!}}:{\mathcal {Y}}'{^{\!}}\hookrightarrow {\mathcal {H}}'{^{\!}}\).

Proposition 4.1

For each \(\gamma \in \mathcal {C}\), with \(\sigma \;{:}{=}\; s - 1=\tfrac{1}{2}\) and the notation from (7), (8), and (9), we define and as follows:

for \(v_1\), \(v_2 \in {\mathcal {H}}\), \(u \in {\mathcal {X}}\), and \(w \in {\mathcal {Y}}\). These operators are well-defined, continuously invertible, and they make the following diagram commutative:

figure a

Moreover, the mappings , , and , , are of class \(C^1\) and hence locally Lipschitz continuous.

Here and in what followers, a doubly headed arrow in the diagram (27) above indicates a linear operator with dense image.

Proof

Well-definedness: It follows from the bi-Lipschitz continuity of \(\gamma \) and from Hölder’s inequality that the integrals

$$\begin{aligned} \textstyle \int _{{\mathbb {T}}^2}\langle \delta ^{\sigma }_{\gamma } {\mathcal {D}}_\gamma v_1 , \delta ^{\sigma }_{\gamma } {\mathcal {D}}_\gamma v_2 \rangle \, \mu _\gamma \quad \text {and} \quad \int _{{\mathbb {T}}^2}\langle \delta ^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma u , \delta ^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma w \rangle \, \mu _\gamma \end{aligned}$$

are well-defined and finite. Indeed, we have by Hölder’s inequality that

$$\begin{aligned}&|{ \textstyle \int _{{\mathbb {T}}^2}\langle \delta ^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma u , \delta ^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma w \rangle \, \mu _\gamma }|\\&\quad \le \Vert {\langle \delta ^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma u , \delta ^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma w \rangle }\Vert _{L_{\mu _\gamma }^{1}} \\&\quad \le \Vert {\delta ^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma u}\Vert _{L_{\mu _\gamma }^{p}} \Vert {\delta ^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma w}\Vert _{L_{\mu _\gamma }^{q}} = \left[ u\right] _{{\mathcal {X}},\gamma } \left[ w\right] _{{\mathcal {Y}},\gamma } \end{aligned}$$

and, analogously, \( |{ \textstyle \int _{{\mathbb {T}}^2}\langle \delta ^{\sigma }_{\gamma } {\mathcal {D}}_\gamma v_1 , \delta ^{\sigma }_{\gamma } {\mathcal {D}}_\gamma v_2 \rangle \, \mu _\gamma }| \le \left[ v_1\right] _{{\mathcal {H}},\gamma } \left[ v_2\right] _{{\mathcal {H}},\gamma }\). Moreover, the existence of the integrals

$$\begin{aligned} \textstyle \int _{{\mathbb {T}}^2}\langle \delta ^{\sigma }_{\gamma } v_1 , \delta ^{\sigma }_{\gamma } v_2 \rangle \, E(\gamma ) \, \mu _\gamma \quad \text {and} \quad \int _{{\mathbb {T}}^2}\langle \delta ^{{{\sigma }+{\nu }}}_{\gamma } u , \delta ^{{{\sigma }-{\nu }}}_{\gamma } w \rangle \, E(\gamma ) \, \mu _\gamma \end{aligned}$$

follows from Theorem 3.3 with \(k=1\), \(\alpha = 2\), \(\beta =0\), and \(L_1 = K = \triangle /\varrho _\gamma \). (Here we use \(\sigma =\tfrac{1}{2}\).) The commutativity of (27) follows from the pointwise identity

$$\begin{aligned} \langle \delta ^{{{\sigma }+{\nu }}}_{\gamma } \varphi , \delta ^{{{\sigma }-{\nu }}}_{\gamma } \psi \rangle (x,y) = \tfrac{\langle \varphi (x) - \varphi (y),\psi (x) - \psi (y) \rangle }{|{\gamma (x)-\gamma (y)}|^{2 \sigma }} = \langle \delta ^{\sigma }_{\gamma } \varphi , \delta ^{\sigma }_{\gamma } \psi \rangle (x,y) \end{aligned}$$

which holds for arbitrary functions \(\varphi \), \(\psi :{\mathbb {T}}\rightarrow {{\mathbb {R}}^{m}}\).

Invertibility: We show this only for , as the argument for is analogous. First we observe that is injective. Indeed, let . Then

implies that \({\mathcal {D}}_{\gamma } u\) must be constant and that the mean value of u vanishes. But the first condition can only hold if u is already constant and the second one forces this constant to be zero. So it suffices to show that is a Fredholm operator of index 0. To this end, we define the operator \(A_\gamma :{\mathcal {X}}\rightarrow {\mathcal {Y}}'{^{\!}}\) by

$$\begin{aligned} \textstyle \langle A_\gamma \, u, w \rangle \;{:}{=}\; \int _{{\mathbb {T}}^2}\big \langle D^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma u, D^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma u \big \rangle \, {{\text {d}}}\mu _\gamma , \quad \text {where} \quad D^{\alpha }_{\gamma } \varphi \;{:}{=}\; \frac{\triangle \varphi }{\varrho _\gamma ^\alpha }. \end{aligned}$$
(28)

Now we observe that \(\langle A_\gamma \, u, w \rangle = \int _{{\mathbb {T}}^2}\big \langle D^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma u, D^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma u \big \rangle \, {{\text {d}}}\mu _\gamma \) and that

From Theorem 3.3 we may conclude that \(A_\gamma \) is a compact perturbation of . So it suffices to show that \(A_\gamma \) is a Fredholm operator of index 0. As this is a bit more involved, we defer this to Theorem 4.2 below.

Fréchet differentiability: This can be shown by utilizing basically the same technique as in the proof of the Fréchet differentiability of the energy: first order Taylor expansion of the integrand around the point \(\gamma \) and bounding the integral of the second order remainder term (see Claim 3 in Theorem 3.1). In fact, the analysis here is bit easier for the geodesic distance \(\varrho _\gamma \) does not appear in the definitions of and . For the sake of brevity, we omit the details. \(\square \)

4.1 Details

The remainder of this section is devoted to proving the following lemma. The employed techniques are fairly standard in the area of pseudo-differential operators; only the fact that the differential operators involved here are nonlocal introduces a couple of further technicalities. Throughout, we will suppose that \(\gamma \in \mathcal {C}\). Moreover, we will denote by \(K :{\mathcal {X}}\rightarrow {\mathcal {Z}}\) a generic compact operator that—pretty much like the ever-expanding “constant” C—may change from line to line.

Lemma 4.2

The operator \(A_\gamma :{\mathcal {X}}\rightarrow {\mathcal {Y}}'{^{\!}}\) defined in (28) is a Fredholm operator of index 0.

Proof

As the Fredholm property and the index are invariant under compact perturbations, it suffices to show that \(A_\gamma + M_\gamma \) is continuously invertible, where we define the compact operator \(M_\gamma :{\mathcal {X}}\rightarrow {\mathcal {Y}}'{^{\!}}\) by The operator \(A_\gamma + M_\gamma \) is injective because of . Since the operator \(j_{\mathcal {C}} \, i_{\mathcal {C}} :{\mathcal {X}}\hookrightarrow {\mathcal {Y}}\) is injective and has dense image, this also implies that \(A_\gamma + M_\gamma \) has dense image. By virtue of the Schauder lemma (see Theorem A.5), it suffices to establish an elliptic estimate of the form \( \Vert {u}\Vert _{{\mathcal {X}}} \le C\, \big ( \Vert {(A_\gamma + M_\gamma )\, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {K \, u}\Vert _{{\mathcal {Z}}} \big )\) for all \(u \in {\mathcal {X}}\) with suitable norms \(\Vert {\cdot }\Vert _{{\mathcal {X}}}\) and \(\Vert {\cdot }\Vert _{{\mathcal {Y}}}\) that will be defined soon. Indeed, we will show in Theorem 4.6 that

$$\begin{aligned} \Vert {u}\Vert _{{\mathcal {X}}} \le C\, \big ( \Vert {A_\gamma \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {K \, u}\Vert _{{\mathcal {Z}}} \big ) \quad \text {for all } u \in {\mathcal {X}}. \end{aligned}$$
(29)

Since \(M_\gamma \) is compact, this implies the previous inequality (for different C, K and \({\mathcal {Z}}\)). \(\square \)

In order to prove (29), it is convenient to introduce some additional notation. The operator \(A_\gamma \) is defined in terms of \(D^{{{\sigma }+{\nu }}}_{\gamma } = \frac{\triangle }{\varrho _\gamma ^{{\sigma }+{\nu }}}\) and \(D^{{{\sigma }-{\nu }}}_{\gamma } = \frac{\triangle }{\varrho _\gamma ^{{\sigma }-{\nu }}}\), see (28), so it is natural to consider the following local semi-norms: For an open set \(U \subset {\mathbb {T}}\), \(0<\alpha <1\), and \(1 \le r < \infty \) and a measurable \(v :{\mathbb {T}}\rightarrow {{\mathbb {R}}^{m}}\), we define

For convenience, we fix \(\gamma \) and define

We point out that \(\gamma \in \mathcal {C}\) has always finite energy, thus \(\gamma \) is bi-Lipschitz continuous, see [62, Thm. 2.3]); hence the norms \(\Vert {\cdot }\Vert _{{\mathcal {X}}}\), \(\Vert {\cdot }\Vert _{{\mathcal {Y}}}\) are equivalent to \(\Vert {\cdot }\Vert _{{\mathcal {X}},\gamma }\), \(\Vert {\cdot }\Vert _{{\mathcal {Y}},\gamma }\), respectively, cf. (10). For a measurable function \({\tilde{v}} :{\mathbb {R}}\rightarrow {{\mathbb {R}}^{m}}\) and an open set \({\tilde{U}} \subset {{\mathbb {R}}^{m}}\), we define

$$\begin{aligned} D^{\alpha }_{} {\tilde{v}}(x,y) \;{:}{=}\; \frac{{\tilde{v}}(x) - {\tilde{v}}(y)}{|{x-y}|^\alpha } \quad \text {and} \quad {\mathcal {D}}{\tilde{v}} (x) = {\tilde{v}}'(x), \end{aligned}$$

as well as the following semi-norms:

Moreover, we use following abbreviations for the global Sobolev spaces and their norms:

$$\begin{aligned} {\tilde{{\mathcal {X}}}}&\,{:}{=}\, W^{{{s}+{\nu }},p}{({\mathbb {R}};{\mathbb {R}}^m)}, \;\;\; \Vert {\cdot }\Vert _{{\tilde{{\mathcal {X}}}}} \;{:}{=}\; \Vert {\cdot }\Vert _{W^{{{s}+{\nu }},p}{({\mathbb {R}})}}, \\ {\tilde{{\mathcal {Y}}}}&\,{:}{=}\, W^{{s-{\nu },q}}{({\mathbb {R}};{\mathbb {R}}^m)}, \;\quad \Vert {\cdot }\Vert _{{\tilde{{\mathcal {Y}}}}} \;{:}{=}\; \Vert {\cdot }\Vert _{W^{s-{\nu },q}{({\mathbb {R}})}}. \end{aligned}$$

Via localization techniques, we will compare \(A_\gamma \) to \(L :{\tilde{{\mathcal {X}}}} \rightarrow {\tilde{{\mathcal {Y}}}}'{^{\!}}\) given by

$$\begin{aligned} \langle L \, {\tilde{u}}, {\tilde{v}} \rangle \;{:}{=}\; \textstyle \int _{{\mathbb {R}}\times {\mathbb {R}}} \langle D^{{{s}+{\nu }}}_{} {\mathcal {D}}{\tilde{u}}, D^{{{s}-{\nu }}}_{} {\mathcal {D}}{\tilde{w}} \rangle \, {{\text {d}}}\mu \quad \text {with} \quad {{\text {d}}}\mu (x,y) \;{:}{=}\; \frac{{{\text {d}}}x \, {{\text {d}}}y}{|{x-y}|}. \end{aligned}$$
(30)

Compare this to the weak formulation of the fractional Laplacian

$$\begin{aligned} (-\varDelta )^{\sigma } :W^{{{\sigma }+{\nu }},p}({\mathbb {R}};{\mathbb {R}}^m)\rightarrow W^{-{{\sigma }+{\nu }},p}({\mathbb {R}};{\mathbb {R}}^m) \cong (W^{{{\sigma }-{\nu }},q}({\mathbb {R}};{\mathbb {R}}^m))'{^{\!}}\end{aligned}$$

which is given by

$$\begin{aligned} \textstyle \langle (-\varDelta )^{\sigma } \varphi , \psi \rangle&= C_\sigma \int _{{\mathbb {R}}\times {\mathbb {R}}} \langle D^{{{s}+{\nu }}}_{} \varphi , D^{{{s}-{\nu }}}_{} \psi , \rangle \, {{\text {d}}}\mu \\&= C_{\sigma } \int _{{\mathbb {R}}}\int _{{\mathbb {R}}} \Big \langle \frac{\varphi (x) - \varphi (y)}{|{x-y}|^{\sigma }}, \frac{\psi (x) - \psi (y)}{|{x-y}|^{\sigma }} \Big \rangle \, \frac{{{\text {d}}}x\, {{\text {d}}}y}{|{x-y}|} \end{aligned}$$

for some \(C_{\sigma } > 0\) (see e.g., [50, Theorem 1.1]). So up to a constant, we have \(L = {\mathcal {D}}'{^{\!}}(-\varDelta )^\sigma {\mathcal {D}}\), hence L is a pseudo-differential operator of order \(2 \, \sigma +2=2s=3\) and its principal symbol \(P(L) :T'{^{\!}}\, {\mathbb {R}}\cong {\mathbb {R}}\times {\mathbb {R}}\rightarrow {{\,\mathrm{End}\,}}({{\mathbb {R}}^{m}})\) is (up to a constant) given by \(P(L)(x,\xi ) = |{\xi }|^{2s}\, {{\,\mathrm{id}\,}}_{{\mathbb {R}}^{m}} = P(({{\,\mathrm{id}\,}}-\varDelta )^{s})(x,\xi )\). By [92, 2.3.8], the operator L is continuously invertible.Footnote 3

The following two lemmas will help us in localizing Sobolev norms.

Lemma 4.3

(Norm localization) Let \(0<\alpha <1\), \(1 \le r < \infty \) and let \(U \subset \subset V \subset \subset {\mathbb {T}}\) be relatively compact, open sets. Then there is a \(C = C(\gamma ,\alpha ,r,U,V)>0\) such that

$$\begin{aligned} \Vert {v}\Vert _{W_{\varrho ,\gamma }^{1+\alpha ,r}(V)}&\le \Vert {v}\Vert _{ W_{\varrho ,\gamma }^{1+\alpha ,r}({\mathbb {T}})}\\&\le C \, \Vert {v}\Vert _{W_{\varrho ,\gamma }^{1+\alpha ,r}(V)} \;\; \text {for all }v \in W^{1+\alpha ,r}({\mathbb {T};{\mathbb {R}}^m}) \text { with } {{\,\mathrm{supp}\,}}(v) \subset V. \end{aligned}$$

Lemma 4.4

(Norm localization) Let \(0<\alpha <1\), \(1 \le r < \infty \) and let \({\tilde{V}} \subset \subset {\tilde{U}} \subset \subset {\mathbb {R}}\) be relatively compact, open sets. Then there is a \(C = C(\alpha ,r,{\tilde{U}}, {\tilde{V}})>0\) such that

$$\begin{aligned} \Vert {{\tilde{v}}}\Vert _{ W^{1+\alpha ,r}({\tilde{U}})}&\le \Vert {{\tilde{v}}}\Vert _{ W^{1+\alpha ,r}{(\mathbb {R})}}\\&\le C \, \Vert {{\tilde{v}}}\Vert _{ W^{1+\alpha ,r}({\tilde{U}})} \;\; \text {for all }{\tilde{v}} \in W^{1+\alpha ,r}{(\mathbb {R}};{\mathbb {R})}\text { with }{{\,\mathrm{supp}\,}}({\tilde{v}}) \subset {\tilde{V}}. \end{aligned}$$

Their proofs are quite similar, so we show only the proof of the latter for it contains an additional difficulty.

Proof of Theorem 4.4

Since \({{\,\mathrm{supp}\,}}({\tilde{v}}) \subset {\tilde{V}} \subset {\tilde{U}}\), we obviously have

The following identity is caused by the nonlocality of \(D^{\alpha }_{}\) and it is easily overlooked:

Notice that we used here that \({{\,\mathrm{supp}\,}}({\mathcal {D}}\, {\tilde{v}}) \subset {\tilde{V}}\). We assumed that \({\tilde{V}} \subset \subset {\tilde{U}}\) is relatively compact, so there is an \(R>0\) such that \(|{x-y}| \ge R\) for all \(x \in {\mathbb {R}}\setminus {\tilde{U}}\) and \(y \in {\tilde{V}}\). Because of \(1+\alpha \, r>1\), we have \( \textstyle \sup _{y \in {\tilde{V}}} \int _{{\mathbb {R}}\setminus {\tilde{U}}} \, \frac{{{\text {d}}}x}{|{x-y}|^{1+\alpha \, r}} \le \int _R^\infty \frac{{{\text {d}}}t}{t^{1+\alpha \, r}} < \infty \), leading us to

which concludes the proof. \(\square \)

Now we can start to show (29), at least for functions u with small support.

Lemma 4.5

(Local elliptic estimate) For each point \(a \in {\mathbb {T}}\), there are open neighborhoods \(W \subset \subset U \subset {\mathbb {T}}\) and a compact operator \(K :{\mathcal {X}}\rightarrow {\mathcal {Z}}\) into some Banach space \({\mathcal {Z}}\) such that

$$\begin{aligned} \Vert {u}\Vert _{{\mathcal {X}}} \le C \, \big ( \Vert {A_\gamma \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {K \, u}\Vert _{{\mathcal {Z}}} \big ) \quad \text {holds for each } u \in {\mathcal {X}}\text { with } {{\,\mathrm{supp}\,}}(u) \subset W. \end{aligned}$$
(31)

Proof

We start by picking an isometric coordinate system around the point \(a\in {\mathbb {T}}\). With \(L \;{:}{=}\; \int _{\mathbb {T}}\omega _{\gamma }\) denoting the curve length, we choose \(U \;{:}{=}\; B_{L/4}(a)\), \(V \;{:}{=}\; B_{L/8}(a)\), and \(W \;{:}{=}\; B_{L/16}(a)\), where these balls are meant with respect to the intrinsic distance \(\varrho _\gamma \). We point out that U is geodesically convex, i.e., each shortest arc between two points in U is contained in U. Likewise we define the following open balls \({\tilde{U}} \;{:}{=}\; B_{L/4}(0)\), \({\tilde{V}} \;{:}{=}\; B_{L/8}(0)\), and \({\tilde{W}} \;{:}{=}\; B_{L/16}(0)\) in \({\mathbb {R}}\), this time with respect to the standard distance on \({\mathbb {R}}\). Now we define the isometric chart \( f:U \rightarrow {\tilde{U}} \) by \(f(x) = \pm \varrho _\gamma (a,x)\), where the sign depends on whether the shortest curve from a to x is oriented positively (\(+\)) or negatively (−) with respect to the standard orientation of \({\mathbb {T}}\). Let \(u \in {\mathcal {X}}\) be a function with \({{\,\mathrm{supp}\,}}(u) \subset W\). Then we can find a unique function \({\tilde{u}} \in {\tilde{{\mathcal {X}}}}\) with \({{\,\mathrm{supp}\,}}({\tilde{u}}) \subset {\tilde{W}}\) and \(u = {\tilde{u}} \circ f\).Footnote 4 Because f is an isometry, we have \(\Vert {u}\Vert _{{W^{s+\nu ,p}_{\varrho ,\gamma }}{(U)}} = \Vert {{\tilde{u}}}\Vert _{W^{{{s}+{\nu }},p}{({\tilde{U}})}}\). From Theorem 4.3 and from the continuous invertibility of \(L :{\tilde{{\mathcal {X}}}} \rightarrow {\tilde{{\mathcal {Y}}}}\), we deduce that there is a \(C \ge 0\) (depending only on \(\gamma \), \({{s}+{\nu }}\), \(p\), U, and W) such that

$$\begin{aligned} \Vert {u}\Vert _{{\mathcal {X}}}&= \Vert {u}\Vert _{W^{{{s}+{\nu }},p}_{\varrho ,\gamma }({\mathbb {T}})} \le C \, \Vert {u}\Vert _{W^{{{s}+{\nu }},p}_{\varrho ,\gamma }(U)}\nonumber \\&\quad = C \, \Vert {{\tilde{u}}}\Vert _{W^{{{s}+{\nu }},p}({\tilde{U}})} \le C \,\Vert {{\tilde{u}}}\Vert _{{\tilde{{\mathcal {X}}}}} \le C \, \Vert {L^{-1}}\Vert \, \Vert {L \, {\tilde{u}}}\Vert _{{\tilde{{\mathcal {Y}}}}'{^{\!}}}. \end{aligned}$$
(32)

Our next goal is to control \(\Vert {L \, {\tilde{u}}}\Vert _{{\tilde{{\mathcal {Y}}}}'{^{\!}}}\) by \(\Vert {A_\gamma \, u}\Vert _{{\mathcal {Y}}}\) modulo a compact operator. To this end, we choose a bump function \(\eta \in W^{{{s}+{\nu }},p}({\mathbb {T}};{\mathbb {R}})\) with values in \(\left[ 0,1\right] \) that satisfies \(\eta (x) = 1\) for all \(x \in {\bar{W}}\) and \({{\,\mathrm{supp}\,}}(\eta ) \subset V\). We denote by \({\tilde{\eta }} \in W^{{{s}+{\nu }},p}({\mathbb {R}};{\mathbb {R}})\) the unique function \({\tilde{\eta }} \circ f= \eta \) and \({{\,\mathrm{supp}\,}}({\tilde{\eta }}) \subset {\tilde{V}}\). These functions induce the following multiplication operators:

$$\begin{aligned} H:{\mathcal {X}}&\rightarrow {\mathcal {X}},&H\, u&\;{:}{=}\; \eta \, u;&&{\tilde{H}} :{\tilde{{\mathcal {X}}}}&\rightarrow {\tilde{{\mathcal {X}}}},&{\tilde{H}} \, \varphi&\;{:}{=}\; {\tilde{\eta }} \, \varphi ; \\ H:{\mathcal {Y}}&\rightarrow {\mathcal {Y}},&H\, w&\;{:}{=}\; \eta \, w;&&{\tilde{H}} :{\tilde{{\mathcal {Y}}}}&\rightarrow {\tilde{{\mathcal {Y}}}},&{\tilde{H}} \, \psi&\;{:}{=}\; {\tilde{\eta }} \, \psi . \end{aligned}$$

Because of \(H\, u = u\) and \({\tilde{H}} \, {\tilde{u}} = {\tilde{u}}\), we may split \(A_\gamma \, u \) and \(L {\tilde{u}}\) into

$$\begin{aligned} A_\gamma \, u = H'{^{\!}}\, A_\gamma \, u + (A_\gamma \, H- H'{^{\!}}\, A_\gamma ) \, u \quad \text {and} \quad L \, {\tilde{u}} = {\tilde{H}}'{^{\!}}\, L \, {\tilde{u}} + (L \, {\tilde{H}} - {\tilde{H}}'{^{\!}}\, L) \, {\tilde{u}}. \end{aligned}$$

Claim

The operators \(Q \;{:}{=}\; A_\gamma \, H- H'{^{\!}}\, A_\gamma \) and \({\tilde{Q}} \;{:}{=}\; L \, {\tilde{H}} - {\tilde{H}}'{^{\!}}\, L\) are compact.

For \(0<\alpha <1\) and \(v :{\mathbb {T}}\rightarrow {{\mathbb {R}}^{m}}\), the Leibniz rule implies

$$\begin{aligned} D^{\alpha }_{\gamma } {\mathcal {D}}_{\gamma } (\eta \, v)&= (\eta \circ \pi _1) \, (D^{\alpha }_{\gamma } {\mathcal {D}}_{\gamma } v) + (D^{\alpha }_{\gamma } \eta ) \, ({\mathcal {D}}_{\gamma } v \circ \pi _2) \\&\qquad + (D^{\alpha }_{\gamma } {\mathcal {D}}_{\gamma } \eta ) \, (v \circ \pi _1) + ({\mathcal {D}}_{\gamma } \eta \circ \pi _2) \, (D^{\alpha }_{\gamma } v), \end{aligned}$$

where \(\pi _1\), \(\pi _2:{{\mathbb {T}}^2}\rightarrow {\mathbb {T}}\) are the projections given by \(\pi _1(x,y) = x\) and \(\pi _2(x,y) = y\). This allows us to write

$$\begin{aligned} D^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma (\eta \, u)&= (\eta \circ \pi _1) \cdot D^{s+{\nu }}_{\gamma } {\mathcal {D}}_\gamma u +K_1 \, u \quad \text {and} \\ D^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma (\eta \, w)&= (\eta \circ \pi _1) \cdot D^{s-{\nu }}_{\gamma } {\mathcal {D}}_\gamma w +K_2 \, w, \end{aligned}$$

with compact operators \(K_1 :{\mathcal {X}}\rightarrow (L^{q}_{\mu _\gamma }({\mathbb {T}}^2;{\mathbb {R}}^{m}))'{^{\!}}\) and \(K_2 :{\mathcal {Y}}\rightarrow L^{q}_{\mu _\gamma }({\mathbb {T}}^2;{\mathbb {R}}^{m})\). Now we have

$$\begin{aligned}&\langle (A_\gamma \, H- H'{^{\!}}\, A_\gamma ) \, u, w \rangle _{{\mathcal {Y}}'{^{\!}},{\mathcal {Y}}} \\&\quad = \textstyle \int _{{\mathbb {T}}^2} \big ( \langle D^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma (\eta \, u), D^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma w \rangle - \langle D^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma u, D^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma (\eta \, w) \rangle \big ) \mu _\gamma \\&\quad = \langle K_1 u, D^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma w \rangle _{(L^{q}_{\mu _\gamma })'{^{\!}},L^{q}_{\mu _\gamma }} - \langle D^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma u, K_2 \, w \rangle _{(L^{q}_{\mu _\gamma })'{^{\!}},L^{q}_{\mu _\gamma }} , \end{aligned}$$

thus \(A_\gamma \, H- H'{^{\!}}\, A_\gamma = (D^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma )'{^{\!}}\, K_1 - K_2'{^{\!}}\, D^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma \) is compact. This shows the first statement. The second statement is proven analogously.

From this claim it follows that the “leading term” of \(L {\tilde{u}}\) is \({\tilde{H}}'{^{\!}}\, L \, {\tilde{u}}\). Next we are going to bound \(\Vert {{\tilde{H}}'{^{\!}}\, L \, {\tilde{u}}}\Vert _{{\tilde{{\mathcal {Y}}}}'{^{\!}}}\). For this it suffices to test \({\tilde{H}}'{^{\!}}\, L \, {\tilde{u}}\) against only such \({\tilde{w}} \in {\tilde{Y}}\) with \({{\,\mathrm{supp}\,}}({\tilde{w}}) \subset V\). More precisely, we have

$$\begin{aligned} \Vert {{\tilde{H}}'{^{\!}}\, L \, {\tilde{u}}}\Vert _{{\tilde{{\mathcal {Y}}}}'{^{\!}}} = \sup _{{\tilde{w}} \in {\tilde{{\mathcal {Y}}}}, \; {{\,\mathrm{supp}\,}}({\tilde{w}}) \subset {\tilde{V}}} \frac{ \langle L \, {\tilde{u}}, {\tilde{\eta }} \, {\tilde{w}} \rangle _{{\tilde{{\mathcal {Y}}}}'{^{\!}},{\tilde{{\mathcal {Y}}}}} }{ \Vert {{\tilde{w}}}\Vert _{{\tilde{{\mathcal {Y}}}}} }. \end{aligned}$$

For every such \({\tilde{w}}\) there is a \(w \in {\mathcal {Y}}\) with \({{\,\mathrm{supp}\,}}(w) \subset V\) such that \(w = {\tilde{w}} \circ f\). Since also \({\tilde{u}}\) and \({\tilde{\eta }}\) are constructed in this way from u and \(\eta \), we have

$$\begin{aligned} \langle L \, {\tilde{u}}, {\tilde{\eta }} \, {\tilde{w}} \rangle _{{\tilde{{\mathcal {Y}}}}'{^{\!}},{\tilde{{\mathcal {Y}}}}}&= \textstyle \int _{{\tilde{U}}} \int _{{\tilde{U}}} \langle D^{{s}+{\nu }}\, {\mathcal {D}}\, {\tilde{u}} , D^{{s}-{\nu }}\, {\mathcal {D}}\, ({\tilde{\eta }}_a \, {\tilde{w}}) \rangle \, {{\text {d}}}\mu \\&= \textstyle \int _{U} \int _{U} \langle D_\gamma ^{{s}+{\nu }}\, {\mathcal {D}}_\gamma \, u , D_\gamma ^{{s}-{\nu }}\, {\mathcal {D}}_\gamma \, (\eta \, w) \rangle \, {{\text {d}}}\mu _\gamma \\&= \langle A_\gamma \, u, \eta \, w \rangle _{{\mathcal {Y}}'{^{\!}},{\mathcal {Y}}} = \langle H'{^{\!}}\, A_\gamma \, u, w \rangle _{{\mathcal {Y}}'{^{\!}},{\mathcal {Y}}}\\&= \langle A_\gamma \, u, w \rangle _{{\mathcal {Y}}'{^{\!}},{\mathcal {Y}}} + \langle Q \, u, w \rangle _{{\mathcal {Y}}'{^{\!}},{\mathcal {Y}}} \\&\le \big ( \Vert {A_\gamma \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {Q \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} \big ) \, \Vert {w}\Vert _{{\mathcal {Y}}} . \end{aligned}$$

By Theorem 4.3 and Theorem 4.4, there is a \(C \ge 0\) (depending only on \(\gamma \), \({{s}-{\nu }}\), \(q\), U, W, \({\tilde{U}}\), and \({\tilde{W}}\) such that \( \Vert {w}\Vert _{{\mathcal {Y}}} \le C \, \Vert {w}\Vert _{W^{{{s}-{\nu }},q}_{\varrho ,\gamma }(U)} = C \, \Vert {{\tilde{w}}}\Vert _{W^{{{s}-{\nu }},q}({\tilde{U}})} \le C^2 \, \Vert {{\tilde{w}}}\Vert _{{\tilde{{\mathcal {Y}}}}} . \) Thus we obtain

$$\begin{aligned} \Vert {{\tilde{H}}'{^{\!}}\, L \, {\tilde{u}}}\Vert _{{\tilde{{\mathcal {Y}}}}'{^{\!}}} \le C^2 \, \big ( \Vert {A_\gamma \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {Q \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} \big ). \end{aligned}$$

Combined with (32), this leads to

$$\begin{aligned} \Vert {u}\Vert _{{\mathcal {X}}}&\le C \, \big ( \Vert {A_\gamma \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {Q \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {{\tilde{Q}} \, {\tilde{u}}}\Vert _{{\tilde{{\mathcal {Y}}}}'{^{\!}}} \big ) . \end{aligned}$$

Again by Theorem 4.3 and Theorem 4.4, the mapping \(u \mapsto u = H u \mapsto {\tilde{u}}\) is continuous. Since \({\tilde{Q}}\) is compact, the mapping \(u \mapsto {\tilde{Q}} \, {\tilde{u}}\) is compact as well. This concludes the proof. \(\square \)

Finally, we piece together the local estimates from above.

Lemma 4.6

(Global elliptic estimate) Let \(\gamma \in \mathcal {C}\). Then there are \(C >0\) and a compact, linear operator \(K :{\mathcal {X}}\rightarrow {\mathcal {Z}}\) into some Banach space \({\mathcal {Z}}\) such that

$$\begin{aligned} \Vert {u}\Vert _{{\mathcal {X}}} \le C\, \big ( \Vert {A_\gamma \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {K \, u}\Vert _{{\mathcal {Z}}} \big ) \quad \text {for all }u \in {\mathcal {X}}. \end{aligned}$$

Proof

We start by covering \({\mathbb {T}}\) by finitely many open sets \(W_i \subset \subset U_i \subset {\mathbb {T}}\), \(i = 1, \dotsc ,k\), as in Theorem 4.5 such that there are local elliptic estimates of the form

$$\begin{aligned} \Vert {u}\Vert _{{\mathcal {X}}} \le C_i \, \big ( \Vert {A_\gamma \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {K_i \, u}\Vert _{{\mathcal {Z}}_i} \big ) \quad \hbox { for all}\ u \in {\mathcal {X}}\text { with } {{\,\mathrm{supp}\,}}(u) \subset W_i, \end{aligned}$$

with suitable constants \(C_i \ge 0\) and suitable compact operators \(K_i :{\mathcal {X}}\rightarrow {\mathcal {Z}}_i\) into Banach spaces \({\mathcal {Z}}_i\). Now we pick a smooth partition of unity \(\{\varphi _1,\dotsc ,\varphi _k\} \subset C^{\infty }(\mathbb {T};\left[ 0,1\right] )\) subordinate to \(\{W_1,\dotsc ,W_k\}\). We denote the corresponding multiplication operators by \(\varPhi _i :{\mathcal {X}}\rightarrow {\mathcal {X}}\), \(\varPhi _i \, u \;{:}{=}\; \varphi _i \, u\) and \(\varPhi _i :{\mathcal {Y}}\rightarrow {\mathcal {Y}}\), \(\varPhi _i \, w \;{:}{=}\; \varphi _i \, w\). Now we observe

$$\begin{aligned} \Vert {A_\gamma \, \varPhi _i\, u}\Vert _{{\mathcal {Y}}'{^{\!}}}&\le \Vert {\varPhi _i'{^{\!}}\, A_\gamma \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {(A_\gamma \, \varPhi _i - \varPhi _i'{^{\!}}\, A_\gamma ) \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} \\&\le \Vert {\varPhi _i'{^{\!}}}\Vert \,\Vert {A_\gamma \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {(A_\gamma \, \varPhi _i - \varPhi _i'{^{\!}}\, A_\gamma ) \, u}\Vert _{{\mathcal {Y}}'{^{\!}}}. \end{aligned}$$

With the triangle inequality, we obtain

$$\begin{aligned} \Vert {u}\Vert _{{\mathcal {X}}}&= \textstyle \Vert {\sum _{i=1}^k \varphi _i \, u}\Vert _{{\mathcal {X}}} \le \textstyle \sum _{i=1}^k \Vert {\varphi _i \, u}\Vert _{{\mathcal {X}}}\\&\le \textstyle \sum _{i=1}^k C_i \Big ( \Vert {A_\gamma \, \varPhi _i \, u }\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {K_i \, \varPhi _i \, u}\Vert _{{\mathcal {Z}}_i} \Big )\\&\le \textstyle \sum _{i=1}^k C_i \, \Big ( \Vert { \varPhi _i}\Vert \, \Vert {A_\gamma \, u }\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {(A_\gamma \, \varPhi _i - \varPhi _i'{^{\!}}\, A_\gamma ) \, u}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {K_i \, \varPhi _i \, u}\Vert _{{\mathcal {Z}}_i}\Big ). \end{aligned}$$

The separate claim in the proof of Theorem 4.5 shows that \(A_\gamma \, \varPhi _i - \varPhi _i'{^{\!}}\, A_\gamma \) is a compact operator. So setting \(C \;{:}{=}\; \sum _{i=1}^k C_i \, \Vert {\varPhi _i}\Vert + \max (C_1, \dotsc , C_k)\), \({\mathcal {Z}}\;{:}{=}\; ({\mathcal {Y}}'{^{\!}}\oplus {\mathcal {Z}}_1) \oplus \cdots \oplus ({\mathcal {Y}}'{^{\!}}\oplus {\mathcal {Z}}_k)\), \(\Vert {(\eta _1,z_1,\dotsc ,\eta _k,z_k)}\Vert _{{\mathcal {Z}}} \;{:}{=}\; \Vert {\eta _1}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {z_1}\Vert _{{\mathcal {Z}}_1} + \cdots + \Vert {\eta _k}\Vert _{{\mathcal {Y}}'{^{\!}}} + \Vert {z_k}\Vert _{{\mathcal {Z}}_k}\), and

$$\begin{aligned} K \, u \;{:}{=}\; \Big ( (A_\gamma \, \varPhi _1 - \varPhi _1'{^{\!}}\, A_\gamma ) \, u, K_1 \, u , \dotsc , (A_\gamma \, \varPhi _k - \varPhi _k'{^{\!}}\, A_\gamma ) \, u, K_k \, u \Big ) \end{aligned}$$

concludes the proof. \(\square \)

5 Constraints

Our aim in this section is to set up constraints on the barycenter and on the parametrization of curves and to show Theorem 1.2, i.e., well-definedness of the associated projected gradient and its flow.

Here, as before, we abbreviated \({\sigma }\;{:}{=}\; {s}-1=\tfrac{1}{2}\). By the choice of \({\nu }\) and \(p\) (see (9)), we have \(\mathcal {C}\subset W^{1+{{\sigma }+{\nu }},p}({\mathbb {T}};{\mathbb {R}}^m) \subset W^{1,\infty }({\mathbb {T}};{\mathbb {R}}^m)\). Hence for each \(\gamma \in \mathcal {C}\), the functions \(x \mapsto |{\gamma '(x)}|\) and \(x \mapsto |{\gamma '(x)}|^{-1}\) are both members of \(W^{{{\sigma }+{\nu }},p}({\mathbb {T}};{\mathbb {R}})\hookrightarrow L^{\infty }({\mathbb {T}};{\mathbb {R}})\). By the chain rule Theorem A.1, the following mapping is well-defined:

$$\begin{aligned} \varPhi :\mathcal {C}\rightarrow W^{{{\sigma }+{\nu }},p}({\mathbb {T}};{\mathbb {R}}) \oplus {\mathbb {R}}^m, \quad \varPhi (\gamma ) \;{:}{=}\; \Big ( \log (|{\gamma '}|) - \log (L), \textstyle \int _{\,{\mathbb {T}}} \gamma \, \omega _{\gamma } \Big ) . \end{aligned}$$
(33)

A curve \(\gamma \in \mathcal {C}\) is parametrized by constant speed L and has 0 as barycenter if and only if \(\varPhi (\gamma ) = (0,0)\). Our main task is to prove Theorem 5.1 below; it states that the feasible set

$$\begin{aligned} {\mathcal {M}}\;{:}{=}\; \{\gamma \in \mathcal {C}| \varPhi (\gamma ) = (0,0)\}, \end{aligned}$$

equipped with a generalized Riesz isomorphism inherited from is almost a Riemannian manifold, at least in view of the projected or intrinsic gradients. Theorem 1.2 will follow from this immediately.

To this end (and in analogy to the space triple \({\mathcal {X}}\), \({\mathcal {H}}\), and \({\mathcal {Y}}\)), we introduce the Banach space triple

$$\begin{aligned} {\mathcal {X}}\mathcal {N}&\;{:}{=}\; W^{{{\sigma }+{\nu }},p}({\mathbb {T}};{\mathbb {R}}) \oplus {{\mathbb {R}}^m}, \\ {\mathcal {H}}\mathcal {N}&\;{:}{=}\; W^{\sigma , 2}({\mathbb {T}};{\mathbb {R}}) \oplus {{\mathbb {R}}^m}, \\ {\mathcal {Y}}\mathcal {N}&\;{:}{=}\; W^{{{\sigma }-{\nu }},q}({\mathbb {T}};{\mathbb {R}}) \oplus {{\mathbb {R}}^m} \end{aligned}$$

and the continuous dense injections \( i_{\mathcal {N}} :{\mathcal {X}}\mathcal {N}\hookrightarrow {\mathcal {H}}\mathcal {N}\) and \( j_{\mathcal {N}} :{\mathcal {H}}\mathcal {N}\hookrightarrow {\mathcal {Y}}\mathcal {N}\). A straight-forward computation shows that \(\varPhi \) is differentiable and that its derivative \(D\varPhi (\gamma ) :{\mathcal {X}}\rightarrow {\mathcal {X}}\mathcal {N}\) is given by

$$\begin{aligned} D\varPhi (\gamma ) \, u = \textstyle \big ( \; \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma u \rangle \; , \; \int _{\mathbb {T}}( u +\gamma \, \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma u \rangle )\, \omega _{\gamma } \; \big ) \quad \text {for }u \in {\mathcal {X}}. \end{aligned}$$
(34)

Theorem A.4 implies that \(u \mapsto \langle {\mathcal {D}}_\gamma \gamma ,{\mathcal {D}}_\gamma u \rangle \) induces well-defined and continuous linear operators \({\mathcal {X}}\rightarrow {\mathcal {X}}\mathcal {N}\), \({\mathcal {H}}\rightarrow {\mathcal {H}}\mathcal {N}\), and \({\mathcal {Y}}\rightarrow {\mathcal {Y}}\mathcal {N}\), provided that \({\nu }> 0\), and \(p\ge 2\). With the usual convention \(\mathcal {X}_{\gamma }{\varPhi } \;{:}{=}\; D\varPhi (\gamma )\) etc. we generate a triple \((\mathcal {X}_{\gamma }{\varPhi },\mathcal {H}_{\gamma }{\varPhi }, \mathcal {Y}_{\gamma }{\varPhi })\) of continuous, linear operators that makes the following diagram commutative:

figure b

By Theorem 5.2, the mapping \(\varPhi \) is a submersion. Thus the implicit function theorem implies that the set \({\mathcal {M}}\) is a Banach submanifold of \(\mathcal {C}\). For \(\gamma \in {\mathcal {M}}\) and \(\mathcal {Z} \in \{\mathcal {X},\mathcal {H},\mathcal {Y}\}\), define \( \mathcal {Z}_{\gamma }{{\mathcal {M}}} \;{:}{=}\; \ker ( \mathcal {Z}_{\gamma }{\varPhi }) \).The set \(\mathcal {Z}{{\mathcal {M}}} \;{:}{=}\; \coprod _{\gamma \in {\mathcal {M}}} \{\gamma \} \times \mathcal {Z}_{\gamma }{{\mathcal {M}}}\) together with the footpoint map \(\pi _{\mathcal {Z}{\mathcal {M}}} :\mathcal {Z}{{\mathcal {M}}} \rightarrow {\mathcal {M}}\) constitutes a smooth Banach vector bundle over \({\mathcal {M}}\) and we have . The Banach spaces \(\mathcal {H}_{\gamma }{{\mathcal {M}}}\) and \(\mathcal {Y}_{\gamma }{{\mathcal {M}}}\) are the completions of \(T_\gamma {\mathcal {M}}= \ker ( D \varPhi (\gamma ))\) with respect to the topologies of \({\mathcal {H}}\) and \({\mathcal {Y}}\), respectively. Via Galerkin subspace projection, we may define linear operators \(I_{{\mathcal {M}}}|_\gamma :\mathcal {H}_{\gamma }{{\mathcal {M}}} \rightarrow \mathcal {H}{'{^{\!}}}_{\gamma }{{\mathcal {M}}}\) and \(\mathcal {J}_{{\mathcal {M}}}|_\gamma :\mathcal {X}_{\gamma }{{\mathcal {M}}} \rightarrow \mathcal {Y}{'{^{\!}}}_{\gamma }{{\mathcal {M}}}\) by

$$\begin{aligned} \langle \mathcal {I}_{{\mathcal {M}}}|_\gamma \, v_1, v_2 \rangle \;{:}{=}\; \langle \mathcal {I}_{\mathcal {C}}|_\gamma \, v_1, v_2 \rangle \quad \text {and} \quad \langle \mathcal {J}_{{\mathcal {M}}}|_\gamma \, u, w \rangle \;{:}{=}\; \langle \mathcal {J}_{\mathcal {C}}|_\gamma \, u, w \rangle \end{aligned}$$

for \(v_1\), \(v_2 \in \mathcal {H}_{\gamma }{{\mathcal {M}}}\), \(u \in \mathcal {X}_{\gamma }{{\mathcal {M}}}\), and \(w \in \mathcal {Y}_{\gamma }{{\mathcal {M}}}\). The mappings \(i_{c}\) and \(j_{c}\) induce continuous injections \({i}_{{\mathcal {M}}}|_\gamma :\mathcal {X}_{\gamma }{{\mathcal {M}}} \hookrightarrow \mathcal {H}_{\gamma }{{\mathcal {M}}}\) and \({ j}_{{\mathcal {M}}}|_\gamma :\mathcal {H}_{\gamma }{{\mathcal {M}}} \hookrightarrow \mathcal {Y}_{\gamma }{{\mathcal {M}}}\). By (27), we have \(j'{^{\!}}_{{\mathcal {M}}} \, \mathcal {J}_{{\mathcal {M}}} = \mathcal {I}_{{\mathcal {M}}} \, i_{{\mathcal {M}}}\).

We define the intrinsic gradient \({{\,\mathrm{grad}\,}}_{\mathcal {M}}({\mathcal {E}}|_{\mathcal {M}})|_{\gamma } \in {\mathcal {X}}_{\gamma }{\mathcal {M}}\) by

$$\begin{aligned} \langle \mathcal {J}_{{\mathcal {M}}}|_\gamma {{\,\mathrm{grad}\,}}_{\mathcal {M}}({\mathcal {E}}|_{\mathcal {M}})|_\gamma , w \rangle = D({\mathcal {E}}|_{\mathcal {M}})(\gamma ) \,w \quad \hbox { for all}\ w \in \mathcal {Y}_{\gamma }{{\mathcal {M}}} \end{aligned}$$

or simply by \({{\,\mathrm{grad}\,}}_{\mathcal {M}}({\mathcal {E}}|_{\mathcal {M}})|_\gamma \;{:}{=}\; (\mathcal {J}_{{\mathcal {M}}}|_\gamma )^{-1} D({\mathcal {E}}|_{\mathcal {M}})(\gamma )\). Its well-definedness is established by the following theorem which states that \({\mathcal {M}}\) has a “nearly Riemannian structure”. Note that \({\mathcal {M}}\) cannot support a Riemannian structure because the tangent space \(T_\gamma {\mathcal {M}}\) is not Hilbertable in the sense that there is no positive-definite bilinear form whose norm topologizes \(T_\gamma {\mathcal {M}}\).

Theorem 5.1

The operators \(\mathcal {I}_{{\mathcal {M}}}|_\gamma \) and\(\mathcal {J}_{{\mathcal {M}}}|_\gamma \) define a family of continuous and continuously invertible operators.

Proof

Denote by \(F:{\mathcal {M}}\hookrightarrow \mathcal {C}\) and by

the canonical injections which give rise to dual maps

Observe that and .

The Galerkin projection of a Hilbert space’s Riesz isomorphism onto a closed subspace equals the Riesz isomorphism of the restricted scalar product. So the invertibility of is straight-forward. The nontrivial part here is to show that is continuously invertible. By the open mapping theorem, it suffices to show that is both injective and surjective. Injectivity can be deduced from the injectivity of , , and as follows: Let and put . Now the following shows that \(u=0\):

In order to establish surjectivity of , we fix an arbitrary \(\eta \in \mathcal {Y}'_{\gamma }{{\mathcal {M}}} \). By the Hahn-Banach theorem, the mapping is surjective. Thus there is an \(\eta _0 \in {\mathcal {Y}}'{^{\!}}\) with . By Theorem 5.4 below, the saddle point problem

(36)

has a unique solution with \(u_0 \in {\mathcal {X}}\) and \(\lambda _0 \in {\mathcal {Y}}'{^{\!}}\mathcal {N}\). In particular, we have , hence we may write with . Thus, we have

where we used the fact that in the last step which follows by \(\varPhi \circ F=0\) due to \({\mathcal {M}}=\varPhi ^{-1}(0)\). \(\square \)

This leads us immediately to the proof of our main result Theorem 1.2:

Proof

The proof of Theorem 5.1 shows that we the gradient \(u_0 \;{:}{=}\; {{\,\mathrm{grad}\,}}_{\mathcal {M}}({\mathcal {E}}|_{\mathcal {M}})|_\gamma \) can be computed by solving (36) with \(\eta _0 = D{\mathcal {E}}(\gamma )\). Theorem 5.6 below shows that it is also the projected gradient, i.e., \({{\,\mathrm{grad}\,}}_{\mathcal {M}}({\mathcal {E}}|_{\mathcal {M}})|_\gamma \) coincides with the \(\mathcal {I}_{{\mathcal {M}}}|_\gamma \)-orthogonal projection of \({{\,\mathrm{grad}\,}}({\mathcal {E}})|_\gamma \) onto \(T_\gamma {\mathcal {M}}= \ker (D \varPhi (\gamma ))\). Because \(D{\mathcal {E}}(\gamma )\) and the saddle point matrix from (36) depend locally Lipschitz continuously on \(\gamma \), the gradient \({{\,\mathrm{grad}\,}}_{\mathcal {M}}({\mathcal {E}}|_{\mathcal {M}})\) is a locally Lipschitz continuous vector field on \({\mathcal {M}}\). \(\square \)

5.1 Details

Now, statements and proofs of the auxiliary results are in order.

Lemma 5.2

(Right inverse) The triple induced by the derivative \(D\varPhi (\gamma )\) allows a triple of continuous right inverses such that the following diagram commutes:

figure c

Moreover depend smoothly on \(\gamma \) and in particular, they are locally Lipschitz continuous.

Proof

Denote by \({{\,\mathrm{pr}\,}}_\gamma ^\perp (x) \;{:}{=}\; {{\,\mathrm{id}\,}}_{{{\mathbb {R}}^{m}}} - \tau _\gamma (x) \otimes \langle \tau _\gamma (x),\cdot \rangle \) the orthogonal projector onto the orthogonal complement of \(\tau _\gamma (x)\). Fix a and a \((\xi ,U) \in {\mathcal {Z}}\mathcal {N}\). Denote the length of \(\gamma \) by . Fix a given point \(y_0 \in {\mathbb {T}}\) and define

with a vector \({\tilde{U}} \in {{\mathbb {R}}^{m}}\) to be determined later. This way, the components of \(D\varPhi (\gamma ) \, u\) stated in (34) amount to

$$\begin{aligned} \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma u \rangle = \langle \tau _\gamma , {\mathcal {D}}_\gamma u \rangle = \xi , \quad \text {and} \quad \textstyle \int _{\mathbb {T}}\big ( u +\gamma \, \langle {\mathcal {D}}_\gamma \gamma , {\mathcal {D}}_\gamma u \rangle \big )\, \omega _{\gamma } = U . \end{aligned}$$
(38)

Using the product rule Theorem A.4 (for which \(p\ge 2\) is crucial here), we see that u is a member of \({\mathcal {Z}}\), provided that we can find a \({\tilde{U}} \in {{\mathbb {R}}^{m}}\) such that u becomes continuous at \(y = y_0\). For this, it is necessary and sufficient that Define the vector \(b_\gamma (\xi ) \in {{\mathbb {R}}^{m}}\) and the symmetric matrix \(\varTheta _\gamma \in {{\,\mathrm{End}\,}}({{\mathbb {R}}^{m}})\) by and Assume \(\varTheta _\gamma \) is not invertible. Then there is a unit vector \(V \in {{\mathbb {R}}^{m}}\) in its kernel and we have that \( 0 = \langle V,\varTheta _\gamma \, V \rangle = \textstyle \int _{\mathbb {T}}\big ( |{V}|^2 - \langle \tau _\gamma ,V \rangle ^2\big ) \, \omega _{\gamma } \). But that means that \(\tau _\gamma (x) = \pm V\) has to hold for almost every \(x\in {\mathbb {T}}\). Since \(\tau _\gamma \) is continuous, this implies that \(\gamma \) is a straight line, which is impossible due to \(\gamma \) being closed. This contradicts our assumption and thus \(\varTheta _\gamma \) must be invertible. So we may choose \({\tilde{U}} \;{:}{=}\; - \varTheta _\gamma ^{-1} \, b_\gamma (\xi )\) and put By (38), is indeed a right inverse of . Finally, it is only a matter of some elementary calculus to show that and depend smoothly on \(\gamma \). \(\square \)

In analogy to Theorem 4.1, we may equip the target space with the following Riesz isomorphisms. This will help us to generalize the concept of adjoint operators between Hilbert spaces.

Proposition 5.3

Analogously to Theorem 4.1, we define the \(\gamma \)-dependent, linear operators and as follows:

for \((\eta _1,V_1)\), \((\eta _2,V_2) \in {\mathcal {H}}\mathcal {N}\), \((\xi ,U) \in {\mathcal {X}}\mathcal {N}\), and \((\psi ,W) \in {\mathcal {Y}}\mathcal {N}\). These operators are well-defined, continuous, and continuously invertible, and they satisfy . Moreover, and are of class .

The proof is entirely along the lines of the proof of Theorem 4.1.

Lemma 5.4

(Saddle point matrix) For each \(\gamma \in \mathcal {C}\), the saddle point matrix

is continuously invertible.

Proof

Let B be as in Theorem 5.2 above. As is invertible, the saddle point matrix \(\mathcal {A}|_\gamma \) is invertible if and only if its Schur complement is invertible. In analogy to the adjoint operators and , we introduce the generalized adjoint operators

Observe that we may express the Schur complement as , hence it suffices to show that is continuously invertible. Since is surjective, is invertible. Utilizing the identities and as well as the diagram (35), one verifies that . This shows that is injective. By Theorem 5.5 below; the operator is invertible. This allows us to define the projector . With , , and , we can verify that is surjective:

Finally, the open mapping theorem implies that is continuously invertible. \(\square \)

Lemma 5.5

(Invertibility of \(B^*B\)) For each \(\gamma \in \mathcal {C}\), the linear operator is continuously invertible.

Proof

We have with . Thus it suffices to show that T is invertible. Let \({\bar{\xi }} \;{:}{=}\; (\xi ,U) \in {\mathcal {X}}\mathcal {N}\) and \({\bar{\psi }} \;{:}{=}\; (\psi ,W) \in {\mathcal {Y}}\mathcal {N}\). Put and . By construction, we have

With the notation from the proof of Theorem 5.2, we put \({\tilde{U}} \;{:}{=}\; - \varTheta _\gamma ^{-1} \, b_\gamma (\xi )\) and \({\tilde{W}} \;{:}{=}\; - \varTheta _\gamma ^{-1} \, b_\gamma (\psi )\). Now we observe that

$$\begin{aligned} \delta ^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_\gamma u&= (\tau _\gamma \circ \pi _1)\; (\delta ^{{{\sigma }+{\nu }}}_{\gamma } \xi ) + (\delta ^{{{\sigma }+{\nu }}}_{\gamma } \tau _\gamma ) \; (\xi \circ \pi _2) + \delta ^{{{\sigma }+{\nu }}}_{\gamma } {{\,\mathrm{pr}\,}}_\gamma ^\perp \; {\tilde{U}} \quad \text {and} \quad \\ \delta ^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_\gamma w&= (\tau _\gamma \circ \pi _1)\; (\delta ^{{{\sigma }-{\nu }}}_{\gamma } \psi ) + (\delta ^{{{\sigma }-{\nu }}}_{\gamma } \tau _\gamma ) \; (\psi \circ \pi _2) + \delta ^{{{\sigma }-{\nu }}}_{\gamma } {{\,\mathrm{pr}\,}}_\gamma ^\perp \; {\tilde{W}}. \end{aligned}$$

Writing only the terms of highest order in \(\xi \) and \(\psi \), we obtain

$$\begin{aligned} \textstyle \int _{{\mathbb {T}}^2}\langle \delta ^{{{\sigma }+{\nu }}}_{\gamma } {\mathcal {D}}_{\gamma } u , \delta ^{{{\sigma }-{\nu }}}_{\gamma } {\mathcal {D}}_{\gamma } w \rangle \, \mu _\gamma = \int _{{\mathbb {T}}^2}(\delta ^{{{\sigma }+{\nu }}}_{\gamma } \, \xi ) \, (\delta ^{{{\sigma }-{\nu }}}_{\gamma } \,\psi ) \, \mu _\gamma + {\text {l.o.t.}}\end{aligned}$$

The latter pairing is identical to up to the term \(\int _{{\mathbb {T}}} \xi \, \psi \, \omega _\gamma + \langle U,W \rangle \), which is a combination of lower order and finite rank, thus represents a compact operator \({\mathcal {X}}\mathcal {N}\rightarrow {\mathcal {Y}}'{^{\!}}\mathcal {N}\). This means that T is a compact perturbation of and thus a Fredholm operator of index 0. Hence it suffices to show that T is injective. Let \({\bar{\xi }} \in \ker (T)\) and put . A diagram chase in (37) and (27) yields

Since is injective and since is a scalar product on \({\mathcal {H}}\), this implies . The injectivity of and we see that T is injective. So as an injective Fredholm operator with index zero, T must also be surjective, hence continuously invertible by the open mapping theorem. \(\square \)

The invertibility of the saddle point matrix leads to the following generalizations of (i) the Moore-Penrose pseudoinverse of a surjective operator between Hilbert spaces and (ii) the orthoprojector onto the orthogonal complement of the operator’s null space. Being able to reduce the action of these operators to solving a linear saddle point system will be crucial for applications (see Sect. 6).

Corollary 5.6

The Moore-Penrose pseudoinverse of and the orthoprojector with kernel can be completed to a continuous right inverse of and a continuous projector . For \(\xi \in {\mathcal {X}}\mathcal {N}\) and \({\tilde{u}} \in {\mathcal {X}}\), the operators can be evaluated by solving the following saddle point systems

where \(\lambda \), \(\mu \in {\mathcal {Y}}'{^{\!}}\mathcal {N}\) act as Lagrange multipliers.

6 Computational treatment

For the ease of use, we discretize curves by polygonal lines and approximate the Möbius energy and the Riesz isomorphisms from Theorem 4.1 by simple quadrature rules. In the language of finite element analysis, we employ a nonconforming Ritz–Galerkin scheme because the discrete ansatz space is not a subset of the smooth configuration space. We try to outline a discrete setting that can be applied also to more general self-avoiding energies; therefore, we do not care about Möbius-invariance of the energy, although Möbius-invariant discretizations have already been proposed (see e.g., [49] and [15, 16]).

6.1 Spatial discretization

Let \({\mathcal {T}}\) denote a partition of \({\mathbb {T}}\) with vertex set \(V({\mathcal {T}}) \subset {\mathbb {T}}\) and edge set \(E({\mathcal {T}}) \subset V({\mathcal {T}}) \times V({\mathcal {T}})\). Denote the number of edges by N. If the partition is sufficiently fine, i.e., \(h({\mathcal {T}}) \;{:}{=}\; \max _{I\in E({\mathcal {T}})} |{I}|\) is sufficiently small, then we may identify each edge with the closed, oriented interval connecting its end vertices. For an edge \(I\in E({\mathcal {T}})\), we denote by \(I^\downarrow \in V({\mathcal {T}})\) and \(I^\uparrow \in V({\mathcal {T}})\) its backward and forward boundary vertex, respectively.

Let \(P:V({\mathcal {T}}) \rightarrow {{\mathbb {R}}^{m}}\) be an embedded polygon in \({{\mathbb {R}}^{m}}\), i.e., there is a piecewise linear embedding \(\gamma :{\mathbb {T}}\rightarrow {{\mathbb {R}}^{m}}\) such that \(\gamma |_{V({\mathcal {T}})} = P\) and such that \(\gamma \) maps \(I\) affinely onto the line segment connecting \(P(I^\downarrow )\) to \(P(I^\uparrow )\). We denote by \(\mathcal {C}_{{\mathcal {T}}}\) the set of such embedded polygons which is an open set in the space of all closed polygons with N edges. Since the latter is finite dimensional and isomorphic to \(({{\mathbb {R}}^{m}})^N\), we have \({\mathcal {X}}_{{\mathcal {T}}}= {\mathcal {H}}_{{\mathcal {T}}}= {\mathcal {Y}}_{{\mathcal {T}}}\cong ({{\mathbb {R}}^{m}})^N\). Likewise, we discretize the target spaces by \({\mathcal {X}}\mathcal {N}_{{\mathcal {T}}}= {\mathcal {H}}\mathcal {N}_{{\mathcal {T}}}= {\mathcal {Y}}\mathcal {N}_{{\mathcal {T}}}= \{\lambda :E({\mathcal {T}}) \rightarrow {\mathbb {R}}\} \times {{\mathbb {R}}^{m}}\cong {\mathbb {R}}^N \times {{\mathbb {R}}^{m}}\). By \(\ell _P(I) \;{:}{=}\; |{P(I^\downarrow ) - P(I^\uparrow )}|\), we denote the edge length of edge I.

6.1.1 Discrete energy

There are several possibilities to discretize the Möbius energy \({\mathcal {E}}\). A very general approach employs simple quadrature rules and works for reparametrization-invariant energies \({\mathcal {F}}\) of the form \( \textstyle {\mathcal {F}}(\gamma ) = \int _{{\mathbb {T}}^2}F(\gamma ) \, \varOmega _\gamma \) with some energy density \(F(\gamma ) :{{\mathbb {T}}^2}\rightarrow {\mathbb {R}}\). If, for a sufficiently smooth curve \(\gamma \), the integrand \(F(\gamma )\) is not too singular around the diagonal of the integration domain \({{\mathbb {T}}^2}\), we have

$$\begin{aligned} \textstyle {\mathcal {F}}(\gamma ) \approx \sum _{{\bar{I}} \cap {\bar{J}} = \emptyset } \int _{I} \! \int _{J} F(\gamma ) \, \varOmega _\gamma . \end{aligned}$$
(39)

Typically, the right hand side makes sense also if \(\gamma \) is a polygonal line. Indeed, cutting out the diagonal is somewhat necessary: An elegant scaling argument in [89, Figure 2.2] shows that the Möbius energy of a polygonal line with at least one nontrivial turning angle is infinite.

We may exploit parametrization invariance and pull back \(F(\gamma )\) along the the local parameterization \(\gamma _I:\left[ 0,1\right] \rightarrow {{\mathbb {R}}^{m}}\), \(\gamma _I(s) \;{:}{=}\; P(I^\downarrow ) \, (1 - s) + s\, P(I^\uparrow )\) and \(\gamma _J:\left[ 0,1\right] \rightarrow {{\mathbb {R}}^{m}}\), \(\gamma _J(t) \;{:}{=}\; P(J^\downarrow ) \, (1 - t) + t \, P(J^\uparrow )\) to the unit square. Denoting the pullback by \(F_{IJ}(P) :\left[ 0,1\right] ^2 \rightarrow {\mathbb {R}}\), we have

$$\begin{aligned} \textstyle \int _{I} \! \int _{J} F(\gamma ) \, \varOmega _\gamma = \ell _P(I)\, \ell _P(J) \int _0^1 \! \int _0^1 F_{IJ}(P)(s,t) \, {{\text {d}}}s \, {{\text {d}}}t. \end{aligned}$$

So with a k-point quadrature rule \(t_1, \dotsc , t_k \in \left[ 0,1\right] \), \(\omega _1,\dotsc ,\omega _k \in {\mathbb {R}}\), we may discretize \({\mathcal {F}}\) by \( {\mathcal {F}}_{{\mathcal {T}}}(P) \;{:}{=}\; \textstyle \sum _{{\bar{I}} \cap {\bar{J}} = \emptyset } W_{IJ}(P) \) with the local contributions

$$\begin{aligned} W_{IJ}(P) \;{:}{=}\; \textstyle \ell _P(I)\, \ell _P(J) \sum _{i=1}^k \sum _{j=1}^k F_{IJ}(P)(t_i,t_j)\, \omega _i \, \omega _j. \end{aligned}$$
(40)

Applying this with \(k =1\) to \(F=E\) from (12), one is naturally lead to the vertex energy (\(t_1 = 0\), \(\omega _1 = 1\)) and to the edge energy (\(t_1 = 1/2\), \(\omega _1 = 1\)) as proposed by Kusner and Sullivan in [48]. Scholtes proved in [73] that the vertex energy for equilateral polygons \(\varGamma \)-converges towards \({\mathcal {E}}\) under refinement of partitions, i.e., for \(h({\mathcal {T}}) \rightarrow 0\), with respect to the \(W^{k,q}\)-topology, \(k\in \{0,1\}\), \(q\in [1,\infty ]\). Roughly speaking, \(\varGamma \)-convergence implies that cluster points of minimizers of the discrete energies are minimizers of \({\mathcal {E}}\). This result justifies the quite harsh variational crimes that one commits by choosing polygonal lines as discrete configurations. Although it is restricted to equilateral polygons (which was one of the reasons for us to include the edge length constraint), we deem it likely that it can be extended to non-equilateral polygons with a uniform bound on as \(h\rightarrow 0\). At least our experiments indicate that the precise distributions of edge lengths does not matter.

We require also the derivative of the discrete energy. In a Similar as to Sect. 3, the explicit dependence of E on the geodesic distance \(\varrho _\gamma \) causes problems: Without taking further measures, this would lead to the very high complexity of \(\varOmega (N^3)\) to assemble the derivative \(D{\mathcal {E}}_{{{\mathcal {T}}}}(P)\) for the vertex energy and edge energy.Footnote 5 This can be circumvented by utilizing the identity \({\mathcal {E}}(\gamma ) = \textstyle 4 \!+\! \int _{{\mathbb {T}}^2}F(\gamma ) \, \varOmega _\gamma \) with the integrand

$$\begin{aligned} F(\gamma ) \;{:}{=}\; \frac{|{\triangle \tau _\gamma }|^2}{2 \, |{\triangle \gamma }|^2} + 2 \frac{ \langle \tau _\gamma \circ \pi _1,\tau _\gamma \circ \pi _2 \rangle }{ |{\triangle \gamma }|^2 } - 2 \frac{ \langle \triangle \gamma ,\tau _\gamma \circ \pi _1 \rangle \,\langle \triangle \gamma ,\tau _\gamma \circ \pi _2 \rangle }{ |{\triangle \gamma }|^4 } , \end{aligned}$$

which was derived by Ishizeki and Nagasawa in [41]. For the sake of efficiency, we discretize with the midpoint rule, i.e., with \(k =1\), \(t_1 = 1/2\), and \(\omega _1 = 1\). For this F, the local contributions \(W_{IJ}(P)\) depend only on the coordinates of the four points \(P(I^\downarrow )\), \(P(I^\uparrow )\), \(P(J^\downarrow )\), and \(P(J^\uparrow )\). So the expression of the first and second derivative of \(W_{IJ}\) with respect to these four points can once be computed symbolically and compiled into runtime-efficient libraries. The first and second derivative of \({\mathcal {F}}_{{\mathcal {T}}}\) can then be assembled from \(DW_{IJ}(P)\) and \(D^2W_{IJ}(P)\) as a vector and a matrix of size \(m\, N\) and \((m\, N) \times (m\, N)\), respectively. Due to the nonlocal nature of the energy, the matrix \(D^2{\mathcal {F}}(P)\) is dense.

6.1.2 Discrete inner product

Next we discretize the inner product from Theorem 4.1. Let \(U :V({\mathcal {T}}) \rightarrow {{\mathbb {R}}^{m}}\) and denote by \(u :{\mathbb {T}}\rightarrow {{\mathbb {R}}^{m}}\) piecewise linear interpolation. For the computation of the local contribution of the edge pair \((I,J)\) to the Gram matrix, we put

$$\begin{aligned} u_I(s) \;{:}{=}\; U(I^\downarrow ) \, (1 - s) + s \, U(I^\uparrow ), \quad \text {and} \quad u_J(t) \;{:}{=}\; U(J^\downarrow ) \, (1 - t) + t \, U(J^\uparrow ). \end{aligned}$$

The first two terms of can now be discretized as follows:

$$\begin{aligned}&\textstyle \sum \limits _{\begin{array}{c} I\cap J= \emptyset \end{array}} \ell _P(I)\,\ell _P(J) \big |{ \frac{u_I(I^\uparrow )-u_I(I^\downarrow )}{\ell _P(I)} - \frac{u_J(J^\uparrow )-u_J(J^\downarrow )}{\ell _P(J)} }\big |^2 \sum \limits _{i=1}^k \sum \limits _{j=1}^k \frac{ \omega _{i} \, \omega _{j} }{ |{\gamma _I(t_i) - \gamma _J(t_j)}|^2 } \;\;\text {and} \\&\quad \textstyle \sum \limits _{\begin{array}{c} I\cap J= \emptyset \end{array}} \ell _P(I)\,\ell _P(J) \sum \limits _{i=1}^k \sum \limits _{j=1}^k \frac{ \left|u_I(t_i) - u_I(t_j)\right|^2 }{ |{\gamma _I(t_i) - \gamma _J(t_j)}|^2 }E_{IJ}(P)(t_i,t_j) \, \omega _{i} \, \omega _{j} , \end{aligned}$$

where we employ the same quadrature rule as for the discrete Möbius energy. In the presence of a barycenter constraint, we may simply omit the term without loosing definiteness of the inner product on \(\ker (D \varPhi _{{\mathcal {T}}}(P))\). By virtue of the polarization formula, this defines the Gram matrix uniquely, leading to discrete bilinear forms . The local matrices are of size \((4 \,m) \times (4 \,m)\) (\(m\) coordinates for each of the four vertices belonging to the edge pair \((I,J)\)). They can be computed in parallel and added into the global matrix afterwards. The resulting global Gram matrix is a dense matrix of size \((m\, N) \times (m\, N)\).Footnote 6

6.1.3 Discrete constraints

As for the constraints, we discretize \(\varPhi \) by

$$\begin{aligned} \varPhi _{{\mathcal {T}}}(P) \;{:}{=}\; \Big ( \; \big ( \log ( \ell _P(I))- \log (\ell _0(I)) \big )_{I\in E({\mathcal {T}})} \;,\;\textstyle \sum _{I\in E({\mathcal {T}})} \, \frac{\ell _P(I)}{2} \, ( P(I^\uparrow ) + P(I^\downarrow )) \; \Big ), \end{aligned}$$

where \(\ell _0 :E({\mathcal {T}}) \rightarrow \left]0,\infty \right[\) is a prescribed distribution of desired edge lengths, for example \(\ell _0(I) = L \, |{I}|\). Although restoring feasibility for the edge length constraint comes at a certain cost, it prevents edges from collapsing to points and from being overstretched in the course of optimization. The latter is crucial since the discrete energy is not exactly self-avoiding; it becomes singular only if quadrature points approach each other. So overstretched edges make it more likely that the curve tries to form a self-intersection.

Some care should be given to the choice of the target edge lengths \(\ell _0\). A coarse mesh may not be sufficient to preclude self-intersections, a very fine mesh is expensive as the computational effort grows quadratically in the number of nodes. As a rule of thumb, the distance between two neighboring vertices of a polygon should be strictly smaller than the distance between any other pairs of vertices. In principle, it is also possible to drop the edge length constraints; instead one could introduce a global length constraint and one could handle short and long edges by adaptive edge split and edge collapse strategies. We refrained from opting for this route here for the sake of simplicity.

6.2 Projected gradient

Once we have assembled the vector , and the matrices and , the projected gradient \(u \;{:}{=}\; {{\,\mathrm{grad}\,}}_{{\mathcal {M}}_{{\mathcal {T}}}}({\mathcal {E}}_{{\mathcal {T}}}|_{{\mathcal {M}}_{{\mathcal {T}}}})|_P\) can be obtained by solving the following discrete analogue of the linear saddle point system (36):

(41)

We assemble the saddle point matrix as a dense, symmetric matrix with \((N \,m+ N + m)\) rows, and solve it via a dense LU-factorization. Hence it costs roughly \({{\,\mathrm{O}\,}}(N^2 m^2)\) for the assembly and a further \({{\,\mathrm{O}\,}}(N^3 m^3)\) for the factorization. It is not surprising that this is the most expensive part in the overall optimization process. We would like to point out that this can be sped up considerably by more sophisticated methods: The assembly of the saddle point matrix can be avoided by assembling \(D\varPhi _{{\mathcal {T}}}(P)\) as a sparse matrix and by compressing in a hierarchical matrix data structure that is efficient for fast matrix-vector multiplication. Similar techniques can be employed to approximate \({\mathcal {E}}_{\mathcal {T}}(P)\) and \(D{\mathcal {E}}_{\mathcal {T}}(P)\) in subquadratic time, but all this is beyond the scope of the present work.

6.3 Restoring feasibility and time step size rules

Suppose that \(\varPhi _{{\mathcal {T}}}(P) = 0\) and that u is a feasible search direction, i.e., \(D\varPhi _{{\mathcal {T}}}(P) u = 0\). The constraint mapping \(\varPhi _{{\mathcal {T}}}\) is Lipschitz continuously differentiable. Hence provided that the step size \(\tau > 0\) is sufficiently small, the modified Newton method

$$\begin{aligned} Q_0 = P+ \tau \, u, \qquad Q_{i+1} = Q_i - D\varPhi _{{\mathcal {T}}}(P)^{\dagger \!}\; \varPhi _{{\mathcal {T}}}(Q_i) \quad \hbox { for}\ i \in \mathbb {N} \end{aligned}$$
(42)

converges quickly to a point \(Q_\infty \) that satisfies \(\varPhi _{{\mathcal {T}}}(Q_\infty ) = 0\). Here \(D\varPhi _{{\mathcal {T}}}(P)^{\dagger \!}\) denotes the Moore-Penrose pseudoinverse with respect to the inner product \(G_P\) and we utilize Theorem 5.6 to evaluate it.Footnote 7 For a given descending direction u, we may apply backtracking line search to find a suitable step size \(\tau > 0\): If the residual \(\varPhi _{{\mathcal {T}}}(Q_i)\) is smaller than a prescribed tolerance after a small, prescribed number of iterations, then the point \(Q_i\) may serve as the next iterate of the optimization method. Otherwise we shrink \(\tau \) and restart the modified Newton method. By shrinking \(\tau \) even further, if necessary, we can also achieve that \(Q_i\) satisfies the Armijo condition \({\mathcal {E}}_{{\mathcal {T}}}(Q_i) \le {\mathcal {E}}_{{\mathcal {T}}}(P) + (\tau /2) \, D{\mathcal {E}}_{{\mathcal {T}}}(P) \, u\). An initial guess for \(\tau \) can be obtained, e.g., by collision detection (see, e.g., [69]): One determines the smallest step size \(\tau _*\) such that \(P+ \tau _* \, u\) has a self-intersection and starts the backtracking procedure with, e.g., \(\tau = \tfrac{2}{3} \tau _*\). By utilizing suitable space partitioning data structures, this collision detection can be performed in subquadratic time. However, we simply cycled over all \(O(N^2)\) edge pairs because its runtime is proportional to the runtime of \(D{\mathcal {E}}_{{\mathcal {T}}}(P)\).

6.4 Optimization methods employed in Fig. 3

Feasible methods Projected \(L^2\)-, \(W^{1,2}\)-, \(W^{3/2,2}\)-, and \(W^{2,2}\)-flows were simulated both with explicit and implicit time integration schemes. We followed the approach above, only replacing by the Riesz operator corresponding to the particular choice of metric. Armijo backtracking line search automatically determines a stable step size. For the implicit integration of the \(L^2\)-gradient flow, we employ the backward Euler method. Since it is not unconditionally stable, Armijo backtracking has to be employed also here. Because backtracking requires the implicit equations to be solved again, this is particularly expensive.

The employed trust region method is a blend of the method from [20] with the two-dimensional subspace method from [77] (without computing the lowest eigenvalues): The next iterate is found by minimizing a quadratic model in a trust region within a low-dimensional subspace spanned by the current projected gradient, the projection of the previous gradient onto the current tangent space, and the Newton search direction – provided the current projected gradient is shorter than a given threshold. This means that the optimization is mostly driven by gradient and momentum; and the Hessian is utilized only in the end phase of optimization. Shrinkage and expansion of the trust region is handled as usual, but the radius is of course to be interpreted with respect to the employed inner product.

Infeasible methods In order to compare also to unconstrained optimization methods, we applied them to an analogous discretization of the penalized energy

$$\begin{aligned} \textstyle {\mathcal {E}}_\alpha (\gamma ) \;{:}{=}\; {\mathcal {E}}(\gamma ) + \alpha \, \Vert {\varPhi (\gamma )}\Vert _{L^2}^2 = {\mathcal {E}}(\gamma ) + \alpha \int _{\mathbb {T}}\, \log ( |{\gamma '(t)}|/L )^2 \, L \, {{\text {d}}}t, \end{aligned}$$

whose penalty can be interpreted as Hencky’s stretch energy. The optimization methods were made aware of this penalty by using the metric to compute gradients, where denotes the Riesz operator of .Footnote 8 As nonlinear conjugate gradient method, we employed the Polak-Ribière method “with automatic reset” (method \(\mathrm {PR}_+\) in [59, Section 5.2]). L-BFGS was implemented with history length 30 and as described in [59, Section 7.2]. The only difference is that we replace the initial guess for the inverse Hessian by the inverse of the current metric (because using a single initial guess turned out to be less efficient).Footnote 9 As for Nesterov’s accelerated gradient method (acc. grad.), we followed [57], but added collision detection to truncate the step sizes (in both steps of the method). Moreover, as suggested in [60], we reset the momentum to 0 whenever an increase of the objective was observed.Footnote 10 All these methods were complemented with a line search that tries to find a weak Wolfe-Powell step size.