1 Introduction

Analysis of time-dependent data has become increasingly important for a wide range of applications such as understanding the onset and progression of diseases, physical performance assessment from biomechanical gait data, or facial expression analysis in video sequences. All these examples can be understood as longitudinal data, where individual instances of a common underlying process are observed at multiple time points. While common statistical tools like mean-variance analysis and regression allow to study phenomena across individuals or within a single one, longitudinal data exhibit correlations due to repeated measurements that violate the independence assumptions of such cross-sectional methods. Another issue that warrants attention is missing data that arise, e.g. due to acquisition errors or when subjects drop out of a clinical study. Proper statistical inference for longitudinal data must therefore account for the within-individual correlation of observations as well as for the sparse and non-uniform sampling.

In this light, hierarchical models and in particular mixed-effects models pose an adequate and very flexible framework for longitudinal data analysis [1, 2]. Such approaches deal with the mass of the inherent interrelations by specifying a model in which each subject is assumed to have an own unique functional relation between the dependent variable and time-related predictor(s). Thus, a parametric spatio-temporal model that optimally fits the data for each given individual is estimated. Due to random error variation in the dependent variable at each time point for each individual, the fit is generally not perfect. The coefficients describing these subject-specific models (e.g. intercept and slope in the case of straight lines) are in turn assumed to vary randomly in the population. Corresponding to these random effects, there is a ‘fixed’ effect that is often of primary interest, i.e. a single group ‘fixed’ coefficient that indexes the average spatio-temporal model for the entire group.

While there is abundant literature on the mixed-effects framework for scalar and vector-valued measurements, generalizations to other types of data where the domains are structured (such as shapes or graphs) are still at an early stage of research. However, there is a substantial body of work that demonstrates the advantages of leveraging the structure (or geometry) of the data such as improving the assessment of subject-specific clinical outcomes [3], computer-aided diagnosis [4,5,6], or comparing populations [7,8,9] to name but a few instances. This provides a strong impetus for the development of generalized hierarchical models that benefit from a compact encoding of constraints and exhibit a superior consistency as compared to their Euclidean counterparts.

A common workaround when facing manifold-valued data is to resort to linear statistical tools by employing a vector space representation—either given explicitly by the discrete representation or derived via a dedicated operation. For example, in the field of medical image analysis linear mixed-effects models have been applied to vertex coordinates of meshes in order to study shape data [10, 11]. In general, however, the quality of the obtained model depends on the validity of the linearity assumption, which is a poor choice for data with a large spread or within regions of high curvature in shape space and, thus, is considered a limiting factor for the ability to represent natural biological variability in populations (see, e.g. [12] and the references therein).

To derive coordinate-free, manifold-based formulations, Riemannian geometry provides a suitable generalization of straight lines called geodesics that serve as a building block for inference models. This approach has been explored in multiple works [7, 8, 13, 14] leading to geodesic hierarchical models that encode both subject-specific as well as group trends in terms of geodesics. There are further extensions that employ a multi-geodesic approach to model developments as function of multiple, possibly categorical covariates [15], as well as nonparametric formulations [16] describing group average trends via weighted Fréchet means. In lieu of intrinsic noise distributions that feature tractable computation of the likelihood, model estimation in these approaches is formulated in terms of least-squares criteria instead of maximum likelihood or empirical Bayes estimates. Nonetheless, under certain conditions the least-squares solutions can be shown to agree with maximum likelihood estimates [14].

More recently, probabilistic formulations of Riemannian mixed-effects models [17,18,19,20,21] have been proposed. These approaches are based on a notion of parallelity that constraints the ‘slopes’ to be fixed for the entire population, whereas subjects in the study may follow different patterns of disease progression. This assumption can, hence, be a limiting factor diminishing the flexibility and fidelity of these approaches.

In order to gauge differences in geodesic trends, many approaches employ a product metric that measures the distance between privileged points on the trajectories as well as the directional deviation (change in ‘slope’). The latter relies on a notion of transport between tangent spaces to spatially align trajectories, e.g. parallel transport [15, 17, 22] in Riemannian manifolds or co-adjoint transport [13, 23] in the group of diffeomorphisms. However, the choice of transport can have a significant impact on the analysis: while the parallel transport depends on the path of transport (an effect called holonomy), the co-adjoint transport is not compliant with the metric. From a geometrical point of view, more appropriate distances can be derived by considering the space of trajectories itself as a differentiable manifold and equipping it with a Riemannian metric that is in turn consistent with the metric of the data manifold. State-of-the-art approaches, therefore, identify geodesics with points in the tangent bundle of the data manifold [7]. While the Sasaki metric is a natural metric on the tangent bundle, its geodesic computations require time-discrete approximation schemes involving the Riemannian curvature tensor. This not only incurs high computational costs, but also has a negative effect on numerical stability.

We consider a novel approach that overcomes these shortcomings. To this end, we identify elements of the tangent bundle with vector fields along the geodesic trend. This provides a notion of a canonical metric that is motivated from a functional view of parameterized curves in the shape space [24]. Considering the space of the geodesics as a submanifold in the space of shape trajectories, this allows in particular the use of a naturally induced distance. The corresponding shortest path, log map, and average geodesic can be computed by variational time-discretization. Remarkably, the underlying energy function allows for fast and simple evaluation increasing computational efficiency. In particular, the computation of the distance requires neither the computation of the curvature nor the decomposition into horizontal and vertical components.

In this work, we employ the derived metric within the generative hierarchical approach introduced in [7] that is based on a least-squares theoretic formulation. In the first stage of this hierarchical model, inner-individual changes are modelled as geodesic trends, which in the second level are considered as disturbances of a population-averaged geodesic trend. For the first stage, estimates at the individual level amounts to solving geodesic regression problems [25] for each individual. These problems can be solved efficiently in terms of first-order Riemannian optimization schemes [26]. The involved geodesic computations require geometric quantities such as adjoint Jacobi fields and parallel transport. These important geometric quantities are in general not given as closed form expressions. Efficient approximation schemes have been presented in [26,27,28].

Using the derived metric for geodesic trends, we obtain a notion of mean, covariance, and Mahalanobis distance. This allows us to develop a statistical hypothesis test for comparing the group-wise mean trends. Nonparametric permutation tests are applied to test the significance of the estimated differences in group trends. We perform this using a manifold-valued Hotelling \(T^2\) statistic analogously to [7] by applying it to the tangent bundle. As example application, we demonstrate the methodology on rat skulls growth and the long-term study of incident knee osteoarthritis (OA).

This paper is organized as follows. In the next section, we describe the geodesic mixed effects model and present an alternate approach for the computation of mean trends as well as the corresponding algorithmic realization. Section 3 provides an overview of Kendall’s shape space. Section 4 presents the application of our approach to synthetic spherical data for simulation studies, followed by rat calvaria and femur data from an epidemiological longitudinal study dealing on osteoarthritis in Kendall’ shape space and a discussion of the numerical results.

2 Hierarchical Geodesic Model

In this section, we describe the geodesic hierarchical model following least-squares theoretic derivation introduced in [7]. In particular, we derive a novel, functional-based Riemannian metric on the space of geodesics, thereby inducing notions of mean geodesics, co-variances in populations of geodesics, and least-squares estimates.

Fig. 1
figure 1

Schematic view of the geodesic mixed effects model: Subject-wise trends represented as best-fitting geodesics and subsequent population-average trend as mean of these geodesics in the shape space

Let (Mg) be a smooth complete Riemannian manifold with distance function d, exponential map \(\exp \), its local inverse \(\log \), and injectivity radius \(r_{inj}\). Then, the response \(Y_i\) of the i-th individual is modelled by

$$\begin{aligned} Y_i = \exp _{\gamma _i(T_i)}(\epsilon _i), \end{aligned}$$

where \(\gamma _i\) denotes the geodesic parameterized with respect to the independent variable \(T_i\). The random variation of observations from the geodesic \(\gamma _i\) is modelled by the exponential map and the random variable \(\epsilon _i\) that takes values on the tangent space at corresponding points on \(\gamma _i\). The group-level trend is then determined as the mean \({{\bar{\gamma }}}\) geodesic modelled by the exponential map \(\exp ^{TM}\) of the tangent bundle TM

$$\begin{aligned} \gamma _i = \exp ^{TM}_{{{\bar{\gamma }}}}(\nu _i), \end{aligned}$$

where \(\nu _i \in T_{{{\bar{\gamma }}}}TM\) represents the residual geodesic encoding the difference to the group-level geodesic.

To estimate the parameter of the described longitudinal model, we employ a two-step least-squares procedure requiring an appropriate notion of distance in the space of geodesics.

2.1 Geodesic Regression

The first stage of the employed model poses an instance of geodesic regression, which we summarize below. As such inconsistencies, e.g. due to acquisition noise and reconstruction errors, in the individual’s observations are minimized. For an overview and applications, we refer to [25, 26] and [29].

Consider scalars \(t_1< t_2< \cdots < t_N\) and distinct points \(q_1,\cdots ,q_N\in M\). Geodesic regression aims at finding a geodesic curve that best fits the data \(q_i\) at \(t_i\) in the least-squares sense, i.e. minimizing

$$\begin{aligned} \sum _{i=1}^Nd^2(q_i,\gamma (t_i)) \end{aligned}$$

over \(\gamma \) in the space of geodesics. The minimizer is called best-fitting geodesic for data \((t_i,q_i)_{i=1,\cdots ,N}\). Applying a linear bijection, we may assume that \(0=t_1<t_2<\cdots <t_N=1\).

For a geodesic \(\gamma \) from x to y, \(\varPhi \) denotes the parametrization of \(\gamma \) as a path over the unit interval, viz.

$$\begin{aligned} \varPhi (x,y,t):=\exp _x(t\log _xy),\,t\in I, \end{aligned}$$
(1)

where \(I:=[0,1]\).

Computationally, we employ the above parametrization to determine the corresponding \(\varPhi (x^*,y^*,.)\), where \((x^*,y^*) :={{\,\mathrm{arg\,min}\,}}F\) and

$$\begin{aligned} F(x,y):=\sum _{i=1}^Nd^2(q_i,\varPhi (x,y,t_i)) \end{aligned}$$

over \(M\times M\). The choice \((q_1,q_N)\), serves as a natural initial guess.

We remark that one could identify geodesics with their initial points and velocities instead of using endpoints. We employ the latter, since geodesic computations in terms of the function \(\varPhi \) defined in equation (1), the so-called slerp (spherical linear interpolation), are more efficient. The predictive power of the regression model can be measured by the coefficient of determination, denoted \(R^2\). To compute it, let \(F_{min}:=F(x^*,y^*)\) and denote the minimum of

$$\begin{aligned} G(x):=\sum _{i=1}^N d^2(q_i,x)) \end{aligned}$$

by \(G_{min}\). Then

$$\begin{aligned} R^2=1 - \frac{F_{min}}{G_{min}}. \end{aligned}$$

We recall that \(\frac{1}{N}F_{min}\) and \(\frac{1}{N}G_{min}\) are the unexplained and total variance, respectively. Moreover, \(R^2\in [0,1]\) and a large value indicates better regression performance (goodness of fit).

Generally, due to absence of an explicit analytic solution, the regression task has to be solved numerically. To this end, we employ a Riemannian trust-regions solver [30] with a Hessian approximation based on finite differences. The main challenge in this regard, is the computation of the gradient of F, \(\nabla F\). Due to the fact that

$$\begin{aligned} \nabla \rho _y(x)=-2\log _xy, \end{aligned}$$

where \(\rho _y(x):=d^2(x,y)\), \(\nabla F\) is given by the adjoint of the sum of certain Jacobi fields. For details and application to Kendall’s shape space, we refer to [26].

2.2 Tangent Bundle and Mean Geodesic

Geodesic mixed effects models and particularly mean geodesic (group trend) require a notion of distance for the tangent bundle consistent with the Riemannian metric of the shape space. In the following, we present a brief introduction to a natural choice for such a distance provided by the Sasaki metric employed in [7]. Then, we propose an alternative \(L^2\)-type approach and induced variational time-discrete geodesics.

Let \(\tau \) denote the canonical projection of the tangent bundle TM. Suppose that TM is endowed with a Riemannian metric \({\tilde{g}}\). Identifying a geodesic with its endpoints, the mean of geodesics is determined by \({\tilde{g}}\). A prominent natural choice for \({\tilde{g}}\) is the Sasaki metric. It is uniquely determined by the following properties (cf. [31]): a) \(\tau \) becomes a Riemannian submersion (\(\tau \) has maximal rank and \(\mathrm {d}\tau \) preserves lengths of horizontal vectors). b) The restriction of \({\tilde{g}}\) to any tangent space coincides with the Euclidean metric induced by g. c) Parallel vector fields along arbitrary curves in M are orthogonal to their fibres, i.e. for any curve \(\gamma \) and parallel vector field v along it, \({\dot{v}}\perp T_\gamma M\).

Let \(\eta :=(p,u):I\rightarrow TM\) be a smooth curve. \(\tau \) being a Riemannian submersion, \(T_{\eta }TM\) enjoys an orthogonal decomposition in vertical (viz. kernel of \(d_{\eta }\tau \)) and horizontal subspaces, each of dimension dim(M). Identifying each of them with \(T_{p}M\), the Sasaki metric at \(\eta \) is induced by the quadratic form \(\Vert v\Vert ^2+\Vert w\Vert ^2\), where \(v=p^\prime \) and \(w=u^\prime \) and prime denotes the derivative with respect to the curve parameter s. Denoting the covariant derivative and curvature tensor of g by \(\nabla \) and R, Sasaki geodesics are characterized by

$$\begin{aligned} \nabla _v v&= -R(u,w,v),\\ \nabla _v w&= 0. \end{aligned}$$

Algorithms for the computation of the exponential and log map as well as mean geodesic with respect to Sasaki metric, and also an application to corpus callosum longitudinal data as trajectories in Kendall’s shape space are given in [7]. In the case \(m=2\) (planar shapes), the shape space can be identified with the complex projective space and the Riemannian curvature tensor is explicitly given in terms of the canonical complex structure and the curvature tensor of the pre-shape space. For \(m\ge 3\), computation of R is more delicate.

We recall that the decomposition of the double tangent bundle into horizontal and vertical subspaces can be expressed in terms of Jacobi fields as follows (cf. [32]). The map \(H(s,t):= \exp _{p(s)} (tu(s))\) is a family of geodesics. Therefore, \(J_s(t):=H^\prime (s,t)\) is a family of Jacobi fields. The horizontal and vertical components of \(\eta ^\prime \) read \(v=J(0)\) and \(w={\dot{J}}(0)\).

Next, we propose an alternative approach to employ a metric on the tangent bundle. Fix \(s\in I\) and let \(\gamma _s:I\rightarrow M\) be the geodesic emerging from p(s) with initial velocity \(u(s)=\dot{\gamma _s}(0)\). Let \(\xi _s\) be a vector field along \(\gamma _s\). Then by \(\Vert \xi _s\Vert _{L^2}^2=\int _Ig(\xi _s (t),\xi _s(t))\,dt\) a quadratic form at (p(s), u(s)) is given, which defines a distance, denoted by \(\delta \) for the space of geodesics and in turn, on the tangent bundle TM. Hence, from an extrinsic point of view, \(\delta \) is induced by the standard metric of the Hilbert manifold \(L^2(I,M)\).

Now, let \(H=H(s,t)\) be a family of paths \(I\rightarrow M\) (\(H(s,\cdot )\) not necessarily geodesics) with \(\alpha :=H(0,\cdot )\) and \(\beta :=H(1,\cdot )\) geodesics. The energy of H induced by the above quadratic form reads

$$\begin{aligned} E(H)=\int _0^1\int _0^1g(H^\prime (s,t),H^\prime (s,t))\,dt\,ds \end{aligned}$$

(\(\xi _s=H^\prime (s,.)\)). Let \(H^*\) denote the minimizer of E restricted to paths through geodesics, i.e. H(s, .) geodesic for all \(s\in I\). Fix \(n\in {\mathbb {N}}\). Next, we construct time-discrete paths \((H^*_i)_{i=0,\cdots ,n}\) (sequence of geodesics) to approximate \(H^*\), which in turn, identifying each geodesic with its initial point and velocity, provides a discrete path in the tangent bundle approximating \((H^*(0),\dot{H^*}(0))\). We remark that the energy functional E achieves its minimum in \(L^2(I,M)\), i.e. over all paths connecting \(\alpha \) to \(\beta \), if H(., t) is a geodesic for all t. It follows from a slight modification of [24, Theorem 3.2], which we present in Sect. 6.1 for the reader’s convenience. Now, let \(s_i:=\frac{i}{n}\). We may write

$$\begin{aligned} E(H)=\sum _{i=0}^{n-1}\int _{s_i}^{s_{i+1}}\Vert H^\prime (s,.)\Vert _{L^2}^2\,ds \end{aligned}$$

and discretizing E intrinsically, i.e. replacing \(H^\prime (s,.)\) by the finite difference \(n\log _{H_i} H_{i+1}\) with \(H_i(.):=H(s_i,.)\), we arrive at

$$\begin{aligned} E_n(H)=n\sum _{i=0}^{n-1}\delta ^2(H_i,H_{i+1}). \end{aligned}$$

Identifying each geodesic \(H_i\) with its endpoints \(x_i=H_i(0)\) and \(y_i=H_i(1)\), we have to minimize the explicit expression

$$\begin{aligned}&E_n((x_i,y_i)_{i=0,\cdots ,n})\nonumber \\&\quad =n\sum _{i=0}^{n-1}\int _0^1 d^2(\varPhi (x_i,y_i,t),\varPhi (x_{i+1},y_{i+1},t))\, dt. \end{aligned}$$
(2)

Now, to ensure that any two points determine a unique geodesic, we suppose that \(\alpha \) and \(\beta \) are close enough in the sense that their images lie in a convex normal neighbourhood (for instance a geodesic ball with radius \(\frac{r_{inj}}{2}\)) and let \((x^*,y^*)\) be the minimizer of \(E_n\) over \((x_i,y_i)_i\in M^n\times M^n\) with fixed endpoints \(x_0=\alpha (0),y_0=\alpha (1)\), \(x_{n}=\beta (0),y_{n}=\beta (1)\). A natural choice for the initial values \(x^0\) and \(y^0\) is given by the equidistant partition \(x^0_i=\varPhi (x_0,x_{n},s_i)\), \(y^0_i=\varPhi (y_0,y_{n},s_i)\). Then, the desired discrete shortest path reads

$$\begin{aligned} H^*=\left( \varPhi (x^*_0,y^*_0,.),\cdots ,\varPhi (x^*_n,y^*_n,.)\right) . \end{aligned}$$
(3)

We remark that the discrete path length is given by

$$\begin{aligned}&{\mathcal {L}}_n((x_i,y_i)_i)=\sum _{i=0}^{n-1}\delta (H_i,H_{i+1})\\&\quad =\sum _{i=0}^{n-1}\sqrt{\int _0^1 d^2(\varPhi (x_i,y_i,t),\varPhi (x_{i+1},y_{i+1},t))\,dt}. \end{aligned}$$

Furthermore, for any discrete path H one has the a priori estimation

$$\begin{aligned} \delta ^2(H_0,H_n)\le n\sum _{i=0}^{n-1}\delta ^2(H_i,H_{i+1}), \end{aligned}$$

with equality if and only if all \(H_i\) equidistantly lie along a smooth shortest path from \(H_0\) to \(H_n\).

Note that \(H_0=\alpha \), \(H_n=\beta \), and the right hand side of the above inequality is the discrete energy \(E_n(H)\), explicitly given by (2). Indeed, one verifies that, if M is the Euclidean space, then the minimizer coincides with the initial value and in the above estimation equality holds. Moreover, it coincides with the discrete Sasaki geodesic and converges to the Sasaki geodesic given by

$$\begin{aligned} p(s)&=(1-s)\alpha (0)+s\beta (0),\\ u(s)&=s{\dot{\alpha }}(0)+(1-s){\dot{\beta }}(0). \end{aligned}$$

Next, we aim at constructing the mean of N geodesics \(\gamma _j:I\rightarrow M\). To ensure well-definedness of the approach, we suppose that they are close enough, in the sense that their images lie in a neighbourhood U within \(\frac{1}{2}min \{r_{inj},\frac{\pi }{\sqrt{\varDelta }}\}\), where \(\varDelta \) denotes an upper bound for the sectional curvature in U and \(\frac{\pi }{\sqrt{\varDelta }}\) is interpreted as \(+\infty \) if \(\varDelta \le 0\) (e.g. for Hadamard manifolds). Now, the induced mean of \(\gamma _1,\cdots ,\gamma _N\) is the geodesic \(\gamma \) with initial- and endpoints x and y, minimizing over \(M^2\)

$$\begin{aligned} \begin{aligned}&G_n(x,y)=\sum _{j=1}^N \min _{x_i^j, y_i^j \in M}\,E_n((x^j_i,y^j_i)_{i=0,\cdots ,n}) \\&\quad \text {s.t.}\,x^j_0=x,\,y^j_0=y,\,x^j_n=\gamma _j(0),\,y^j_n=\gamma _j(1). \end{aligned} \end{aligned}$$
(4)

A natural choice for the initial value is given by the point-wise mean of \((\gamma _j(0),\gamma _j(1))\), \(j=1,\cdots ,N\). In the sequel, we call \(\delta \) the functional-based \(L^2\)-metric.

Note that computations of the log map and mean with respect to \(\delta \) neither involve the curvature tensor nor decomposition in horizontal and vertical components.

2.3 Algorithmic Realization

In this section, we describe the algorithmic realization of the computation of the variational time-discrete shortest path and the mean geodesic with respect to the proposed \(L^2\)-metric, denoted by \(\delta \).

Discrete shortest paths are determined as minimizers of the discrete path energy \(E_n\) given in (2). Before we turn to the general case, let us first consider solutions of \(E_2\), i.e. discrete 2-geodesics in TM. Assuming endpoints to be fixed, i.e. seeking solutions of the boundary value problem, the search space for solutions of \(E_2\) reduces to a single M-geodesic. Indeed, given a quadrature rule for t-discretization, the estimation of 2-geodesics numerically coincides with (weighted) geodesic regression. Explicitly, we minimize over \(x,y\in M\)

$$\begin{aligned}&E_2((x_-,y_-),(x,y),(x_+,y_+))\\&\quad =\sum _{i=0}^k\omega _id^2(\varPhi (x,y,t_i),\varPhi (x_{-},y_{-},t_i))\\&\qquad +d^2(\varPhi (x,y,t_i),\varPhi (x_{+},y_{+},t_i)))), \end{aligned}$$

where discrete values \(t_i\) and weights \(\omega _i\) are determined by the chosen quadrature rule. This analogue allows us to reuse algorithms and implementations for geodesic regression that are already required during the first stage of our hierarchical model. Moreover, as the solutions to \(E_2\) are also midpoints of the corresponding 2-geodesic, we can interpret them as discrete averages. This allows to derive an optimization procedure for general \(E_n\) by adopting a discrete path shortening flow from the realm of polygonal path processing. In particular, we employ an iterative averaging approach detailed in Algorithm 1 that consecutively updates each of the inner nodes by replacing it with the average of its neighbours as determined by regression, i.e. estimation of a 2-geodesic (\(n=2\)). One verifies that this algorithm converges, as each averaging step weakly decreases the discrete path energy \(E_n\).

figure d

Computation of the discrete mean geodesic \({\bar{\gamma }}\) requires minimization of Eq. (4) yielding a simultaneous optimization over N discrete paths \((x_i^j,y_i^j)_{i=0,\ldots ,n}\) connecting the input geodesics \(\gamma _j\) and \({\bar{\gamma }}\). While interior nodes of each path have to form 2-geodesics with their neighbours, the coupling at the centre is described as minimizer of \(G_1\), where input geodesics are the first interior nodes \((x^j_1,y^j_1)\) along the discrete paths. Again, the optimization of \(G_1\) w.r.t. the centre \({\bar{\gamma }}\) can be considered as an instance of geodesic regression following a similar argument as for \(E_2\). This motivates an alternating optimization procedure that—in every iteration—first relaxes the estimate for \({\bar{\gamma }}\) and subsequently updates the interior nodes of the paths each time solving a geodesic regression problem. To improve efficiency, we further employ a cascadic approach in time by solving the sequence of related problems \((\min _\gamma G_k)_{k=1,\ldots ,n}\) using the k-th solution to initialize the subsequent problem.

3 Kendall’s Shape Space

In the following, we present a brief overview of Kendall’s shape space and its tangent bundle as well as main quantities which will be employed for geodesic analysis and statistics. Note that our approach is not limited to the Kendall’s shape space and can be employed to any Riemannian manifold.

For a comprehensive introduction to Kendall’s shape space and details on the subjects of this section, we refer to [26, 33]. For the relevant tools from Riemannian geometry, we refer to [34, 35].

Let M(mk) denote the space of real \(m\times k\) matrices endowed with its canonical scalar product given by \(\langle x,y\rangle ={{\,\mathrm{trace}\,}}(xy^t)\), and \(\Vert \cdot \Vert \) the induced Frobenius norm. We call the set of k-ad of landmarks in \({\mathbb {R}}^m\) after removing translations and scaling the pre-shape space and identify it with

$$\begin{aligned} {\mathcal {S}}_m^k:=\left\{ x\in M(m,k):\, \sum _{i=1}^kx_i=0,\,\Vert x\Vert =1\right\} \end{aligned}$$

endowed with the standard spherical distance given by \({\mathbf {d}}(x,y)=\arccos (\langle x,y\rangle )\).

A shape is a pre-shape with rotations removed. More precisely, the left action of \(\mathrm {SO}_m\) on \({\mathcal {S}}_m^k\) given by \((R,x)\mapsto Rx\) defines an equivalence relation given by \(x\sim y\) if and only if \(y=Rx\) for some \(R\in \mathrm {SO}_m\). Kendall’s shape space is defined as \(\varSigma ={\mathcal {S}}_m^k/\mathord {\sim }\). Now, denoting the canonical projection of \(\sim \) by \(\pi \), the induced distance between any two shapes \(\pi (x)\) and \(\pi (y)\) is the Procrustes distance given by

$$\begin{aligned} d(x,y)= \min _{R\in \mathrm {SO}_m} d(x,Ry)=\arccos \sum _{i=1}^m\lambda _i, \end{aligned}$$

where \(\lambda _1\ge \cdots \ge \lambda _{m-1}\ge |\lambda _m|\) denote the pseudo-singular values of \(yx^t\). Note that for simplicity of notation, we have identified shapes and their representing pre-shapes in the definition of \(d_{\varSigma }\). We call \(x,y\in {\mathcal {S}}_m^k\) well positioned and write \(x{\mathop {\sim }\limits ^{\omega }}y\) if and only if \(yx^t\) is symmetric and \({\mathbf {d}}(x,y)=d(x,y)\). For each \(x,y\in {\mathcal {S}}_m^k\), there exists an optimal rotation \(R\in \mathrm {SO}_m\) such that \(x{\mathop {\sim }\limits ^{\omega }}Ry\). The diameter of \(\sigma \) is \(\pi /2\).

Furthermore, the horizontal and vertical spaces at \(x\in {\mathcal {S}}_m^k\) read

$$\begin{aligned} \mathrm {Hor}_x&=\{u\in M(m,k-1):\,ux^t=xu^t\text { and } \langle x,u\rangle =0\},\\ \mathrm {Ver}_x&=\{Ax:\,A+A^t=0\}. \end{aligned}$$

A smooth curve is called horizontal if and only if its tangent field is horizontal. Geodesics in the shape space are equivalence classes of horizontal geodesics. For \(x{\mathop {\sim }\limits ^{\omega }}y\), the geodesic from x to y given by

$$\begin{aligned} \varPhi (x,y,t)=\frac{\sin ((1-t)\varphi )}{\sin \varphi }x+\frac{\sin (t\varphi )}{\sin \varphi }y \end{aligned}$$

with \(\varphi =\arccos (\langle x,y\rangle ),\,0\le t\le 1\), is horizontal. Hence, \(\varPhi \) realizes the shortest path from \(\pi (x)\) to \(\pi (y)\). Now, let \(\mathrm {Log}\) and \(\mathrm {Exp}\) denote the log and exponential map of the pre-shape sphere, i.e.

$$\begin{aligned} \mathrm {Log}_xy=\varphi \frac{y-\langle x,y\rangle x}{\Vert y-\langle x,y\rangle x \Vert }, \end{aligned}$$

and

$$\begin{aligned} \mathrm {Exp}_xv=\cos (\varphi )\cdot x+\frac{\sin (\varphi )}{\varphi }\cdot v, \end{aligned}$$

with \(\Vert v\Vert =\varphi ,\,\langle x,v\rangle =0\). Then, the Riemannian exponential map of the shape space, denoted by \(\widetilde{\exp }\), satisfies

$$\begin{aligned} \pi (\mathrm {Exp}_x u)=\widetilde{\exp }_{\pi (x)}(\mathrm {d}_x\pi (u))=\widetilde{\exp }_{\pi (x)}(\mathrm {d}_x\pi (u^h)), \end{aligned}$$

where \(u^h\) stands for the horizontal component of \(u\in T_x{\mathcal {S}}_m^k\). Furthermore, the log map of the shape space is represented at x by \((\mathrm {Log}_x)^h\).

We recall that pre-shapes with rank \(\ge m-1\), denoted by \({\mathcal {S}}\), constitute an open and dense subset of \({\mathcal {S}}_m^k\) and the restriction of the quotient map \(\pi \) to \({\mathcal {S}}\) is a Riemannian submersion with respect to the metric induced by the ambient Euclidean space. Moreover, for pre-shapes in \({\mathcal {S}}\), the optimal rotation is unique if and only if \(\lambda _{m-1}+\lambda _m\ne 0\). Denoting the covariant derivatives in the pre-shape and shape space by \(\nabla \), respectively \(\tilde{\nabla }\), for horizontal vector fields X and Y, we have

$$\begin{aligned} \nabla _XY=(\nabla _XY)^h+\frac{1}{2}[X,Y]^v. \end{aligned}$$

Here, the superscript v denotes the vertical component and [., .] the Lie bracket. Moreover, \((\nabla _XY)^h\) equals the horizontal lift of \(\tilde{\nabla }_{\mathrm {d}\pi X}\mathrm {d}\pi Y\). In particular,

$$\begin{aligned} \mathrm {d}\pi (\nabla _X Y)=(\tilde{\nabla }_{\mathrm {d}\pi X} \mathrm {d}\pi Y) \circ \pi . \end{aligned}$$

We recall that denoting the Euclidean derivative of a vector field W along a path \(\gamma \) in the pre-shape sphere (i.e. \(\Vert \gamma \Vert =1\)) by \({\dot{W}}\), we have

$$\begin{aligned} \nabla _{{\dot{\gamma }}}W={\dot{W}}-\langle {\dot{W}},\gamma \rangle \gamma . \end{aligned}$$

Key quantities of the shape space geometry such as parallel transport, Jacobi fields and Fréchet mean can be computed by horizontal lifting to \({\mathcal {S}}\) (and extension to \({\mathcal {S}}_m^k\)). We refer the reader to [26] for corresponding results.

4 Experiments

In this section, we provide experimental evaluations based on the proposed geodesic hierarchical model for synthetic as well as publicly available, longitudinal data. First, we perform a simulation study investigating the efficacy of the proposed method. Second, we provide a comparison of mean trends w.r.t. the Sasaki and our proposed metric, apply the approaches to planar shapes describing rat skulls and compare the results. Subsequently, we apply the derived scheme to the analysis of group differences in longitudinal femur shapes of subjects with incident and developing osteoarthritis (OA) versus normal controls. In particular, we estimate group-wise trends and perform a Hotelling \(T^2\) test to identify significant group differences. For two-dimensional visualization, we apply tangent PCA at the Fréchet mean of shapes representing the observed data. Time discrete computations are performed based on 2-geodesics—employing finer discretizations have been found to provide no further improvements for the datasets under study.

4.1 Synthetic Spherical Data

We performed simulation studies with responses on the 2-dimensional unit sphere \(S^2\). The objective of these studies is to illustrate the application of the proposed approach and to verify its efficacy. To this end, we fixed a geodesic \(\mu \) with initial- and endpoints \(\mu _0\) and \(\mu _1\), and employed two distributions of spherical geodesics with mean \(\mu \), to generate random geodesics. We call the distributions \(M^2\) and Sasaki. Let \(\sigma =\frac{\pi }{12}\). The \(M^2\) distribution uses the parametrization of geodesics over \(S^2\times S^2\). Therein a sample geodesic \(\gamma \) is given by initial- and endpoints \(\gamma _0\) and \(\gamma _1\) drawn from a normal distribution with standard deviation \(\sigma \) centred at \(\mu _0\) and \(\mu _1\), respectively. For the Sasaki distribution, we randomly generate vectors v and w in \(T_{\mu _0}S^2\) from a normal distribution with standard deviation \(\sigma \) and mean zero. We gain \(\gamma _0\) and \({\dot{\gamma }}(0)\) by applying the Sasaki exponential map at \((\mu _0,u)\) to (vw), where \(u={\dot{\mu }}(0)\).

We used \(\frac{1}{\pi }\max (d( \mu _0,\gamma _0),d(\mu _1,\gamma _1))\) with

\(d(x,y)=\arccos (\langle x,y\rangle )\), to measure the distance between \(\mu \) and \(\gamma \) and thus, quantify the accuracy of our estimations. The scaling factor \(\frac{1}{\pi }\) reflects the fact that the error is bounded above by \(\pi \), the diameter of the sphere.

In the first study, we repeatedly (100 times) estimated mean geodesics from 25 randomly generated geodesics. The resulting bias (average error) and median (95% confidence intervals) from the \(M^2\) distribution amount to 0.0269 and 0.0259 (0.0250–0.0288), and 0.0323 and 0.0304 (0.0297–0.0348) for the proposed and Sasaki metric, respectively. For both metrics, the Sasaki distribution yields almost identical values, viz. 0.0253 and 0.0234, with slightly different confidence intervals (0.0231–0.0275) and 0.0234 (0.0230–0.0277). Figure 2 shows the frequency distribution for this study with a fitted density function from a beta distribution. This choice of density function is motivated by the boundedness of the error function and the opening that its probability is unknown.

Fig. 2
figure 2

Frequency distribution of error for estimated means of 25 random geodesics for 100 draws from the \(M^2\) (top) and Sasaki distribution (button) for Sasaki (left) and proposed metric (right)

In the second study, in each experiment, we computed the error of the estimated mean w.r.t. both approaches, and increased the number of geodesics. We repeated the experiment 10 times and observed that for both approaches the error clearly exhibits an overall decreasing trend (as expected). For both distributions, the deviation between the proposed approach and the Sasaki one is reasonable. Figure 3 shows a plot of the average errors ± standard deviations against the number of geodesics. However, the overall trend for the Sasaki metric has slightly more fluctuations, while the error for proposed metric exhibits a relatively more uniform trend. The former approach uses numerical second order covariant derivatives and spherical parallel transport causing more computational sensibility, while the latter one is based on quadrature. This seems to be a reasonable explanation for the mentioned difference. Regarding computation time, there was no significant difference between the approaches.

Fig. 3
figure 3

Average error of mean estimates versus increasing number of geodesics for 10 draws from \(M^2\) (top) and Sasaki distribution (button). Shaded regions indicate standard deviations

We remark that one could use another distance function instead of the one induced by the canonical metric on \(S^2\times S^2\). However, we observed that the other natural choices, Sasaki distance or maximum of \(d(\mu (t),\gamma (t))\) over \(t\in I\), only slightly change the error values, but qualitatively have no remarkable effect on the considered essential statistical characteristics, i.e. frequency distribution or average trends.

A qualitative comparison between the shortest path and mean induced by the Sasaki and the proposed metric is shown in Figs. 4 and 5, respectively. For none of the shortest paths, footpoint curves constitute geodesics. However, the functional-based one is closer to the more intuitive shortest path given by the simple point-wise construction \(H(s,t)=\varPhi (\alpha (t),\beta (t),s)\).

Fig. 4
figure 4

Minimal geodesic in the tangent bundle identified as shortest path connecting two geodesics (red) with respect to Sasaki (left) and functional-based \(L^2\)-metric (right)

Fig. 5
figure 5

Red geodesics are generated by randomly perturbing the endpoints of the black one. Their mean geodesic with respect to Sasaki (left) and functional-based \(L^2\)-metric (right) are blue. Both approaches provide an adequate approximation of the true mean

4.2 Mean Trend for Rats Calvaria Data

Fig. 6
figure 6

Landmark configuration of rat braincase (laterally viewed and the skull is facing right) with bottom of the figure corresponding to the basicranium (source: [36])

As the first, open-accessFootnote 1 dataset we use Vilmann’s rat calvaria (skulls excluding the lower jaw) that have been obtained from X-ray images and were also studied in [36, pp. 408-414]. It consists of 8 landmarks in 2 dimensions for 18 individuals observed at ages of 7, 14, 21, 30, 40, 60, 90, 150 days. Our computations for these data yield a coefficient of determination of \(R^2=0.794\), in accordance with [37]. The 2D Kendall’s shape space is isometric to the complex projective space with its standard (Fubini-Study) metric, hence a symmetric space. Therefore, decomposition in vertical and horizontal parts, parallel transport, and the curvature tensor can be represented by closed form expressions (cf. Sect. 6.2). Hence, application of our approach to the rat skulls dataset provides a suitable opportunity for the comparison of the Sasaki and the proposed mean geodesic.

Fig. 7
figure 7

Top: Input shapes (blue) and endpoints of fitted geodesics (red) with zoomed view of input shape trajectories and fitted geodesics for landmark 1. Bottom: Fitted geodesics, Sasaki (black) and proposed (green) mean trend with zoomed view for landmark 1

Figure 7 visualizes landmark-wise, shape representations of the input and output corresponding to the first stage (top) and second stage (bottom) of the geodesic mixed-effects model applied to rats calvaria data. Figure 8 shows the result of tangent PCA to the fitted geodesics and their mean providing the average overall trend. The deviation between the two mean geodesics induced by the Sasaki and our proposed metric just amounts to \(10^{-4}\). Therefore, they are not distinguishable in the plots.

Fig. 8
figure 8

Two-dimensional visualization of fitted geodesics (red) with their Sasaki (black) and proposed (green) mean trend (visually indistinguishable)

4.3 Group-Wise Trends for Femoral Data and Hypothesis Tests

Our second dataset is derived from the Osteoarthritis Initiative (OAI), which is a longitudinal study of knee osteoarthritis comprising (among others) clinical evaluation data and radiological images from 4,796 men and women of age 45–79. The data are available for public access at http://www.oai.ucsf.edu/.

From the OAI database, we determined three groups of shapes trajectories: HH (healthy, i.e. no OA), HD (healthy to diseased, i.e. onset and progression to severe OA), and DD (diseased, i.e. OA at baseline) according to the Kellgren–Lawrence score [38] of grade 0 for all visits, an increase of at least 3 grades over the course of the study, and grade 3 or 4 for all visits, respectively. We extracted surfaces of the distal femora from the respective 3D weDESS MR images (0.37\(\times \)0.37 mm matrix, 0.7 mm slice thickness) using a state-of-the-art automatic segmentation approach [39]. For each group, we collected 22 trajectories (all available data for group DD minus a record that exhibited inconsistencies, and the same number for groups HD and HH, randomly selected), each of which comprises shapes of all acquired MR images, i.e. at baseline, the 1-, 2-, 3-, 4-, 6- and 8- year visits. In a supervised post-process, the quality of segmentations and the correspondence of the resulting meshes (8,988 vertices) were ensured.

Main goals in this application are the following. First, to understand how anatomical structures in the femur change over time, during aging and growth processes, and more critically during disease progression. Second, to compare and test how temporal changes in the anatomy of different groups significantly differ. We represented the data in Kendall’s shape space and applied the geodesic regression approach described above to the femoral trajectories.

We computed the \(R^2\)-values amounting to medians (95 confidence intervals) of 0.40 (0.33–0.46), 0.55 (0.48–0.63), and 0.51 (0.40–0.72) for group HH, DD, and HD, respectively. For details and computational aspects, we refer to [26]. Note that geodesic representation provides a less cluttered visualization of the trajectory population making it easier to identify trends within as well as across groups.

Fig. 9
figure 9

Two-dimensional visualization for group-Wise trends estimated as averages of fitted geodesics

For the statistical testing of group differences, we employ the manifold-valued Hotelling \(T^2\) test described in [7]. For the convenience of the reader, the formulae used therein are presented below. Let \(x=\{x_1,\cdots ,x_{n_1}\}\) and \(y=\{y_1,\cdots ,y_{n_2}\}\) two samples with corresponding Fréchet means \({\bar{x}}\) and \({\bar{y}}\), \(v_x=\log _{{\bar{x}}}{\bar{y}}\), \(v_y=\log _{{\bar{y}}}{\bar{x}}\). Then the individual group covariances are given by

$$\begin{aligned} W_x&=\frac{1}{n_1}\sum _{i=1}^{n_1}(\log _{{\bar{x}}}x_i)(\log _{{\bar{x}}}x_i)^t \\ W_y&=\frac{1}{n_2} \sum _{i=1}^{n_2}(\log _{{\bar{y}}}y_i)(\log _{{\bar{y}}}y_i)^t \end{aligned}$$

and the sample \(T^2\) statistic reads

$$\begin{aligned} t^2=\frac{1}{2}( v_x^tW^{-1}_x v_x + v_y^tW^{-1}_y v_y ). \end{aligned}$$

For the estimations of the log map and mean, we employed equations (2) and (4). We found \(t^2\)-values 0.0012, 0.000703, and 0.000591 for HH vs. DD, DD vs. HD, and HH vs. HD with corresponding p-values 0, 0.011, and 0.033. For the computation of the statistical significance, i.e. p-values, we randomly permuted group memberships of the subject-specific geodesic trends 1, 000 times. The results reveal clear differences between the group-wise average geodesics demonstrating the descriptiveness of the proposed approach. In particular, the results confirm the qualitative differences in group-average trends depicted in the low-dimensional visualization in Fig. 9.

5 Conclusion

We presented a modification of the geodesic hierarchical model introduced in [7] by employing a discrete geodesic for the tangent bundle of the shape space instead of Sasaki geodesics. Our approach does not involve the Riemannian curvature tensor and allows for simple and efficient approximation. Furthermore, we presented numerical examples, using synthetic random spherical data as well as 2D shapes representing longitudinal data for rat skulls, demonstrating the accuracy of our approach.

Moreover, we estimated average geodesics and group trends for the example application of femoral longitudinal 3D data using Kendall’s shape space, and employed a manifold-valued Hotelling \(T^2\) test, which confirmed that the model well distinguishes the groups.

There are several potential directions for future work. First, it would be interesting to derive efficient expressions for the estimation of Sasaki geodesics for 3D shape trends and to compare the result with our approach. Furthermore, we would like to extend our model to account for errors in covariates, as well as higher-dimensional parameters, which would allow to take further effects into account providing more insight on more complex phenomena. In the end, further investigation and understanding of distributions on manifolds is needed to obtain probabilistic, yet computationally tractable generalizations of mixed-effects models.Footnote 2