A Cutting-Plane Method for Sublabel-Accurate Relaxation of Problems with Product Label Spaces

Ye, Zhenzhang; Haefner, Bjoern; Quéau, Yvain; Möllenhoff, Thomas; Cremers, Daniel

doi:10.1007/s11263-022-01704-7

A Cutting-Plane Method for Sublabel-Accurate Relaxation of Problems with Product Label Spaces

Open access
Published: 29 October 2022

Volume 131, pages 346–362, (2023)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computer Vision Aims and scope Submit manuscript

A Cutting-Plane Method for Sublabel-Accurate Relaxation of Problems with Product Label Spaces

Download PDF

Zhenzhang Ye ORCID: orcid.org/0000-0001-6627-8783¹,
Bjoern Haefner¹,
Yvain Quéau²,
Thomas Möllenhoff³ &
…
Daniel Cremers¹

2041 Accesses
2 Altmetric
Explore all metrics

Abstract

Many problems in imaging and low-level vision can be formulated as nonconvex variational problems. A promising class of approaches to tackle such problems are convex relaxation methods, which consider a lifting of the energy functional to a higher-dimensional space. However, they come with increased memory requirements due to the lifting. The present paper is an extended version of the earlier conference paper by Ye et al. (in: DAGM German conference on pattern recognition (GCPR), 2021) which combined two recent approaches to make lifting more scalable: product-space relaxation and sublabel-accurate discretization. Furthermore, it is shown that a simple cutting-plane method can be used to solve the resulting semi-infinite optimization problem. This journal version extends the previous conference work with additional experiments, a more detailed outline of the complete algorithm and a user-friendly introduction to functional lifting methods.

Sublabel-Accurate Multilabeling Meets Product Label Spaces

A Fast Continuous Max-Flow Approach to Non-convex Multi-labeling Problems

Functional Liftings of Vectorial Variational Problems with Laplacian Regularization

1 Introduction

In this paper, we present a convex optimization framework for total-variation regularized problems of the following form:

$$\begin{aligned} \begin{aligned} \inf _{ u : \Omega \rightarrow \Gamma } ~&\int _{\Omega } c(x, u_1(x), \ldots , u_k(x)) \, \textrm{d}x \\&+ \sum _{i=1}^k \lambda _i \text {TV}(u_i). \end{aligned} \end{aligned}$$

(1)

The set $\Gamma = \{ (\gamma _1, \ldots , \gamma _k) \in \textbf{R}^N : \gamma _i \in \Gamma _i, i=1\ldots k \}$ is defined by k individual submanifolds $\Gamma _i \subset \textbf{R}^{N_i}$ with $N = N_1 + \cdots + N_k$. The individual $\Gamma _i$ are required to be bounded subsets of $\textbf{R}^{N_i}$.

Since the focus of this paper are imaging applications we assume $\Omega \subset \textbf{R}^2$ to be a rectangular domain but the approach is easily generalized to higher dimensional or non-rectangular domains.

We make no special assumptions on the cost $c : \Omega \times \Gamma \rightarrow \textbf{R}_{\ge 0}$ in (1) and allow it to be a general nonnegative nonconvex function. This turns (1) into an overall nonconvex optimization problem, which can be challenging to solve using standard gradient-based methods. Moreover, we do not assume that we are able to compute gradients, projections or proximal operators of the cost function c(x, u(x)). Our approach only requires function evaluations. This allows us to consider degenerate costs that are out of reach for gradient-based approaches.

The regularizer in (1) is a separable total variation regularization $\text {TV}(u_i)$ on the individual components $u_i : \Omega \rightarrow \textbf{R}^{N_i}$ weighted by a parameter $\lambda _i > 0$. The total variation (TV) (Rudin et al. 1992; Chambolle et al. 2010) encourages a spatially smooth but edge-preserving solution. It can be defined as the following equation:

$$\begin{aligned} \text {TV}(u_i)&:= \sup _{\begin{array}{c} p : \Omega \rightarrow \textbf{R}^{N_i \times 2}\\ \Vert p(x) \Vert _* \le 1 \end{array}} \int _{\Omega } \langle {{\,\textrm{Div}\,}}_x p(x), u_i(x) \rangle \, \textrm{d}x \nonumber \\&= \int _{\Omega } \Vert \nabla u_i(x) \Vert \, \textrm{d}x, \end{aligned}$$

(2)

where by $\nabla u_i(x) \in \textbf{R}^{N_i \times 2}$ we denote the Jacobian matrix and by $\Vert \cdot \Vert _*$ the dual norm of $\Vert \cdot \Vert $. The last equality in (2) holds for sufficiently smooth $u_i$.

The convex relaxation approach we use in this paper works for general convex and nonconvex regularizers which depend on the Jacobian $\nabla u_i$, see Pock et al. (2009, 2010), Möllenhoff and Cremers (2019), Vogt et al. (2020). However, the main focus of this paper is an efficient implementation of the data cost c, and therefore we consider only the separable total variation (2).

Motivation and applications

To motivate Problem (1), let us consider some practical applications in low-level vision and imaging. One example is the variational estimation of optical flow between two RGB images $I_1, I_2 : \Omega \rightarrow \textbf{R}^3$, see Horn and Schunck (1981). In that case, $\Gamma _1 = \Gamma _2 = [a, b] \subset \textbf{R}$ models the displacement between the two images and the cost function is given by a photometric error.

Often, $\Gamma _i$ is a curved manifold, see, e.g., the applications presented by Lellmann et al. (2013), Weinmann et al. (2014). Examples include $\Gamma _i = \mathbb {S}^2$ for normal field processing (Lellmann et al. 2013), $\text {SO}(3)$ for motion estimation (Görlitz et al. 2019) or the circle $\mathbb {S}^1$ for processing of cyclic data (Cremers and Strekalovskiy 2013; Steinke et al. 2010).

Many real-wolrd applications requires to estimate multiple quantities in a joint fashion. This naturally leads to the formulation of product space which is considered in (1), where $\Gamma = \Gamma _1 \times \cdots \times \Gamma _k$. In this case, each $\Gamma _i$ models one quantity of interest that one aims to estimate.

A prominent approach to address joint optimization problems of this form are alternating procedures such as expectation maximization (Dempster et al. 1977), block-coordinate descent and alternating direction-type methods (Boyd et al. 2011). There, the idea is to estimates a single quantity while holding the other ones fixed. However, these approaches usually depend on a good initialization and are easy to get stuck in a very poor local optima. Therefore, the goal of this paper is to instead consider a convex relaxation of Problem (1). The relaxed problem can then be solved to global optimality with standard proximal methods such as the primal dual algorithm (Pock and Chambolle 2011). These methods are usually well parallelizable. Thus, they can be efficiently implemented on GPUs, allowing to solve large-scale problems in a reasonable time.

Contributions The main difficulty with convex approaches to (1) is the large memory requirements which are inherent to a lifted problem formulation which renders the problem convex. In order to improve the memory-efficiency of relaxations, two disparate ideas have been considered in previous work: sublabel-accurate liftings (Möllenhoff et al. 2016) and product-space relaxations (Goldluecke et al. 2013). In this paper, we combine both approaches and present a sublabel-accurate implementation of Goldluecke et al. (2013). Unlike previous liftings (Möllenhoff et al. 2016; Möllenhoff and Cremers 2017; Vogt et al. 2020), our approach does not require epigraphical projections and can therefore be applied in a black-box fashion, requiring only evaluations of the cost c.

Our main contribution is a simple way to implement the resulting semi-infinite optimization problem with a cutting-plane method. Moreover, we show that using this method, we can achieve a lower energy than the product-space lifting (Goldluecke et al. 2013) on optical flow estimation and manifold-valued denoising problems.

This journal paper is an extended version of the conference paper (Ye et al. 2021). In particular, we offer the following contributions over the conference version:

We have added additional background and explanations on the basics of functional lifting to Sects. 2 and 3.
In Sect. 4 we added the detailed update equations for the primal-dual algorithm.
We added additional figures and explanations to Sect. 5 to illustrate and provide an intuition of our algorithm on a simple example.
We added additional experiments on optical flow and manifold-valued image denoising to Sect. 5 and evaluated our method on a larger set of images.

Overview of the paper

After this introduction, we provide in Sect. 2 an introduction to functional lifting methods for Problem (1), review existing works and explain our contributions relatively to them. We summarize the convex relaxation for (1) and its discretization in Sect. 3. The proposed cutting-plane method and sampling strategy which we use to implement the discretized relaxation are presented in Sect. 4. In Sect. 5 we evaluate our method on a toy problem and several real-world imaging applications. Our conclusions are eventually drawn in Sect. 6.

2 An Introduction to Functional Lifting

Let us first consider a simplified version of Problem (1) where $\Omega $ consists only of a single point, i.e., the nonconvex minimization of one data term:

$$\begin{aligned} \min _{\gamma \in \Gamma } ~ c(\gamma _1, \ldots , \gamma _k). \end{aligned}$$

(3)

A well-known approach to the global optimization of (3) is a lifting or stochastic relaxation procedure, which has been considered in diverse fields such as polynomial optimization (Lasserre, J- B. 2000), continuous Markov random fields (Fix and Agarwal 2014; Peng et al. 2011; Bauermeister et al. 2021), variational methods (Pock et al. 2008), and black-box optimization (de Boer et al. 2005; Ollivier et al. 2017; Schaul 2011). The idea is to relax the search space in (3) from $\gamma \in \Gamma $ to probability distributions$\textbf{u}\in \mathcal {P}(\Gamma )$ and solve^{Footnote 1}

$$\begin{aligned} \min _{\textbf{u}\in \mathcal {P}(\Gamma )} \int _{\Gamma } c(\gamma _1, \ldots , \gamma _k) \, \textrm{d}\textbf{u}(\gamma _1, \ldots , \gamma _k). \end{aligned}$$

(4)

Due to linearity of the integral wrt. $\textbf{u}$ and convexity of the relaxed search space, this is a convex problem for arbitrary cost c. Moreover, the minimizers of (4) concentrate at the optima of c and can hence be identified with solutions to (3). However, if $\Gamma $ is a continuum, this problem is infinite-dimensional and therefore challenging.

We illustrate the conceptual difference between the formulation (3) and (4) on a one-dimensional example in Fig. 1.

Discrete/traditional multilabeling

In the context of Markov random fields (Ishikawa 2003; Kappes et al. 2013) and multilabel optimization (Chambolle et al. 2012; Lellmann et al. 2009; Lellmann and Schnörr 2011; Zach et al. 2008) one typically discretizes $\Gamma $ into a finite set of points (called the labels) $\Gamma = \{ \textbf{v}_1, \ldots , \textbf{v}_{\ell } \}$. This turns (4) into a finite-dimensional linear program $\min _{\textbf{u}\in \Delta ^\ell }\,\langle c', \textbf{u} \rangle $ where $c' \in \textbf{R}^\ell _{\ge 0}$ denotes the label cost and $\Delta ^\ell \subset \textbf{R}^\ell $ is the $(\ell -1)$-dimensional unit simplex. If we evaluate the cost at the labels, this program upper bounds the continuous problem (3), since instead of all possible solutions, one considers a restricted subset determined by the labels. Since the solution will be attained at one of the labels, typically a fine meshing is needed. Similar to black-box and zero-order optimization methods, this strategy suffers from the curse of dimensionality. When each $\Gamma _i$ is discretized into $\ell $ labels, the overall number is $\ell ^k$ which quickly becomes intractable since many labels are required for a smooth solution. This limits the method to be applied on more practical problems. Additionally, for pairwise or regularizing terms, often a large number of dual constraints has to be implemented. In that context, the work from Lellmann et al. (2013) considers a constraint pruning strategy as an offline-preprocessing.

Sublabel-accurate multilabeling

The discrete-continuous MRF (Fix and Agarwal 2014; Zach 2013; Zach and Kohli 2012) and lifting methods (Laude et al. 2016; Möllenhoff et al. 2016; Möllenhoff and Cremers 2017) attempt to find a more label-efficient convex formulation. These approaches can be understood through duality (Fix and Agarwal 2014; Möllenhoff and Cremers 2017). Applied to (3), the idea is to replace the cost $c : \Gamma \rightarrow \textbf{R}$ with a dual variable $\textbf{q}: \Gamma \rightarrow \textbf{R}$:

$$\begin{aligned}&\min _{\textbf{u}\in \mathcal {P}(\Gamma )} \sup _{\textbf{q}: \Gamma \rightarrow \textbf{R}} ~ \int _{\Gamma } \textbf{q}(\gamma _1, \ldots , \gamma _k) \, \textrm{d}\textbf{u}(\gamma _1, \ldots , \gamma _k), \nonumber \\&~~\text {s.t.} ~\textbf{q}(\gamma ) \le c(\gamma ) ~\text { for all } \gamma \in \Gamma . \end{aligned}$$

(5)

The inner supremum in the formulation (5) maximizes the lower-bound $\textbf{q}$. Additionally, if the dual variable is sufficiently expressive, this problem is actually equivalent to (4).

Approximating $\textbf{q}$, for example with piecewise linear functions on $\Gamma $, one arrives at a lower-bound to the nonconvex problem (3). It has been observed in a recent series of works (Laude et al. 2016; Möllenhoff et al. 2016; Möllenhoff and Cremers 2017; Vogt et al. 2020; Zach and Kohli 2012) that piecewise linear dual variables can lead to smooth solutions even when $\textbf{q}$ (and therefore also $\textbf{u}$) is defined on a rather coarse mesh. As remarked by Fix and Agarwal (2014), Laude et al. (2016), Möllenhoff et al. (2016), for an affine dual variable this strategy corresponds to minimizing the convex envelope of the cost, $\min _{\gamma \in \Gamma } c^{**}(\gamma )$, where $c^{**}$ denotes the Fenchel biconjugate of c.

The implementation of the constraints in (5) can be challenging even in the case of piecewise-linear $\textbf{q}$. This is partly due to the fact that Problem (5) is a semi-infinite optimization problem (Blankenship and Falk 1976), i.e., an optimization problem with infinitely many constraints. The works (Möllenhoff et al. 2016; Zach and Kohli 2012) implement the constraints via projections onto the epigraph of the (restricted) conjugate function of the cost within a proximal optimization framework. Such projections are only available in closed form for some choices of c and expensive to compute if the dimension is larger than one (Laude et al. 2016). This limits the applicability in a “plug-and-play” fashion.

Product-space liftings

The product-space lifting approach (Goldluecke et al. 2013) attempts to overcome the aforementioned exponential memory requirements of labeling methods in an orthogonal way to the sublabel-based methods. The main idea is to exploit the product-space structure in (1) and optimize over k marginal distributions of the probability measure $\textbf{u}\in \mathcal {P}(\Gamma )$, which we denote by $\textbf{u}_i \in \mathcal {P}(\Gamma _i)$. Applying (Goldluecke et al. 2013) to the single data term (3) one arrives at the following relaxation:

$$\begin{aligned}&\min _{ \{ \textbf{u}_i \in \mathcal {P}(\Gamma _i) \} } \sup _{ \{ \textbf{q}_i : \Gamma _i \rightarrow \textbf{R}\}} \sum _{i=1}^k \int _{\Gamma _i} \textbf{q}_i(\gamma _i) \, \textrm{d}\textbf{u}_i(\gamma _i) \nonumber \\&~~ ~ \text { s.t. } \sum _{i=1}^k \textbf{q}_i(\gamma _i) \le c(\gamma ) ~~ \text { for all } \gamma \in \Gamma . \end{aligned}$$

(6)

Since one only has to discretize the individual $\Gamma _i$ this substantially reduces the memory requirements from $\mathcal {O}(\ell ^N)$ to $\mathcal {O}(\sum _{i=1}^k \ell ^{N_i})$. While at first glance it seems that the curse of dimensionality is lifted, the difficulty is moved to the dual, where we still have a large (or even infinite) number of constraints. A global implementation of the constraints with Lagrange multipliers as proposed in Goldluecke et al. (2013) again leads to the same exponential dependency on the dimension.

As a side note, readers familiar with optimal transport may notice that the supremum in (6) is a multi-marginal transportation problem (Carlier 2003; Villani 2008) with transportation cost c. This view is mentioned by Bach (2019) where relaxations of form (6) are analyzed under submodularity assumptions.

In summary, the sublabel-accurate lifting methods,discrete-continuous MRFs (Zach and Kohli 2012; Möllenhoff et al. 2016) and product-space liftings (Goldluecke et al. 2013) all share a common difficulty: implementation of an exponential or even infinite number of constraints on the dual variables.

Summary of our contribution

Our main contribution is a simple way to implement the dual constraints in an online fashion with a random sampling strategy which we present in Sect. 4. This allows a black-box implementation, which only requires an evaluation of the cost c and no epigraphical projection operations as in Möllenhoff et al. (2016), Zach and Kohli (2012). Moreover, the sampling approach allows us to propose and implement a sublabel-accurate variant of the product-space relaxation (Goldluecke et al. 2013) which we describe in the following section.

3 Product-Space Relaxation

Our starting point is the convex relaxation of (1) presented in Goldluecke et al. (2013), Strekalovskiy et al. (2014). In these works, $\Gamma _i \subset \textbf{R}$ is chosen to be an interval. We first denote the Lagrangian as:

$$\begin{aligned}&\mathcal {L}( \{ \textbf{u}_i \} , \{ \textbf{q}_i \}, \{ \textbf{p}_i \} ) = \sum _{i=1}^k \int _{\Omega } \int _{\Gamma _i} \textbf{q}_i(x, \gamma _i) \end{aligned}$$

(7)

$$\begin{aligned}&\qquad - {{\,\textrm{Div}\,}}_x \textbf{p}_i(x, \gamma _i) \, \, \textrm{d}\textbf{u}_{i}^x(\gamma _i) \, \textrm{d}x. \end{aligned}$$

(8)

Following Vogt et al. (2020) we consider a generalization to manifolds $\Gamma _i \subset \textbf{R}^{N_i}$ which leads us to the following relaxation:

$$\begin{aligned}&\min _{ \{ \textbf{u}_i : \Omega \rightarrow \mathcal {P}(\Gamma _i) \} } \sup _{ \begin{array}{c} \{ \textbf{q}_i : \Omega \times \Gamma _i \rightarrow \textbf{R}\} \\ \{ \textbf{p}_i : \Omega \times \Gamma _i \rightarrow \textbf{R}^2 \} \end{array} } \mathcal {L}( \{ \textbf{u}_i \} , \{ \textbf{q}_i \}, \{ \textbf{p}_i \} ), \nonumber \\&\text {s.t. }~ \Vert P_{T_{\gamma _i}}\nabla _{\gamma _i} \textbf{p}_i(x, \gamma _i) \Vert _* \le \lambda _i, ~ \forall i, x, \gamma _i, \end{aligned}$$

(9)

$$\begin{aligned}&\qquad \sum _{i=1}^k \textbf{q}_i(x, \gamma _i) \le c(x, \gamma ), ~ \forall x, \gamma \end{aligned}$$

(10)

This cost function appears similar to (6) explained in the previous section, but there are two differences. First, we now have marginal distributions $\textbf{u}_i(x)$ for every $x \in \Omega $ since we do not consider only a single data term anymore. The notation $\, \textrm{d}\textbf{u}_i^x$ in (8) denotes the integration against the probability measure $\textbf{u}_i(x) \in \mathcal {P}(\Gamma _i)$. The variables $\textbf{q}_i$ play the same role as in (6) and lower-bound the cost under constraint (10). The second difference is the introduction of additional dual variables $\textbf{p}_i$ and the term $-{{\,\textrm{Div}\,}}_x \textbf{p}_i$ in (8). Together with the constraint (9), this can be shown to implement the total variation regularization as in Lellmann et al. (2013), Vogt et al. (2020). Following Vogt et al. (2020), the derivative $\nabla _{\gamma _i} \textbf{p}_i(x, \gamma _i)$ in (9) denotes the $(N_i \times 2)$-dimensional Jacobian considered in the Euclidean sense and $P_{T_{\gamma _i}}$ the projection onto the tangent space of $\Gamma _i$ at the point $\gamma _i$.

To get an intuition on the total variation regularization, we use a concrete example to illustrate how (8) and (9) implement it. Consider the case when the labeling variable $\textbf{u}_i^x = \delta _{u(x)}$ is given as a Dirac measure at every point x. As a concrete example, we consider $k=1$ for simplicity. The term in (8) then simplifies to

$$\begin{aligned}&\int _{\Omega } -{{\,\textrm{Div}\,}}_x \textbf{p}(x, u(x)) \, \, \textrm{d}x \end{aligned}$$

(11)

$$\begin{aligned}&\quad = \int _{\Omega } \langle \nabla _{\gamma } \textbf{p}(x, u(x)), \nabla u(x) \rangle \, \, \textrm{d}x, \end{aligned}$$

(12)

which follows by applying the chain-rule and the fact that $\textbf{p}$ has compact support. Finally, taking a point-wise supremum over $\textbf{p}$ inside the integral in (12) under the dual-norm constraint (9) gives us the total variation of u: $\int _{\Omega } \Vert \nabla u(x) \Vert \, \, \textrm{d}x$, which is the same as defined in (2).

3.1 Finite-Element Discretization

We approximate the infinite-dimensional problem (8) by restricting $\textbf{u}_i$, $\textbf{p}_i$ and $\textbf{q}_i$ to be piecewise functions on a discrete meshing of $\Omega \times \Gamma _i$. The considered discretization is a standard finite-element approach and largely follows the work from Vogt et al. (2020). Unlike the forward-differences considered in Vogt et al. (2020) we use lowest-order Raviart–Thomas elements (see, e.g., Caillaud and Chambolle 2020, Section 5) in $\Omega $, which are specifically tailored towards the considered total variation regularization.

Discrete mesh

We approximate each $d_i$-dimensional manifold $\Gamma _i \subset \textbf{R}^{N_i}$ with a simplicial manifold $\Gamma _i^h$, given by the union of a collection of $d_i$-dimensional simplices $\mathcal {T}_{i}$. We denote the number of vertices (“labels”) in the triangulation of $\Gamma _i$ as $\ell _i$. The set of labels is denoted by $\mathcal {L}_i = \{ \textbf{v}_{i,1}, \ldots , \textbf{v}_{i,\ell _i} \}$. As assumed, $\Omega \subset \textbf{R}^2$ is a rectangle which we split into a set of faces $\mathcal {F}$ of edge-length $h_x$ with edge set $\mathcal {E}$. The number of faces and edges are denoted by $F = |\mathcal {F}|$, $E = |\mathcal {E}|$.

Data term and the

$\textbf{u}_i$, $\textbf{q}_i$ variables We assume the cost $c : \Omega \times \Gamma \rightarrow \textbf{R}_{\ge 0}$ is constant in $x \in \Omega $ on each face and denote its value as $c(x(f), \gamma )$ for $f \in \mathcal {F}$, where $x(f) \in \Omega $ denotes the midpoint of the face f. Similarly, we also assume the variables $\textbf{u}_i$ and $\textbf{q}_i$ to be constant in $x \in \Omega $ on each face but continuous piecewise linear functions in $\gamma _i$. They are represented by coefficient functions $\textbf{u}_i^h, \textbf{q}_i^h \in \textbf{R}^{F \cdot \ell _i}$, i.e., we specify the values on the labels and linearly interpolate inbetween. This is done by the interpolation operator $\textbf{W}_{i, f, \gamma _i} : \textbf{R}^{F \cdot \ell _i} \rightarrow \textbf{R}$ which given an index $1 \le i \le k$, face f, and (continuous) label position $\gamma _i \in \Gamma _i$ computes the function value based on barycentric coordinates: $\textbf{W}_{i, f, \gamma _i} \textbf{u}_i^h = \textbf{u}_i(x(f), \gamma _i)$. Note that after discretization, $\textbf{u}_i$ is only defined on $\Gamma _i^h$ but we can uniquely associate to each $\gamma _i \in \Gamma _i^h$ a point on $\Gamma _i$.

Divergence and

$\textbf{p}_i$ variables Our variable $\textbf{p}_i$ is represented by coefficients $\textbf{p}_i^h \in \textbf{R}^{E \cdot \ell _i}$ which live on the edges in $\Omega $ and the labels in $\Gamma _i$. The vector $\textbf{p}_i(x, \gamma _i) \in \textbf{R}^2$ is obtained by linearly interpolating the coefficients on the vertical and horizontal edges of the face and using the interpolated coefficients to evaluate the piecewise-linear function on $\Gamma _i^h$. Under this approximation, the discrete divergence ${{\,\textrm{Div}\,}}_x^h : \textbf{R}^{E \cdot \ell _i} \rightarrow \textbf{R}^{F \cdot \ell _i}$ is given by $({{\,\textrm{Div}\,}}_x^h \textbf{p}_i^h)(f) = \left( \textbf{p}_i^h(e_r) + \textbf{p}_i^h(e_t) - \textbf{p}_i^h(e_l) - \textbf{p}_i^h(e_b) \right) / h_x$ where $e_r, e_t, e_l, e_b$ are the right, top, left and bottom edges of f, respectively.

Total variation constraint

Computing the operator $P_{T_{\gamma _i}} \nabla _{\gamma _i}$ is largely inspired by Vogt et al. (2020), Section 2.2. It is implemented by a linear map $\textbf{D}_{i, f, \alpha , t} : \textbf{R}^{E \cdot \ell _i} \rightarrow \textbf{R}^{d_i \times 2}$. Here, $f \in \mathcal {F}$ and $\alpha \in [0,1]^2$ correspond to a point $x \in \Omega $ while $t \in \mathcal {T}_i$ is the simplex containing the point corresponding to $\gamma _i \in \Gamma _i$. First, the operator computes coefficients in $\textbf{R}^{\ell _i}$ of two piecewise-linear functions on the manifold by linearly interpolating the values on the edges based on the face index $f \in \mathcal {F}$ and $\alpha \in [0,1]^2$. For each function, the derivative in simplex $t \in \mathcal {T}_i$ on the triangulated manifold is given by the gradient of an affine extension. Projecting the resulting vector onto the $d_i$-dimensional tangent space for both functions leads to a $d_i \times 2$-matrix which approximates $P_{T_{\gamma _i}} \nabla _{\gamma _i} \textbf{p}_i(x, \gamma _i)$.

Final discretized problem

Plugging our discretized $\textbf{u}_i$, $\textbf{q}_i$, $\textbf{p}_i$ into (8), we arrive at the following finite-dimensional optimization problem:

$$\begin{aligned}&\min _{\{\textbf{u}_i^h \in \textbf{R}^{F \cdot \ell _i} \}} \max _{\begin{array}{c} \{\textbf{p}_i^h \in \textbf{R}^{E \cdot \ell _i} \},\\ \{\textbf{q}_i^h \in \textbf{R}^{F \cdot \ell _i} \} \end{array}} h_x^2 \cdot \sum _{i=1}^k \langle \textbf{u}_i^h, \textbf{q}_i^h - {{\,\textrm{Div}\,}}_x^h \textbf{p}_i^h \rangle \nonumber \\&\qquad + \sum _{f \in \mathcal {F}} \textbf{i}\{\textbf{u}_i^h(f) \in \Delta ^{\ell _i}\}, \end{aligned}$$

(13)

$$\begin{aligned}&\text {s.t. } ~~ \Vert \textbf{D}_{i, f, \alpha , t} \textbf{p}_i^h \Vert _* \le \lambda _i, \nonumber \\&\qquad \qquad \forall i \in [k], f \in \mathcal {F}, \alpha \in \{ 0, 1 \}^2, t \in \mathcal {T}_i, \end{aligned}$$

(14)

$$\begin{aligned}&\sum _{i=1}^k \textbf{W}_{i, f, \gamma _i}\textbf{q}_i^h \le c \left( x(f), \gamma \right) , \forall f \in \mathcal {F}, \gamma \in \Gamma , \end{aligned}$$

(15)

where $\textbf{i}\{\cdot \}$ is the indicator function. In our applications, we found that it is sufficient to enforce the constraint (14) at the corners of each face which corresponds to choosing $\alpha \in \{ 0, 1 \}^2$. Apart from the infinitely many constraints in (15), this is a finite-dimensional convex-concave saddle-point problem and can be tackled by many numerical optimization algorithms.

3.2 Solution Recovery

Before presenting in the next section how we propose to implement the constraints (15), we briefly discuss how a primal solution $\{ \textbf{u}_i^h \}$ of the above problem is turned into an approximate solution to (1). To that end, we follow Lellmann et al. (2013), Vogt et al. (2020) and compute the Riemannian center of mass via an iteration $\tau =1,\dots ,T$:

$$\begin{aligned} V^\tau _{j}&= \log _{u_i^\tau }(\textbf{v}_{i,j}), \end{aligned}$$

(16)

$$\begin{aligned} v^\tau&= \sum _{j=1}^{\ell _i} ~ \textbf{u}_i^h(f, j) V^\tau _j, \end{aligned}$$

(17)

$$\begin{aligned} \textbf{u}_i^{\tau +1}&= \exp _{u_i^\tau }(v^\tau ). \end{aligned}$$

(18)

Here, $u_i^0 \in \Gamma _i$ is initialized by the label with the highest probability according to $\textbf{u}_i^h(f, \cdot )$. $\log _{u_i^\tau }$ and $\exp _{u_i^\tau }$ denote the logarithmic and exponential mapping between $\Gamma _i^h$ and its tangent space at $u_i^\tau \in \Gamma _i$, which are both available in closed-form for the manifolds we consider here. In our case $T=20$ was enough to reach convergence. For flat manifolds, $T=1$ is enough, as both mappings boil down to the identity and (18) computes a weighted Euclidean mean.

In general, there is no theory which shows that $u^T(x)=(u_1^T(x), \ldots , u_k^T(x))$ from (18) is a global minimizer of (1). Tightness of the relaxation in the special case $k = 1$ and $\Gamma \subset \textbf{R}$ is shown in Pock et al. (2010). For higher dimensional $\Gamma $, the tightness of related relaxations is ongoing research; see Ghoussoub and Kim, Y- H., Lavenant, H. Palmer, A.Z. (2021) for results on the Dirichlet energy. By computing a-posteriori optimality gaps, solutions of (8) were shown to be typically near the global optimum of Problem (1); see, e.g., Goldluecke et al. (2013).

4 Implementation of the Constraints

Though the optimization variables in (13) are finite-dimensional, the energy is still difficult to optimize because of the infinitely many constraints in (15).

Before we present our approach, let us first describe what we refer to as the baseline method for the remainder of this paper. As the baseline approach, we consider the direct solution of (13) where we implemented the constraints only at the label/discretization points $\mathcal {L}_1 \times \cdots \times \mathcal {L}_k$ via Lagrange multipliers [this strategy is also employed by the global variant of the product-space approach (Goldluecke et al. 2013)]. This baseline actually corresponds to a single outer iteration $N_{it}$ of the proposed Algorithm 1, with a large number $M_{it}$ of inner iterations.

We aim for a framework that allows for solving a better approximation of (15) than the baseline while being of similar memory complexity. To this end, Algorithm 1 alternates the following two steps:

(1) Sampling Based on the current solution we prune previously considered but feasible constraints and sample a new subset of the infinitely many constraints in (15). From all the current sampled constraints, we consider the most violated constraints for each face, add one sample at the current solution and discard the rest.

(2) Solving the subsampled problem Considering the current finite subset of constraints, we solve Problem (13) using a primal-dual method (Chambolle and Pock 2011) with diagonal preconditioning (Pock and Chambolle 2011). Both constraints (14) and (15) are implemented using Lagrange multipliers.

These two phases are performed alternatingly, with the aim to eventually approach the solution of the continuous problem (13). In practice, a fixed number of outer iterations $N_{it}$ is set. While we do not prove convergence of the overall algorithm, convergence results for related procedures exist; see, e.g., Blankenship and Falk (1976), Theorem 2.4.

The detailed algorithm is explained in Algorithm 1. The cost matrix $\textbf{C}$ is constructed by evaluating $c(x(f), \gamma )$ at proposed samples $\mathcal {S}_f$. We denote $\xi $ and $\nu $ as the Lagrange multipliers. The Lagrange multiplier $\xi $ is initialized by a warm-start strategy, i.e. $\xi ^{it}$ keeps same if we have the same proposed sample from previous outer iteraion. The $\text {prox}$ of a function g with step size $\tau $ is defined as:

$$\begin{aligned} \text {prox}_{\tau g}(x) = {{\,\mathrm{arg\,min}\,}}_y \frac{1}{2\tau } \Vert x-y \Vert + g(y) \end{aligned}$$

(19)

Our constraint sampling strategy is detailed in Algorithm 2. For each face in $\mathcal {F}$, it generates a finite set of “sublabels” $\mathcal {S}_f \subset \Gamma $ at which we implement the constraints (15). Next, we provide the motivation behind each line in the algorithm.

Random uniform sampling (Line 1) To have a global view of the cost function, we consider a uniform sampling on the label space $\Gamma $. The parameter $n > 0$ determines the number of the samples for each face.

Local perturbation around the mean (Line 2) Besides the global information, we apply local perturbation around the current solution $\textbf{u}$. In case the current solution is close to the optimal one, this strategy allows us to refine it with these samples. The parameter $\delta > 0$ determines the size of the local neighbourhood. In our experiments, we always used a Gaussian perturbation with $\delta =0.1$.

Pruning strategy (Lines 3–4) Most samples from previous iterations are discarded because the corresponding constraints are already satisfied. We prune all current feasible constraints as in Blankenship and Falk (1976). Similarly, the two random sampling strategies (Lines 1 and 2) might return some samples for which the constraints are already fulfilled. Therefore, we only consider the samples with violated constraints and pick the r most violated from them. This pruning strategy is essential for a memory efficient implementation as shown later.

Sampling at u (Line 5) Finally, we add one sample which is exactly at the current solution $u \in \Gamma $ to have at least one guaranteed sample per face. In the next section, we illustrate the behavior of Algorithm 1 on a toy problem, and evaluate its performance on real-world imaging problems.

5 Numerical Validation

Our approach and the baseline are implemented in PyTorch. Code for reproducing the following experiments can be found here: https://github.com/zhenzhangye/sublabel_meets_product_space. Note that that one of the runtime bottlenecks of our sampling strategy is creating the samples and picking the most violated r as shown in Algorithm 2. Additionally, the sparse matrix operations and the PDHG updates can be more efficiently implemented in CUDA, as all PDHG updates can be executed in a single CUDA kernel, compared to PyTorch’s multiple kernel calls. Therefore, a specialized implementation as in Goldluecke et al. (2013) will allow the method to scale by factor $10-100\times $ in favor of runtime.

5.1 Illustration of the Algorithm

First of all, we consider a simplistic minimization problem on a single nonconvex data term:

$$\begin{aligned} c(u) = \min {\left\{ \begin{array}{ll} -4 u + 2.4, &{} u \in [0.1, 0.35),\\ 4 u - 0.4, &{} u \in [0.35, 0.6] ,\\ 2, &{}\text {otherwise}, \end{array}\right. } \end{aligned}$$

(20)

with 5 labels to illustrate the behavior of both, the baseline algorithm and our sampling strategies.

Figure 2 depicts the baseline’s behavior. While it only evaluates the energy on the labels, five samples are considered as illustrated by the red dots in Fig. 2a. Figure 2b shows the dual variable $\textbf{q}_i^h$ after it iterations. Since the algorithm is maximizing $\textbf{q}_i^h$, the green and red dots should overlay (e.g. the second label) when it converges. Despite the convergence, the resulting $\textbf{q}^h$ violates the constraints significantly close to the optimal solution (“green > blue”). Therefore, to attain the global optimal solution, the baseline approach needs more labels which requires more memory.

The motivation of random uniform sampling (Line 1, Algorithm 2) and local perturbation around the mean (Line 2, Algorithm 2) in our sampling strategy is intuitively clear. However, as demonstrated later in the experiment of a truncated quadratic energy, sampling at u (Line 5, Algorithm 2) is critical for the stability of our method. A comparison in Fig. 3 helps to show the necessity of this strategy. We ran two experiments with the identical settings except for the sampling at u. After a given number of iterations, the dual variable $\textbf{q}^h$ is approximately optimal, as indicated in Fig. 3a. Our pruning strategy (Line 3, Algorithm 2) removes all of the proposed samples since all of them satisfy the constraints. As a result, the subproblem becomes unconstrainted on that dual variable $\textbf{q}^h$ and its update has no significance, Fig. 3b. To solve this problem, we propose to always at least have one sample at u even when $\textbf{q}^h$ is nearly optimal, cf. Fig. 3c. As illustrated in Fig. 3d, this can avoid the degeneration of $\textbf{q}^h$ as it is still constrained.

Finally, the complete sampling strategy is illustrated in Fig. 4. As shown in Fig. 4a, the primal-dual method can obtain the optimal $\textbf{q}^h$ for the sampled subproblem. Our sampling strategy can provide necessary samples and prune the feasible ones, cf. Fig. 4 (b). These few but meaningful samples lead the $\textbf{q}^h$ to achieve global optimality, cf. Fig. 4c.

5.2 Truncated Quadratic Energy

In this section, we study the numerical effect of each line in Algorithm 2. We evaluate our method on the truncated quadratic energy $c(x, u(x)) = \min \{(u(x) - f(x))^2, \nu \}$. where $f : \Omega \rightarrow \textbf{R}$ is the input data as show in Fig. 5. For this specific experiment, we generate a $64 \times 64$ gray image degraded with Gaussian noise of standard deviation $\sigma =0.05$ and 5% salt-and-pepper noise. The parameters are chosen as $\nu =0.025$, $\lambda =0.25$, $N_{it}=10$, $M_{it}=200$, $n=10$ and $r=1$. To reduce the effect of randomness, we run each algorithm 20 times and report mean and standard deviation of the final energy for different number of labels in Table 1. We want to emphasize that more labels have benefits for both baseline and our algorithm. Nevertheless, the proposed approach can reach lower energies with the same number of labels and similar memory requirements.

Table 1 Ablation study indicating the effect of individual lines in Algorithm 2

Full size table

As can be seen in this table, adding uniform sampling and picking the most violated constraint per face (Lines 1 and 4 of Algorithm 2) already decreases the final energy significantly. We also consider local exploration around the current solution (Line 2), which helps to find better energies at the expense of higher memory requirements. The pruning strategy (Line 3) circumvents this memory issue, however the energy deteriorates dramatically because some faces could end up having no samples after pruning. Therefore, keeping the current solution as a sample (Line 5) per face prevents the energy from degrading. Including all these sampling strategies, the proposed method can achieve the best energy and runtime, at comparable memory usage to the baseline method.

We further illustrate the comparison on the number of iterations and time between the baseline and our proposed method in Fig. 6. Due to the replacement on the samples, we have a peak right after each sampling phase. The energy however converges immediately, leading to an overall decreasing trend.

Additionally, we compare our method to the baseline on a more practical dataset CBSD from Martin et al. (2001). This dataset contains 68 images and noisy ones with additive white Gaussian noise (5% in this experiment). The number of labels is 7 for both methods. 5K iterations are performed on the baseline method, while we set $N_{it}=10$ and $M_{it}=500$ to get a fair comparison. The other parameters are chosen as $\lambda =0.25$, $n=50$ and $r=1$. The results are shown in Fig. 7. Our method outpeforms the baseline among all the images regarding both energy and peak signal-to-noise ratio (PSNR).

5.3 Manifold-Value Denoising

To show the flexibility of our algorithm, we next evaluate it on a manifold-valued denoising problem in HSV color space. The hue component of this space is a circle, i.e., $\Gamma _1 =\mathbb {S}^1$, $\Gamma _2, \Gamma _3 = [0,1]$.

The data term of this experiment is still a truncated quadratic energy, where for the hue component the distance is taken on the circle $\mathbb {S}^1$. The input images (Baker et al. 2011; Martin et al. 2001) are degraded with the same setting as above.

Both the baseline and our method are implemented with 7 labels. First of all, we evaluate the impact of the most violated r samples. As shown in Fig. 8, The maximum difference of energy and memory is only 0.8% and 0.06%, respectively, which can be considered almost constant wrt. r. Therefore, we pick $r=5$. To get an equal number of total iterations, 30K iterations are performed on the baseline, while we set $N_{it}=100$ outer iterations with $M_{it}=300$ inner primal-dual steps for our method. Other parameters are chosen as $\lambda =0.015$ and $n=30$. As shown in Fig. 9, our method can achieve a lower energy than the baseline. Qualitatively, since our method implements the constraints not only at the labels but also inbetween, there is less bias.

Table 2 We compute the optical flow on the Middlebury dataset (Baker et al. 2011) using our method and the baseline for a varying amount of labels

Full size table

5.4 Optical Flow

Given two input images $I_1$, $I_2$, we compute the optical flow $u : \Omega \rightarrow \textbf{R}^2$ for the label space $\Gamma = [a,b]^2$. We use a simple $\ell _1$-norm for the data term, i.e. $c(x, u(x)) = \Vert I_2(x) - I_1(x+u(x))\Vert _1$ and set the regularization weight as $\lambda =0.04$. The baseline approach runs for 50K iterations, while we set $N_{it}=50$ and $M_{it}=1000$ for a fair comparison. Additionally, the parameters are chosen as $n=20$ and $r=1$ in Algorithm 2.

We consider three methods for this experiment: the baseline, the method from Laude et al. (2016) and ours. The method from Laude et al. (2016) approximates the dataterm in a piecewise convex manner and requires a specific epigraph projection for the dataterm. Though it can attain lower energy, our approach requires less memory and tackle any cost function in a simple black-box fashion. Table 2 summarizes the quantitative results obtained on the Middleburry dataset (Baker et al. 2011), while the detailed absolute numbers can be found in the appendix. This table shows how our approach performs relatively to the baseline, i.e. “”, e.g. for three labels our energy is 50.91% of the baseline energy for all 8 datasets, while using 99.84% of the baseline’s memory and having an average end point error (aep) and average angular error (aae) of 91.94% and 82.34% of the baseline error metrics, respectively. To enable qualitative comparison, we visualize in Fig. 10 the results on two of the datasets. The remaining qualitative results on the Middlebury data set (Baker et al. 2011) are shown in the appendix. Our method outperforms the baseline approach regarding energy under the same number of labels and requires the same amount of memory. Because Laude et al. (2016) uses a tighter relxation on the label space, they can achieve lower energy with a smaller number of labels. However, it runs out of memory easily while the proposed method scales better wrt. memory consumption.

6 Conclusion and Limitations

In this paper we made functional lifting methods more scalable by combining two advances, namely product-space relaxations (Goldluecke et al. 2013) and sublabel-accurate discretizations (Möllenhoff and Cremers 2017; Vogt et al. 2020). This combination is enabled by adapting a cutting-plane method from semi-infinite programming (Blankenship and Falk 1976). This allows an implementation of sublabel-accurate methods without difficult epigraphical projections.

Moreover, our approach makes sublabel-accurate functional-lifting methods applicable to any cost function in a simple black-box fashion. In experiments, we demonstrate the effectiveness of the approach over a baseline based on the product-space relaxation (Goldluecke et al. 2013) and provided a proof-of-concept experiment showcasing the method in the manifold-valued setting.

Future work will concentrate on applying and adapting the presented framework to solve large inverse problems in computer vision with multiple data terms, different regularizers and several manifold-valued optimization variables in a joint fashion. However, it is not obvious if our presented cutting plane approach is easily applicable for such large problems or if novel ideas have to be pursued.

Notes

$\mathcal {P}(\Gamma )$ is the set of nonnegative Radon measures on $\Gamma $ with total mass $\textbf{u}(\Gamma ) = 1$.

References

Bach, F. (2019). Submodular functions: From discrete to continuous domains. Mathematical Programming, 175(1), 419–459.
Article MathSciNet MATH Google Scholar
Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., & Szeliski, R. (2011). A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92(1), 1–31.
Article Google Scholar
Bauermeister, H., Laude, E., Möllenhoff, T., Moeller, M., & Cremers, D. (2021). Lifting the convex conjugate in Lagrangian relaxations: A tractable approach for continuous Markov random fields. arXiv:2107.06028
Blankenship, J. W., & Falk, J. E. (1976). Infinitely constrained optimization problems. Journal of Optimization Theory and Applications, 19(2), 261–281.
Article MathSciNet MATH Google Scholar
Boyd, S. P., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.
Article MATH Google Scholar
Caillaud, C., & Chambolle, A. (2020). Error estimates for finite differences approximations of the total variation. preprint hal-02539136.
Carlier, G. (2003). On a class of multidimensional optimal transportation problems. Journal of Convex Analysis, 10(2), 517–530.
MathSciNet MATH Google Scholar
Chambolle, A., Caselles, V., Cremers, D., Novaga, M., & Pock, T. (2010). An introduction to total variation for image analysis. Theoretical Foundations and Numerical Methods for Sparse Recovery, 9(263–340), 227.
MathSciNet MATH Google Scholar
Chambolle, A., Cremers, D., & Pock, T. (2012). A convex approach to minimal partitions. SIAM Journal on Imaging Sciences, 5(4), 1113–1158.
Article MathSciNet MATH Google Scholar
Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40, 120–145.
Article MathSciNet MATH Google Scholar
Cremers, D., & Strekalovskiy, E. (2013). Total cyclic variation and generalizations. Journal of Mathematical Imaging and Vision, 47(3), 258–277.
Article MathSciNet MATH Google Scholar
de Boer, P., Kroese, D. P., Mannor, S., & Rubinstein, R. Y. (2005). A tutorial on the cross-entropy method. Annals of Operations Research, 134(1), 19–67.
Article MathSciNet MATH Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.
MathSciNet MATH Google Scholar
Fix, A., & Agarwal, S. (2014). Duality and the continuous graphical model. In European conference on computer vision (ECCV).
Ghoussoub, N., Kim, Y.-H., Lavenant, H., & Palmer, A. Z. (2021). Hidden convexity in a problem of nonlinear elasticity. SIAM Journal on Mathematical Analysis, 53(1), 1070–1087.
Article MathSciNet MATH Google Scholar
Goldluecke, B., Strekalovskiy, E., & Cremers, D. (2013). Tight convex relaxations for vector-valued labeling. SIAM Journal on Imaging Sciences, 6(3), 1626–1664.
Article MathSciNet MATH Google Scholar
Görlitz, A., Geiping, J., & Kolb, A. (2019). Piecewise rigid scene flow with implicit motion segmentation. In International conference on intelligent robots and systems (IROS).
Horn, B. K., & Schunck, B. G. (1981). Determining optical flow. Artificial Intelligence, 17(1–3), 185–203.
Article MATH Google Scholar
Ishikawa, H. (2003). Exact optimization for Markov random fields with convex priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(10), 1333–1336.
Article Google Scholar
Kappes, J., Andres, B., Hamprecht, F., Schnorr, C., Nowozin, S., Batra, D., Kim, S., Kausler, B., Lellmann, J., Komodakis, N., & Rother, C. (2013). A comparative study of modern inference techniques for discrete energy minimization problems. In IEEE conference on computer vision and pattern recognition (CVPR).
Lasserre, J.-B. (2000). Global optimization with polynomials and the problem of moments. SIAM Journal on Optimization, 11(3), 796–817.
Article MathSciNet MATH Google Scholar
Laude, E., Möllenhoff, T., Moeller, M., Lellmann, J., & Cremers, D. (2016). Sublabel-accurate convex relaxation of vectorial multilabel energies. In European conference on computer vision (ECCV).
Lellmann, J., Kappes, J., Yuan, J., Becker, F., & Schnörr, C. (2009). Convex multi-class image labeling by simplex-constrained total variation. In International conference on scale space and variational methods in computer vision (SSVM) (pp. 150–162).
Lellmann, J., Lellmann, B., Widmann, F., & Schnörr, C. (2013). Discrete and continuous models for partitioning problems. International Journal of Computer Vision, 104(3), 241–269.
Article MathSciNet MATH Google Scholar
Lellmann, J., & Schnörr, C. (2011). Continuous multiclass labeling approaches and algorithms. SIAM Journal on Imaging Sciences, 4(4), 1049–1096.
Article MathSciNet MATH Google Scholar
Lellmann, J., Strekalovskiy, E., Koetter, S., & Cremers, D. (2013). Total variation regularization for functions with values in a manifold. In International conference on computer vision (ICCV).
Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International conference on computer vision, 2, 416–423.
Google Scholar
Möllenhoff, T., & Cremers, D. (2017). Sublabel-accurate discretization of nonconvex free-discontinuity problems. In International conference on computer vision (ICCV).
Möllenhoff, T., & Cremers, D. (2019). Lifting vectorial variational problems: A natural formulation based on geometric measure theory and discrete exterior calculus. In IEEE conference on computer vision and pattern recognition (CVPR).
Möllenhoff, T., Laude, E., Moeller, M., Lellmann, J., & Cremers, D. (2016). Sublabel-accurate relaxation of nonconvex energies. IEEE conference on computer vision and pattern recognition (CVPR).
Ollivier, Y., Arnold, L., Auger, A., & Hansen, N. (2017). Information-geometric optimization algorithms: A unifying picture via invariance principles. Journal of Machine Learning Research, 1818(1–18), 65.
MATH Google Scholar
Peng, J., Hazan, T., McAllester, D., & Urtasun, R. (2011). Convex max-product algorithms for continuous MRFs with applications to protein folding. In International conference on machine learning (ICML).
Pock, T., & Chambolle, A. (2011). Diagonal preconditioning for first order primal–dual algorithms in convex optimization. In International conference on computer vision (ICCV).
Pock, T., Cremers, D., Bischof, H., & Chambolle, A. (2009). An algorithm for minimizing the piecewise smooth Mumford–Shah functional. In International conference on computer vision (ICCV).
Pock, T., Cremers, D., Bischof, H., & Chambolle, A. (2010). Global solutions of variational models with convex regularization. SIAM Journal on Imaging Sciences, 3(4), 1122–1145.
Article MathSciNet MATH Google Scholar
Pock, T., Schoenemann, T., Graber, G., Bischof, H., & Cremers, D. (2008). A convex formulation of continuous multi-label problems. In European conference on computer vision (ECCV).
Rudin, L. I., Osher, S., & Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear phenomena, 60(1–4), 259–268.
Article MathSciNet MATH Google Scholar
Schaul, T. (2011). Studies in continuous black-box optimization (Unpublished doctoral dissertation) Technische Universität München.
Steinke, F., Hein, M., & Schölkopf, B. (2010). Nonparametric regression between general Riemannian manifolds. SIAM Journal on Imaging Sciences, 3(3), 527–563.
Article MathSciNet MATH Google Scholar
Strekalovskiy, E., Chambolle, A., & Cremers, D. (2014). Convex relaxation of vectorial problems with coupled regularization. SIAM Journal on Imaging Sciences, 7(1), 294–336.
Article MathSciNet MATH Google Scholar
Villani, C. (2008). Optimal transport: Old and new. Berlin: Springer.
MATH Google Scholar
Vogt, T., Strekalovskiy, E., Cremers, D., & Lellmann, J. (2020). Lifting methods for manifold-valued variational problems. Handbook of variational methods for nonlinear geometric data: Springer.
Book MATH Google Scholar
Weinmann, A., Demaret, L., & Storath, M. (2014). Total variation regularization for manifold-valued data. SIAM Journal on Imaging Sciences, 7(4), 2226–2257.
Article MathSciNet MATH Google Scholar
Ye, Z., Haefner, B., Quéau, Y., Möllenhoff, T., & Cremers, D. (2021). Sublabel-accurate multilabeling meets product label spaces. In DAGM German conference on pattern recognition (GCPR).
Zach, C. (2013). Dual decomposition for joint discrete-continuous optimization. In International conference on artificial intelligence and statistics (AISTATS).
Zach, C., Gallup, D., Frahm, J.-M., & Niethammer, M. (2008). Fast global labeling for real-time stereo using multiple plane sweeps. In Proceedings of the vision, modeling and visualization workshop (VMV).
Zach, C. & Kohli, P. (2012). A convex discrete-continuous approach for Markov random fields. In European conference on computer vision (ECCV).

Download references

Acknowledgements

This work was supported by the ERC Advanced Grant SIMULACRON.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Department of Informatics, Technical University of Munich, Garching, Germany
Zhenzhang Ye, Bjoern Haefner & Daniel Cremers
Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, Caen, France
Yvain Quéau
Center for AI Project, RIKEN, Tokyo, Japan
Thomas Möllenhoff

Authors

Zhenzhang Ye
View author publications
You can also search for this author in PubMed Google Scholar
Bjoern Haefner
View author publications
You can also search for this author in PubMed Google Scholar
Yvain Quéau
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Möllenhoff
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Cremers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenzhang Ye.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by Michael Möller.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Additional Optical Flow Results

Table 3 shows the energy and memory requirement as well as the average endpoint error (aep) and average angular error (aae) for the baseline, the method from Laude et al. (2016) and our approach across different number of labels for all 8 used Middlebury datasets (Baker et al. 2011). Figures 11 and 12 visualize the remaining optical flow results on the Middlebury dataset (Baker et al. 2011) using the baseline, the method from Laude et al. (2016) and ours for a varying amount of labels. Note that the approach from Laude et al. (2016) leverages a tighter discretization on the label space and is implemented on CUDA. Though their approach achieves better energy under fewer labels, ours has a better scability.

Table 3 We compute the optical flow on the Middlebury dataset (Baker et al. 2011) using the baseline, the method from Laude et al. (2016) and ours for a varying amount of labels

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ye, Z., Haefner, B., Quéau, Y. et al. A Cutting-Plane Method for Sublabel-Accurate Relaxation of Problems with Product Label Spaces. Int J Comput Vis 131, 346–362 (2023). https://doi.org/10.1007/s11263-022-01704-7

Download citation

Received: 14 February 2022
Accepted: 13 October 2022
Published: 29 October 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11263-022-01704-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Cutting-Plane Method for Sublabel-Accurate Relaxation of Problems with Product Label Spaces

Abstract

Similar content being viewed by others

Sublabel-Accurate Multilabeling Meets Product Label Spaces

A Fast Continuous Max-Flow Approach to Non-convex Multi-labeling Problems

Functional Liftings of Vectorial Variational Problems with Laplacian Regularization

1 Introduction

2 An Introduction to Functional Lifting