Accelerated Alternating Descent Methods for Dykstra-Like Problems

Chambolle, Antonin; Tan, Pauline; Vaiter, Samuel

doi:10.1007/s10851-017-0724-6

Accelerated Alternating Descent Methods for Dykstra-Like Problems

Published: 17 March 2017

Volume 59, pages 481–497, (2017)
Cite this article

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

640 Accesses
10 Citations
Explore all metrics

Abstract

This paper extends recent results by the first author and T. Pock (ICG, TU Graz, Austria) on the acceleration of alternating minimization techniques for quadratic plus nonsmooth objectives depending on two variables. We discuss here the strongly convex situation, and how ‘fast’ methods can be derived by adapting the overrelaxation strategy of Nesterov for projected gradient descent. We also investigate slightly more general alternating descent methods, where several descent steps in each variable are alternatively performed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated Primal-Dual Gradient Descent with Linesearch for Convex, Nonconvex, and Nonsmooth Optimization Problems

Article 01 March 2019

Accelerated gradient methods for nonconvex nonlinear and stochastic programming

Article 21 February 2015

Accelerated methods for weakly-quasi-convex optimization problems

Article 27 July 2023

Notes

Consider however that as our implementations are here the same c program where depending on our choice either the descent step or the exact minimization is called, this should not be a bug. This is confirmed both by the fact that for $\varepsilon >0$, when the subproblems are easier and hence it is even more likely that the descent steps will converge to the exact solution in few iterations, the exact and inexact method need nearly the same number of iterations, and the fact that increasing the number of descent steps yield eventually a number of outer iterations equal to the (AAMM) algorithm.
Image belongs to the authors.

References

Aujol, J.-F., Dossal, C.: Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25(4), 2408–2433 (2015)
Article MathSciNet MATH Google Scholar
Beck, A.: On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM J. Optim. 25(1), 185–209 (2015)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Article MathSciNet MATH Google Scholar
Boyle, J.P., Dykstra, R.L.: A method for finding projections onto the intersection of convex sets in Hilbert spaces. In: Dykstra, R., Robertson, T., Wright, F.T. (eds) Advances in Order Restricted Statistical Inference (Iowa City, Iowa, 1985), vol. 37 of Lecture Notes in Statistics, pp. 28–47. Springer, Berlin (1986)
Braides, A.: $Gamma$-Convergence for Beginners. Number 22 in Oxford Lecture Series in Mathematics and Its Applications. Oxford University Press, Oxford (2002)
Google Scholar
Cai, J.-F., Dong, B., Osher, S., Shen, Z.: Image restoration: total variation, wavelet frames, and beyond. J. Am. Math. Soc. 25(4), 1033–1089 (2012)
Article MathSciNet MATH Google Scholar
Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166(3), 968–982 (2015)
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159, 253–287 (2016)
Chambolle, A., Pock, T.: A remark on accelerated block coordinate descent for computing the proximity operators of a sum of convex functions. SMAI J. Comput. Math. 1, 29–54 (2015)
Article MathSciNet Google Scholar
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319, 5 (2016)
Article MathSciNet MATH Google Scholar
Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., Burachik, R., Combettes, P.L., Elser, V., Luke, D.R., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, vol. 49 of Springer Optimization and Applications, pp. 185–212. Springer, New York (2011)
Dal Maso, G.: An Introduction to $\Gamma $-Convergence. Birkhäuser, Boston (1993)
Book MATH Google Scholar
Demengel, F., Temam, R.: Convex functions of a measure and applications. Indiana Univ. Math. J. 33(5), 673–709 (1984)
Article MathSciNet MATH Google Scholar
Deutsch, F., Hundal, H.: The rate of convergence of Dykstra’s cyclic projections algorithm: the polyhedral case. Numer. Funct. Anal. Optim. 15(5–6), 537–565 (1994)
Article MathSciNet MATH Google Scholar
Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions. CRC Press, Boca Raton (1992)
MATH Google Scholar
Nemirovski, A.S., Yudin, D.: Informational complexity of mathematical programming. Izv. Akad. Nauk SSSR Tekhn. Kibernet. 1, 88–117 (1983)
MathSciNet Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 of Applied Optimization. Kluwer, Boston (2004)
Book MATH Google Scholar
Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)
Article MathSciNet MATH Google Scholar
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
Article MathSciNet MATH Google Scholar
Ziemer, W.P.: Weakly Differentiable Functions. Sobolev Spaces and Functions of Bounded Variation. Springer, New York (1989)
Book MATH Google Scholar

Download references

Acknowledgements

This work is supported by the ANR via the international project ‘EANOI’ (Efficient Algorithms for Nonsmooth Optimization in Imaging), FWF No. I1148 / ANR-12-IS01-0003. A. Chambolle also benefits from support of the ‘Programme Gaspard Monge pour l’Optimisation et la Recherche Opérationnelle’ (PGMO), through the ‘MAORI’ group, as well as the ‘GdR MIA’ of the CNRS. He also warmly thanks Churchill College and DAMTP, Centre for Mathematical Sciences, University of Cambridge, for their kind hospitality during the completion of this work, thanks to a support of the French Embassy in the UK and the Cantab Capital Institute for Mathematics of Information.

Author information

Authors and Affiliations

CMAP, CNRS, Ecole Polytechnique, 91128, Palaiseau, France
Antonin Chambolle & Pauline Tan
IMB, CNRS, Université de Bourgogne, 9 Ave Alain Savary, 21000, Dijon, France
Samuel Vaiter

Authors

Antonin Chambolle
View author publications
You can also search for this author in PubMed Google Scholar
Pauline Tan
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Vaiter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonin Chambolle.

Appendix: An Approximation Result

In this appendix, we show that although this is not totally obvious at first glance, the discrete energy $J_\varepsilon (\mathbf {u})$ is an approximation of the isotropic total variation. The result is more precisely as follows. To simplify we work in the domain $\Omega =(0,1)^2$ (extension to more general regular domains is not difficult) and we define, for $N\ge 1$ an integer, the functional, defined for $u\in L^1(\Omega )$,

$$\begin{aligned} \mathcal {F}_{\varepsilon ,N}(u) = {\left\{ \begin{array}{ll} \frac{1}{N}J_{\varepsilon /N}^{N,N}(\mathbf {u}) &{}\quad \text { if } \mathbf {u}=(u_{i,j})_{1\le i,j\le N}, u(x)= \sum \nolimits _{i=1}^N\sum \nolimits _{j=1}^N u_{i,j} \chi _{(\frac{i-1}{N},\frac{j}{N})\times (\frac{j-1}{N},\frac{j}{N})}(x) \text { a.e.,} \\ +\infty &{}\quad \text { else.} \end{array}\right. } \end{aligned}$$

here, $J_{\varepsilon /N}^{N,N}$ is a notation for the energy (44) in case $m=n=N$ (and with the smoothing parameter $\varepsilon /N$). We also denote $\Phi _\varepsilon (p):= |p|^2/(2\varepsilon )$ if $|p|\le \varepsilon $, $|p|-\varepsilon /2$ else and recall that for $u\in BV(\Omega )$ a function with bounded variation $|Du|(\Omega )<+\infty $ [17, 22], $\int _\Omega \Phi _\varepsilon (Du)= \int _\Omega \Phi _\varepsilon (\nabla u)\hbox {d}x + |D^su|$ where $Du=\nabla u \hbox {d}x +D^su$ is the Radon-Nikodym decomposition of Du as an absolutely continuous and singular part, see [15]. We introduce the functional

$$\begin{aligned} \mathcal {F}_\varepsilon (u) = {\left\{ \begin{array}{ll} \Phi _\varepsilon (Du)(\Omega ) &{}\quad \text {if } u\in BV(\Omega ),\\ +\infty &{}\quad \text {if } u\in L^1(\Omega )\setminus BV(\Omega ). \end{array}\right. } \end{aligned}$$

Then, one can show that $\mathcal {F}_\varepsilon $ can also be defined by duality, as follows:

$$\begin{aligned} \mathcal {F}_\varepsilon (u)= & {} \sup \left\{ \int _\Omega u(x)\text {div}\,\varphi (x)\hbox {d}x - \frac{\varepsilon }{2}\int _\Omega |\varphi (x)|^2 \hbox {d}x \,:\,\right. \nonumber \\&\left. \varphi \in C_c^\infty (\Omega ;\mathbb {R}^2), |\varphi (x)|\le 1\quad \forall x\in \Omega \right\} \end{aligned}$$

(47)

One has the following result:

Theorem 4

As $N\rightarrow \infty $, $\mathcal {F}_{\varepsilon ,N}$ $\Gamma $-converges to $\mathcal {F}_\varepsilon $. Moreover, if for some sequence $(u^N)\in L^1(\Omega )^{\mathbb {N}}$, $\mathcal {F}_{\varepsilon ,N}(u^N)\le C<+\infty $, then there exists $u\in BV(\Omega )$, a subsequence $(u^{N_k})_k$ and a sequence of constants $(a_k)_k$ such that that $u^{N_k}-a_k\rightarrow u$ in $L^1(\Omega )$.

For the proper definition and main properties of $\Gamma $-convergence, see for instance [6, 14]. The theorem establishes that images minimizing $J_{\varepsilon /N}^{N,N}$ ($+$ other terms such as a quadratic penalization) should be close if N is large to minimizers of the isotropic ‘Huber-total variation’ $\mathcal {F}_\varepsilon $, in the continuum. The proof is easy, however not really found in this form in the literature, as far as we know. The closest results are maybe the $\Gamma $-convergence theorems of Cai et al. [7] in the context of wavelet-based approximations of the total variation.

Proof

It is enough to prove: (i) that if $u^N\in L^1(\Omega )$ is such that $\ell =\liminf _N \mathcal {F}_{\varepsilon ,N}(u^N)<\infty $, then not only one can extract $u^{N_k}$ which converges to some u, but in addition $\mathcal {F}_\varepsilon (u)\le \ell $; (ii) that given u with finite total variation, one can build a sequence $u^N$ with $\limsup _N \mathcal {F}_{\varepsilon ,N}(u^N)\le \mathcal {F}_\varepsilon (u)$.

For point (i), we first consider a subsequence $(u^{N_k})$ such that $\ell = \lim _k \mathcal {F}_{\varepsilon ,N_k}(u^{N_k})$. Then, we see that since for all k (large enough) $\mathcal {F}_{\varepsilon ,N_k}(u^{N_k})<+\infty $, by definition $u^{N_k}$ is piecewise constant and can be written

$$\begin{aligned} u^{N_k}(x)= \sum _{i=1}^{N_k}\sum _{j=1}^N u^k_{i,j} \chi _{\left( \frac{i-1}{N_k},\frac{j}{N_k}\right) \times \left( \frac{j-1}{N_k},\frac{j}{N_k}\right) }(x) \end{aligned}$$

for some matrix $\mathbf {u}^k=(u^k_{i,j})_{1\le i,j\le N_k}$. Then we observe that for some constant $\sigma >0$,

$$\begin{aligned}&\mathcal {F}_{\varepsilon ,N_k}(u^{N_k}) +\frac{\varepsilon }{2} \ge \mathcal {F}_{0,N_k}(u^{N_k})\\&\qquad \ge \sigma \frac{1}{N_k}\sum _{i,j} \left( |u^{k}_{i+1,j}-u^{k}_{i,j}| +|u^{k}_{i,j+1}-u^{k}_{i,j}|\right) \\&\quad = \sigma |Du^{N_k}|(\Omega ). \end{aligned}$$

Hence $|Du^{N_k}|(\Omega )$ is bounded, showing that $(u^{N_k}-a_k)_k$ is precompact in $L^1(\Omega )$, where $a_k$ is the average of the function $u^{N_k}$ in $\Omega $. Without loss of generality, we assume $a_k=0$ and we denote by u the limit of a subsequence (which for convenience we do not relabel). We must now show that $\mathcal {F}_\varepsilon (u)\le \ell $.

Let $\delta >0$, and let $\varphi =(\varphi ^1,\varphi ^2)\in C_c^\infty (\Omega ;\mathbb {R}^2)$ be a smooth vector field with $|\varphi (x)|^2=\varphi ^1(x)^2+\varphi ^2(x)^2\le 1-\delta $ for all $x\in \Omega $. Observe that

$$\begin{aligned} \int _\Omega u^{N_k}(x)\text {div}\,\varphi (x)\hbox {d}x= & {} \sum _{i,j} u^k_{i,j} \int _{\left( \frac{i-1}{N_k},\frac{j}{N_k}\right) \times \left( \frac{j-1}{N_k},\frac{j}{N_k}\right) }\text {div}\,\varphi (x)\hbox {d}x\\= & {} \sum _{i,j} \left( u^k_{i+1,j}-u^k_{i,j}\right) \varphi ^1_{i+\frac{1}{2},j} \\&+\, \left( u^k_{i,j+1}-u^k_{i,j}\right) \varphi ^2_{i,j+\frac{1}{2}} \end{aligned}$$

where $\varphi ^1_{i+\frac{1}{2},j}$ is the flux of $\varphi $ through the vertical segment $\{\frac{i}{N_k}\}\times (\frac{j-1}{N_k},\frac{j}{N_k})$ and $\varphi ^2_{i,j+\frac{1}{2}}$ is the flux through the horizontal segment $(\frac{i-1}{N_k},\frac{j}{N_k})\times \{\frac{j}{N_k}\}$.

Assume (i, j) are both odd or even. Denote by ${\bar{x}}= (i/N_k,j/N_k)$: as $\varphi $ is smooth, one clearly has that $N_k\varphi ^1_{i+1/2,j} = \varphi ^1({\bar{x}})+ O(1/N_k)$, etc., and, in fact,

$$\begin{aligned}&N_k^2{\mathcal {N}}_{i,j}^2:= \left( N_k\varphi ^1_{i+\frac{1}{2},j}\right) ^2 +\left( N_k\varphi ^2_{i,j+\frac{1}{2}}\right) ^2\\&\qquad +\,\left( N_k\varphi ^1_{i+\frac{1}{2},j+1}\right) ^2+ \left( N_k\varphi ^2_{i+1,j+\frac{1}{2}}\right) ^2\\&\quad \le 2(1-\delta ) + O\left( \frac{1}{N_k^2}\right) \le 2 \end{aligned}$$

if $N_k$ is large enough. As a consequence

$$\begin{aligned}&\left( u^k_{i+1,j}-u^k_{i,j}\right) \varphi ^1_{i+\frac{1}{2},j} \nonumber \\&\qquad +\, \left( u^k_{i,j+1}-u^k_{i,j}\right) \varphi ^2_{i,j+\frac{1}{2}}\nonumber \\&\qquad +\, \left( u^k_{i+1,j+1}-u^k_{i,j+1}\right) \varphi ^1_{i+\frac{1}{2},j+1} \nonumber \\&\qquad +\, \left( u^k_{i+1,j+1}-u^k_{i+1,j}\right) \varphi ^2_{i+1,j+\frac{1}{2}} - \frac{\varepsilon }{2}\mathcal {N}_{i,j}^2\nonumber \\&\quad \le \frac{1}{N_k} TV^{4,\varepsilon /N_k}_{i,j}\left( \mathbf {u}^k\right) . \end{aligned}$$

(48)

Thanks to the smoothness of $\varphi $, one can check easily that

$$\begin{aligned} \sum _{(i,j)\text { even}} {\mathcal {N}}_{i,j}^2 + \sum _{(i,j)\text { odd}} {\mathcal {N}}_{i,j}^2 \rightarrow \int _\Omega |\varphi (x)|^2 \hbox {d}x \end{aligned}$$

as $k\rightarrow \infty $, hence, summing (48) over all (i, j) both odd or both even, we find (using also the fact that $\varphi $ has compact support) that

$$\begin{aligned}&\int _\Omega u^{N_k}(x)\text {div}\,\varphi (x)\hbox {d}x - \frac{\varepsilon }{2}\int _\Omega |\varphi (x)|^2 \hbox {d}x\\&\quad +\,o(1)\le \frac{1}{N_k}J^{N_k,N_k}_{\varepsilon /N_k}\left( \mathbf {u}^k\right) = \mathcal {F}_{\varepsilon ,N_k}\left( u^{N_k}\right) . \end{aligned}$$

In the limit, we find that

$$\begin{aligned} \int _\Omega u(x)\text {div}\,\varphi (x)\hbox {d}x - \frac{\varepsilon }{2}\int _\Omega |\varphi (x)|^2 \hbox {d}x \le \ell . \end{aligned}$$

Thanks to (47), we deduce that $\mathcal {F}_\varepsilon (u)\le \ell $.

We now must prove (ii). We only sketch the proof, which is very simple: one first observes that as any $u\in BV(\Omega )$ can be approximated by a sequence $(u_n)$ with $u_n\in C^\infty (\overline{\Omega })$, $u_n\rightarrow u$ in $L^1(\Omega )$ and $\int _\Omega \Phi _\varepsilon (\nabla u_n(x))\hbox {d}x=\mathcal {F}_\varepsilon (u)$, it is enough to show the result for a smooth function and use then a diagonal argument.

But if u is smooth, letting simply for each N, $u^N_{i,j}=u((i-1/2)/N,(j-1/2)/N)$, one first observes that

$$\begin{aligned} u^N(x):=\sum _{i,j} u^N_{i,j}\chi _{\left( \frac{i-1}{N},\frac{j}{N}\right) \times \left( \frac{j-1}{N},\frac{j}{N}\right) }(x) \rightarrow u(x) \end{aligned}$$

uniformly in $\Omega $, and then that $\mathcal {F}_{\varepsilon ,N}(u^N)$ is a finite-difference approximation of $\int _\Omega \Phi _\varepsilon (u(x))\hbox {d}x$, which converges to this limit as $N\rightarrow \infty $. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chambolle, A., Tan, P. & Vaiter, S. Accelerated Alternating Descent Methods for Dykstra-Like Problems. J Math Imaging Vis 59, 481–497 (2017). https://doi.org/10.1007/s10851-017-0724-6

Download citation

Received: 12 July 2016
Accepted: 04 March 2017
Published: 17 March 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s10851-017-0724-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated Alternating Descent Methods for Dykstra-Like Problems

Abstract

Access this article

Similar content being viewed by others

Accelerated Primal-Dual Gradient Descent with Linesearch for Convex, Nonconvex, and Nonsmooth Optimization Problems

Accelerated gradient methods for nonconvex nonlinear and stochastic programming

Accelerated methods for weakly-quasi-convex optimization problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: An Approximation Result

Theorem 4

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerated Alternating Descent Methods for Dykstra-Like Problems

Abstract

Access this article

Similar content being viewed by others

Accelerated Primal-Dual Gradient Descent with Linesearch for Convex, Nonconvex, and Nonsmooth Optimization Problems

Accelerated gradient methods for nonconvex nonlinear and stochastic programming

Accelerated methods for weakly-quasi-convex optimization problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: An Approximation Result

Appendix: An Approximation Result

Theorem 4

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation