Skip to main content
Log in

Accelerated Alternating Descent Methods for Dykstra-Like Problems

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Abstract

This paper extends recent results by the first author and T. Pock (ICG, TU Graz, Austria) on the acceleration of alternating minimization techniques for quadratic plus nonsmooth objectives depending on two variables. We discuss here the strongly convex situation, and how ‘fast’ methods can be derived by adapting the overrelaxation strategy of Nesterov for projected gradient descent. We also investigate slightly more general alternating descent methods, where several descent steps in each variable are alternatively performed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Consider however that as our implementations are here the same c program where depending on our choice either the descent step or the exact minimization is called, this should not be a bug. This is confirmed both by the fact that for \(\varepsilon >0\), when the subproblems are easier and hence it is even more likely that the descent steps will converge to the exact solution in few iterations, the exact and inexact method need nearly the same number of iterations, and the fact that increasing the number of descent steps yield eventually a number of outer iterations equal to the (AAMM) algorithm.

  2. Image belongs to the authors.

References

  1. Aujol, J.-F., Dossal, C.: Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25(4), 2408–2433 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  2. Beck, A.: On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM J. Optim. 25(1), 185–209 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  3. Beck, A., Teboulle, M.: A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  5. Boyle, J.P., Dykstra, R.L.: A method for finding projections onto the intersection of convex sets in Hilbert spaces. In: Dykstra, R., Robertson, T., Wright, F.T. (eds) Advances in Order Restricted Statistical Inference (Iowa City, Iowa, 1985), vol. 37 of Lecture Notes in Statistics, pp. 28–47. Springer, Berlin (1986)

  6. Braides, A.: \(Gamma\)-Convergence for Beginners. Number 22 in Oxford Lecture Series in Mathematics and Its Applications. Oxford University Press, Oxford (2002)

    Google Scholar 

  7. Cai, J.-F., Dong, B., Osher, S., Shen, Z.: Image restoration: total variation, wavelet frames, and beyond. J. Am. Math. Soc. 25(4), 1033–1089 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166(3), 968–982 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159, 253–287 (2016)

  11. Chambolle, A., Pock, T.: A remark on accelerated block coordinate descent for computing the proximity operators of a sum of convex functions. SMAI J. Comput. Math. 1, 29–54 (2015)

    Article  MathSciNet  Google Scholar 

  12. Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319, 5 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  13. Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., Burachik, R., Combettes, P.L., Elser, V., Luke, D.R., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, vol. 49 of Springer Optimization and Applications, pp. 185–212. Springer, New York (2011)

  14. Dal Maso, G.: An Introduction to \(\Gamma \)-Convergence. Birkhäuser, Boston (1993)

    Book  MATH  Google Scholar 

  15. Demengel, F., Temam, R.: Convex functions of a measure and applications. Indiana Univ. Math. J. 33(5), 673–709 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  16. Deutsch, F., Hundal, H.: The rate of convergence of Dykstra’s cyclic projections algorithm: the polyhedral case. Numer. Funct. Anal. Optim. 15(5–6), 537–565 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  17. Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions. CRC Press, Boca Raton (1992)

    MATH  Google Scholar 

  18. Nemirovski, A.S., Yudin, D.: Informational complexity of mathematical programming. Izv. Akad. Nauk SSSR Tekhn. Kibernet. 1, 88–117 (1983)

    MathSciNet  Google Scholar 

  19. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 of Applied Optimization. Kluwer, Boston (2004)

    Book  MATH  Google Scholar 

  20. Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  21. Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  22. Ziemer, W.P.: Weakly Differentiable Functions. Sobolev Spaces and Functions of Bounded Variation. Springer, New York (1989)

    Book  MATH  Google Scholar 

Download references

Acknowledgements

This work is supported by the ANR via the international project ‘EANOI’ (Efficient Algorithms for Nonsmooth Optimization in Imaging), FWF No. I1148 / ANR-12-IS01-0003. A. Chambolle also benefits from support of the ‘Programme Gaspard Monge pour l’Optimisation et la Recherche Opérationnelle’ (PGMO), through the ‘MAORI’ group, as well as the ‘GdR MIA’ of the CNRS. He also warmly thanks Churchill College and DAMTP, Centre for Mathematical Sciences, University of Cambridge, for their kind hospitality during the completion of this work, thanks to a support of the French Embassy in the UK and the Cantab Capital Institute for Mathematics of Information.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonin Chambolle.

Appendix: An Approximation Result

Appendix: An Approximation Result

In this appendix, we show that although this is not totally obvious at first glance, the discrete energy \(J_\varepsilon (\mathbf {u})\) is an approximation of the isotropic total variation. The result is more precisely as follows. To simplify we work in the domain \(\Omega =(0,1)^2\) (extension to more general regular domains is not difficult) and we define, for \(N\ge 1\) an integer, the functional, defined for \(u\in L^1(\Omega )\),

$$\begin{aligned} \mathcal {F}_{\varepsilon ,N}(u) = {\left\{ \begin{array}{ll} \frac{1}{N}J_{\varepsilon /N}^{N,N}(\mathbf {u}) &{}\quad \text { if } \mathbf {u}=(u_{i,j})_{1\le i,j\le N}, u(x)= \sum \nolimits _{i=1}^N\sum \nolimits _{j=1}^N u_{i,j} \chi _{(\frac{i-1}{N},\frac{j}{N})\times (\frac{j-1}{N},\frac{j}{N})}(x) \text { a.e.,} \\ +\infty &{}\quad \text { else.} \end{array}\right. } \end{aligned}$$

here, \(J_{\varepsilon /N}^{N,N}\) is a notation for the energy (44) in case \(m=n=N\) (and with the smoothing parameter \(\varepsilon /N\)). We also denote \(\Phi _\varepsilon (p):= |p|^2/(2\varepsilon )\) if \(|p|\le \varepsilon \), \(|p|-\varepsilon /2\) else and recall that for \(u\in BV(\Omega )\) a function with bounded variation \(|Du|(\Omega )<+\infty \) [17, 22], \(\int _\Omega \Phi _\varepsilon (Du)= \int _\Omega \Phi _\varepsilon (\nabla u)\hbox {d}x + |D^su|\) where \(Du=\nabla u \hbox {d}x +D^su\) is the Radon-Nikodym decomposition of Du as an absolutely continuous and singular part, see [15]. We introduce the functional

$$\begin{aligned} \mathcal {F}_\varepsilon (u) = {\left\{ \begin{array}{ll} \Phi _\varepsilon (Du)(\Omega ) &{}\quad \text {if } u\in BV(\Omega ),\\ +\infty &{}\quad \text {if } u\in L^1(\Omega )\setminus BV(\Omega ). \end{array}\right. } \end{aligned}$$

Then, one can show that \(\mathcal {F}_\varepsilon \) can also be defined by duality, as follows:

$$\begin{aligned} \mathcal {F}_\varepsilon (u)= & {} \sup \left\{ \int _\Omega u(x)\text {div}\,\varphi (x)\hbox {d}x - \frac{\varepsilon }{2}\int _\Omega |\varphi (x)|^2 \hbox {d}x \,:\,\right. \nonumber \\&\left. \varphi \in C_c^\infty (\Omega ;\mathbb {R}^2), |\varphi (x)|\le 1\quad \forall x\in \Omega \right\} \end{aligned}$$
(47)

One has the following result:

Theorem 4

As \(N\rightarrow \infty \), \(\mathcal {F}_{\varepsilon ,N}\) \(\Gamma \)-converges to \(\mathcal {F}_\varepsilon \). Moreover, if for some sequence \((u^N)\in L^1(\Omega )^{\mathbb {N}}\), \(\mathcal {F}_{\varepsilon ,N}(u^N)\le C<+\infty \), then there exists \(u\in BV(\Omega )\), a subsequence \((u^{N_k})_k\) and a sequence of constants \((a_k)_k\) such that that \(u^{N_k}-a_k\rightarrow u\) in \(L^1(\Omega )\).

For the proper definition and main properties of \(\Gamma \)-convergence, see for instance [6, 14]. The theorem establishes that images minimizing \(J_{\varepsilon /N}^{N,N}\) (\(+\) other terms such as a quadratic penalization) should be close if N is large to minimizers of the isotropic ‘Huber-total variation’ \(\mathcal {F}_\varepsilon \), in the continuum. The proof is easy, however not really found in this form in the literature, as far as we know. The closest results are maybe the \(\Gamma \)-convergence theorems of Cai et al. [7] in the context of wavelet-based approximations of the total variation.

Proof

It is enough to prove: (i) that if \(u^N\in L^1(\Omega )\) is such that \(\ell =\liminf _N \mathcal {F}_{\varepsilon ,N}(u^N)<\infty \), then not only one can extract \(u^{N_k}\) which converges to some u, but in addition \(\mathcal {F}_\varepsilon (u)\le \ell \); (ii) that given u with finite total variation, one can build a sequence \(u^N\) with \(\limsup _N \mathcal {F}_{\varepsilon ,N}(u^N)\le \mathcal {F}_\varepsilon (u)\).

For point (i), we first consider a subsequence \((u^{N_k})\) such that \(\ell = \lim _k \mathcal {F}_{\varepsilon ,N_k}(u^{N_k})\). Then, we see that since for all k (large enough) \(\mathcal {F}_{\varepsilon ,N_k}(u^{N_k})<+\infty \), by definition \(u^{N_k}\) is piecewise constant and can be written

$$\begin{aligned} u^{N_k}(x)= \sum _{i=1}^{N_k}\sum _{j=1}^N u^k_{i,j} \chi _{\left( \frac{i-1}{N_k},\frac{j}{N_k}\right) \times \left( \frac{j-1}{N_k},\frac{j}{N_k}\right) }(x) \end{aligned}$$

for some matrix \(\mathbf {u}^k=(u^k_{i,j})_{1\le i,j\le N_k}\). Then we observe that for some constant \(\sigma >0\),

$$\begin{aligned}&\mathcal {F}_{\varepsilon ,N_k}(u^{N_k}) +\frac{\varepsilon }{2} \ge \mathcal {F}_{0,N_k}(u^{N_k})\\&\qquad \ge \sigma \frac{1}{N_k}\sum _{i,j} \left( |u^{k}_{i+1,j}-u^{k}_{i,j}| +|u^{k}_{i,j+1}-u^{k}_{i,j}|\right) \\&\quad = \sigma |Du^{N_k}|(\Omega ). \end{aligned}$$

Hence \(|Du^{N_k}|(\Omega )\) is bounded, showing that \((u^{N_k}-a_k)_k\) is precompact in \(L^1(\Omega )\), where \(a_k\) is the average of the function \(u^{N_k}\) in \(\Omega \). Without loss of generality, we assume \(a_k=0\) and we denote by u the limit of a subsequence (which for convenience we do not relabel). We must now show that \(\mathcal {F}_\varepsilon (u)\le \ell \).

Let \(\delta >0\), and let \(\varphi =(\varphi ^1,\varphi ^2)\in C_c^\infty (\Omega ;\mathbb {R}^2)\) be a smooth vector field with \(|\varphi (x)|^2=\varphi ^1(x)^2+\varphi ^2(x)^2\le 1-\delta \) for all \(x\in \Omega \). Observe that

$$\begin{aligned} \int _\Omega u^{N_k}(x)\text {div}\,\varphi (x)\hbox {d}x= & {} \sum _{i,j} u^k_{i,j} \int _{\left( \frac{i-1}{N_k},\frac{j}{N_k}\right) \times \left( \frac{j-1}{N_k},\frac{j}{N_k}\right) }\text {div}\,\varphi (x)\hbox {d}x\\= & {} \sum _{i,j} \left( u^k_{i+1,j}-u^k_{i,j}\right) \varphi ^1_{i+\frac{1}{2},j} \\&+\, \left( u^k_{i,j+1}-u^k_{i,j}\right) \varphi ^2_{i,j+\frac{1}{2}} \end{aligned}$$

where \(\varphi ^1_{i+\frac{1}{2},j}\) is the flux of \(\varphi \) through the vertical segment \(\{\frac{i}{N_k}\}\times (\frac{j-1}{N_k},\frac{j}{N_k})\) and \(\varphi ^2_{i,j+\frac{1}{2}}\) is the flux through the horizontal segment \((\frac{i-1}{N_k},\frac{j}{N_k})\times \{\frac{j}{N_k}\}\).

Assume (ij) are both odd or even. Denote by \({\bar{x}}= (i/N_k,j/N_k)\): as \(\varphi \) is smooth, one clearly has that \(N_k\varphi ^1_{i+1/2,j} = \varphi ^1({\bar{x}})+ O(1/N_k)\), etc., and, in fact,

$$\begin{aligned}&N_k^2{\mathcal {N}}_{i,j}^2:= \left( N_k\varphi ^1_{i+\frac{1}{2},j}\right) ^2 +\left( N_k\varphi ^2_{i,j+\frac{1}{2}}\right) ^2\\&\qquad +\,\left( N_k\varphi ^1_{i+\frac{1}{2},j+1}\right) ^2+ \left( N_k\varphi ^2_{i+1,j+\frac{1}{2}}\right) ^2\\&\quad \le 2(1-\delta ) + O\left( \frac{1}{N_k^2}\right) \le 2 \end{aligned}$$

if \(N_k\) is large enough. As a consequence

$$\begin{aligned}&\left( u^k_{i+1,j}-u^k_{i,j}\right) \varphi ^1_{i+\frac{1}{2},j} \nonumber \\&\qquad +\, \left( u^k_{i,j+1}-u^k_{i,j}\right) \varphi ^2_{i,j+\frac{1}{2}}\nonumber \\&\qquad +\, \left( u^k_{i+1,j+1}-u^k_{i,j+1}\right) \varphi ^1_{i+\frac{1}{2},j+1} \nonumber \\&\qquad +\, \left( u^k_{i+1,j+1}-u^k_{i+1,j}\right) \varphi ^2_{i+1,j+\frac{1}{2}} - \frac{\varepsilon }{2}\mathcal {N}_{i,j}^2\nonumber \\&\quad \le \frac{1}{N_k} TV^{4,\varepsilon /N_k}_{i,j}\left( \mathbf {u}^k\right) . \end{aligned}$$
(48)

Thanks to the smoothness of \(\varphi \), one can check easily that

$$\begin{aligned} \sum _{(i,j)\text { even}} {\mathcal {N}}_{i,j}^2 + \sum _{(i,j)\text { odd}} {\mathcal {N}}_{i,j}^2 \rightarrow \int _\Omega |\varphi (x)|^2 \hbox {d}x \end{aligned}$$

as \(k\rightarrow \infty \), hence, summing (48) over all (ij) both odd or both even, we find (using also the fact that \(\varphi \) has compact support) that

$$\begin{aligned}&\int _\Omega u^{N_k}(x)\text {div}\,\varphi (x)\hbox {d}x - \frac{\varepsilon }{2}\int _\Omega |\varphi (x)|^2 \hbox {d}x\\&\quad +\,o(1)\le \frac{1}{N_k}J^{N_k,N_k}_{\varepsilon /N_k}\left( \mathbf {u}^k\right) = \mathcal {F}_{\varepsilon ,N_k}\left( u^{N_k}\right) . \end{aligned}$$

In the limit, we find that

$$\begin{aligned} \int _\Omega u(x)\text {div}\,\varphi (x)\hbox {d}x - \frac{\varepsilon }{2}\int _\Omega |\varphi (x)|^2 \hbox {d}x \le \ell . \end{aligned}$$

Thanks to (47), we deduce that \(\mathcal {F}_\varepsilon (u)\le \ell \).

We now must prove (ii). We only sketch the proof, which is very simple: one first observes that as any \(u\in BV(\Omega )\) can be approximated by a sequence \((u_n)\) with \(u_n\in C^\infty (\overline{\Omega })\), \(u_n\rightarrow u\) in \(L^1(\Omega )\) and \(\int _\Omega \Phi _\varepsilon (\nabla u_n(x))\hbox {d}x=\mathcal {F}_\varepsilon (u)\), it is enough to show the result for a smooth function and use then a diagonal argument.

But if u is smooth, letting simply for each N, \(u^N_{i,j}=u((i-1/2)/N,(j-1/2)/N)\), one first observes that

$$\begin{aligned} u^N(x):=\sum _{i,j} u^N_{i,j}\chi _{\left( \frac{i-1}{N},\frac{j}{N}\right) \times \left( \frac{j-1}{N},\frac{j}{N}\right) }(x) \rightarrow u(x) \end{aligned}$$

uniformly in \(\Omega \), and then that \(\mathcal {F}_{\varepsilon ,N}(u^N)\) is a finite-difference approximation of \(\int _\Omega \Phi _\varepsilon (u(x))\hbox {d}x\), which converges to this limit as \(N\rightarrow \infty \). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chambolle, A., Tan, P. & Vaiter, S. Accelerated Alternating Descent Methods for Dykstra-Like Problems. J Math Imaging Vis 59, 481–497 (2017). https://doi.org/10.1007/s10851-017-0724-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10851-017-0724-6

Keywords

Navigation