Abstract
This paper extends recent results by the first author and T. Pock (ICG, TU Graz, Austria) on the acceleration of alternating minimization techniques for quadratic plus nonsmooth objectives depending on two variables. We discuss here the strongly convex situation, and how ‘fast’ methods can be derived by adapting the overrelaxation strategy of Nesterov for projected gradient descent. We also investigate slightly more general alternating descent methods, where several descent steps in each variable are alternatively performed.
Similar content being viewed by others
Notes
Consider however that as our implementations are here the same c program where depending on our choice either the descent step or the exact minimization is called, this should not be a bug. This is confirmed both by the fact that for \(\varepsilon >0\), when the subproblems are easier and hence it is even more likely that the descent steps will converge to the exact solution in few iterations, the exact and inexact method need nearly the same number of iterations, and the fact that increasing the number of descent steps yield eventually a number of outer iterations equal to the (AAMM) algorithm.
Image belongs to the authors.
References
Aujol, J.-F., Dossal, C.: Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25(4), 2408–2433 (2015)
Beck, A.: On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM J. Optim. 25(1), 185–209 (2015)
Beck, A., Teboulle, M.: A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Boyle, J.P., Dykstra, R.L.: A method for finding projections onto the intersection of convex sets in Hilbert spaces. In: Dykstra, R., Robertson, T., Wright, F.T. (eds) Advances in Order Restricted Statistical Inference (Iowa City, Iowa, 1985), vol. 37 of Lecture Notes in Statistics, pp. 28–47. Springer, Berlin (1986)
Braides, A.: \(Gamma\)-Convergence for Beginners. Number 22 in Oxford Lecture Series in Mathematics and Its Applications. Oxford University Press, Oxford (2002)
Cai, J.-F., Dong, B., Osher, S., Shen, Z.: Image restoration: total variation, wavelet frames, and beyond. J. Am. Math. Soc. 25(4), 1033–1089 (2012)
Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166(3), 968–982 (2015)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159, 253–287 (2016)
Chambolle, A., Pock, T.: A remark on accelerated block coordinate descent for computing the proximity operators of a sum of convex functions. SMAI J. Comput. Math. 1, 29–54 (2015)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319, 5 (2016)
Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., Burachik, R., Combettes, P.L., Elser, V., Luke, D.R., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, vol. 49 of Springer Optimization and Applications, pp. 185–212. Springer, New York (2011)
Dal Maso, G.: An Introduction to \(\Gamma \)-Convergence. Birkhäuser, Boston (1993)
Demengel, F., Temam, R.: Convex functions of a measure and applications. Indiana Univ. Math. J. 33(5), 673–709 (1984)
Deutsch, F., Hundal, H.: The rate of convergence of Dykstra’s cyclic projections algorithm: the polyhedral case. Numer. Funct. Anal. Optim. 15(5–6), 537–565 (1994)
Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions. CRC Press, Boca Raton (1992)
Nemirovski, A.S., Yudin, D.: Informational complexity of mathematical programming. Izv. Akad. Nauk SSSR Tekhn. Kibernet. 1, 88–117 (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 of Applied Optimization. Kluwer, Boston (2004)
Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
Ziemer, W.P.: Weakly Differentiable Functions. Sobolev Spaces and Functions of Bounded Variation. Springer, New York (1989)
Acknowledgements
This work is supported by the ANR via the international project ‘EANOI’ (Efficient Algorithms for Nonsmooth Optimization in Imaging), FWF No. I1148 / ANR-12-IS01-0003. A. Chambolle also benefits from support of the ‘Programme Gaspard Monge pour l’Optimisation et la Recherche Opérationnelle’ (PGMO), through the ‘MAORI’ group, as well as the ‘GdR MIA’ of the CNRS. He also warmly thanks Churchill College and DAMTP, Centre for Mathematical Sciences, University of Cambridge, for their kind hospitality during the completion of this work, thanks to a support of the French Embassy in the UK and the Cantab Capital Institute for Mathematics of Information.
Author information
Authors and Affiliations
Corresponding author
Appendix: An Approximation Result
Appendix: An Approximation Result
In this appendix, we show that although this is not totally obvious at first glance, the discrete energy \(J_\varepsilon (\mathbf {u})\) is an approximation of the isotropic total variation. The result is more precisely as follows. To simplify we work in the domain \(\Omega =(0,1)^2\) (extension to more general regular domains is not difficult) and we define, for \(N\ge 1\) an integer, the functional, defined for \(u\in L^1(\Omega )\),
here, \(J_{\varepsilon /N}^{N,N}\) is a notation for the energy (44) in case \(m=n=N\) (and with the smoothing parameter \(\varepsilon /N\)). We also denote \(\Phi _\varepsilon (p):= |p|^2/(2\varepsilon )\) if \(|p|\le \varepsilon \), \(|p|-\varepsilon /2\) else and recall that for \(u\in BV(\Omega )\) a function with bounded variation \(|Du|(\Omega )<+\infty \) [17, 22], \(\int _\Omega \Phi _\varepsilon (Du)= \int _\Omega \Phi _\varepsilon (\nabla u)\hbox {d}x + |D^su|\) where \(Du=\nabla u \hbox {d}x +D^su\) is the Radon-Nikodym decomposition of Du as an absolutely continuous and singular part, see [15]. We introduce the functional
Then, one can show that \(\mathcal {F}_\varepsilon \) can also be defined by duality, as follows:
One has the following result:
Theorem 4
As \(N\rightarrow \infty \), \(\mathcal {F}_{\varepsilon ,N}\) \(\Gamma \)-converges to \(\mathcal {F}_\varepsilon \). Moreover, if for some sequence \((u^N)\in L^1(\Omega )^{\mathbb {N}}\), \(\mathcal {F}_{\varepsilon ,N}(u^N)\le C<+\infty \), then there exists \(u\in BV(\Omega )\), a subsequence \((u^{N_k})_k\) and a sequence of constants \((a_k)_k\) such that that \(u^{N_k}-a_k\rightarrow u\) in \(L^1(\Omega )\).
For the proper definition and main properties of \(\Gamma \)-convergence, see for instance [6, 14]. The theorem establishes that images minimizing \(J_{\varepsilon /N}^{N,N}\) (\(+\) other terms such as a quadratic penalization) should be close if N is large to minimizers of the isotropic ‘Huber-total variation’ \(\mathcal {F}_\varepsilon \), in the continuum. The proof is easy, however not really found in this form in the literature, as far as we know. The closest results are maybe the \(\Gamma \)-convergence theorems of Cai et al. [7] in the context of wavelet-based approximations of the total variation.
Proof
It is enough to prove: (i) that if \(u^N\in L^1(\Omega )\) is such that \(\ell =\liminf _N \mathcal {F}_{\varepsilon ,N}(u^N)<\infty \), then not only one can extract \(u^{N_k}\) which converges to some u, but in addition \(\mathcal {F}_\varepsilon (u)\le \ell \); (ii) that given u with finite total variation, one can build a sequence \(u^N\) with \(\limsup _N \mathcal {F}_{\varepsilon ,N}(u^N)\le \mathcal {F}_\varepsilon (u)\).
For point (i), we first consider a subsequence \((u^{N_k})\) such that \(\ell = \lim _k \mathcal {F}_{\varepsilon ,N_k}(u^{N_k})\). Then, we see that since for all k (large enough) \(\mathcal {F}_{\varepsilon ,N_k}(u^{N_k})<+\infty \), by definition \(u^{N_k}\) is piecewise constant and can be written
for some matrix \(\mathbf {u}^k=(u^k_{i,j})_{1\le i,j\le N_k}\). Then we observe that for some constant \(\sigma >0\),
Hence \(|Du^{N_k}|(\Omega )\) is bounded, showing that \((u^{N_k}-a_k)_k\) is precompact in \(L^1(\Omega )\), where \(a_k\) is the average of the function \(u^{N_k}\) in \(\Omega \). Without loss of generality, we assume \(a_k=0\) and we denote by u the limit of a subsequence (which for convenience we do not relabel). We must now show that \(\mathcal {F}_\varepsilon (u)\le \ell \).
Let \(\delta >0\), and let \(\varphi =(\varphi ^1,\varphi ^2)\in C_c^\infty (\Omega ;\mathbb {R}^2)\) be a smooth vector field with \(|\varphi (x)|^2=\varphi ^1(x)^2+\varphi ^2(x)^2\le 1-\delta \) for all \(x\in \Omega \). Observe that
where \(\varphi ^1_{i+\frac{1}{2},j}\) is the flux of \(\varphi \) through the vertical segment \(\{\frac{i}{N_k}\}\times (\frac{j-1}{N_k},\frac{j}{N_k})\) and \(\varphi ^2_{i,j+\frac{1}{2}}\) is the flux through the horizontal segment \((\frac{i-1}{N_k},\frac{j}{N_k})\times \{\frac{j}{N_k}\}\).
Assume (i, j) are both odd or even. Denote by \({\bar{x}}= (i/N_k,j/N_k)\): as \(\varphi \) is smooth, one clearly has that \(N_k\varphi ^1_{i+1/2,j} = \varphi ^1({\bar{x}})+ O(1/N_k)\), etc., and, in fact,
if \(N_k\) is large enough. As a consequence
Thanks to the smoothness of \(\varphi \), one can check easily that
as \(k\rightarrow \infty \), hence, summing (48) over all (i, j) both odd or both even, we find (using also the fact that \(\varphi \) has compact support) that
In the limit, we find that
Thanks to (47), we deduce that \(\mathcal {F}_\varepsilon (u)\le \ell \).
We now must prove (ii). We only sketch the proof, which is very simple: one first observes that as any \(u\in BV(\Omega )\) can be approximated by a sequence \((u_n)\) with \(u_n\in C^\infty (\overline{\Omega })\), \(u_n\rightarrow u\) in \(L^1(\Omega )\) and \(\int _\Omega \Phi _\varepsilon (\nabla u_n(x))\hbox {d}x=\mathcal {F}_\varepsilon (u)\), it is enough to show the result for a smooth function and use then a diagonal argument.
But if u is smooth, letting simply for each N, \(u^N_{i,j}=u((i-1/2)/N,(j-1/2)/N)\), one first observes that
uniformly in \(\Omega \), and then that \(\mathcal {F}_{\varepsilon ,N}(u^N)\) is a finite-difference approximation of \(\int _\Omega \Phi _\varepsilon (u(x))\hbox {d}x\), which converges to this limit as \(N\rightarrow \infty \). \(\square \)
Rights and permissions
About this article
Cite this article
Chambolle, A., Tan, P. & Vaiter, S. Accelerated Alternating Descent Methods for Dykstra-Like Problems. J Math Imaging Vis 59, 481–497 (2017). https://doi.org/10.1007/s10851-017-0724-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-017-0724-6