Skip to main content

On the Acceleration of Forward-Backward Splitting via an Inexact Newton Method

  • Chapter
  • First Online:
Splitting Algorithms, Modern Operator Theory, and Applications

Abstract

We propose a Forward-Backward Truncated-Newton method (FBTN) for minimizing the sum of two convex functions, one of which smooth. Unlike other proximal Newton methods, our approach does not involve the employment of variable metrics, but is rather based on a reformulation of the original problem as the unconstrained minimization of a continuously differentiable function, the forward-backward envelope (FBE). We introduce a generalized Hessian for the FBE that symmetrizes the generalized Jacobian of the nonlinear system of equations representing the optimality conditions for the problem. This enables the employment of conjugate gradient method (CG) for efficiently solving the resulting (regularized) linear systems, which can be done inexactly. The employment of CG prevents the computation of full (generalized) Jacobians, as it requires only (generalized) directional derivatives. The resulting algorithm is globally (subsequentially) convergent, Q-linearly under an error bound condition, and up to Q-superlinearly and Q-quadratically under regularity assumptions at the possibly non-isolated limit point.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Due to apparent similarities with gradient descent iterations, having in FBS, is also referred to as (generalized) gradient mapping, see, e.g., [17]. In particular, if g = 0, then whereas if f = 0 then . The analogy will be supported by further evidence in the next section where we will see that, up to a change of metric, indeed is the gradient of the forward-backward envelope function.

  2. 2.

    As detailed in the proof, under the assumptions the limit point indeed exists.

  3. 3.

    From the chain rule of differentiation it follows that is strictly differentiable at x ⋆ if \(\operatorname {prox}_{\gamma g}\) is strictly differentiable at (strict differentiability is closed under composition).

  4. 4.

    In case of complex-valued matrices, functions of this form are known as unitarily invariant [34].

  5. 5.

    Consistently with the definition in [66], the polyhedron P can equivalently be expressed by means of only inequalities as , resulting indeed in .

References

  1. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical Programming 137(1), 91–129 (2013). DOI 10.1007/s10107-011-0484-9

    Google Scholar 

  2. Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces. CMS Books in Mathematics. Springer (2017). DOI 10.1007/978-3-319-48311-5

    Google Scholar 

  3. Bauschke, H.H., Noll, D., Phan, H.M.: Linear and strong convergence of algorithms involving averaged nonexpansive operators. Journal of Mathematical Analysis and Applications 421(1), 1–20 (2015)

    Article  MathSciNet  Google Scholar 

  4. Beck, A.: First-Order Methods in Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA (2017). DOI 10.1137/1.9781611974997

    Google Scholar 

  5. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2(1), 183–202 (2009). DOI 10.1137/080716542

    Google Scholar 

  6. Becker, S., Fadili, J.: A quasi-Newton proximal splitting method. In: Advances in Neural Information Processing Systems, pp. 2618–2626 (2012)

    Google Scholar 

  7. Bertsekas, D.P.: Constrained optimization and lagrange multiplier methods. Computer Science and Applied Mathematics, Boston: Academic Press, 1982 (1982)

    MATH  Google Scholar 

  8. Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific (2015)

    Google Scholar 

  9. Bhatia, R.: Matrix Analysis. Graduate Texts in Mathematics. Springer New York (1997)

    Google Scholar 

  10. Bochnak, J., Coste, M., Roy, M.F.: Real Algebraic Geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics. Springer Berlin Heidelberg (2013)

    Google Scholar 

  11. Bolte, J., Daniilidis, A., Lewis, A.: Tame functions are semismooth. Mathematical Programming 117(1), 5–19 (2009). DOI 10.1007/s10107-007-0166-9

    Google Scholar 

  12. Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization 3(3), 538–543 (1993). DOI 10.1137/0803026

    Google Scholar 

  13. Chen, X., Fukushima, M.: Proximal quasi-Newton methods for nondifferentiable convex optimization. Mathematical Programming 85(2), 313–334 (1999). DOI 10.1007/s101070050059

    Google Scholar 

  14. Chen, X., Qi, H., Tseng, P.: Analysis of nonsmooth symmetric-matrix-valued functions with applications to semidefinite complementarity problems. SIAM Journal on Optimization 13(4), 960–985 (2003). DOI 10.1137/S1052623400380584

    Google Scholar 

  15. Clarke, F.H.: Optimization and Nonsmooth Analysis. Society for Industrial and Applied Mathematics (1990). DOI 10.1137/1.9781611971309

    Google Scholar 

  16. Combettes, P.L., Pesquet, J.C.: Proximal Splitting Methods in Signal Processing, pp. 185–212. Springer New York, New York, NY (2011). DOI 10.1007/978-1-4419-9569-8_10

    Google Scholar 

  17. Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Mathematics of Operations Research (2018)

    Google Scholar 

  18. Eldén, L.: Matrix Methods in Data Mining and Pattern Recognition. Society for Industrial and Applied Mathematics (2007). DOI 10.1137/1.9780898718867

    Google Scholar 

  19. Facchinei, F., Pang, J.S.: Finite-dimensional variational inequalities and complementarity problems, vol. II. Springer (2003)

    Google Scholar 

  20. Fazel, M.: Matrix rank minimization with applications. Ph.D. thesis, Stanford University (2002)

    Google Scholar 

  21. Fazel, M., Hindi, H., Boyd, S.P.: A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the 2001 American Control Conference, vol. 6, pp. 4734–4739 (2001). DOI 10.1109/ACC.2001.945730

    Google Scholar 

  22. Fazel, M., Hindi, H., Boyd, S.P.: Rank minimization and applications in system theory. In: Proceedings of the 2004 American Control Conference, vol. 4, pp. 3273–3278 vol.4 (2004). DOI 10.23919/ACC.2004.1384521

    Google Scholar 

  23. Fukushima, M.: Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems. Mathematical Programming 53(1), 99–110 (1992). DOI 10.1007/BF01585696

    Google Scholar 

  24. Giselsson, P., Fält, M.: Envelope functions: Unifications and further properties. Journal of Optimization Theory and Applications (2018). DOI 10.1007/s10957-018-1328-z

    Google Scholar 

  25. Gowda, M.S.: Inverse and implicit function theorems for H-differentiable and semismooth functions. Optimization Methods and Software 19(5), 443–461 (2004). DOI 10.1080/10556780410001697668

    Google Scholar 

  26. Güler, O.: New proximal point algorithms for convex minimization. SIAM Journal on Optimization 2(4), 649–664 (1992). DOI 10.1137/0802032

    Google Scholar 

  27. Han, J., Sun, D.: Newton and quasi-Newton methods for normal maps with polyhedral sets. Journal of Optimization Theory and Applications 94(3), 659–676 (1997). DOI 10.1023/A:1022653001160

    Google Scholar 

  28. Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Grundlehren Text Editions. Springer Berlin Heidelberg (2004)

    Google Scholar 

  29. Horn, R.A., Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press (1994)

    Google Scholar 

  30. Kanzow, C., Ferenczi, I., Fukushima, M.: On the local convergence of semismooth Newton methods for linear and nonlinear second-order cone programs without strict complementarity. SIAM Journal on Optimization 20(1), 297–320 (2009). DOI 10.1137/060657662

    Google Scholar 

  31. Lan, G., Lu, Z., Monteiro, R.D.C.: Primal-dual first-order methods with O(1∕ε) iteration-complexity for cone programming. Mathematical Programming 126(1), 1–29 (2011). DOI 10.1007/s10107-008-0261-6

    Google Scholar 

  32. Lee, J.D., Sun, Y., Saunders, M.: Proximal Newton-type methods for minimizing composite functions. SIAM Journal on Optimization 24(3), 1420–1443 (2014). DOI 10.1137/130921428

    Google Scholar 

  33. Lemaréchal, C., Sagastizábal, C.: Practical aspects of the Moreau-Yosida regularization: Theoretical preliminaries. SIAM Journal on Optimization 7(2), 367–385 (1997). DOI 10.1137/S1052623494267127

    Google Scholar 

  34. Lewis, A.S.: The convex analysis of unitarily invariant matrix functions. Journal of Convex Analysis 2(1), 173–183 (1995)

    MathSciNet  MATH  Google Scholar 

  35. Lewis, A.S.: Convex analysis on the Hermitian matrices. SIAM Journal on Optimization 6(1), 164–177 (1996). DOI 10.1137/0806009

    Google Scholar 

  36. Lewis, A.S.: Derivatives of spectral functions. Mathematics of Operations Research 21(3), 576–588 (1996)

    Article  MathSciNet  Google Scholar 

  37. Lewis, A.S., Sendov, H.S.: Twice differentiable spectral functions. SIAM Journal on Matrix Analysis and Applications 23(2), 368–386 (2001). DOI 10.1137/S089547980036838X

    Google Scholar 

  38. Li, W., Peng, J.: Exact penalty functions for constrained minimization problems via regularized gap function for variational inequalities. Journal of Global Optimization 37(1), 85–94 (2007). DOI 10.1007/s10898-006-9038-8

    Google Scholar 

  39. Li, X., Sun, D., Toh, K.C.: On the efficient computation of a generalized Jacobian of the projector over the Birkhoff polytope. ArXiv e-prints (2017)

    Google Scholar 

  40. Lions Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis 16(6), 964–979 (1979). DOI 10.1137/0716071

    Google Scholar 

  41. Liu, Z., Vandenberghe, L.: Interior-point method for nuclear norm approximation with application to system identification. SIAM Journal on Matrix Analysis and Applications 31(3), 1235–1256 (2010). DOI 10.1137/090755436

    Google Scholar 

  42. Lu, Z.: Randomized block proximal damped Newton method for composite self-concordant minimization. SIAM Journal on Optimization 27(3), 1910–1942 (2017). DOI 10.1137/16M1082767

    Google Scholar 

  43. Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Annals of Operations Research 46(1), 157–178 (1993). DOI 10.1007/BF02096261

    Google Scholar 

  44. Maratos, N.: Exact penalty function algorithms for finite dimensional and control optimization problems (1978)

    Google Scholar 

  45. Martinet, B.: Brève communication. Régularisation d’inéquations variationnelles par approximations successives. Revue française d’informatique et de recherche opérationnelle. Série rouge 4(R3), 154–158 (1970)

    Google Scholar 

  46. Meng, F.: Moreau-Yosida regularization of Lagrangian-dual functions for a class of convex optimization problems. Journal of Global Optimization 44(3), 375 (2008). DOI 10.1007/s10898-008-9333-7

    Google Scholar 

  47. Meng, F., Sun, D., Zhao, G.: Semismoothness of solutions to generalized equations and the Moreau-Yosida regularization. Mathematical Programming 104(2), 561–581 (2005). DOI 10.1007/s10107-005-0629-9

    Google Scholar 

  48. Meng, F., Zhao, G., Goh, M., De Souza, R.: Lagrangian-dual functions and Moreau-Yosida regularization. SIAM Journal on Optimization 19(1), 39–61 (2008). DOI 10.1137/060673746

    Google Scholar 

  49. Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM Journal on Control and Optimization 15(6), 959–972 (1977). DOI 10.1137/0315061

    Google Scholar 

  50. Mifflin, R., Qi, L., Sun, D.: Properties of the Moreau-Yosida regularization of a piecewise C 2 convex function. Mathematical Programming 84(2), 269–281 (1999). DOI 10.1007/s10107980029a

    Google Scholar 

  51. Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bulletin de la Société Mathématique de France 93, 273–299 (1965)

    Article  MathSciNet  Google Scholar 

  52. Morita, T., Kanade, T.: A sequential factorization method for recovering shape and motion from image streams. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(8), 858–867 (1997). DOI 10.1109/34.608289

    Google Scholar 

  53. Nesterov, Y.: Introductory lectures on convex optimization: A basic course, vol. 87. Springer (2003)

    Google Scholar 

  54. Nesterov, Y.: Gradient methods for minimizing composite functions. Mathematical Programming 140(1), 125–161 (2013). DOI 10.1007/s10107-012-0629-5

    Google Scholar 

  55. Pang, J.S.: Error bounds in mathematical programming. Mathematical Programming 79(1), 299–332 (1997). DOI 10.1007/BF02614322

    Google Scholar 

  56. Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014). DOI 10.1561/2400000003

    Google Scholar 

  57. Patrinos, P., Bemporad, A.: Proximal Newton methods for convex composite optimization. In: IEEE Conference on Decision and Control, pp. 2358–2363 (2013)

    Google Scholar 

  58. Patrinos, P., Sopasakis, P., Sarimveis, H.: A global piecewise smooth Newton method for fast large-scale model predictive control. Automatica 47(9), 2016–2022 (2011)

    Article  MathSciNet  Google Scholar 

  59. Patrinos, P., Stella, L., Bemporad, A.: Forward-backward truncated Newton methods for convex composite optimization. ArXiv e-prints (2014)

    Google Scholar 

  60. Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Mathematical Programming 58(1), 353–367 (1993). DOI 10.1007/BF01581275

    Google Scholar 

  61. Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review 52(3), 471–501 (2010). DOI 10.1137/070697835

    Google Scholar 

  62. Rennie, J.D.M., Srebro, N.: Fast maximum margin matrix factorization for collaborative prediction. In: Proceedings of the 22Nd International Conference on Machine Learning, ICML ’05, pp. 713–719. ACM, New York, NY, USA (2005). DOI 10.1145/1102351.1102441

    Google Scholar 

  63. Rockafellar, R.: Convex analysis (1970)

    Google Scholar 

  64. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization 14(5), 877–898 (1976). DOI 10.1137/0314056

    Google Scholar 

  65. Rockafellar, R.T., Wets, R.J.B.: Variational analysis, vol. 317. Springer Science & Business Media (2011)

    Google Scholar 

  66. Scholtes, S.: Piecewise Differentiable Functions, pp. 91–111. Springer New York, New York, NY (2012). DOI 10.1007/978-1-4614-4340-7_4

    Google Scholar 

  67. Sopasakis, P., Freris, N., Patrinos, P.: Accelerated reconstruction of a compressively sampled data stream. In: 2016 24th European Signal Processing Conference (EUSIPCO), pp. 1078–1082 (2016). DOI 10.1109/EUSIPCO.2016.7760414

    Google Scholar 

  68. Srebro, N.: Learning with matrix factorizations. Ph.D. thesis, Cambridge, MA, USA (2004)

    Google Scholar 

  69. Stella, L., Themelis, A., Patrinos, P.: Forward-backward quasi-Newton methods for nonsmooth optimization problems. Computational Optimization and Applications 67(3), 443–487 (2017). DOI 10.1007/s10589-017-9912-y

    Google Scholar 

  70. Stella, L., Themelis, A., Patrinos, P.: Newton-type alternating minimization algorithm for convex optimization. IEEE Transactions on Automatic Control (2018). DOI 10.1109/TAC.2018.2872203

    Google Scholar 

  71. Stella, L., Themelis, A., Sopasakis, P., Patrinos, P.: A simple and efficient algorithm for nonlinear model predictive control. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 1939–1944 (2017). DOI 10.1109/CDC.2017.8263933

    Google Scholar 

  72. Sun, D., Fukushima, M., Qi, L.: A computable generalized Hessian of the D-gap function and Newton-type methods for variational inequality problems. Complementarity and Variational Problems: State of the Art, MC Ferris and JS Pang (eds.), SIAM, Philadelphia, PA pp. 452–472 (1997)

    Google Scholar 

  73. Sun, D., Sun, J.: Semismooth matrix-valued functions. Mathematics of Operations Research 27(1), 150–169 (2002). DOI 10.1287/moor.27.1.150.342

    Google Scholar 

  74. Themelis, A., Patrinos, P.: Douglas-Rachford splitting and ADMM for nonconvex optimization: tight convergence results. ArXiv e-prints (2017)

    Google Scholar 

  75. Themelis, A., Stella, L., Patrinos, P.: Forward-backward envelope for the sum of two nonconvex functions: Further properties and nonmonotone linesearch algorithms. SIAM Journal on Optimization 28(3), 2274–2303 (2018). DOI 10.1137/16M1080240

    Google Scholar 

  76. Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision 9(2), 137–154 (1992). DOI 10.1007/BF00129684

    Google Scholar 

  77. Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. Tech. rep. (2008)

    Google Scholar 

  78. Ulbrich, M.: Optimization Methods in Banach Spaces, pp. 97–156. Springer Netherlands, Dordrecht (2009). DOI 10.1007/978-1-4020-8839-1_2

    Google Scholar 

  79. Yamashita, N., Taji, K., Fukushima, M.: Unconstrained optimization reformulations of variational inequality problems. Journal of Optimization Theory and Applications 92(3), 439–456 (1997). DOI 10.1023/A:1022660704427

    Google Scholar 

  80. Yang, Z.: A study on nonsymmetric matrix-valued functions. Master’s thesis, National University of Singapore (2009)

    Google Scholar 

  81. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 68(1), 49–67 (2006)

    Google Scholar 

  82. Zhou, G., Qi, L.: On the convergence of an inexact Newton-type method. Oper. Res. Lett. 34(6), 647–652 (2006). DOI 10.1016/j.orl.2005.11.001

    Google Scholar 

  83. Zhou, G., Toh, K.C.: Superlinear convergence of a Newton-type algorithm for monotone equations. Journal of Optimization Theory and Applications 125(1), 205–221 (2005). DOI 10.1007/s10957-004-1721-7

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Research Foundation Flanders (FWO) research projects G086518N and G086318N; KU Leuven internal funding StG/15/043; Fonds de la Recherche Scientifique—FNRS and the Fonds Wetenschappelijk Onderzoek—Vlaanderen under EOS Project no 30468160 (SeLMA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panagiotis Patrinos .

Editor information

Editors and Affiliations

Appendix: Auxiliary Results

Appendix: Auxiliary Results

Lemma 1

Any proper lsc convex function with nonempty and bounded set of minimizers is level bounded.

Proof

Let h be such function; to avoid trivialities we assume that \( \operatorname {{\mathrm {dom}}} h\) is unbounded. Fix \(x_\star \in \operatorname *{\mathrm {argmin}} h\) and let R > 0 be such that . Since \( \operatorname {{\mathrm {dom}}} h\) is closed, convex, and unbounded, it holds that h attains a minimum on the compact set \(\operatorname {bdry} B\), be it m, which is strictly larger than h(x ⋆) (since \(\operatorname {dist}( \operatorname *{\mathrm {argmin}} h,\operatorname {bdry} B)>0\) due to compactness of \( \operatorname *{\mathrm {argmin}} h\) and openness of B). For x∉B, let \(s_x=x_\star +R\tfrac {x-x_\star }{\|x-x_\star \|}\) denote its projection onto \(\operatorname {bdry} B\), and let \(t_x\mathrel{\mathop:}= \tfrac {\|x-x_\star \|}{R}\geq 1\). Then,

$$\displaystyle \begin{aligned} h(x) {}={} & h\big(x_\star+t_x(s_x-x_\star)) {}\geq{} h(x_\star)+t_x\big(h(s_x)-h(x_\star)\big) {}\geq{} h(x_\star)+t_x\big(m-h(x_\star)\big) \end{aligned} $$

where in the first inequality we used the fact that t x ≥ 1. Since m − h(x ⋆) > 0 and t x →∞ as ∥x∥→∞, we conclude that h is coercive, and thus level bounded. □

Lemma 2

Let \(H\in \operatorname {S}_+(\mathbb {R}^n)\) with λ max(H) ≤ 1. Then \(H-H^2\in \operatorname {S}_+(\mathbb {R}^n)\) with

(15.122)

Proof

Consider the spectral decomposition for some orthogonal matrix S and diagonal D. Then, where \(\tilde D=D-D^2\). Apparently, \(\tilde D\) is diagonal, hence the eigenvalues of H − H 2 are exactly . The function λ↦λ − λ 2 is concave, hence the minimum in \(\operatorname {eigs}(\tilde H)\) is attained at one extremum, that is, either at λ = λ min(H) or λ = λ max(H), which proves the claim. □

Lemma 3

For any γ ∈ (0, 2∕L f) the forward-backward operator (15.22) is nonexpansive (in fact, \(\frac {2}{4-\gamma L_f}\) -averaged), and the residual is Lipschitz continuous with modulus \(\frac {4}{\gamma (4-\gamma L_f)}\).

Proof

By combining [2, Prop. 4.39 and Cor. 18.17] it follows that the gradient descent operator is γL f∕2-averaged. Moreover, since the proximal mapping is 1∕2-averaged [2, Prop. 12.28] we conclude from [2, Prop. 4.44] that the forward-backward operator T γ is α-averaged with \(\alpha =\frac {2}{4-\gamma L_f}\), thus nonexpansive [2, Rem. 4.34(i)]. Therefore, by definition of α-averagedness there exists a 1-Lipschitz continuous operator S such that \(T_\gamma =(1-\alpha )\operatorname {id}+\alpha S\) and consequently the residual is (2α∕γ)-Lipschitz continuous. □

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Themelis, A., Ahookhosh, M., Patrinos, P. (2019). On the Acceleration of Forward-Backward Splitting via an Inexact Newton Method. In: Bauschke, H., Burachik, R., Luke, D. (eds) Splitting Algorithms, Modern Operator Theory, and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-25939-6_15

Download citation

Publish with us

Policies and ethics