Almost sure convergence of stochastic composite objective mirror descent for non-convex non-smooth optimization

Liang, Yuqing; Xu, Dongpo; Zhang, Naimin; Mandic, Danilo P.

doi:10.1007/s11590-023-01972-3

Almost sure convergence of stochastic composite objective mirror descent for non-convex non-smooth optimization

Original Paper
Published: 18 January 2023

(2023)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Yuqing Liang¹,
Dongpo Xu ORCID: orcid.org/0000-0002-9663-9743¹,
Naimin Zhang² &
…
Danilo P. Mandic³

371 Accesses
1 Altmetric
Explore all metrics

Abstract

Stochastic composite objective mirror descent (SCOMID) is an effective method for solving large-scale stochastic composite problems in machine learning. This method can efficiently use the geometric properties of a problem through a general distance function. However, most existing analyses rely on the convexity of the problem and the unbiased assumption of the stochastic gradient. In addition, the convergence results are obtained in expectation. To this end, we present an almost sure convergence analysis of SCOMID with biased gradient estimation in the non-convex non-smooth setting. For this general case, the analysis shows that the minimum of the squared generalized projected gradient norm arbitrarily converges to zero with probability one. We also obtain the almost sure convergence of function values for SCOMID with time-varying stepsizes in the non-convex and non-smooth setting. Numerical experiments support our theoretical findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Accelerated Stochastic Mirror Descent Method

Article 29 August 2023

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

Article 27 September 2023

Fastest rates for stochastic mirror descent methods

Article 09 June 2021

Data Availibility Statement

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Notes

\({\mathcal {F}}_{t}\) denotes the \(\sigma\)-algebra generated by random variables \(x_{1}\), \(x_{2}\), \(\cdots\), \(x_{t}\), i.e., \({\mathcal {F}}_{t}=\sigma (x_{1},\,x_{2},\,\cdots ,\,x_{t})\).
\(f(x)\sim g(x)\): there exist \(x_{0}\), such that \(\lim _{x\rightarrow x_{0}} f(x)/g(x)=1\).
LIBSVM website: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

References

Alacaoglu, A., Malitsky, Y., Cevher, V.: Convergence of adaptive algorithms for weakly convex constrained optimization. arXiv preprint arXiv:2006.06650 (2020)
Atchadé, Y.F., Fort, G., Moulines, E.: On perturbed proximal gradient algorithms. J. Mach. Learn. Res. 18(1), 310–342 (2017)
MATH Google Scholar
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MATH Google Scholar
Cevher, V., Vũ, B.C.: On the linear convergence of the stochastic gradient method with constant step-size. Optim. Lett. 13(5), 1177–1187 (2019)
Article MATH Google Scholar
Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019)
Article MATH Google Scholar
Driggs, D., Liang, J., Schönlieb, C.B.: On biased stochastic gradient estimation. J. Mach. Learn. Res. 23, 24–1 (2022)
MATH Google Scholar
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. 178(1), 503–558 (2019)
Article MATH Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121–2159 (2011)
MATH Google Scholar
Duchi, J., Shalev-Shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent. In: Conference on Learning Theory, vol. 10, pp. 14–26 (2010)
Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009)
MATH Google Scholar
Dundar, M., Krishnapuram, B., Bi, J., Rao, R.B.: Learning classifiers when the training data is not iid. In: IJCAI, vol. 2007, pp. 756–61 (2007)
Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1), 267–305 (2016)
Article MATH Google Scholar
Gorbunov, E., Hanzely, F., Richtárik, P.: A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent. In: International Conference on Artificial Intelligence and Statistics, pp. 680–690. PMLR (2020)
Gower, R., Sebbouh, O., Loizou, N.: SGD for structured nonconvex functions: Learning rates, minibatching and interpolation. In: International Conference on Artificial Intelligence and Statistics, pp. 1315–1323. PMLR (2021)
Han, Y., Feng, X., Baciu, G., Wang, W.: Nonconvex sparse regularizer based speckle noise removal. Pattern Recognit. 46(3), 989–1001 (2013)
Article Google Scholar
J. Reddi, S., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: Advances in Neural Information Processing Systems, vol. 29, pp. 1153–1161 (2016)
Khaled, A., Richtárik, P.: Better theory for SGD in the nonconvex world. arXiv preprint arXiv:2002.03329 (2020)
Li, Z., Li, J.: Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization. J. Mach. Learn. Res. 23(239), 1–61 (2022)
Google Scholar
Liang, J., Monteiro, R.D.: An average curvature accelerated composite gradient method for nonconvex smooth composite optimization problems. SIAM J. Optim. 31(1), 217–243 (2021)
Article MATH Google Scholar
Liu, J., Kong, J., Xu, D., Qi, M., Lu, Y.: Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization. Neural Netw. 145, 300–307 (2022)
Article Google Scholar
Liu, J., Yuan, Y.: On almost sure convergence rates of stochastic gradient methods. arXiv preprint arXiv:2202.04295 (2022)
Luo, J., Liu, J., Xu, D., Zhang, H.: SGD-r\(\alpha\): A real-time \(\alpha\)-suffix averaging method for SGD with biased gradient estimates. Neurocomputing 487, 1–8 (2022)
Article Google Scholar
Mai, V., Johansson, M.: Convergence of a stochastic gradient method with momentum for non-smooth non-convex optimization. In: International Conference on Machine Learning, pp. 6630–6639. PMLR (2020)
Mandic, D., Chambers, J.: Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability. Wiley, New York (2001)
Book Google Scholar
Mertikopoulos, P., Hallak, N., Kavis, A., Cevher, V.: On the almost sure convergence of stochastic gradient descent in non-convex problems. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1117–1128 (2020)
Metel, M.R., Takeda, A.: Stochastic proximal methods for non-smooth non-convex constrained sparse optimization. J. Mach. Learn. Res. 22, 115–1 (2021)
MATH Google Scholar
Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Cham (2018)
MATH Google Scholar
Nikolova, M., Ng, M.K., Tam, C.P.: Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction. IEEE Trans. Image Process. 19(12), 3073–3088 (2010)
Article MATH Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. pp. 400–407 (1951)
Robbins, H., Siegmund, D.: A convergence theorem for non-negative almost supermartingales and some applications. In: Optimizing methods in statistics, pp. 233–257. Elsevier (1971)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: from Theory to Algorithms. Cambridge University Press, New York (2014)
Book MATH Google Scholar
Sun, R.Y.: Optimization for deep learning: An overview. J. Oper. Res. Soc. China 8(2), 249–294 (2020)
Article MATH Google Scholar
Tao, W., Pan, Z., Wu, G., Tao, Q.: Primal averaging: A new gradient evaluation step to attain the optimal individual convergence. IEEE T. Cybern. 50(2), 835–845 (2018)
Article Google Scholar
Vial, J.P.: Strong and weak convexity of sets and functions. Math. Oper. Res. 8(2), 231–259 (1983)
Article MATH Google Scholar
Ward, R., Wu, X., Bottou, L.: AdaGrad stepsizes: Sharp convergence over nonconvex landscapes. J. Mach. Learn. Res. 21, 1–30 (2020)
MATH Google Scholar
Wood, K., Bianchin, G., Dall’Anese, E.: Online projected gradient descent for stochastic optimization with decision-dependent distributions. IEEE Control Syst. Lett. 6, 1646–1651 (2022)
Article Google Scholar
Zhang, H., Pan, L., Xiu, N.: Optimality conditions for locally Lipschitz optimization with \(l_{0}\)-regularization. Optim. Lett. 15(1), 189–203 (2021)
Article MATH Google Scholar
Zhou, D., Chen, J., Cao, Y., Tang, Y., Yang, Z., Gu, Q.: On the convergence of adaptive gradient methods for nonconvex optimization. arXiv preprint arXiv:1808.05671 (2018)
Zhou, Y., Wang, Z., Ji, K., Liang, Y., Tarokh, V.: Proximal gradient algorithm with momentum and flexible parameter restart for nonconvex optimization. arXiv preprint arXiv:2002.11582 (2020)

Download references

Acknowledgements

The authors wish to thank the anonymous reviewers for their insightful and very helpful expert comments and suggestions. This work was funded in part by National Key R &D Program of China (No. 2021YFA1003400), in part by the National Natural Science Foundation of China (No. 62176051), and in part by the Fundamental Research Funds for the Central Universities of China (No. 2412020FZ024).

Author information

Authors and Affiliations

Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, People’s Republic of China
Yuqing Liang & Dongpo Xu
College of Mathematics and Physics, Wenzhou University, Wenzhou, 325035, People’s Republic of China
Naimin Zhang
Department of Electrical and Electronic Engineering, Imperial College London, London, SW7 2AZ, UK
Danilo P. Mandic

Authors

Yuqing Liang
View author publications
You can also search for this author in PubMed Google Scholar
Dongpo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Naimin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Danilo P. Mandic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongpo Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liang, Y., Xu, D., Zhang, N. et al. Almost sure convergence of stochastic composite objective mirror descent for non-convex non-smooth optimization. Optim Lett (2023). https://doi.org/10.1007/s11590-023-01972-3

Download citation

Received: 22 August 2022
Accepted: 05 January 2023
Published: 18 January 2023
DOI: https://doi.org/10.1007/s11590-023-01972-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Almost sure convergence of stochastic composite objective mirror descent for non-convex non-smooth optimization

Abstract

Access this article

Similar content being viewed by others

An Accelerated Stochastic Mirror Descent Method

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

Fastest rates for stochastic mirror descent methods

Data Availibility Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Almost sure convergence of stochastic composite objective mirror descent for non-convex non-smooth optimization

Abstract

Access this article

Similar content being viewed by others

An Accelerated Stochastic Mirror Descent Method

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

Fastest rates for stochastic mirror descent methods

Data Availibility Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation