Efficient and Numerically Stable Sparse Learning

Xie, Sihong; Fan, Wei; Verscheure, Olivier; Ren, Jiangtao

doi:10.1007/978-3-642-15939-8_31

Sihong Xie²³,
Wei Fan²⁴,
Olivier Verscheure²⁴ &
…
Jiangtao Ren²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6323))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3514 Accesses

Abstract

We consider the problem of numerical stability and model density growth when training a sparse linear model from massive data. We focus on scalable algorithms that optimize certain loss function using gradient descent, with either ℓ₀ or ℓ₁ regularization. We observed numerical stability problems in several existing methods, leading to divergence and low accuracy. In addition, these methods typically have weak controls over sparsity, such that model density grows faster than necessary. We propose a framework to address the above problems. First, the update rule is numerically stable with convergence guarantee and results in more reasonable models. Second, besides ℓ₁ regularization, it exploits the sparsity of data distribution and achieves a higher degree of sparsity with a PAC generalization error bound. Lastly, it is parallelizable and suitable for training large margin classifiers on huge datasets. Experiments show that the proposed method converges consistently and outperforms other baselines using 10% of features by as much as 6% reduction in error rate on average. Datasets and software are available from the authors.

Download to read the full chapter text

Chapter PDF

Fast and scalable Lasso via stochastic Frank–Wolfe methods with a convergence guarantee

Article 21 July 2016

Sparse classification: a scalable discrete optimization perspective

Article 02 November 2021

ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Candes, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Processing Magazine 25(2), 21–30 (2008)
Article Google Scholar
Donoho, D., Johnstone, I.M.: Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association 90, 1200–1224 (1995)
Article MATH MathSciNet Google Scholar
Duchi, J., Singer, Y.: Boosting with structural sparsity. In: ICML, p. 38 (2009)
Google Scholar
Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. In: Machine Learning, pp. 277–296 (1998)
Google Scholar
Garg, R., Khandekar, R.: Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property. In: ICML, pp. 337–344 (2009)
Google Scholar
Gentile, C., Littlestone, N.: The robustness of the p-norm algorithms. In: Proceeding of 12th Annual Conference on Computer Learning Theory, pp. 1–11. ACM Press, New York (1999)
Google Scholar
Graepel, T., Herbrich, R.: From margin to sparsity. In: Advances in Neural Information Processing Systems, vol. 13, pp. 210–216. MIT Press, Cambridge (2001)
Google Scholar
Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. Journal of Machine Learning Research 10, 777–801 (2009)
MathSciNet Google Scholar
Littlestone, N., Warmuth, M.: Relating data compression and learnability (1986)
Google Scholar
Novikoff, A.B.: On convergence proofs for perceptrons. In: Proceedings of the Symposium on the Mathematical Theory of Automata, vol. 12, pp. 615–622 (1963)
Google Scholar
Shalev-Shwartz, S., Srebro, N., Zhang, T.: Trading accuracy for sparsity. Technical report, Toyota Technological Institute at Chicago (2009)
Google Scholar
Shwartz, S.S., Tewari, A.: Stochastic methods for ℓ₁ regularized loss minimization. In: ICML, pp. 929–936. ACM, New York (2009)
Google Scholar
Zhang, T.: Adaptive forward-backward greedy algorithm for sparse learning with linear models. In: NIPS, pp. 1921–1928 (2008)
Google Scholar
Zhao, P., Yu, B.: On model selection consistency of lasso. Journal of Machine Learning Research 7, 2541–2563 (2006)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Sun Yat-Sen University, Guangzhou, China
Sihong Xie & Jiangtao Ren
IBM T.J. Watson Research Center, New York, USA
Wei Fan & Olivier Verscheure

Authors

Sihong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Wei Fan
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Verscheure
View author publications
You can also search for this author in PubMed Google Scholar
Jiangtao Ren
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Matemáticas, Estadística y Computación, Universidad de Cantabria, Avenida de los Castros, s/n, 39071, Santander, Spain
José Luis Balcázar
Yahoo! Research Barcelona, Avinguda Diagonal 177, 08018, Barcelona, Spain
Francesco Bonchi
Yahoo! Research Barcelona, Avinguda Diagnonal 177, 08018, Barcelona, Spain
Aristides Gionis
TAO, CNRS-INRIA-LRI, Université Paris-Sud, 91405, Orsay, France
Michèle Sebag

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, S., Fan, W., Verscheure, O., Ren, J. (2010). Efficient and Numerically Stable Sparse Learning. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15939-8_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-15939-8_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15938-1
Online ISBN: 978-3-642-15939-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient and Numerically Stable Sparse Learning

Abstract

Chapter PDF

Similar content being viewed by others

Fast and scalable Lasso via stochastic Frank–Wolfe methods with a convergence guarantee

Sparse classification: a scalable discrete optimization perspective

ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Efficient and Numerically Stable Sparse Learning

Abstract

Chapter PDF

Similar content being viewed by others

Fast and scalable Lasso via stochastic Frank–Wolfe methods with a convergence guarantee

Sparse classification: a scalable discrete optimization perspective

ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation