Abstract
We consider the problem of numerical stability and model density growth when training a sparse linear model from massive data. We focus on scalable algorithms that optimize certain loss function using gradient descent, with either ℓ0 or ℓ1 regularization. We observed numerical stability problems in several existing methods, leading to divergence and low accuracy. In addition, these methods typically have weak controls over sparsity, such that model density grows faster than necessary. We propose a framework to address the above problems. First, the update rule is numerically stable with convergence guarantee and results in more reasonable models. Second, besides ℓ1 regularization, it exploits the sparsity of data distribution and achieves a higher degree of sparsity with a PAC generalization error bound. Lastly, it is parallelizable and suitable for training large margin classifiers on huge datasets. Experiments show that the proposed method converges consistently and outperforms other baselines using 10% of features by as much as 6% reduction in error rate on average. Datasets and software are available from the authors.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Candes, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Processing Magazine 25(2), 21–30 (2008)
Donoho, D., Johnstone, I.M.: Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association 90, 1200–1224 (1995)
Duchi, J., Singer, Y.: Boosting with structural sparsity. In: ICML, p. 38 (2009)
Freund, Y., Schapire, R.E.: Large margin classification using the perceptron algorithm. In: Machine Learning, pp. 277–296 (1998)
Garg, R., Khandekar, R.: Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property. In: ICML, pp. 337–344 (2009)
Gentile, C., Littlestone, N.: The robustness of the p-norm algorithms. In: Proceeding of 12th Annual Conference on Computer Learning Theory, pp. 1–11. ACM Press, New York (1999)
Graepel, T., Herbrich, R.: From margin to sparsity. In: Advances in Neural Information Processing Systems, vol. 13, pp. 210–216. MIT Press, Cambridge (2001)
Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. Journal of Machine Learning Research 10, 777–801 (2009)
Littlestone, N., Warmuth, M.: Relating data compression and learnability (1986)
Novikoff, A.B.: On convergence proofs for perceptrons. In: Proceedings of the Symposium on the Mathematical Theory of Automata, vol. 12, pp. 615–622 (1963)
Shalev-Shwartz, S., Srebro, N., Zhang, T.: Trading accuracy for sparsity. Technical report, Toyota Technological Institute at Chicago (2009)
Shwartz, S.S., Tewari, A.: Stochastic methods for ℓ1 regularized loss minimization. In: ICML, pp. 929–936. ACM, New York (2009)
Zhang, T.: Adaptive forward-backward greedy algorithm for sparse learning with linear models. In: NIPS, pp. 1921–1928 (2008)
Zhao, P., Yu, B.: On model selection consistency of lasso. Journal of Machine Learning Research 7, 2541–2563 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xie, S., Fan, W., Verscheure, O., Ren, J. (2010). Efficient and Numerically Stable Sparse Learning. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15939-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-15939-8_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15938-1
Online ISBN: 978-3-642-15939-8
eBook Packages: Computer ScienceComputer Science (R0)