Abstract
Proximal methods have recently been shown to provide effective optimization procedures to solve the variational problems defining the ℓ1 regularization algorithms. The goal of the paper is twofold. First we discuss how proximal methods can be applied to solve a large class of machine learning algorithms which can be seen as extensions of ℓ1 regularization, namely structured sparsity regularization. For all these algorithms, it is possible to derive an optimization procedure which corresponds to an iterative projection algorithm. Second, we discuss the effect of a preconditioning of the optimization procedure achieved by adding a strictly convex functional to the objective function. Structured sparsity algorithms are usually based on minimizing a convex (not strictly convex) objective function and this might lead to undesired unstable behavior. We show that by perturbing the objective function by a small strictly convex term we often reduce substantially the number of required computations without affecting the prediction performance of the obtained solution.
Keywords
- Reproduce Kernel Hilbert Space
- Multiple Kernel Learning
- Group Lasso
- Proximal Method
- Regularization Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download to read the full chapter text
Chapter PDF
References
Argyriou, A., Hauser, R., Micchelli, C.A., Pontil, M.: A dc-programming algorithm for kernel selection. In: Proceedings of the Twenty-Third International Conference on Machine Learning (2006)
Bach, F.R., Lanckriet, G., Jordan, M.I.: Multiple kernel learning, conic duality, and the smo algorithm. In: ICML. ACM International Conference Proceeding Series, vol. 69 (2004)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Becker, S., Bobin, J., Candes, E.: Nesta: A fast and accurate first-order method for sparse recovery (2009)
Candès, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Statist. 35(6), 2313–2351 (2005)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics 57, 1413–1457 (2004)
De Mol, C., De Vito, E., Rosasco, L.: Elastic-net regularization in learning theory (2009)
De Mol, C., Mosci, S., Traskine, M., Verri, A.: A regularized method for selecting nested groups of relevant genes from microarray data. Journal of Computational Biology, 16 (2009)
Dontchev, A.L., Zolezzi, T.: Well-posed optimization problems. Lecture Notes in Mathematics, vol. 1543. Springer, Heidelberg (1993)
Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research 10, 2899–2934 (2009)
Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation for l1-minimization: Methodology and convergence. SIOPT 19(3), 1107–1130 (2008)
Jenatton, R., Audibert, J.-Y., Bach, F.: Structured variable selection with sparsity-inducing norms. Technical report, INRIA (2009)
Kubota, R.A., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)
Loris, I.: On the performance of algorithms for the minimization of \(l\sb 1\)-penalized functionals. Inverse Problems 25(3) 035008, 16 (2009)
Loris, I., Bertero, M., De Mol, C., Zanella, R., Zanni, L.: Accelerating gradient projection methods for ℓ1-constrained signal recovery by steplength selection rules (2009)
Micchelli, C.A., Pontil, M.: Learning the kernel function via regularization. J. Mach. Learn. Res. 6, 1099–1125 (2005)
Micchelli, C.A., Pontil, M.: Feature space perspectives for learning the kernel. Mach. Learn. 66(2-3), 297–319 (2007)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Obozinski, G., Taskar, B., Jordan, M.I.: Multi-task feature selection. Technical report, Dept. of Statistics, UC Berkeley (June 2006)
Rosasco, L., Mosci, S., Santoro, A., Verri, M., Villa, S.: Iterative projection methods for structured sparsity regularization. Technical Report MIT-CSAIL-TR-2009-050 CBCL-282 (October 2009)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 56, 267–288 (1996)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Image Process (2009)
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for ℓ1-minimization with applications to compressed sensing. SIAM J. Imaging Sciences 1(1), 143–168 (2008)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B 68(1), 49–67 (2006)
Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics 37(6A), 3468–3497 (2009)
Zou, Z., Hastie, T.: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67, 301–320 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mosci, S., Rosasco, L., Santoro, M., Verri, A., Villa, S. (2010). Solving Structured Sparsity Regularization with Proximal Methods. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6322. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15883-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-15883-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15882-7
Online ISBN: 978-3-642-15883-4
eBook Packages: Computer ScienceComputer Science (R0)