More Web Proxy on the site http://driver.im/

article

Optimization with Sparsity-Inducing Penalties

Authors:

Rodolphe Jenatton,

Guillaume ObozinskiAuthors Info & Claims

Foundations and Trends® in Machine Learning, Volume 4, Issue 1

Pages 1 - 106

https://doi.org/10.1561/2200000015

Published: 01 January 2012 Publication History

Abstract

Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate nonsmooth norms. The goal of this monograph is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted l₂-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view.

References

[1]

J. Abernethy, F. Bach, T. Evgeniou, and J.-P. Vert, "A new approach to collaborative filtering: Operator estimation with spectral regularization," Journal of Machine Learning Research, vol. 10, pp. 803-826, 2009.

Digital Library

[2]

J. Aflalo, A. Ben-Tal, C. Bhattacharyya, J. S. Nath, and S. Raman, "Variable sparsity kernel learning," Journal of Machine Learning Research, vol. 12, pp. 565-592, 2011.

Digital Library

[3]

M. Aharon, M. Elad, and A. Bruckstein, "K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation," IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311-4322, 2006.

Digital Library

[4]

Y. Amit, M. Fink, N. Srebro, and S. Ullman, "Uncovering shared structures in multiclass classification," in Proceedings of the International Conference on Machine Learning (ICML), 2007.

[5]

C. Archambeau and F. Bach, "Sparse probabilistic projections," in Advances in Neural Information Processing Systems (NIPS), 2008.

[6]

A. Argyriou, T. Evgeniou, and M. Pontil, "Convex multi-task feature learning," Machine Learning, vol. 73, no. 3, pp. 243-272, 2008.

Digital Library

[7]

F. Bach, "Consistency of the group lasso and multiple kernel learning," Journal of Machine Learning Research, vol. 9, pp. 1179-1225, 2008.

Digital Library

[8]

F. Bach, "Consistency of trace norm minimization," Journal of Machine Learning Research, vol. 9, pp. 1019-1048, 2008.

Digital Library

[9]

F. Bach, "Exploring large feature spaces with hierarchical multiple kernel learning," in Advances in Neural Information Processing Systems (NIPS), 2008.

[10]

F. Bach, "Structured sparsity-inducing norms through submodular functions," in Advances in Neural Information Processing Systems (NIPS), 2010.

[11]

F. Bach, "Shaping level sets with submodular functions," in Advances in Neural Information Processing Systems (NIPS), 2011.

[12]

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, "Convex optimization with sparsity-inducing norms," in Optimization for Machine Learning, (S. Sra, S. Nowozin, and S. J. Wright, eds.), MIT Press, 2011.

[13]

F. Bach, G. R. G. Lanckriet, and M. I. Jordan, "Multiple kernel learning, conic duality, and the SMO algorithm," in Proceedings of the International Conference on Machine Learning (ICML), 2004.

[14]

F. Bach, J. Mairal, and J. Ponce, "Convex sparse matrix factorizations," Preprint arXiv:0812.1869v1, 2008.

[15]

R. G. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, "Model-based compressive sensing," IEEE Transactions on Information Theory, vol. 56, no. 4, pp. 1982-2001, 2010.

Digital Library

[16]

F. L. Bauer, J. Stoer, and C. Witzgall, "Absolute and monotonic norms," Numerische Mathematik, vol. 3, no. 1, pp. 257-264, 1961.

Digital Library

[17]

A. Beck and M. Teboulle, "A fast iterative shrinkage-thresholding algorithm for linear inverse problems," SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183-202, 2009.

Digital Library

[18]

D. P. Bertsekas, Nonlinear Programming. Athena Scientific, 2nd ed., 1999.

[19]

P. Bickel, Y. Ritov, and A. Tsybakov, "Simultaneous analysis of Lasso and Dantzig selector," Annals of Statistics, vol. 37, no. 4, pp. 1705-1732, 2009.

[20]

J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer-Verlag, 2006.

[21]

L. Bottou, "Online algorithms and stochastic approximations," Online Learning and Neural Networks, vol. 5, 1998.

[22]

L. Bottou and O. Bousquet, "The tradeoffs of large scale learning," in Advances in Neural Information Processing Systems (NIPS), 2008.

[23]

L. Bottou and Y. LeCun, "Large scale online learning," in Advances in Neural Information Processing Systems (NIPS), 2004.

[24]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, "Distributed optimization and statistical learning via the alternating direction method of multipliers," Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1-124, 2011.

Digital Library

[25]

S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.

Digital Library

[26]

D. M. Bradley and J. A. Bagnell, "Convex coding," in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI), 2009.

[27]

P. Brucker, "An O(n) algorithm for quadratic knapsack problems," Operations Research Letters, vol. 3, no. 3, pp. 163-166, 1984.

Digital Library

[28]

C. Burges, "Dimension reduction: A guided tour," Machine Learning, vol. 2, no. 4, pp. 275-365, 2009.

[29]

J. F. Cai, E. J. Candès, and Z. Shen, "A singular value thresholding algorithm for matrix completion," SIAM Journal on Optimization, vol. 20, pp. 1956-1982, 2010.

Digital Library

[30]

E. J. Candès, M. Wakin, and S. Boyd, "Enhancing sparsity by reweighted L1 minimization," Journal of Fourier Analysis and Applications, vol. 14, no. 5, pp. 877-905, 2008.

[31]

F. Caron and A. Doucet, "Sparse Bayesian nonparametric regression," in Proceedings of the International Conference on Machine Learning (ICML), 2008.

[32]

V. Cehver, M. Duarte, C. Hedge, and R. G. Baraniuk, "Sparse signal recovery using markov random fields," in Advances in Neural Information Processing Systems (NIPS), 2008.

[33]

A. Chambolle, "Total variation minimization and a class of binary MRF models," in Proceedings of the fifth International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, 2005.

[34]

A. Chambolle and J. Darbon, "On total variation minimization and surface evolution using parametric maximum flows," International Journal of Computer Vision, vol. 84, no. 3, pp. 288-307, 2009.

Digital Library

[35]

V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, "The convex geometry of linear inverse problems," Preprint arXiv:1012.0621, 2010.

[36]

G. H. G. Chen and R. T. Rockafellar, "Convergence rates in forward-backward splitting," SIAM Journal on Optimization, vol. 7, no. 2, pp. 421-444, 1997.

Digital Library

[37]

S. S. Chen, D. L. Donoho, and M. A. Saunders, "Atomic decomposition by basis pursuit," SIAM Journal on Scientific Computing, vol. 20, pp. 33-61, 1999.

Digital Library

[38]

P. L. Combettes and J.-C. Pesquet, Fixed-Point Algorithms for Inverse Problems in Science and Engineering. Chapter Proximal Splitting Methods in Signal Processing. Springer-Verlag, 2011.

[39]

P. L. Combettes and V. R. Wajs, "Signal recovery by proximal forward-backward splitting," SIAM Multiscale Modeling and Simulation, vol. 4, no. 4, pp. 1168-1200, 2006.

[40]

S. F. Cotter, J. Adler, B. Rao, and K. Kreutz-Delgado, "Forward sequential algorithms for best basis selection," in IEEE Proceedings of Vision Image and Signal Processing, pp. 235-244, 1999.

[41]

I. Daubechies, R. DeVore, M. Fornasier, and C. S. Güntürk, "Iteratively reweighted least squares minimization for sparse recovery," Communications on Pure and Applied Mathematics, vol. 63, no. 1, pp. 1-38, 2010.

[42]

D. L. Donoho and I. M. Johnstone, "Adapting to unknown smoothness via wavelet shrinkage," Journal of the American Statistical Association, vol. 90, no. 432, pp. 1200-1224, 1995.

[43]

J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of Machine Learning Research, vol. 12, pp. 2121-2159, 2011.

Digital Library

[44]

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least angle regression," Annals of Statistics, vol. 32, no. 2, pp. 407-499, 2004.

[45]

M. Elad and M. Aharon, "Image denoising via sparse and redundant representations over learned dictionaries," IEEE Transactions on Image Processing, vol. 15, no. 12, pp. 3736-3745, 2006.

Digital Library

[46]

K. Engan, S. O. Aase, and H. Husoy et al., "Method of optimal directions for frame design," in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1999.

[47]

J. Fan and R. Li, "Variable selection via nonconcave penalized likelihood and its oracle properties," Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348-1360, 2001.

[48]

M. Fazel, H. Hindi, and S. Boyd, "A rank minimization heuristic with application to minimum order system approximation," in Proceedings of the American Control Conference, vol. 6, pp. 4734-4739, 2001.

[49]

J. Friedman, T. Hastie, and R. Tibshirani, "A note on the group lasso and a sparse group lasso," Preprint arXiv:1001:0736v1, 2010.

[50]

J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, 2001.

[51]

W. J. Fu, "Penalized regressions: The bridge versus the lasso," Journal of Computational and Graphical Statistics, vol. 7, no. 3, pp. 397-416, 1998.

[52]

G. Gasso, A. Rakotomamonjy, and S. Canu, "Recovering sparse signals with non-convex penalties and DC programming," IEEE Transactions on Signal Processing, vol. 57, no. 12, pp. 4686-4698, 2009.

Digital Library

[53]

A. Genkin, D. D. Lewis, and D. Madigan, "Large-scale bayesian logistic regression for text categorization," Technometrics, vol. 49, no. 3, pp. 291-304, 2007.

[54]

R. Glowinski and P. Le Tallec, Augmented Lagrangian and Operator-splitting Methods in Nonlinear Mechanics. Society for Industrial Mathematics, 1989.

[55]

Y. Grandvalet and S. Canu, "Outcomes of the equivalence of adaptive ridge with least absolute shrinkage," in Advances in Neural Information Processing Systems (NIPS), 1999.

[56]

R. Gribonval, V. Cevher, and M. E. Davies, "Compressible distributions for high-dimensional statistics," preprint arXiv:1102.1249v2, 2011.

[57]

Z. Harchaoui, "Méthodes à Noyaux pour la Détection," PhD thesis, Télécom ParisTech, 2008.

[58]

Z. Harchaoui and C. Lévy-Leduc, "Catching change-points with Lasso," in Advances in Neural Information Processing Systems (NIPS), 2008.

[59]

K. K. Herrity, A. C. Gilbert, and J. A. Tropp, "Sparse approximation via iterative thresholding," in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP), 2006.

[60]

C. Hu, J. Kwok, and W. Pan, "Accelerated gradient methods for stochastic optimization and online learning," in Advances in Neural Information Processing Systems (NIPS), 2009.

[61]

J. Huang and T. Zhang, "The benefit of group sparsity," Annals of Statistics, vol. 38, no. 4, pp. 1978-2004, 2010.

[62]

J. Huang, Z. Zhang, and D. Metaxas, "Learning with structured sparsity," in Proceedings of the International Conference on Machine Learning (ICML), 2009.

[63]

H. Ishwaran and J. S. Rao, "Spike and slab variable selection: frequentist and Bayesian strategies," Annals of Statistics, vol. 33, no. 2, pp. 730-773, 2005.

[64]

L. Jacob, G. Obozinski, and J.-P. Vert, "Group Lasso with overlaps and graph Lasso," in Proceedings of the International Conference on Machine Learning (ICML), 2009.

[65]

R. Jenatton, "Structured sparsity-inducing norms: Statistical and algorithmic properties with applications to neuroimaging," PhD thesis, Ecole Normale Supérieure de Cachan, 2012.

[66]

R. Jenatton, J.-Y. Audibert, and F. Bach, "Structured variable selection with sparsity-inducing norms," Journal of Machine Learning Research, vol. 12, pp. 2777-2824, 2011.

Digital Library

[67]

R. Jenatton, A. Gramfort, V. Michel, G. Obozinski, F. Bach, and B. Thirion, "Multi-scale mining of fMRI data with hierarchical structured sparsity," in International Workshop on Pattern Recognition in Neuroimaging (PRNI), 2011.

[68]

R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, "Proximal methods for sparse hierarchical dictionary learning," in Proceedings of the International Conference on Machine Learning (ICML), 2010.

[69]

R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, "Proximal methods for hierarchical sparse coding," Journal of Machine Learning Research, vol. 12, pp. 2297-2334, 2011.

Digital Library

[70]

R. Jenatton, G. Obozinski, and F. Bach, "Structured sparse principal component analysis," in Proceedings of International Workshop on Artificial Intelligence and Statistics (AISTATS), 2010.

[71]

S. C. Johnson, "Hierarchical clustering schemes," Psychometrika, vol. 32, no. 3, pp. 241-254, 1967.

[72]

K. Kavukcuoglu, M. A. Ranzato, R. Fergus, and Y. LeCun, "Learning invariant features through topographic filter maps," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009.

[73]

S. Kim and E. P. Xing, "Tree-guided group lasso for multi-task regression with structured sparsity," in Proceedings of the International Conference on Machine Learning (ICML), 2010.

[74]

G. S. Kimeldorf and G. Wahba, "Some results on Tchebycheffian spline functions," Journal of Mathematical Analysis and Applications, vol. 33, pp. 82-95, 1971.

[75]

K. Koh, S. J. Kim, and S. Boyd, "An interior-point method for large-scale l1-regularized logistic regression," Journal of Machine Learning Research, vol. 8, no. 8, pp. 1519-1555, 2007.

Digital Library

[76]

B. Krishnapuram, L. Carin, M. A. T. Figueiredo, and A. J. Hartemink, "Sparse multinomial logistic regression: Fast algorithms and generalization bounds," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 957-968, 2005.

Digital Library

[77]

G. R. G. Lanckriet, N. Cristianini, L. El Ghaoui, P. Bartlett, and M. I. Jordan, "Learning the kernel matrix with semidefinite programming," Journal of Machine Learning Research, vol. 5, pp. 27-72, 2004.

Digital Library

[78]

G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and W. S. Noble, "A statistical framework for genomic data fusion," Bioinformatics, vol. 20, pp. 2626-2635, 2004.

Digital Library

[79]

H. Lee, A. Battle, R. Raina, and A. Y. Ng, "Efficient sparse coding algorithms," in Advances in Neural Information Processing Systems (NIPS), 2007.

[80]

A. Lefèvre, F. Bach, and C. Févotte, "Itakura-Saito nonnegative matrix factorization with group sparsity," in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011.

[81]

P. L. Lions and B. Mercier, "Splitting algorithms for the sum of two nonlinear operators," SIAM Journal on Numerical Analysis, vol. 16, no. 6, pp. 964-979, 1979.

Digital Library

[82]

H. Liu, M. Palatucci, and J. Zhang, "Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery," in Proceedings of the International Conference on Machine Learning (ICML), 2009.

[83]

K. Lounici, M. Pontil, A. B. Tsybakov, and S. van de Geer, "Taking advantage of sparsity in multi-task learning," Preprint arXiv:0903.1468, 2009.

[84]

N. Maculan and G. Galdino de Paula, "A linear-time median-finding algorithm for projecting a vector on the simplex of Rⁿ," Operations Research Letters, vol. 8, no. 4, pp. 219-222, 1989.

Digital Library

[85]

J. Mairal, "Sparse coding for machine learning, image processing and computer vision," PhD thesis, Ecole Normale Supérieure de Cachan, http://tel.archives-ouvertes.fr/tel-00595312, 2010.

[86]

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, "Online learning for matrix factorization and sparse coding," Journal of Machine Learning Research, vol. 11, pp. 19-60, 2010.

Digital Library

[87]

J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, "Network flow algorithms for structured sparsity," in Advances in Neural Information Processing Systems (NIPS), 2010.

[88]

J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, "Convex and network flow optimization for structured sparsity," Journal of Machine Learning Research, vol. 12, pp. 2681-2720, 2011.

Digital Library

[89]

S. Mallat and Z. Zhang, "Matching pursuit in a time-frequency dictionary," IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397-3415, 1993.

Digital Library

[90]

H. Markowitz, "Portfolio selection," Journal of Finance, vol. 7, no. 1, pp. 77-91, 1952.

[91]

B. Martinet, "Régularisation d'inéquations variationnelles par approximations successives," Revue franaise d'informatique et de recherche opérationnelle, série rouge, 1970.

[92]

A. F. T. Martins, N. A. Smith, P. M. Q. Aguiar, and M. A. T. Figueiredo, "Structured sparsity in structured prediction," in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011.

[93]

C. A. Micchelli, J. M. Morales, and M. Pontil, "Regularizers for structured sparsity," Preprint arXiv:1010.0556v2, 2011.

[94]

J. J. Moreau, "Fonctions convexes duales et points proximaux dans un espace hilbertien," Comptes-Rendus de l'Académie des Sciences de Paris, Série A, Mathématiques, vol. 255, pp. 2897-2899, 1962.

[95]

S. Mosci, L. Rosasco, M. Santoro, A. Verri, and S. Villa, "Solving structured sparsity regularization with proximal methods," Machine Learning and Knowledge Discovery in Databases, pp. 418-433, 2010.

Digital Library

[96]

B. K. Natarajan, "Sparse approximate solutions to linear systems," SIAM Journal on Computing, vol. 24, pp. 227-234, 1995.

Digital Library

[97]

R. M. Neal, Bayesian Learning for Neural Networks, vol. 118. Springer Verlag, 1996.

[98]

D. Needell and J. Tropp, "CoSaMP: Iterative signal recovery from incomplete and inaccurate samples," Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301-321, 2009.

[99]

S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu, "A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers," in Advances in Neural Information Processing Systems (NIPS), 2009.

[100]

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 2004.

Digital Library

[101]

Y. Nesterov, "Smooth minimization of non-smooth functions," Mathematical Programming, vol. 103, no. 1, pp. 127-152, 2005.

Digital Library

[102]

Y. Nesterov, "Gradient methods for minimizing composite objective function," Technical Report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, Discussion paper, 2007.

[103]

Y. Nesterov, "Efficiency of coordinate descent methods on huge-scale optimization problems," Technical Report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, Discussion paper, 2010.

[104]

J. Nocedal and S. J. Wright, Numerical Optimization. Springer Verlag, 2nd ed., 2006.

[105]

G. Obozinski, L. Jacob, and J.-P. Vert, "Group Lasso with overlaps: the Latent Group Lasso approach," preprint HAL -- inria-00628498, 2011.

[106]

G. Obozinski, B. Taskar, and M. I. Jordan, "Joint covariate selection and joint subspace selection for multiple classification problems," Statistics and Computing, vol. 20, no. 2, pp. 231-252, 2009.

Digital Library

[107]

B. A. Olshausen and D. J. Field, "Emergence of simple-cell receptive field properties by learning a sparse code for natural images," Nature, vol. 381, pp. 607-609, 1996.

[108]

M. R. Osborne, B. Presnell, and B. A. Turlach, "On the Lasso and its dual," Journal of Computational and Graphical Statistics, vol. 9, no. 2, pp. 319-337, 2000.

[109]

M. Pontil, A. Argyriou, and T. Evgeniou, "Multi-task feature learning," in Advances in Neural Information Processing Systems (NIPS), 2007.

[110]

A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, "SimpleMKL," Journal of Machine Learning Research, vol. 9, pp. 2491-2521, 2008.

[111]

N. S. Rao, R. D. Nowak, S. J. Wright, and N. G. Kingsbury, "Convex approaches to model wavelet sparsity patterns," in International Conference on Image Processing (ICIP), 2011.

[112]

F. Rapaport, E. Barillot, and J.-P. Vert, "Classification of arrayCGH data using fused SVM," Bioinformatics, vol. 24, no. 13, pp. 375-382, 2008.

Digital Library

[113]

P. Ravikumar, J. Lafferty, H. Liu, and L. Wasserman, "Sparse additive models," Journal of the Royal Statistical Society. Series B, statistical methodology, vol. 71, pp. 1009-1030, 2009.

[114]

K. Ritter, "Ein verfahren zur lösung parameterabhängiger, nichtlinearer maximum-probleme," Mathematical Methods of Operations Research, vol. 6, no. 4, pp. 149-166, 1962.

[115]

R. T. Rockafellar, Convex Analysis. Princeton University Press, 1997.

[116]

V. Roth and B. Fischer, "The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms," in Proceedings of the International Conference on Machine Learning (ICML), 2008.

[117]

L. I. Rudin, S. Osher, and E. Fatemi, "Nonlinear total variation based noise removal algorithms," Physica D: Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259-268, 1992.

Digital Library

[118]

M. Schmidt, G. Fung, and R. Rosales, "Fast optimization methods for L1 regularization: A comparative study and two new approaches," in Proceedings of the European Conference on Machine Learning (ECML), 2007.

[119]

M. Schmidt, D. Kim, and S. Sra, "Projected Newton-type methods in machine learning," in Optimization for Machine Learning, (S. Sra, S. Nowozin, and S. J. Wright, eds.), MIT Press, 2011.

[120]

M. Schmidt, N. Le Roux, and F. Bach, "Convergence rates of inexact proximal-gradient methods for convex optimization," in Advances in Neural Information Processing Systems (NIPS), 2011.

[121]

M. Schmidt and K. Murphy, "Convex structure learning in log-linear models: Beyond pairwise potentials," in Proceedings of International Workshop on Artificial Intelligence and Statistics (AISTATS), 2010.

[122]

B. Schölkopf and A. J. Smola, Learning with Kernels. MIT Press, 2001.

[123]

M. W. Seeger, "Bayesian inference and optimal design for the sparse linear model," Journal of Machine Learning Research, vol. 9, pp. 759-813, 2008.

Digital Library

[124]

S. Shalev-Shwartz and A. Tewari, "Stochastic methods for l ₁-regularized loss minimization," in Proceedings of the International Conference on Machine Learning (ICML), 2009.

[125]

A. Shapiro, D. Dentcheva, and A. P. Ruszczynski, Lectures on Stochastic Programming: Modeling and Theory. Society for Industrial Mathematics, 2009.

[126]

J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.

Digital Library

[127]

S. K. Shevade and S. S. Keerthi, "A simple and efficient algorithm for gene selection using sparse logistic regression," Bioinformatics, vol. 19, no. 17, pp. 2246-2253, 2003. Oxford Univ Press.

[128]

S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, "Large scale multiple kernel learning," Journal of Machine Learning Research, vol. 7, pp. 1531-1565, 2006.

Digital Library

[129]

P. Sprechmann, I. Ramirez, G. Sapiro, and Y. C. Eldar, "C-HiLasso: A collaborative hierarchical sparse modeling framework," IEEE Transactions on Signal Processing, vol. 59, no. 9, pp. 4183-4198, 2011.

Digital Library

[130]

N. Srebro, J. D. M. Rennie, and T. S. Jaakkola, "Maximum-margin matrix factorization," in Advances in Neural Information Processing Systems (NIPS), 2005.

[131]

T. Suzuki and R. Tomioka, "SpicyMKL: A fast algorithm for multiple kernel learning with thousands of kernels," Machine Learning, vol. 85, pp. 77-108, 2011.

Digital Library

[132]

M. Szafranski, Y. Grandvalet, and P. Morizet-Mahoudeaux, "Hierarchical penalization," in Advances in Neural Information Processing Systems (NIPS), 2007.

[133]

R. Tibshirani, "Regression shrinkage and selection via the Lasso," Journal of the Royal Statistical Society Series B, vol. 58, no. 1, pp. 267-288, 1996.

[134]

R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, "Sparsity and smoothness via the fused Lasso," Journal of the Royal Statistical Society Series B, vol. 67, no. 1, pp. 91-108, 2005.

[135]

R. Tomioka, T. Suzuki, and M. Sugiyama, "Augmented Lagrangian methods for learning, selecting and combining features," in Optimization for Machine Learning, (S. Sra, S. Nowozin, and S. J. Wright, eds.), MIT Press, 2011.

[136]

J. A. Tropp, "Greed is good: Algorithmic results for sparse approximation," IEEE Transactions on Signal Processing, vol. 50, no. 10, pp. 2231-2242, 2004.

Digital Library

[137]

J. A. Tropp, A. C. Gilbert, and M. J. Strauss, "Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit, "Signal Processing, Special Issue "Sparse Approximations in Signal and Image Processing", vol. 86, pp. 572-588, 2006.

Digital Library

[138]

P. Tseng, "Applications of a splitting algorithm to decomposition in convex programming and variational inequalities," SIAM Journal on Control and Optimization, vol. 29, pp. 119-138, 1991.

Digital Library

[139]

P. Tseng, "On accelerated proximal gradient methods for convex-concave optimization," submitted to SIAM Journal on Optimization, 2008.

[140]

P. Tseng and S. Yun, "A coordinate gradient descent method for nonsmooth separable minimization," Mathematical Programming, vol. 117, no. 1, pp. 387-423, 2009.

Digital Library

[141]

B. A. Turlach, W. N. Venables, and S. J. Wright, "Simultaneous variable selection," Technometrics, vol. 47, no. 3, pp. 349-363, 2005.

[142]

E. Van den Berg, M. Schmidt, M. P. Friedlander, and K. Murphy, "Group sparsity via linear-time projections," Technical report, University of British Columbia, Technical Report number TR-2008-09, 2008.

[143]

G. Varoquaux, R. Jenatton, A. Gramfort, G. Obozinski, B. Thirion, and F. Bach, "Sparse structured dictionary learning for brain resting-state activity modeling," in NIPS Workshop on Practical Applications of Sparse Modeling: Open Issues and New Directions, 2010.

[144]

J.-P. Vert and K. Bleakley, "Fast detection of multiple change-points shared by many signals using group LARS," in Advances in Neural Information Processing Systems (NIPS), 2010.

[145]

M. J. Wainwright, "Sharp thresholds for noisy and high-dimensional recovery of sparsity using l ₁-constrained quadratic programming," IEEE Transactions on Information Theory, vol. 55, no. 5, pp. 2183-2202, 2009.

Digital Library

[146]

S. Weisberg, Applied Linear Regression. Wiley, 1980.

[147]

D. P. Wipf and S. Nagarajan, "A new view of automatic relevance determination," Advances in Neural Information Processing Systems (NIPS), 2008.

[148]

D. P. Wipf and B. D. Rao, "Sparse bayesian learning for basis selection," vol. 52, no. 8, pp. 2153-2164, 2004.

[149]

J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, "Robust face recognition via sparse representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.

Digital Library

[150]

S. J. Wright, "Accelerated block-coordinate relaxation for regularized optimization," Technical report, Technical report, University of Wisconsin-Madison, 2010.

[151]

S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, "Sparse reconstruction by separable approximation," IEEE Transactions on Signal Processing, vol. 57, no. 7, pp. 2479-2493, 2009.

Digital Library

[152]

T. T. Wu and K. Lange, "Coordinate descent algorithms for lasso penalized regression," Annals of Applied Statistics, vol. 2, no. 1, pp. 224-244, 2008.

[153]

L. Xiao, "Dual averaging methods for regularized stochastic learning and online optimization," Journal of Machine Learning Research, vol. 9, pp. 2543-2596, 2010.

[154]

H. Xu, C. Caramanis, and S. Sanghavi, "Robust PCA via outlier pursuit," Preprint arXiv:1010.4237, 2010.

[155]

G. X. Yuan, K. W. Chang, C. J. Hsieh, and C. J. Lin, "A comparison of optimization methods for large-scale l1-regularized linear classification," Technical Report, Department of Computer Science, National University of Taiwan, 2010.

[156]

M. Yuan and Y. Lin, "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, vol. 68, pp. 49-67, 2006.

[157]

P. Zhao, G. Rocha, and B. Yu, "The composite absolute penalties family for grouped and hierarchical variable selection," Annals of Statistics, vol. 37, no. 6A, pp. 3468-3497, 2009.

[158]

P. Zhao and B. Yu, "On model selection consistency of Lasso," Journal of Machine Learning Research, vol. 7, pp. 2541-2563, 2006.

Digital Library

[159]

H. Zou and T. Hastie, "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, vol. 67, no. 2, pp. 301-320, 2005.

Cited By

Xu QYang YDeng B(2024)Point cloud denoising using a generalized error metricGraphical Models10.1016/j.gmod.2024.101216133:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.gmod.2024.101216
Ohl LMattei PBouveyron CLeclercq MDroit APrecioso F(2024)Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selectionStatistics and Computing10.1007/s11222-024-10467-934:5Online publication date: 17-Jul-2024
https://dl.acm.org/doi/10.1007/s11222-024-10467-9
Ke CShin SLou YAhn M(2024)A Generalized Formulation for Group Selection via ADMMJournal of Scientific Computing10.1007/s10915-024-02571-9100:1Online publication date: 31-May-2024
https://dl.acm.org/doi/10.1007/s10915-024-02571-9
Show More Cited By

Optimization with Sparsity-Inducing Penalties
1. Information systems
  1. Data management systems
    1. Database administration
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Optimization with Sparsity-Inducing Penalties (Foundations and Trends(R) in Machine Learning)
Low-Rank Tensor Completion via Novel Sparsity-Inducing Regularizers
To alleviate the bias generated by the <inline-formula><tex-math notation="LaTeX">$\ell_{1}$</tex-math></inline-formula>-norm in the low-rank tensor completion problem, nonconvex surrogates/regularizers have been suggested to replace the tensor nuclear ...
Gap safe screening rules for sparsity enforcing penalties

In high dimensional regression settings, sparsity enforcing penalties have proved useful to regularize the data-fitting term. A recently introduced technique called screening rules propose to ignore some variables in the optimization leveraging the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Foundations and Trends® in Machine Learning

Foundations and Trends® in Machine Learning Volume 4, Issue 1

January 2012

108 pages

ISSN:1935-8237

EISSN:1935-8245

Issue’s Table of Contents

Publisher

Now Publishers Inc.

Hanover, MA, United States

Publication History

Published: 01 January 2012

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

188
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu QYang YDeng B(2024)Point cloud denoising using a generalized error metricGraphical Models10.1016/j.gmod.2024.101216133:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.gmod.2024.101216
Ohl LMattei PBouveyron CLeclercq MDroit APrecioso F(2024)Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selectionStatistics and Computing10.1007/s11222-024-10467-934:5Online publication date: 17-Jul-2024
https://dl.acm.org/doi/10.1007/s11222-024-10467-9
Ke CShin SLou YAhn M(2024)A Generalized Formulation for Group Selection via ADMMJournal of Scientific Computing10.1007/s10915-024-02571-9100:1Online publication date: 31-May-2024
https://dl.acm.org/doi/10.1007/s10915-024-02571-9
Żogała-Siudem BJaroszewicz S(2024)Variable screening for Lasso based on multidimensional indexingData Mining and Knowledge Discovery10.1007/s10618-023-00950-838:1(49-78)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s10618-023-00950-8
Jia JLiu JRam PYao YLiu GLiu YSharma PLiu SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Model sparsity can simplify machine unlearningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668368(51584-51605)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668368
Dwaraknath RErgen TPilanci MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Fixing the NTKProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666198(1539-1559)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666198
Keurti HPan HBesserve MGrewe BSchölkopf BKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Homomorphism autoencoder—learning group structured representations from observed transitionsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619071(16190-16215)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619071
Cuturi MKlein MAblin PKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Monge, bregman and occamProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618675(6671-6682)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618675
Herin MPerny PSokolovska NElkind E(2023)Learning preference models with sparse interactions of criteriaProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/421(3786-3794)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/421
Zhang YYamada I(2023)A Unified Framework for Solving a General Class of Nonconvexly Regularized Convex ModelsIEEE Transactions on Signal Processing10.1109/TSP.2023.331544971(3518-3533)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TSP.2023.3315449
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents