[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Optimization with Sparsity-Inducing Penalties

Published: 01 January 2012 Publication History

Abstract

Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate nonsmooth norms. The goal of this monograph is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted l2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view.

References

[1]
J. Abernethy, F. Bach, T. Evgeniou, and J.-P. Vert, "A new approach to collaborative filtering: Operator estimation with spectral regularization," Journal of Machine Learning Research, vol. 10, pp. 803-826, 2009.
[2]
J. Aflalo, A. Ben-Tal, C. Bhattacharyya, J. S. Nath, and S. Raman, "Variable sparsity kernel learning," Journal of Machine Learning Research, vol. 12, pp. 565-592, 2011.
[3]
M. Aharon, M. Elad, and A. Bruckstein, "K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation," IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311-4322, 2006.
[4]
Y. Amit, M. Fink, N. Srebro, and S. Ullman, "Uncovering shared structures in multiclass classification," in Proceedings of the International Conference on Machine Learning (ICML), 2007.
[5]
C. Archambeau and F. Bach, "Sparse probabilistic projections," in Advances in Neural Information Processing Systems (NIPS), 2008.
[6]
A. Argyriou, T. Evgeniou, and M. Pontil, "Convex multi-task feature learning," Machine Learning, vol. 73, no. 3, pp. 243-272, 2008.
[7]
F. Bach, "Consistency of the group lasso and multiple kernel learning," Journal of Machine Learning Research, vol. 9, pp. 1179-1225, 2008.
[8]
F. Bach, "Consistency of trace norm minimization," Journal of Machine Learning Research, vol. 9, pp. 1019-1048, 2008.
[9]
F. Bach, "Exploring large feature spaces with hierarchical multiple kernel learning," in Advances in Neural Information Processing Systems (NIPS), 2008.
[10]
F. Bach, "Structured sparsity-inducing norms through submodular functions," in Advances in Neural Information Processing Systems (NIPS), 2010.
[11]
F. Bach, "Shaping level sets with submodular functions," in Advances in Neural Information Processing Systems (NIPS), 2011.
[12]
F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, "Convex optimization with sparsity-inducing norms," in Optimization for Machine Learning, (S. Sra, S. Nowozin, and S. J. Wright, eds.), MIT Press, 2011.
[13]
F. Bach, G. R. G. Lanckriet, and M. I. Jordan, "Multiple kernel learning, conic duality, and the SMO algorithm," in Proceedings of the International Conference on Machine Learning (ICML), 2004.
[14]
F. Bach, J. Mairal, and J. Ponce, "Convex sparse matrix factorizations," Preprint arXiv:0812.1869v1, 2008.
[15]
R. G. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, "Model-based compressive sensing," IEEE Transactions on Information Theory, vol. 56, no. 4, pp. 1982-2001, 2010.
[16]
F. L. Bauer, J. Stoer, and C. Witzgall, "Absolute and monotonic norms," Numerische Mathematik, vol. 3, no. 1, pp. 257-264, 1961.
[17]
A. Beck and M. Teboulle, "A fast iterative shrinkage-thresholding algorithm for linear inverse problems," SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183-202, 2009.
[18]
D. P. Bertsekas, Nonlinear Programming. Athena Scientific, 2nd ed., 1999.
[19]
P. Bickel, Y. Ritov, and A. Tsybakov, "Simultaneous analysis of Lasso and Dantzig selector," Annals of Statistics, vol. 37, no. 4, pp. 1705-1732, 2009.
[20]
J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer-Verlag, 2006.
[21]
L. Bottou, "Online algorithms and stochastic approximations," Online Learning and Neural Networks, vol. 5, 1998.
[22]
L. Bottou and O. Bousquet, "The tradeoffs of large scale learning," in Advances in Neural Information Processing Systems (NIPS), 2008.
[23]
L. Bottou and Y. LeCun, "Large scale online learning," in Advances in Neural Information Processing Systems (NIPS), 2004.
[24]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, "Distributed optimization and statistical learning via the alternating direction method of multipliers," Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1-124, 2011.
[25]
S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
[26]
D. M. Bradley and J. A. Bagnell, "Convex coding," in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI), 2009.
[27]
P. Brucker, "An O(n) algorithm for quadratic knapsack problems," Operations Research Letters, vol. 3, no. 3, pp. 163-166, 1984.
[28]
C. Burges, "Dimension reduction: A guided tour," Machine Learning, vol. 2, no. 4, pp. 275-365, 2009.
[29]
J. F. Cai, E. J. Candès, and Z. Shen, "A singular value thresholding algorithm for matrix completion," SIAM Journal on Optimization, vol. 20, pp. 1956-1982, 2010.
[30]
E. J. Candès, M. Wakin, and S. Boyd, "Enhancing sparsity by reweighted L1 minimization," Journal of Fourier Analysis and Applications, vol. 14, no. 5, pp. 877-905, 2008.
[31]
F. Caron and A. Doucet, "Sparse Bayesian nonparametric regression," in Proceedings of the International Conference on Machine Learning (ICML), 2008.
[32]
V. Cehver, M. Duarte, C. Hedge, and R. G. Baraniuk, "Sparse signal recovery using markov random fields," in Advances in Neural Information Processing Systems (NIPS), 2008.
[33]
A. Chambolle, "Total variation minimization and a class of binary MRF models," in Proceedings of the fifth International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, 2005.
[34]
A. Chambolle and J. Darbon, "On total variation minimization and surface evolution using parametric maximum flows," International Journal of Computer Vision, vol. 84, no. 3, pp. 288-307, 2009.
[35]
V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, "The convex geometry of linear inverse problems," Preprint arXiv:1012.0621, 2010.
[36]
G. H. G. Chen and R. T. Rockafellar, "Convergence rates in forward-backward splitting," SIAM Journal on Optimization, vol. 7, no. 2, pp. 421-444, 1997.
[37]
S. S. Chen, D. L. Donoho, and M. A. Saunders, "Atomic decomposition by basis pursuit," SIAM Journal on Scientific Computing, vol. 20, pp. 33-61, 1999.
[38]
P. L. Combettes and J.-C. Pesquet, Fixed-Point Algorithms for Inverse Problems in Science and Engineering. Chapter Proximal Splitting Methods in Signal Processing. Springer-Verlag, 2011.
[39]
P. L. Combettes and V. R. Wajs, "Signal recovery by proximal forward-backward splitting," SIAM Multiscale Modeling and Simulation, vol. 4, no. 4, pp. 1168-1200, 2006.
[40]
S. F. Cotter, J. Adler, B. Rao, and K. Kreutz-Delgado, "Forward sequential algorithms for best basis selection," in IEEE Proceedings of Vision Image and Signal Processing, pp. 235-244, 1999.
[41]
I. Daubechies, R. DeVore, M. Fornasier, and C. S. Güntürk, "Iteratively reweighted least squares minimization for sparse recovery," Communications on Pure and Applied Mathematics, vol. 63, no. 1, pp. 1-38, 2010.
[42]
D. L. Donoho and I. M. Johnstone, "Adapting to unknown smoothness via wavelet shrinkage," Journal of the American Statistical Association, vol. 90, no. 432, pp. 1200-1224, 1995.
[43]
J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of Machine Learning Research, vol. 12, pp. 2121-2159, 2011.
[44]
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least angle regression," Annals of Statistics, vol. 32, no. 2, pp. 407-499, 2004.
[45]
M. Elad and M. Aharon, "Image denoising via sparse and redundant representations over learned dictionaries," IEEE Transactions on Image Processing, vol. 15, no. 12, pp. 3736-3745, 2006.
[46]
K. Engan, S. O. Aase, and H. Husoy et al., "Method of optimal directions for frame design," in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1999.
[47]
J. Fan and R. Li, "Variable selection via nonconcave penalized likelihood and its oracle properties," Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348-1360, 2001.
[48]
M. Fazel, H. Hindi, and S. Boyd, "A rank minimization heuristic with application to minimum order system approximation," in Proceedings of the American Control Conference, vol. 6, pp. 4734-4739, 2001.
[49]
J. Friedman, T. Hastie, and R. Tibshirani, "A note on the group lasso and a sparse group lasso," Preprint arXiv:1001:0736v1, 2010.
[50]
J. H. Friedman, "Greedy function approximation: a gradient boosting machine," Annals of Statistics, vol. 29, no. 5, pp. 1189-1232, 2001.
[51]
W. J. Fu, "Penalized regressions: The bridge versus the lasso," Journal of Computational and Graphical Statistics, vol. 7, no. 3, pp. 397-416, 1998.
[52]
G. Gasso, A. Rakotomamonjy, and S. Canu, "Recovering sparse signals with non-convex penalties and DC programming," IEEE Transactions on Signal Processing, vol. 57, no. 12, pp. 4686-4698, 2009.
[53]
A. Genkin, D. D. Lewis, and D. Madigan, "Large-scale bayesian logistic regression for text categorization," Technometrics, vol. 49, no. 3, pp. 291-304, 2007.
[54]
R. Glowinski and P. Le Tallec, Augmented Lagrangian and Operator-splitting Methods in Nonlinear Mechanics. Society for Industrial Mathematics, 1989.
[55]
Y. Grandvalet and S. Canu, "Outcomes of the equivalence of adaptive ridge with least absolute shrinkage," in Advances in Neural Information Processing Systems (NIPS), 1999.
[56]
R. Gribonval, V. Cevher, and M. E. Davies, "Compressible distributions for high-dimensional statistics," preprint arXiv:1102.1249v2, 2011.
[57]
Z. Harchaoui, "Méthodes à Noyaux pour la Détection," PhD thesis, Télécom ParisTech, 2008.
[58]
Z. Harchaoui and C. Lévy-Leduc, "Catching change-points with Lasso," in Advances in Neural Information Processing Systems (NIPS), 2008.
[59]
K. K. Herrity, A. C. Gilbert, and J. A. Tropp, "Sparse approximation via iterative thresholding," in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP), 2006.
[60]
C. Hu, J. Kwok, and W. Pan, "Accelerated gradient methods for stochastic optimization and online learning," in Advances in Neural Information Processing Systems (NIPS), 2009.
[61]
J. Huang and T. Zhang, "The benefit of group sparsity," Annals of Statistics, vol. 38, no. 4, pp. 1978-2004, 2010.
[62]
J. Huang, Z. Zhang, and D. Metaxas, "Learning with structured sparsity," in Proceedings of the International Conference on Machine Learning (ICML), 2009.
[63]
H. Ishwaran and J. S. Rao, "Spike and slab variable selection: frequentist and Bayesian strategies," Annals of Statistics, vol. 33, no. 2, pp. 730-773, 2005.
[64]
L. Jacob, G. Obozinski, and J.-P. Vert, "Group Lasso with overlaps and graph Lasso," in Proceedings of the International Conference on Machine Learning (ICML), 2009.
[65]
R. Jenatton, "Structured sparsity-inducing norms: Statistical and algorithmic properties with applications to neuroimaging," PhD thesis, Ecole Normale Supérieure de Cachan, 2012.
[66]
R. Jenatton, J.-Y. Audibert, and F. Bach, "Structured variable selection with sparsity-inducing norms," Journal of Machine Learning Research, vol. 12, pp. 2777-2824, 2011.
[67]
R. Jenatton, A. Gramfort, V. Michel, G. Obozinski, F. Bach, and B. Thirion, "Multi-scale mining of fMRI data with hierarchical structured sparsity," in International Workshop on Pattern Recognition in Neuroimaging (PRNI), 2011.
[68]
R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, "Proximal methods for sparse hierarchical dictionary learning," in Proceedings of the International Conference on Machine Learning (ICML), 2010.
[69]
R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, "Proximal methods for hierarchical sparse coding," Journal of Machine Learning Research, vol. 12, pp. 2297-2334, 2011.
[70]
R. Jenatton, G. Obozinski, and F. Bach, "Structured sparse principal component analysis," in Proceedings of International Workshop on Artificial Intelligence and Statistics (AISTATS), 2010.
[71]
S. C. Johnson, "Hierarchical clustering schemes," Psychometrika, vol. 32, no. 3, pp. 241-254, 1967.
[72]
K. Kavukcuoglu, M. A. Ranzato, R. Fergus, and Y. LeCun, "Learning invariant features through topographic filter maps," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009.
[73]
S. Kim and E. P. Xing, "Tree-guided group lasso for multi-task regression with structured sparsity," in Proceedings of the International Conference on Machine Learning (ICML), 2010.
[74]
G. S. Kimeldorf and G. Wahba, "Some results on Tchebycheffian spline functions," Journal of Mathematical Analysis and Applications, vol. 33, pp. 82-95, 1971.
[75]
K. Koh, S. J. Kim, and S. Boyd, "An interior-point method for large-scale l1-regularized logistic regression," Journal of Machine Learning Research, vol. 8, no. 8, pp. 1519-1555, 2007.
[76]
B. Krishnapuram, L. Carin, M. A. T. Figueiredo, and A. J. Hartemink, "Sparse multinomial logistic regression: Fast algorithms and generalization bounds," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 957-968, 2005.
[77]
G. R. G. Lanckriet, N. Cristianini, L. El Ghaoui, P. Bartlett, and M. I. Jordan, "Learning the kernel matrix with semidefinite programming," Journal of Machine Learning Research, vol. 5, pp. 27-72, 2004.
[78]
G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and W. S. Noble, "A statistical framework for genomic data fusion," Bioinformatics, vol. 20, pp. 2626-2635, 2004.
[79]
H. Lee, A. Battle, R. Raina, and A. Y. Ng, "Efficient sparse coding algorithms," in Advances in Neural Information Processing Systems (NIPS), 2007.
[80]
A. Lefèvre, F. Bach, and C. Févotte, "Itakura-Saito nonnegative matrix factorization with group sparsity," in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011.
[81]
P. L. Lions and B. Mercier, "Splitting algorithms for the sum of two nonlinear operators," SIAM Journal on Numerical Analysis, vol. 16, no. 6, pp. 964-979, 1979.
[82]
H. Liu, M. Palatucci, and J. Zhang, "Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery," in Proceedings of the International Conference on Machine Learning (ICML), 2009.
[83]
K. Lounici, M. Pontil, A. B. Tsybakov, and S. van de Geer, "Taking advantage of sparsity in multi-task learning," Preprint arXiv:0903.1468, 2009.
[84]
N. Maculan and G. Galdino de Paula, "A linear-time median-finding algorithm for projecting a vector on the simplex of R n ," Operations Research Letters, vol. 8, no. 4, pp. 219-222, 1989.
[85]
J. Mairal, "Sparse coding for machine learning, image processing and computer vision," PhD thesis, Ecole Normale Supérieure de Cachan, http://tel.archives-ouvertes.fr/tel-00595312, 2010.
[86]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro, "Online learning for matrix factorization and sparse coding," Journal of Machine Learning Research, vol. 11, pp. 19-60, 2010.
[87]
J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, "Network flow algorithms for structured sparsity," in Advances in Neural Information Processing Systems (NIPS), 2010.
[88]
J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, "Convex and network flow optimization for structured sparsity," Journal of Machine Learning Research, vol. 12, pp. 2681-2720, 2011.
[89]
S. Mallat and Z. Zhang, "Matching pursuit in a time-frequency dictionary," IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397-3415, 1993.
[90]
H. Markowitz, "Portfolio selection," Journal of Finance, vol. 7, no. 1, pp. 77-91, 1952.
[91]
B. Martinet, "Régularisation d'inéquations variationnelles par approximations successives," Revue franaise d'informatique et de recherche opérationnelle, série rouge, 1970.
[92]
A. F. T. Martins, N. A. Smith, P. M. Q. Aguiar, and M. A. T. Figueiredo, "Structured sparsity in structured prediction," in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011.
[93]
C. A. Micchelli, J. M. Morales, and M. Pontil, "Regularizers for structured sparsity," Preprint arXiv:1010.0556v2, 2011.
[94]
J. J. Moreau, "Fonctions convexes duales et points proximaux dans un espace hilbertien," Comptes-Rendus de l'Académie des Sciences de Paris, Série A, Mathématiques, vol. 255, pp. 2897-2899, 1962.
[95]
S. Mosci, L. Rosasco, M. Santoro, A. Verri, and S. Villa, "Solving structured sparsity regularization with proximal methods," Machine Learning and Knowledge Discovery in Databases, pp. 418-433, 2010.
[96]
B. K. Natarajan, "Sparse approximate solutions to linear systems," SIAM Journal on Computing, vol. 24, pp. 227-234, 1995.
[97]
R. M. Neal, Bayesian Learning for Neural Networks, vol. 118. Springer Verlag, 1996.
[98]
D. Needell and J. Tropp, "CoSaMP: Iterative signal recovery from incomplete and inaccurate samples," Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301-321, 2009.
[99]
S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu, "A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers," in Advances in Neural Information Processing Systems (NIPS), 2009.
[100]
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 2004.
[101]
Y. Nesterov, "Smooth minimization of non-smooth functions," Mathematical Programming, vol. 103, no. 1, pp. 127-152, 2005.
[102]
Y. Nesterov, "Gradient methods for minimizing composite objective function," Technical Report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, Discussion paper, 2007.
[103]
Y. Nesterov, "Efficiency of coordinate descent methods on huge-scale optimization problems," Technical Report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, Discussion paper, 2010.
[104]
J. Nocedal and S. J. Wright, Numerical Optimization. Springer Verlag, 2nd ed., 2006.
[105]
G. Obozinski, L. Jacob, and J.-P. Vert, "Group Lasso with overlaps: the Latent Group Lasso approach," preprint HAL -- inria-00628498, 2011.
[106]
G. Obozinski, B. Taskar, and M. I. Jordan, "Joint covariate selection and joint subspace selection for multiple classification problems," Statistics and Computing, vol. 20, no. 2, pp. 231-252, 2009.
[107]
B. A. Olshausen and D. J. Field, "Emergence of simple-cell receptive field properties by learning a sparse code for natural images," Nature, vol. 381, pp. 607-609, 1996.
[108]
M. R. Osborne, B. Presnell, and B. A. Turlach, "On the Lasso and its dual," Journal of Computational and Graphical Statistics, vol. 9, no. 2, pp. 319-337, 2000.
[109]
M. Pontil, A. Argyriou, and T. Evgeniou, "Multi-task feature learning," in Advances in Neural Information Processing Systems (NIPS), 2007.
[110]
A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet, "SimpleMKL," Journal of Machine Learning Research, vol. 9, pp. 2491-2521, 2008.
[111]
N. S. Rao, R. D. Nowak, S. J. Wright, and N. G. Kingsbury, "Convex approaches to model wavelet sparsity patterns," in International Conference on Image Processing (ICIP), 2011.
[112]
F. Rapaport, E. Barillot, and J.-P. Vert, "Classification of arrayCGH data using fused SVM," Bioinformatics, vol. 24, no. 13, pp. 375-382, 2008.
[113]
P. Ravikumar, J. Lafferty, H. Liu, and L. Wasserman, "Sparse additive models," Journal of the Royal Statistical Society. Series B, statistical methodology, vol. 71, pp. 1009-1030, 2009.
[114]
K. Ritter, "Ein verfahren zur lösung parameterabhängiger, nichtlinearer maximum-probleme," Mathematical Methods of Operations Research, vol. 6, no. 4, pp. 149-166, 1962.
[115]
R. T. Rockafellar, Convex Analysis. Princeton University Press, 1997.
[116]
V. Roth and B. Fischer, "The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms," in Proceedings of the International Conference on Machine Learning (ICML), 2008.
[117]
L. I. Rudin, S. Osher, and E. Fatemi, "Nonlinear total variation based noise removal algorithms," Physica D: Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259-268, 1992.
[118]
M. Schmidt, G. Fung, and R. Rosales, "Fast optimization methods for L1 regularization: A comparative study and two new approaches," in Proceedings of the European Conference on Machine Learning (ECML), 2007.
[119]
M. Schmidt, D. Kim, and S. Sra, "Projected Newton-type methods in machine learning," in Optimization for Machine Learning, (S. Sra, S. Nowozin, and S. J. Wright, eds.), MIT Press, 2011.
[120]
M. Schmidt, N. Le Roux, and F. Bach, "Convergence rates of inexact proximal-gradient methods for convex optimization," in Advances in Neural Information Processing Systems (NIPS), 2011.
[121]
M. Schmidt and K. Murphy, "Convex structure learning in log-linear models: Beyond pairwise potentials," in Proceedings of International Workshop on Artificial Intelligence and Statistics (AISTATS), 2010.
[122]
B. Schölkopf and A. J. Smola, Learning with Kernels. MIT Press, 2001.
[123]
M. W. Seeger, "Bayesian inference and optimal design for the sparse linear model," Journal of Machine Learning Research, vol. 9, pp. 759-813, 2008.
[124]
S. Shalev-Shwartz and A. Tewari, "Stochastic methods for l 1-regularized loss minimization," in Proceedings of the International Conference on Machine Learning (ICML), 2009.
[125]
A. Shapiro, D. Dentcheva, and A. P. Ruszczynski, Lectures on Stochastic Programming: Modeling and Theory. Society for Industrial Mathematics, 2009.
[126]
J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
[127]
S. K. Shevade and S. S. Keerthi, "A simple and efficient algorithm for gene selection using sparse logistic regression," Bioinformatics, vol. 19, no. 17, pp. 2246-2253, 2003. Oxford Univ Press.
[128]
S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, "Large scale multiple kernel learning," Journal of Machine Learning Research, vol. 7, pp. 1531-1565, 2006.
[129]
P. Sprechmann, I. Ramirez, G. Sapiro, and Y. C. Eldar, "C-HiLasso: A collaborative hierarchical sparse modeling framework," IEEE Transactions on Signal Processing, vol. 59, no. 9, pp. 4183-4198, 2011.
[130]
N. Srebro, J. D. M. Rennie, and T. S. Jaakkola, "Maximum-margin matrix factorization," in Advances in Neural Information Processing Systems (NIPS), 2005.
[131]
T. Suzuki and R. Tomioka, "SpicyMKL: A fast algorithm for multiple kernel learning with thousands of kernels," Machine Learning, vol. 85, pp. 77-108, 2011.
[132]
M. Szafranski, Y. Grandvalet, and P. Morizet-Mahoudeaux, "Hierarchical penalization," in Advances in Neural Information Processing Systems (NIPS), 2007.
[133]
R. Tibshirani, "Regression shrinkage and selection via the Lasso," Journal of the Royal Statistical Society Series B, vol. 58, no. 1, pp. 267-288, 1996.
[134]
R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, "Sparsity and smoothness via the fused Lasso," Journal of the Royal Statistical Society Series B, vol. 67, no. 1, pp. 91-108, 2005.
[135]
R. Tomioka, T. Suzuki, and M. Sugiyama, "Augmented Lagrangian methods for learning, selecting and combining features," in Optimization for Machine Learning, (S. Sra, S. Nowozin, and S. J. Wright, eds.), MIT Press, 2011.
[136]
J. A. Tropp, "Greed is good: Algorithmic results for sparse approximation," IEEE Transactions on Signal Processing, vol. 50, no. 10, pp. 2231-2242, 2004.
[137]
J. A. Tropp, A. C. Gilbert, and M. J. Strauss, "Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit, "Signal Processing, Special Issue "Sparse Approximations in Signal and Image Processing", vol. 86, pp. 572-588, 2006.
[138]
P. Tseng, "Applications of a splitting algorithm to decomposition in convex programming and variational inequalities," SIAM Journal on Control and Optimization, vol. 29, pp. 119-138, 1991.
[139]
P. Tseng, "On accelerated proximal gradient methods for convex-concave optimization," submitted to SIAM Journal on Optimization, 2008.
[140]
P. Tseng and S. Yun, "A coordinate gradient descent method for nonsmooth separable minimization," Mathematical Programming, vol. 117, no. 1, pp. 387-423, 2009.
[141]
B. A. Turlach, W. N. Venables, and S. J. Wright, "Simultaneous variable selection," Technometrics, vol. 47, no. 3, pp. 349-363, 2005.
[142]
E. Van den Berg, M. Schmidt, M. P. Friedlander, and K. Murphy, "Group sparsity via linear-time projections," Technical report, University of British Columbia, Technical Report number TR-2008-09, 2008.
[143]
G. Varoquaux, R. Jenatton, A. Gramfort, G. Obozinski, B. Thirion, and F. Bach, "Sparse structured dictionary learning for brain resting-state activity modeling," in NIPS Workshop on Practical Applications of Sparse Modeling: Open Issues and New Directions, 2010.
[144]
J.-P. Vert and K. Bleakley, "Fast detection of multiple change-points shared by many signals using group LARS," in Advances in Neural Information Processing Systems (NIPS), 2010.
[145]
M. J. Wainwright, "Sharp thresholds for noisy and high-dimensional recovery of sparsity using l 1-constrained quadratic programming," IEEE Transactions on Information Theory, vol. 55, no. 5, pp. 2183-2202, 2009.
[146]
S. Weisberg, Applied Linear Regression. Wiley, 1980.
[147]
D. P. Wipf and S. Nagarajan, "A new view of automatic relevance determination," Advances in Neural Information Processing Systems (NIPS), 2008.
[148]
D. P. Wipf and B. D. Rao, "Sparse bayesian learning for basis selection," vol. 52, no. 8, pp. 2153-2164, 2004.
[149]
J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, "Robust face recognition via sparse representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210-227, 2009.
[150]
S. J. Wright, "Accelerated block-coordinate relaxation for regularized optimization," Technical report, Technical report, University of Wisconsin-Madison, 2010.
[151]
S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, "Sparse reconstruction by separable approximation," IEEE Transactions on Signal Processing, vol. 57, no. 7, pp. 2479-2493, 2009.
[152]
T. T. Wu and K. Lange, "Coordinate descent algorithms for lasso penalized regression," Annals of Applied Statistics, vol. 2, no. 1, pp. 224-244, 2008.
[153]
L. Xiao, "Dual averaging methods for regularized stochastic learning and online optimization," Journal of Machine Learning Research, vol. 9, pp. 2543-2596, 2010.
[154]
H. Xu, C. Caramanis, and S. Sanghavi, "Robust PCA via outlier pursuit," Preprint arXiv:1010.4237, 2010.
[155]
G. X. Yuan, K. W. Chang, C. J. Hsieh, and C. J. Lin, "A comparison of optimization methods for large-scale l1-regularized linear classification," Technical Report, Department of Computer Science, National University of Taiwan, 2010.
[156]
M. Yuan and Y. Lin, "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, vol. 68, pp. 49-67, 2006.
[157]
P. Zhao, G. Rocha, and B. Yu, "The composite absolute penalties family for grouped and hierarchical variable selection," Annals of Statistics, vol. 37, no. 6A, pp. 3468-3497, 2009.
[158]
P. Zhao and B. Yu, "On model selection consistency of Lasso," Journal of Machine Learning Research, vol. 7, pp. 2541-2563, 2006.
[159]
H. Zou and T. Hastie, "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, vol. 67, no. 2, pp. 301-320, 2005.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Foundations and Trends® in Machine Learning
Foundations and Trends® in Machine Learning  Volume 4, Issue 1
January 2012
108 pages
ISSN:1935-8237
EISSN:1935-8245
Issue’s Table of Contents

Publisher

Now Publishers Inc.

Hanover, MA, United States

Publication History

Published: 01 January 2012

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Point cloud denoising using a generalized error metricGraphical Models10.1016/j.gmod.2024.101216133:COnline publication date: 1-Jun-2024
  • (2024)Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selectionStatistics and Computing10.1007/s11222-024-10467-934:5Online publication date: 17-Jul-2024
  • (2024)A Generalized Formulation for Group Selection via ADMMJournal of Scientific Computing10.1007/s10915-024-02571-9100:1Online publication date: 31-May-2024
  • (2024)Variable screening for Lasso based on multidimensional indexingData Mining and Knowledge Discovery10.1007/s10618-023-00950-838:1(49-78)Online publication date: 1-Jan-2024
  • (2023)Model sparsity can simplify machine unlearningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668368(51584-51605)Online publication date: 10-Dec-2023
  • (2023)Fixing the NTKProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666198(1539-1559)Online publication date: 10-Dec-2023
  • (2023)Homomorphism autoencoder—learning group structured representations from observed transitionsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619071(16190-16215)Online publication date: 23-Jul-2023
  • (2023)Monge, bregman and occamProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618675(6671-6682)Online publication date: 23-Jul-2023
  • (2023)Learning preference models with sparse interactions of criteriaProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/421(3786-3794)Online publication date: 19-Aug-2023
  • (2023)A Unified Framework for Solving a General Class of Nonconvexly Regularized Convex ModelsIEEE Transactions on Signal Processing10.1109/TSP.2023.331544971(3518-3533)Online publication date: 1-Jan-2023
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media