[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Spectral Regularization Algorithms for Learning Large Incomplete Matrices

Published: 01 August 2010 Publication History

Abstract

We use convex relaxation techniques to provide a sequence of regularized low-rank solutions for large-scale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm SOFT-IMPUTE iteratively replaces the missing elements with those obtained from a soft-thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a low-rank SVD of a dense matrix. Exploiting the problem structure, we show that the task can be performed with a complexity of order linear in the matrix dimensions. Our semidefinite-programming algorithm is readily scalable to large matrices; for example SOFT-IMPUTE takes a few hours to compute low-rank approximations of a 106 X 106 incomplete matrix with 107 observed entries, and fits a rank-95 approximation to the full Netflix training set in 3.3 hours. Our methods achieve good training and test errors and exhibit superior timings when compared to other competitive state-of-the-art techniques.

References

[1]
J. Abernethy, F. Bach, T. Evgeniou, and J.-P. Vert. A new approach to collaborative filtering: operator estimation with spectral regularization. Journal of Machine Learning Research, 10:803-826, 2009.
[2]
A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In Advances in Neural Information Processing Systems 19. MIT Press, 2007.
[3]
A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243-272, 2008.
[4]
F. Bach. Consistency of trace norm minimization. Journal of Machine Learning Research, 9: 1019-1048, 2008.
[5]
R. M. Bell and Y. Koren. Lessons from the Netflix prize challenge. Technical report, AT&T Bell Laboratories, 2007.
[6]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[7]
S. Burer and R. D.C. Monteiro. Local minima and convergence in low-rank semidefinite programming. Mathematical Programming, 103(3):427-631, 2005.
[8]
J. Cai, E. J. Candes, and Z. Shen. A singular value thresholding algorithm for matrix completion, 2008. Available at http://www.citebase.org/abstract?id=oai:arXiv.org:0810.3286.
[9]
E. Candès and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9:717-772, 2008.
[10]
E. J. Candès and T. Tao. The power of convex relaxation: near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053-2080, 2009.
[11]
D. DeCoste. Collaborative prediction using ensembles of maximum margin matrix factorizations. In Proceedings of the 23rd International Conference on Machine Learning, pages 249-256. ACM, 2006.
[12]
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society Series B, 39:1-38, 1977.
[13]
D. Donoho, I. Johnstone, G. Kerkyachairan, and D. Picard. Wavelet shrinkage; asymptopia? (with discussion). Journal of the Royal Statistical Society: Series B, 57:201-337, 1995.
[14]
M. Fazel. Matrix Rank Minimization with Applications. PhD thesis, Stanford University, 2002.
[15]
J. Friedman. Fast sparse regression and classification. Technical report, Department of Statistics, Stanford University, 2008.
[16]
J. Friedman, T. Hastie, H. Hoefling, and R. Tibshirani. Pathwise coordinate optimization. Annals of Applied Statistics, 2(1):302-332, 2007.
[17]
M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, 2009. Web page and software available at http://stanford.edu/~boyd/cvx.
[18]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Prediction, Inference and Data Mining (Second Edition). Springer Verlag, New York, 2009.
[19]
S. Ji and J. Ye. An accelerated gradient method for trace norm minimization. In Proceedings of the 26th International Conference on Machine Learning, pages 457-464, 2009.
[20]
R. H. Keshavan, S. Oh, and A. Montanari. Matrix completion from a few entries. IEEE Transactions on Information Theory, 56(6):2980-2998, 2009.
[21]
R.M. Larsen. Lanczos bidiagonalization with partial reorthogonalization. Technical Report DAIMI PB-357, Department of Computer Science, Aarhus University, 1998.
[22]
R.M. Larsen. Propack-software for large and sparse svd calculations, 2004. Available at http://sun.stanford.edu/~rmunk/PROPACK.
[23]
J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections. Arizona State University, 2009. Available at http://www.public.asu.edu/~jye02/Software/SLEP.
[24]
Z. Liu and L. Vandenberghe. Interior-point method for nuclear norm approximation with application to system identfication. SIAM Journal on Matrix Analysis and Applications, 31(3):1235-1256, 2009.
[25]
S. Ma, D. Goldfarb, and L. Chen. Fixed point and Bregman iterative methods for matrix rank minimization. Mathematical Programming Series A, forthcoming.
[26]
R. Mazumder, J. Friedman, and T. Hastie. Sparsenet: coordinate descent with non-convex penalties. Technical report, Stanford University, 2009.
[27]
Y. Nesterov. Introductory Lectures on Convex Optimization: Basic course. Kluwer, Boston, 2003.
[28]
Y. Nesterov. Gradient methods for minimizing composite objective function. Technical Report 76, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, 2007.
[29]
B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, 2007. Available at http://www.citebase.org/abstract?id=oai:arXiv.org:0706.4138.
[30]
J. Rennie and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the 22nd International Conference on Machine Learning, pages 713-719. ACM, 2005.
[31]
R. Salakhutdinov, A. Mnih, and G. E. Hinton. Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th International Conference on Machine Learning, pages 791- 798. AAAI Press, 2007.
[32]
ACM SIGKDD and Netflix. Soft modelling by latent variables: the nonlinear iterative partial least squares (NIPALS) approach. In Proceedings of KDD Cup and Workshop, 2007. Available at http://www.cs.uic.edu/~liub/KDD-cup-2007/proceedings.html.
[33]
N. Srebro and T. Jaakkola. Weighted low-rank approximations. In Proceedings of the 20th International Conference on Machine Learning, pages 720-727. AAAI Press, 2003.
[34]
N. Srebro, N. Alon, and T. Jaakkola. Generalization error bounds for collaborative prediction with low-rank matrices. In Advances in Neural Information Processing Systems 17, pages 5-27. MIT Press, 2005a.
[35]
N. Srebro, J. Rennie, and T. Jaakkola. Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems 17, pages 1329-1336. MIT Press, 2005b.
[36]
G. Takacs, I. Pilaszy, B. Nemeth, and D. Tikk. Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 10:623-656, 2009.
[37]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267-288, 1996.
[38]
O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman. Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520- 525, 2001.
[39]
C. H. Zhang. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2):894-942, 2010.

Cited By

View all
  • (2024)Bandits with Stochastic Experts: Constant Regret, Empirical Experts and EpisodesACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/36802799:3(1-33)Online publication date: 25-Jul-2024
  • (2024)Imputation Strategies for Longitudinal Behavioral Studies: Predicting Depression Using GLOBEM DatasetsCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678424(736-742)Online publication date: 5-Oct-2024
  • (2024)Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and FusionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680653(5780-5789)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 11, Issue
3/1/2010
3637 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 August 2010
Published in JMLR Volume 11

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)160
  • Downloads (Last 6 weeks)46
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Bandits with Stochastic Experts: Constant Regret, Empirical Experts and EpisodesACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/36802799:3(1-33)Online publication date: 25-Jul-2024
  • (2024)Imputation Strategies for Longitudinal Behavioral Studies: Predicting Depression Using GLOBEM DatasetsCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678424(736-742)Online publication date: 5-Oct-2024
  • (2024)Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and FusionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680653(5780-5789)Online publication date: 28-Oct-2024
  • (2024)Regionalization-Based Collaborative Filtering: Harnessing Geographical Information in RecommendersACM Transactions on Spatial Algorithms and Systems10.1145/365664110:2(1-23)Online publication date: 21-May-2024
  • (2024)Missing Data Imputation with Uncertainty-Driven NetworkProceedings of the ACM on Management of Data10.1145/36549202:3(1-25)Online publication date: 30-May-2024
  • (2024)Do We Really Need Imputation in AutoML Predictive Modeling?ACM Transactions on Knowledge Discovery from Data10.1145/364364318:6(1-64)Online publication date: 12-Apr-2024
  • (2024)In-Database Data ImputationProceedings of the ACM on Management of Data10.1145/36393262:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Mining of Switching Sparse Networks for Missing Value Imputation in Multivariate Time SeriesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671760(2296-2306)Online publication date: 25-Aug-2024
  • (2024)Data Imputation from the Perspective of Graph Dirichlet EnergyProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679669(3237-3247)Online publication date: 21-Oct-2024
  • (2024)Learning Spatiotemporal Graphical Models From Incomplete ObservationsIEEE Transactions on Signal Processing10.1109/TSP.2024.335457272(1361-1374)Online publication date: 1-Jan-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media