[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3618260.3649694acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Solving Dense Linear Systems Faster Than via Preconditioning

Published: 11 June 2024 Publication History

Abstract

We give a stochastic optimization algorithm that solves a dense n× n real-valued linear system Ax=b, returning x such that ||Ax−b||≤ є||b|| in time: Õ((n2+nkω−1)log1/є), where k is the number of singular values of A larger than O(1) times its smallest positive singular value, ω < 2.372 is the matrix multiplication exponent, and Õ hides a poly-logarithmic in n factor. When k=O(n1−θ) (namely, A has a flat-tailed spectrum, e.g., due to noisy data or regularization), this improves on both the cost of solving the system directly, as well as on the cost of preconditioning an iterative method such as conjugate gradient. In particular, our algorithm has an Õ(n2) runtime when k=O(n0.729). We further adapt this result to sparse positive semidefinite matrices and least squares regression. Our main algorithm can be viewed as a randomized block coordinate descent method, where the key challenge is simultaneously ensuring good convergence and fast per-iteration time. In our analysis, we use theory of majorization for elementary symmetric polynomials to establish a sharp convergence guarantee when coordinate blocks are sampled using a determinantal point process. We then use a Markov chain coupling argument to show that similar convergence can be attained with a cheaper sampling scheme, and accelerate the block coordinate descent update via matrix sketching.

References

[1]
Josh Alman and Virginia Vassilevska Williams. 2021. A refined laser method and faster matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA). 522–539.
[2]
Nima Anari and Michał Dereziński. 2020. Isotropy and log-concave polynomials: Accelerated sampling and high-precision counting of matroid bases. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS). 1331–1344.
[3]
Nima Anari, Shayan Oveis Gharan, and Alireza Rezaei. 2016. Monte Carlo Markov chain algorithms for sampling strongly Rayleigh distributions and determinantal point processes. In Conference on Learning Theory. 103–115.
[4]
Nima Anari, Yang P Liu, and Thuy-Duong Vuong. 2022. Optimal sublinear sampling of spanning trees and determinantal point processes via average-case entropic independence. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS). 123–134.
[5]
Haim Avron, Kenneth L Clarkson, and David P Woodruff. 2017. Faster kernel ridge regression using sketching and preconditioning. SIAM J. Matrix Anal. Appl., 38, 4 (2017), 1116–1138.
[6]
Stephen P Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press.
[7]
Xue Chen and Michal Derezinski. 2021. Query complexity of least absolute deviation regression via robust uniform convergence. In Conference on Learning Theory. 1144–1179.
[8]
Shabarish Chenakkod, Michał Dereziński, Xiaoyu Dong, and Mark Rudelson. 2023. Optimal Embedding Dimension for Sparse Subspace Embeddings. arXiv preprint arXiv:2311.10680.
[9]
Kenneth L Clarkson and David P Woodruff. 2013. Low rank approximation and regression in input sparsity time. In Proceedings of the forty-fifth annual ACM symposium on Theory of Computing. 81–90.
[10]
Michael B Cohen. 2016. Nearly tight oblivious subspace embeddings by trace inequalities. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms. 278–287.
[11]
Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. 2015. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. 163–172.
[12]
Michael B Cohen, Jonathan Kelner, Rasmus Kyng, John Peebles, Richard Peng, Anup B Rao, and Aaron Sidford. 2018. Solving directed Laplacian systems in nearly-linear time through sparse LU factorizations. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS). 898–909.
[13]
Michael B Cohen, Yin Tat Lee, and Zhao Song. 2021. Solving linear programs in the current matrix multiplication time. Journal of the ACM (JACM), 68, 1 (2021), 1–39.
[14]
Michael B Cohen, Cameron Musco, and Christopher Musco. 2017. Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. 1758–1777.
[15]
Michael B Cohen and Richard Peng. 2015. Lp row sampling by lewis weights. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. 183–192.
[16]
Don Coppersmith and Shmuel Winograd. 1987. Matrix multiplication via arithmetic progressions. In Proceedings of the nineteenth annual ACM symposium on Theory of computing. 1–6.
[17]
James Demmel, Ioana Dumitriu, and Olga Holtz. 2007. Fast linear algebra is stable. Numer. Math., 108, 1 (2007), 59–91.
[18]
Michał Dereziński. 2019. Fast determinantal point processes via distortion-free intermediate sampling. In Conference on Learning Theory. 1029–1049.
[19]
Michal Derezinski, Daniele Calandriello, and Michal Valko. 2019. Exact sampling of determinantal point processes with sublinear time preprocessing. Advances in neural information processing systems, 32 (2019).
[20]
Michał Dereziński, Kenneth L Clarkson, Michael W Mahoney, and Manfred K Warmuth. 2019. Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression. In Conference on Learning Theory. 1050–1069.
[21]
Michal Derezinski, Rajiv Khanna, and Michael W Mahoney. 2020. Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method. Advances in Neural Information Processing Systems, 33 (2020), 4953–4964.
[22]
Michal Derezinski, Feynman Liang, and Michael Mahoney. 2020. Bayesian experimental design using regularized determinantal point processes. In International Conference on Artificial Intelligence and Statistics. 3197–3207.
[23]
Michał Derezinski and Michael W Mahoney. 2021. Determinantal point processes in randomized numerical linear algebra. Notices of the American Mathematical Society, 68, 1 (2021), 34–45.
[24]
Michał Dereziński and Elizaveta Rebrova. 2024. Sharp analysis of sketch-and-project methods via a connection to randomized singular value decomposition. SIAM Journal on Mathematics of Data Science, 6, 1 (2024), 127–153.
[25]
Amit Deshpande, Luis Rademacher, Santosh S Vempala, and Grant Wang. 2006. Matrix approximation and projective clustering via volume sampling. Theory of Computing, 2, 1 (2006), 225–247.
[26]
Edgar Dobriban and Stefan Wager. 2018. High-dimensional asymptotics of prediction: Ridge regression and classification. The Annals of Statistics, 46, 1 (2018), 247–279.
[27]
Petros Drineas and Michael W Mahoney. 2016. RandNLA: randomized numerical linear algebra. Commun. ACM, 59, 6 (2016), 80–90.
[28]
Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9, 3–4 (2014), 211–407.
[29]
Ahmed El Alaoui and Michael W Mahoney. 2014. Fast randomized kernel methods with statistical guarantees. stat, 1050 (2014), 2.
[30]
Zachary Frangella, Joel A Tropp, and Madeleine Udell. 2021. Randomized Nyström Preconditioning. arXiv preprint arXiv:2110.02820.
[31]
Robert M Gower and Peter Richtárik. 2015. Randomized iterative methods for linear systems. SIAM J. Matrix Anal. Appl., 36, 4 (2015), 1660–1690.
[32]
Robert M Gray. 2006. Toeplitz and circulant matrices: A review. Foundations and Trends® in Communications and Information Theory, 2, 3 (2006), 155–239.
[33]
Anne Greenbaum. 1989. Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences. Linear Algebra Appl., 113 (1989), 7–63.
[34]
Alain Guenoche. 1983. Random spanning tree. Journal of Algorithms, 4, 3 (1983), 214–220.
[35]
Venkatesan Guruswami and Ali Kemal Sinop. 2012. Optimal column-based low-rank matrix reconstruction. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms. 1207–1214.
[36]
Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53, 2 (2011), 217–288.
[37]
Ken Hayami. 2018. Convergence of the conjugate gradient method on singular systems. arXiv preprint arXiv:1809.00793.
[38]
Magnus R Hestenes and Eduard Stiefel. 1952. Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49, 6 (1952), 409–436.
[39]
J Ben Hough, Manjunath Krishnapur, Yuval Peres, and Bálint Virág. 2006. Determinantal Processes and Independence. Probability Surveys, 3 (2006), 206–229.
[40]
Haotian Jiang, Tarun Kathuria, Yin Tat Lee, Swati Padmanabhan, and Zhao Song. 2020. A faster interior point method for semidefinite programming. In 2020 IEEE 61st annual symposium on foundations of computer science (FOCS). 910–918.
[41]
M. S. Kaczmarz. 1937. Angenaherte Auflosung von Systemen linearer Gleichungen. Bulletin International de l’Academie Polonaise des Sciences et des Lettres, 35 (1937), 355–357.
[42]
Thomas Kailath, Sun-Yuan Kung, and Martin Morf. 1979. Displacement ranks of matrices and linear equations. J. Math. Anal. Appl., 68, 2 (1979), 395–407.
[43]
Ioannis Koutis, Gary L Miller, and Richard Peng. 2012. A fast solver for a class of linear systems. Commun. ACM, 55, 10 (2012), 99–107.
[44]
Alex Kulesza and Ben Taskar. 2012. Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5, 2–3 (2012), 123–286.
[45]
Rasmus Kyng, Yin Tat Lee, Richard Peng, Sushant Sachdeva, and Daniel A Spielman. 2016. Sparsified cholesky and multigrid solvers for connection laplacians. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. 842–850.
[46]
Rasmus Kyng and Sushant Sachdeva. 2016. Approximate gaussian elimination for laplacians-fast, sparse, and simple. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS). 573–582.
[47]
Francois Le Gall. 2012. Faster algorithms for rectangular matrix multiplication. In 2012 IEEE 53rd annual symposium on foundations of computer science. 514–523.
[48]
Yin Tat Lee and Aaron Sidford. 2013. Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In 2013 ieee 54th annual symposium on foundations of computer science. 147–156.
[49]
Dennis Leventhal and Adrian S Lewis. 2010. Randomized methods for linear constraints: convergence rates and conditioning. Mathematics of Operations Research, 35, 3 (2010), 641–654.
[50]
Yi Li and David Woodruff. 2020. Input-sparsity low rank approximation in Schatten norm. In International Conference on Machine Learning. 6001–6009.
[51]
Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, and Xiaotong Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461 (2021), 370–403.
[52]
Po-Ling Loh and Martin J Wainwright. 2011. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Advances in neural information processing systems, 24 (2011).
[53]
Odile Macchi. 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7, 1 (1975), 83–122.
[54]
Per-Gunnar Martinsson and Joel A Tropp. 2020. Randomized numerical linear algebra: Foundations and algorithms. Acta Numerica, 29 (2020), 403–572.
[55]
Xiangrui Meng and Michael W Mahoney. 2013. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing. 91–100.
[56]
Xiangrui Meng, Michael A Saunders, and Michael W Mahoney. 2014. LSRN: A parallel iterative solver for strongly over-or underdetermined systems. SIAM Journal on Scientific Computing, 36, 2 (2014), C95–C118.
[57]
R. Murray, J. Demmel, M. W. Mahoney, N. B. Erichson, M. Melnichenko, O. A. Malik, L. Grigori, M. Dereziński, M. E. Lopes, T. Liang, and H. Luo. 2023. Randomized Numerical Linear Algebra – A Perspective on the Field with an Eye to Software.
[58]
Cameron Musco and Christopher Musco. 2015. Randomized block krylov methods for stronger and faster approximate singular value decomposition. Advances in neural information processing systems, 28 (2015).
[59]
Cameron Musco and Christopher Musco. 2017. Recursive sampling for the nystrom method. Advances in neural information processing systems, 30 (2017).
[60]
Cameron Musco, Christopher Musco, and Aaron Sidford. 2018. Stability of the Lanczos method for matrix function approximation. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms. 1605–1624.
[61]
Mojmir Mutny, Michal Derezinski, and Andreas Krause. 2020. Convergence analysis of block coordinate algorithms with determinantal sampling. In International Conference on Artificial Intelligence and Statistics. 3110–3120.
[62]
Deanna Needell and Joel A Tropp. 2014. Paved with good intentions: analysis of a randomized block Kaczmarz method. Linear Algebra Appl., 441 (2014), 199–221.
[63]
Deanna Needell and Rachel Ward. 2013. Two-subspace projection method for coherent overdetermined systems. Journal of Fourier Analysis and Applications, 19, 2 (2013), 256–269.
[64]
Jelani Nelson and Huy L Nguyên. 2013. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In 2013 ieee 54th annual symposium on foundations of computer science. 117–126.
[65]
Zipei Nie. 2022. Matrix anti-concentration inequalities with applications. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. 568–581.
[66]
A Nikolov, M Singh, and U Tantipongpipat. 2019. Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design. In Symposium on Discrete Algorithms (SODA).
[67]
Victor Pan. 1984. How to multiply matrices faster. Springer-Verlag.
[68]
Richard Peng and Santosh Vempala. 2021. Solving sparse linear systems faster than matrix multiplication. In Proceedings of the 2021 ACM-SIAM symposium on discrete algorithms (SODA). 504–521.
[69]
Elizaveta Rebrova and Deanna Needell. 2021. On block Gaussian sketching for the Kaczmarz method. Numerical Algorithms, 86 (2021), 443–473.
[70]
Anton Rodomanov and Dmitry Kropotov. 2020. A randomized coordinate descent method with volume sampling. SIAM Journal on Optimization, 30, 3 (2020), 1878–1904.
[71]
Vladimir Rokhlin and Mark Tygert. 2008. A fast randomized algorithm for overdetermined linear least-squares regression. Proceedings of the National Academy of Sciences, 105, 36 (2008), 13212–13217.
[72]
Alessandro Rudi, Luigi Carratino, and Lorenzo Rosasco. 2017. Falkon: An optimal large scale kernel method. Advances in neural information processing systems, 30 (2017).
[73]
Tamas Sarlos. 2006. Improved approximation algorithms for large matrices via random projections. In 2006 47th annual IEEE symposium on foundations of computer science (FOCS’06). 143–152.
[74]
Zhao Song, David P Woodruff, and Peilin Zhong. 2019. Relative error tensor low rank approximation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms. 2772–2789.
[75]
Daniel A Spielman and Shang-Hua Teng. 2014. Nearly linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems. SIAM J. Matrix Anal. Appl., 35, 3 (2014), 835–885.
[76]
Daniel A Spielman and Jaeoh Woo. 2009. A note on preconditioning by low-stretch spanning trees. arXiv preprint arXiv:0903.2816.
[77]
Volker Strassen. 1969. Gaussian elimination is not optimal. Numerische mathematik, 13, 4 (1969), 354–356.
[78]
Thomas Strohmer and Roman Vershynin. 2009. A randomized Kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications, 15, 2 (2009), 262–278.
[79]
Joel A Tropp. 2011. Improved analysis of the subsampled randomized Hadamard transform. Advances in Adaptive Data Analysis, 3, 01n02 (2011), 115–126.
[80]
Pravin M Vaidya. 1989. Speeding-up linear programming using fast matrix multiplication. In 30th annual symposium on foundations of computer science. 332–337.
[81]
Ramon Van Handel. 2014. Probability in high dimension. Lecture Notes (Princeton University).
[82]
Ruosong Wang and David P Woodruff. 2022. Tight Bounds for l1 Oblivious Subspace Embeddings. ACM Transactions on Algorithms (TALG), 18, 1 (2022), 1–32.
[83]
Virginia Vassilevska Williams. 2012. Multiplying matrices faster than Coppersmith-Winograd. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing. 887–898.
[84]
Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou. 2023. New bounds for matrix multiplication: from alpha to omega. arXiv preprint arXiv:2307.07970.
[85]
David P Woodruff. 2014. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10, 1–2 (2014), 1–157.
[86]
Jianlin Xia, Yuanzhe Xi, and Ming Gu. 2012. A superfast structured solver for Toeplitz linear systems via randomized sampling. SIAM J. Matrix Anal. Appl., 33, 3 (2012), 837–858.
[87]
Yuchen Zhang, John Duchi, and Martin Wainwright. 2013. Divide and conquer kernel ridge regression. In Conference on learning theory. 592–617.

Cited By

View all
  • (2024)A subspace constrained randomized Kaczmarz method for structure or external knowledge exploitationLinear Algebra and its Applications10.1016/j.laa.2024.06.010Online publication date: Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC 2024: Proceedings of the 56th Annual ACM Symposium on Theory of Computing
June 2024
2049 pages
ISBN:9798400703836
DOI:10.1145/3618260
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Linear Systems
  2. Matrices and Tensors
  3. Randomized Linear Algebra
  4. Sketching and Sampling
  5. Subspace Embedding

Qualifiers

  • Research-article

Conference

STOC '24
Sponsor:
STOC '24: 56th Annual ACM Symposium on Theory of Computing
June 24 - 28, 2024
BC, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25
57th Annual ACM Symposium on Theory of Computing (STOC 2025)
June 23 - 27, 2025
Prague , Czech Republic

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)129
  • Downloads (Last 6 weeks)24
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A subspace constrained randomized Kaczmarz method for structure or external knowledge exploitationLinear Algebra and its Applications10.1016/j.laa.2024.06.010Online publication date: Jun-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media