More Web Proxy on the site http://driver.im/

research-article

Solving Dense Linear Systems Faster Than via Preconditioning

Authors:

Michał Dereziński,

Jiaming YangAuthors Info & Claims

STOC 2024: Proceedings of the 56th Annual ACM Symposium on Theory of Computing

Pages 1118 - 1129

https://doi.org/10.1145/3618260.3649694

Published: 11 June 2024 Publication History

Abstract

We give a stochastic optimization algorithm that solves a dense n× n real-valued linear system Ax=b, returning x such that ||Ax−b||≤ є||b|| in time: Õ((n²+nk^ω−1)log1/є), where k is the number of singular values of A larger than O(1) times its smallest positive singular value, ω < 2.372 is the matrix multiplication exponent, and Õ hides a poly-logarithmic in n factor. When k=O(n^1−θ) (namely, A has a flat-tailed spectrum, e.g., due to noisy data or regularization), this improves on both the cost of solving the system directly, as well as on the cost of preconditioning an iterative method such as conjugate gradient. In particular, our algorithm has an Õ(n²) runtime when k=O(n^0.729). We further adapt this result to sparse positive semidefinite matrices and least squares regression. Our main algorithm can be viewed as a randomized block coordinate descent method, where the key challenge is simultaneously ensuring good convergence and fast per-iteration time. In our analysis, we use theory of majorization for elementary symmetric polynomials to establish a sharp convergence guarantee when coordinate blocks are sampled using a determinantal point process. We then use a Markov chain coupling argument to show that similar convergence can be attained with a cheaper sampling scheme, and accelerate the block coordinate descent update via matrix sketching.

References

[1]

Josh Alman and Virginia Vassilevska Williams. 2021. A refined laser method and faster matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA). 522–539.

[2]

Nima Anari and Michał Dereziński. 2020. Isotropy and log-concave polynomials: Accelerated sampling and high-precision counting of matroid bases. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS). 1331–1344.

[3]

Nima Anari, Shayan Oveis Gharan, and Alireza Rezaei. 2016. Monte Carlo Markov chain algorithms for sampling strongly Rayleigh distributions and determinantal point processes. In Conference on Learning Theory. 103–115.

[4]

Nima Anari, Yang P Liu, and Thuy-Duong Vuong. 2022. Optimal sublinear sampling of spanning trees and determinantal point processes via average-case entropic independence. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS). 123–134.

[5]

Haim Avron, Kenneth L Clarkson, and David P Woodruff. 2017. Faster kernel ridge regression using sketching and preconditioning. SIAM J. Matrix Anal. Appl., 38, 4 (2017), 1116–1138.

Digital Library

[6]

Stephen P Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press.

Digital Library

[7]

Xue Chen and Michal Derezinski. 2021. Query complexity of least absolute deviation regression via robust uniform convergence. In Conference on Learning Theory. 1144–1179.

[8]

Shabarish Chenakkod, Michał Dereziński, Xiaoyu Dong, and Mark Rudelson. 2023. Optimal Embedding Dimension for Sparse Subspace Embeddings. arXiv preprint arXiv:2311.10680.

[9]

Kenneth L Clarkson and David P Woodruff. 2013. Low rank approximation and regression in input sparsity time. In Proceedings of the forty-fifth annual ACM symposium on Theory of Computing. 81–90.

Digital Library

[10]

Michael B Cohen. 2016. Nearly tight oblivious subspace embeddings by trace inequalities. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms. 278–287.

Digital Library

[11]

Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. 2015. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. 163–172.

Digital Library

[12]

Michael B Cohen, Jonathan Kelner, Rasmus Kyng, John Peebles, Richard Peng, Anup B Rao, and Aaron Sidford. 2018. Solving directed Laplacian systems in nearly-linear time through sparse LU factorizations. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS). 898–909.

[13]

Michael B Cohen, Yin Tat Lee, and Zhao Song. 2021. Solving linear programs in the current matrix multiplication time. Journal of the ACM (JACM), 68, 1 (2021), 1–39.

Digital Library

[14]

Michael B Cohen, Cameron Musco, and Christopher Musco. 2017. Input sparsity time low-rank approximation via ridge leverage score sampling. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. 1758–1777.

[15]

Michael B Cohen and Richard Peng. 2015. Lp row sampling by lewis weights. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. 183–192.

Digital Library

[16]

Don Coppersmith and Shmuel Winograd. 1987. Matrix multiplication via arithmetic progressions. In Proceedings of the nineteenth annual ACM symposium on Theory of computing. 1–6.

Digital Library

[17]

James Demmel, Ioana Dumitriu, and Olga Holtz. 2007. Fast linear algebra is stable. Numer. Math., 108, 1 (2007), 59–91.

Digital Library

[18]

Michał Dereziński. 2019. Fast determinantal point processes via distortion-free intermediate sampling. In Conference on Learning Theory. 1029–1049.

[19]

Michal Derezinski, Daniele Calandriello, and Michal Valko. 2019. Exact sampling of determinantal point processes with sublinear time preprocessing. Advances in neural information processing systems, 32 (2019).

[20]

Michał Dereziński, Kenneth L Clarkson, Michael W Mahoney, and Manfred K Warmuth. 2019. Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression. In Conference on Learning Theory. 1050–1069.

[21]

Michal Derezinski, Rajiv Khanna, and Michael W Mahoney. 2020. Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method. Advances in Neural Information Processing Systems, 33 (2020), 4953–4964.

[22]

Michal Derezinski, Feynman Liang, and Michael Mahoney. 2020. Bayesian experimental design using regularized determinantal point processes. In International Conference on Artificial Intelligence and Statistics. 3197–3207.

[23]

Michał Derezinski and Michael W Mahoney. 2021. Determinantal point processes in randomized numerical linear algebra. Notices of the American Mathematical Society, 68, 1 (2021), 34–45.

[24]

Michał Dereziński and Elizaveta Rebrova. 2024. Sharp analysis of sketch-and-project methods via a connection to randomized singular value decomposition. SIAM Journal on Mathematics of Data Science, 6, 1 (2024), 127–153.

[25]

Amit Deshpande, Luis Rademacher, Santosh S Vempala, and Grant Wang. 2006. Matrix approximation and projective clustering via volume sampling. Theory of Computing, 2, 1 (2006), 225–247.

[26]

Edgar Dobriban and Stefan Wager. 2018. High-dimensional asymptotics of prediction: Ridge regression and classification. The Annals of Statistics, 46, 1 (2018), 247–279.

[27]

Petros Drineas and Michael W Mahoney. 2016. RandNLA: randomized numerical linear algebra. Commun. ACM, 59, 6 (2016), 80–90.

Digital Library

[28]

Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9, 3–4 (2014), 211–407.

[29]

Ahmed El Alaoui and Michael W Mahoney. 2014. Fast randomized kernel methods with statistical guarantees. stat, 1050 (2014), 2.

[30]

Zachary Frangella, Joel A Tropp, and Madeleine Udell. 2021. Randomized Nyström Preconditioning. arXiv preprint arXiv:2110.02820.

[31]

Robert M Gower and Peter Richtárik. 2015. Randomized iterative methods for linear systems. SIAM J. Matrix Anal. Appl., 36, 4 (2015), 1660–1690.

Digital Library

[32]

Robert M Gray. 2006. Toeplitz and circulant matrices: A review. Foundations and Trends® in Communications and Information Theory, 2, 3 (2006), 155–239.

[33]

Anne Greenbaum. 1989. Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences. Linear Algebra Appl., 113 (1989), 7–63.

[34]

Alain Guenoche. 1983. Random spanning tree. Journal of Algorithms, 4, 3 (1983), 214–220.

[35]

Venkatesan Guruswami and Ali Kemal Sinop. 2012. Optimal column-based low-rank matrix reconstruction. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms. 1207–1214.

[36]

Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53, 2 (2011), 217–288.

[37]

Ken Hayami. 2018. Convergence of the conjugate gradient method on singular systems. arXiv preprint arXiv:1809.00793.

[38]

Magnus R Hestenes and Eduard Stiefel. 1952. Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49, 6 (1952), 409–436.

[39]

J Ben Hough, Manjunath Krishnapur, Yuval Peres, and Bálint Virág. 2006. Determinantal Processes and Independence. Probability Surveys, 3 (2006), 206–229.

[40]

Haotian Jiang, Tarun Kathuria, Yin Tat Lee, Swati Padmanabhan, and Zhao Song. 2020. A faster interior point method for semidefinite programming. In 2020 IEEE 61st annual symposium on foundations of computer science (FOCS). 910–918.

[41]

M. S. Kaczmarz. 1937. Angenaherte Auflosung von Systemen linearer Gleichungen. Bulletin International de l’Academie Polonaise des Sciences et des Lettres, 35 (1937), 355–357.

[42]

Thomas Kailath, Sun-Yuan Kung, and Martin Morf. 1979. Displacement ranks of matrices and linear equations. J. Math. Anal. Appl., 68, 2 (1979), 395–407.

[43]

Ioannis Koutis, Gary L Miller, and Richard Peng. 2012. A fast solver for a class of linear systems. Commun. ACM, 55, 10 (2012), 99–107.

Digital Library

[44]

Alex Kulesza and Ben Taskar. 2012. Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5, 2–3 (2012), 123–286.

[45]

Rasmus Kyng, Yin Tat Lee, Richard Peng, Sushant Sachdeva, and Daniel A Spielman. 2016. Sparsified cholesky and multigrid solvers for connection laplacians. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. 842–850.

Digital Library

[46]

Rasmus Kyng and Sushant Sachdeva. 2016. Approximate gaussian elimination for laplacians-fast, sparse, and simple. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS). 573–582.

[47]

Francois Le Gall. 2012. Faster algorithms for rectangular matrix multiplication. In 2012 IEEE 53rd annual symposium on foundations of computer science. 514–523.

[48]

Yin Tat Lee and Aaron Sidford. 2013. Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In 2013 ieee 54th annual symposium on foundations of computer science. 147–156.

[49]

Dennis Leventhal and Adrian S Lewis. 2010. Randomized methods for linear constraints: convergence rates and conditioning. Mathematics of Operations Research, 35, 3 (2010), 641–654.

[50]

Yi Li and David Woodruff. 2020. Input-sparsity low rank approximation in Schatten norm. In International Conference on Machine Learning. 6001–6009.

[51]

Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, and Xiaotong Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461 (2021), 370–403.

Digital Library

[52]

Po-Ling Loh and Martin J Wainwright. 2011. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Advances in neural information processing systems, 24 (2011).

[53]

Odile Macchi. 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7, 1 (1975), 83–122.

[54]

Per-Gunnar Martinsson and Joel A Tropp. 2020. Randomized numerical linear algebra: Foundations and algorithms. Acta Numerica, 29 (2020), 403–572.

[55]

Xiangrui Meng and Michael W Mahoney. 2013. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing. 91–100.

Digital Library

[56]

Xiangrui Meng, Michael A Saunders, and Michael W Mahoney. 2014. LSRN: A parallel iterative solver for strongly over-or underdetermined systems. SIAM Journal on Scientific Computing, 36, 2 (2014), C95–C118.

Digital Library

[57]

R. Murray, J. Demmel, M. W. Mahoney, N. B. Erichson, M. Melnichenko, O. A. Malik, L. Grigori, M. Dereziński, M. E. Lopes, T. Liang, and H. Luo. 2023. Randomized Numerical Linear Algebra – A Perspective on the Field with an Eye to Software.

[58]

Cameron Musco and Christopher Musco. 2015. Randomized block krylov methods for stronger and faster approximate singular value decomposition. Advances in neural information processing systems, 28 (2015).

[59]

Cameron Musco and Christopher Musco. 2017. Recursive sampling for the nystrom method. Advances in neural information processing systems, 30 (2017).

[60]

Cameron Musco, Christopher Musco, and Aaron Sidford. 2018. Stability of the Lanczos method for matrix function approximation. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms. 1605–1624.

[61]

Mojmir Mutny, Michal Derezinski, and Andreas Krause. 2020. Convergence analysis of block coordinate algorithms with determinantal sampling. In International Conference on Artificial Intelligence and Statistics. 3110–3120.

[62]

Deanna Needell and Joel A Tropp. 2014. Paved with good intentions: analysis of a randomized block Kaczmarz method. Linear Algebra Appl., 441 (2014), 199–221.

[63]

Deanna Needell and Rachel Ward. 2013. Two-subspace projection method for coherent overdetermined systems. Journal of Fourier Analysis and Applications, 19, 2 (2013), 256–269.

[64]

Jelani Nelson and Huy L Nguyên. 2013. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In 2013 ieee 54th annual symposium on foundations of computer science. 117–126.

[65]

Zipei Nie. 2022. Matrix anti-concentration inequalities with applications. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. 568–581.

Digital Library

[66]

A Nikolov, M Singh, and U Tantipongpipat. 2019. Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design. In Symposium on Discrete Algorithms (SODA).

[67]

Victor Pan. 1984. How to multiply matrices faster. Springer-Verlag.

[68]

Richard Peng and Santosh Vempala. 2021. Solving sparse linear systems faster than matrix multiplication. In Proceedings of the 2021 ACM-SIAM symposium on discrete algorithms (SODA). 504–521.

[69]

Elizaveta Rebrova and Deanna Needell. 2021. On block Gaussian sketching for the Kaczmarz method. Numerical Algorithms, 86 (2021), 443–473.

Digital Library

[70]

Anton Rodomanov and Dmitry Kropotov. 2020. A randomized coordinate descent method with volume sampling. SIAM Journal on Optimization, 30, 3 (2020), 1878–1904.

Digital Library

[71]

Vladimir Rokhlin and Mark Tygert. 2008. A fast randomized algorithm for overdetermined linear least-squares regression. Proceedings of the National Academy of Sciences, 105, 36 (2008), 13212–13217.

[72]

Alessandro Rudi, Luigi Carratino, and Lorenzo Rosasco. 2017. Falkon: An optimal large scale kernel method. Advances in neural information processing systems, 30 (2017).

[73]

Tamas Sarlos. 2006. Improved approximation algorithms for large matrices via random projections. In 2006 47th annual IEEE symposium on foundations of computer science (FOCS’06). 143–152.

[74]

Zhao Song, David P Woodruff, and Peilin Zhong. 2019. Relative error tensor low rank approximation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms. 2772–2789.

[75]

Daniel A Spielman and Shang-Hua Teng. 2014. Nearly linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems. SIAM J. Matrix Anal. Appl., 35, 3 (2014), 835–885.

Digital Library

[76]

Daniel A Spielman and Jaeoh Woo. 2009. A note on preconditioning by low-stretch spanning trees. arXiv preprint arXiv:0903.2816.

[77]

Volker Strassen. 1969. Gaussian elimination is not optimal. Numerische mathematik, 13, 4 (1969), 354–356.

[78]

Thomas Strohmer and Roman Vershynin. 2009. A randomized Kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications, 15, 2 (2009), 262–278.

[79]

Joel A Tropp. 2011. Improved analysis of the subsampled randomized Hadamard transform. Advances in Adaptive Data Analysis, 3, 01n02 (2011), 115–126.

[80]

Pravin M Vaidya. 1989. Speeding-up linear programming using fast matrix multiplication. In 30th annual symposium on foundations of computer science. 332–337.

[81]

Ramon Van Handel. 2014. Probability in high dimension. Lecture Notes (Princeton University).

[82]

Ruosong Wang and David P Woodruff. 2022. Tight Bounds for l1 Oblivious Subspace Embeddings. ACM Transactions on Algorithms (TALG), 18, 1 (2022), 1–32.

Digital Library

[83]

Virginia Vassilevska Williams. 2012. Multiplying matrices faster than Coppersmith-Winograd. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing. 887–898.

Digital Library

[84]

Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou. 2023. New bounds for matrix multiplication: from alpha to omega. arXiv preprint arXiv:2307.07970.

[85]

David P Woodruff. 2014. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10, 1–2 (2014), 1–157.

[86]

Jianlin Xia, Yuanzhe Xi, and Ming Gu. 2012. A superfast structured solver for Toeplitz linear systems via randomized sampling. SIAM J. Matrix Anal. Appl., 33, 3 (2012), 837–858.

Digital Library

[87]

Yuchen Zhang, John Duchi, and Martin Wainwright. 2013. Divide and conquer kernel ridge regression. In Conference on learning theory. 592–617.

Cited By

Lok JRebrova E(2024)A subspace constrained randomized Kaczmarz method for structure or external knowledge exploitationLinear Algebra and its Applications10.1016/j.laa.2024.06.010Online publication date: Jun-2024
https://doi.org/10.1016/j.laa.2024.06.010

Index Terms

Solving Dense Linear Systems Faster Than via Preconditioning
1. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis
      1. Numeric approximation algorithms
      2. Stochastic approximation
    2. Mathematical optimization
      1. Continuous optimization
        Stochastic control and optimization
  2. Randomness, geometry and discrete structures
    1. Random projections and metric embeddings

Recommendations

A parallel implementation of the CMRH method for dense linear systems

This paper presents an implementation of the CMRH (Changing Minimal Residual method based on the Hessenberg process) iterative method suitable for parallel architectures. CMRH is an alternative to GMRES and QMR, the well-known Krylov methods for solving ...
Algorithms for the CMRH method for dense linear systems

The CMRH (Changing Minimal Residual method based on the Hessenberg process) method is a Krylov subspace method for solving large linear systems with non-symmetric coefficient matrices. CMRH generates a (non orthogonal) basis of the Krylov subspace ...
Preconditioning techniques for large linear systems: a survey

This article surveys preconditioning techniques for the iterative solution of large linear systems, with a focus on algebraic methods suitable for general sparse matrices. Covered topics include progress in incomplete factorization methods, sparse ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

STOC 2024: Proceedings of the 56th Annual ACM Symposium on Theory of Computing

June 2024

2049 pages

ISBN:9798400703836

DOI:10.1145/3618260

General Chairs:
Bojan Mohar
Simon Fraser University, Canada
,
Igor Shinkar
Simon Fraser University, Canada
,
Program Chair:
Ryan O'Donnell
Carnegie Mellon University, USA

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

STOC '24

Sponsor:

SIGACT

STOC '24: 56th Annual ACM Symposium on Theory of Computing

June 24 - 28, 2024

BC, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25

Sponsor:
sigact

57th Annual ACM Symposium on Theory of Computing (STOC 2025)

June 23 - 27, 2025

Prague , Czech Republic

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
195
Total Downloads

Downloads (Last 12 months)195
Downloads (Last 6 weeks)45

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lok JRebrova E(2024)A subspace constrained randomized Kaczmarz method for structure or external knowledge exploitationLinear Algebra and its Applications10.1016/j.laa.2024.06.010Online publication date: Jun-2024
https://doi.org/10.1016/j.laa.2024.06.010

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten