Abstract
Consider the classical problem of solving a general linear system of equations \(Ax=b\). It is well known that the (successively over relaxed) Gauss–Seidel scheme and many of its variants may not converge when A is neither diagonally dominant nor symmetric positive definite. Can we have a linearly convergent G–S type algorithm that works for anyA? In this paper we answer this question affirmatively by proposing a doubly stochastic G–S algorithm that is provably linearly convergent (in the mean square error sense) for any feasible linear system of equations. The key in the algorithm design is to introduce a nonuniform double stochastic scheme for picking the equation and the variable in each update step as well as a stepsize rule. These techniques also generalize to certain iterative alternating projection algorithms for solving the linear feasibility problem \(A x\le b\) with an arbitrary A, as well as high-dimensional minimization problems for training over-parameterized models in machine learning. Our results demonstrate that a carefully designed randomization scheme can make an otherwise divergent G–S algorithm converge.
Similar content being viewed by others
References
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods, 2nd edn. Athena Scientific, Belmont (1997)
Deutsch, F.: Best Approximation in Inner Product Spaces. Springer, New York (2001)
Golub, G., van Loan, C.: Matrix Computation. Johns Hopkins University Press, Baltimore (1996)
Gower, R.M., Richtarik, P.: Stochastic dual ascent for solving linear systems (2015). ArXiv Preprint: arXiv:1512.06890
Hefny, A., Needell, D., Ramdas, A.: Rows versus columns: randomized Kaczmarz or Gauss–Seidel for ridge regression. SIAM J. Sci. Comput. 39(5), S528–S542 (2017)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: The Proceedings of the Neural Information Processing (NIPS) (2013)
Kaczmarz, S.: Approximate solution of systems of linear equations. Int. J. Control 57(6), 1269–1271 (1993). Translated from German original of 1933
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795–811. Springer (2016)
Konecny, J., Qu, Z., Richtarik, P.: Semi-stochastic coordinate descent (2014). Preprint arXiv:1412.6293
Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)
Liu, J., Wright, S.: An accelerated randomized Kaczmarz algorithm. Math. Comput. 85, 153–178 (2016)
Liu, J., Wright, S.J., Ré, C., Bittorf, V.: An asynchronous parallel stochastic coordinate descent algorithm. In: The Proceedings of the International Conference on Machine Learning (ICML) (2014)
Loera, J.A.D., Haddock, J., Needell, D.: A sampling Kaczmarz–Motzkin algorithm for linear feasibility. SIAM J. Sci. Comput. 39(5), S66–S87 (2017)
Ma, A., Needell, D., Ramdas, A.: Convergence properties of the randomized extended Gauss–Seidel and Kaczmarz methods. SIAM J. Matrix Anal. Appl. 36(4), 1590–1604 (2015)
Motzkin, T.S., Schoenberg, I.J.: The relaxation method for linear inequalities. Can. J. Math. 6, 393–404 (1954)
Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. In: Mathematical Programming, pp. 1–39 (2018)
Needell, D.: Randomized Kaczmarz solver for noisy linear systems. BIT Numer. Math. 50(2), 395–403 (2010)
Needell, D., Tropp, J.A.: Paved with good intentions: analysis of a randomized block Kaczmarz method. Linear Algebra Appl. 441, 199–221 (2013)
Ortega, J.M.: Numerical analysis: a second course. SIAM, Philadelphia (1990)
Oswald, P., Zhou, W.: Convergence analysis for Kaczmarz-type methods in a Hilbert space framework. Linear Algebra Appl. 478, 131–161 (2015)
Oswald, P., Zhou, W.: Random reordering in SOR-type methods. Numer. Math. 135(4), 1207–1220 (2017)
Saad, Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia (2003)
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15(2), 262 (2008)
Wheaton, I., Awoniyi, S.: A new iterative method for solving non-square systems of linear equations. J. Comput. Appl. Math. 322(Supplement C), 1–6 (2017)
Yu, A.W., Lin, Q., Yang, T.: Doubly stochastic primal–dual coordinate method for bilinear saddle-point problem (2015). ArXiv preprint arXiv:1508.03390
Zhang, A., Gu, Q.: Accelerated stochastic block coordinate descent with optimal sampling. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2035–2044. ACM (2016)
Zouzias, A., Freris, N.M.: Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix Anal. Appl. 34(2), 773–793 (2013)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research is supported by the NSFC Grants 61731018 and 61571384, the Peacock project of SRIBD.
Rights and permissions
About this article
Cite this article
Razaviyayn, M., Hong, M., Reyhanian, N. et al. A linearly convergent doubly stochastic Gauss–Seidel algorithm for solving linear equations and a certain class of over-parameterized optimization problems. Math. Program. 176, 465–496 (2019). https://doi.org/10.1007/s10107-019-01404-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-019-01404-0
Keywords
- Gauss–Seidel algorithm
- Linear systems of equations
- Nonuniform block coordinate descent algorithm
- Over-parameterized optimization