More Web Proxy on the site http://driver.im/

research-article

Two Symmetrized Coordinate Descent Methods Can Be $O(n^2)$ Times Slower Than the Randomized Version

Authors:

Ruoyu SunAuthors Info & Claims

SIAM Journal on Optimization, Volume 31, Issue 4

Pages 2726 - 2752

https://doi.org/10.1137/19M1292473

Published: 01 January 2021 Publication History

Abstract

Block decomposition algorithms decompose the variable vector into multiple blocks and update a single block with other blocks fixed at each iteration. In each epoch, the blocks are updated according to a certain update order. There are at least two classes of deterministic update orders: nonsymmetric and symmetric. A basic nonsymmetric update order is the cyclic order ($1, 2,\ldots, n$); i.e., the first, second, $\ldots,$ $n$th blocks are updated sequentially in each round. A typical symmetric update order is the symmetric Gauss--Seidel rule ($1, 2, \ldots, n-1, n, n-1, \ldots, 1$). Another example is the Gaussian back substitution rule which starts with the natural order ($1, 2, \ldots, n$) as a prediction step and is followed by a correction step. Recently, coordinate descent with cyclic order was shown to be $\mathcal{O}(n^2)$ times slower than randomized versions in the worst case. A natural question arises: can the symmetrized update orders achieve faster convergence rates than the cyclic order or even get close to the randomized versions? In this paper, we give a negative answer to this question. We show that both Gaussian back substitution and symmetric Gauss--Seidel orders suffer from the same slow convergence issue as the cyclic order in the worst case. In particular, we prove that for unconstrained problems, coordinate descent with these two symmetric update orders can be $\mathcal{O}(n^2)$ times slower than randomized coordinate descent. Besides unconstrained problems, we also empirically study linearly constrained problems with a quadratic objective: we empirically demonstrate that the convergence speed of alternating direction method of multipliers (ADMM) algorithms with these two update orders can be roughly $\mathcal{O}(n^2)$ times slower than randomly permuted ADMM on some problem instances.

References

[1]

J. R. Angelos, C. C. Cowen, and S. K. Narayan, Triangular truncation and finding the norm of a hadamard multiplier, Linear Algebra Appl., 170 (1992), pp. 117--135.

[2]

S. Boyd, N. Parikh, and E. Chu, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Now Publishers, Norwell, MA, 2011.

[3]

X. Cai, D. Han, and X. Yuan, The Direct Extension of ADMM for Three-Block Separable Convex Minimization Models Is Convergent When One Function Is Strongly Convex, preprint, Optimization Online, 2014.

[4]

T. F. C. Chan and R. Glowinski, Finite Element Approximation and Iterative Solution of a Class of Mildly Non-Linear Elliptic Equations, Technical report, Stanford University, Stanford, CA, 1978.

[5]

C.-C. Chang and C.-J. Lin, Libsvm: A library for support vector machines, ACM Trans. Intell. Syst. Technol., 2 (2011), pp. 1--27.

[6]

C. Chen, B. He, Y. Ye, and X. Yuan, The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent, Math. Program., 155 (2016), pp. 57--79.

[7]

C. Chen, M. Li, X. Liu, and Y. Ye, On the Convergence of Multi-Block Alternating Direction Method of Multipliers and Block Coordinate Descent Method, preprint, arXiv:1508.00193, 2015.

[8]

C. Chen, Y. Shen, and Y. You, On the convergence analysis of the alternating direction method of multipliers with three blocks, Abstr. Appl. Anal., 2013 (2013), 183961.

[9]

L. Chen, D. Sun, and K.-C. Toh, An efficient inexact symmetric Gauss--Seidel based majorized ADMM for high-dimensional convex composite conic programming, Math. Program., 161 (2017), pp. 237--270.

[10]

P. D. Coddington, Random number generators for parallel computers, NHSE Rev., 1 (1996).

[11]

W. Deng, M.-J. Lai, Z. Peng, and W. Yin, Parallel multi-block ADMM with ${O} (1/k)$ convergence, J. Sci. Comput., 71 (2017), pp. 712--736.

[12]

O. Fercoq and P. Richtárik, Accelerated, parallel, and proximal coordinate descent, SIAM J. Optim., 25 (2015), pp. 1997--2023.

[13]

C. A. Floudas and P. M. Pardalos, Encyclopedia of Optimization, vol. 1, Springer, New York, 2001.

[14]

J. Friedman, T. Hastie, and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw., 33 (2010), pp. 1--22.

[15]

D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation, Comput. Math. Appl., 2 (1976), pp. 17--40.

[16]

R. Glowinski and A. Marroco, Approximation par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de dirichlet non linéaires, ESAIM Math. Model. Numer. Anal., 9 (1975), pp. 41--76.

[17]

G. Gordon and R. Tibshirani, Coordinate descent, Optimization, 10 (2015), p. 725.

[18]

M. Gürbüzbalaban, A. Ozdaglar, and P. A. Parrilo, Why random reshuffling beats stochastic gradient descent, Math. Program., (2019), pp. 1--36.

[19]

M. Gürbüzbalaban, A. Ozdaglar, N. D. Vanli, and S. J. Wright, Randomness and permutations in coordinate descent methods, Math. Program., 181 (2020), pp. 349--376.

[20]

D. Han and X. Yuan, A note on the alternating direction method of multipliers, J. Optim. Theory Appl., 155 (2012), pp. 227--238.

[21]

D. Han, X. Yuan, and W. Zhang, An augmented lagrangian based parallel splitting method for separable convex minimization with applications to image processing, Math. Comp., 83 (2014), pp. 2263--2291.

[22]

B. He, L. Hou, and X. Yuan, On full Jacobian decomposition of the augmented Lagrangian method for separable convex programming, SIAM J. Optim., 25 (2015), pp. 2274--2312.

[23]

B. He, M. Tao, and X. Yuan, Alternating direction method with Gaussian back substitution for separable convex programming, SIAM J. Optim., 22 (2012), pp. 313--340.

[24]

B. He, M. Tao, and X. Yuan, Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming, Math. Oper. Res., 42 (2017), pp. 577--896.

[25]

B. He, H.-K. Xu, and X. Yuan, On the proximal Jacobian decomposition of ALM for multiple-block separable convex minimization problems and its relationship to ADMM, J. Sci. Comput., 66 (2016), pp. 1204--1217.

[26]

M. Hong, T.-H. Chang, X. Wang, M. Razaviyayn, S. Ma, and Z.-Q. Luo, A Block Successive Upper Bound Minimization Method of Multipliers for Linearly Constrained Convex Optimization, preprint, arXiv:1401.7079, 2014.

[27]

M. Hong and Z.-Q. Luo, On the linear convergence of the alternating direction method of multipliers, Math. Program., 162 (2017), pp. 165--199.

[28]

C. Hsieh, K. Chang, C. Lin, S. S. Keerthi, and S. Sundararajan, A dual coordinate descent method for large-scale linear SVM, in Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5-9, 2008, W. W. Cohen, A. McCallum, and S. T. Roweis, eds., ACM International Conference Proceeding Series 307, ACM, Providence, 2008, pp. 408--415, https://doi.org/10.1145/1390156.1390208.

[29]

C. Hsieh, H. Yu, and I. S. Dhillon, Passcode: Parallel asynchronous stochastic dual co-ordinate descent, in Proceedings of the 32nd International Conference on Machine Learning, F. R. Bach and D. M. Blei, eds., JMLR Workshop Conf. Proc. 37, JMLR, Cambridge, MA, 2015, pp. 2370--2379, http://proceedings.mlr.press/v37/hsieha15.html.

[30]

T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455--500.

[31]

C.-P. Lee and S. J. Wright, Random permutations fix a worst case for cyclic coordinate descent, IMA J. Numer. Anal., 39 (2018), pp. 1246--1275.

[32]

D. Leventhal and A. S. Lewis, Randomized methods for linear constraints: Convergence rates and conditioning, Math. Oper. Res., 35 (2010), pp. 641--654.

[33]

M. Li, D. Sun, and K.-C. Toh, A convergent 3-block semi-proximal ADMM for convex minimization problems with one strongly convex block, Asia-Pac. J. Oper. Res., 32 (2015), 1550024.

[34]

X. Li, D. Sun, and K.-C. Toh, A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions, Math. Program., (2014), pp. 1--41.

[35]

X. Li, D. Sun, and K.-C. Toh, A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions, Math. Program., 155 (2016), pp. 333--373.

[36]

X. Li, T. Zhao, R. Arora, H. Liu, and M. Hong, On faster convergence of cyclic block coordinate descent-type methods for strongly convex minimization, J. Mach. Learn. Res., 18 (2017), pp. 6741--6764.

[37]

Q. Lin, Z. Lu, and L. Xiao, An accelerated randomized proximal coordinate gradient method and its application to regularized empirical risk minimization, SIAM J. Optim., 25 (2015), pp. 2244--2273.

[38]

T. Lin, S. Ma, and S. Zhang, On the sublinear convergence rate of multi-block ADMM, J. Oper. Res. Soc. China, 3 (2015), pp. 251--274.

[39]

T. Lin, S. Ma, and S. Zhang, On the global linear convergence of the ADMM with multiblock variables, SIAM J. Optim., 25 (2015), pp. 1478--1497.

[40]

T. Lin, S. Ma, and S. Zhang, Iteration complexity analysis of multi-block admm for a family of convex minimization without strong convexity, J. Sci. Comput., 69 (2016), pp. 52--81.

[41]

J. Liu, S. J. Wright, C. Ré, V. Bittorf, and S. Sridhar, An asynchronous parallel stochastic coordinate descent algorithm, in Proceedings of the 31th International Conference on Machine Learning, JMLR Workshop Conf. Proc., JMLR, Cambridge, MA, 2014, pp. 469--477, http://proceedings.mlr.press/v32/liud14.html.

[42]

Z. Lu and L. Xiao, On the complexity analysis of randomized block-coordinate descent methods, Math. Program., 152 (2015), pp. 615--642.

[43]

Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22 (2012), pp. 341--362.

[44]

A. Patrascu and I. Necoara, Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization, J. Global Optim., 61 (2015), pp. 19--46.

[45]

J. Platt, Fast training of support vector machines using sequential minimal optimization. advances in kernel methods---support vector learning, in Advances in Kernel Methods, MIT Press, Cambridge, MA, 1999, pp. 185--208.

[46]

B. Recht and C. Ré, Beneath the Valley of the Noncommutative Arithmetic-Geometric Mean Inequality: Conjectures, Case-Studies, and Consequences, preprint, arXiv:1202.4184, 2012.

[47]

P. Richtárik and M. Takáč, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Math. Program., 144 (2014), pp. 1--38.

[48]

S. Shalev-Shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss minimization, J. Mach. Learn. Res., 14 (2013), pp. 567--599.

[49]

J. Sherman and W. J. Morrison, Adjustment of an inverse matrix corresponding to a change in one element of a given matrix, Ann. Math. Statist., 21 (1950), pp. 124--127, https://doi.org/10.1214/aoms/1177729893, http://projecteuclid.org/euclid.aoms/1177729893.

[50]

Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, An iteratively weighted mmse approach to distributed sum-utility maximization for a mimo interfering broadcast channel, IEEE Trans. Signal Process., 59 (2011), pp. 4331--4340.

[51]

S. Sra, Explicit diagonalization of an anti-triangular Cesaró matrix, preprint, arXiv:1411.4107, 2014.

[52]

D. Sun, K.-C. Toh, and L. Yang, A convergent proximal alternating direction method of multipliers for conic programming with 4-block constraints, SIAM J. Optim., 25 (2015), pp. 882--915.

[53]

R. Sun and M. Hong, Improved iteration complexity bounds of cyclic block coordinate descent for convex problems, in Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, eds., Curran Associates, Red Hook, NY, 2015, pp. 1306--1314, https://proceedings.neurips.cc/paper/2015/hash/96b9bff013acedfb1d140579e2fbeb63-Abstract.html.

[54]

R. Sun, Z.-Q. Luo, and Y. Ye, On the expected convergence of randomly permuted ADMM, Math. Oper. Res., 45 (2019), pp. 233--271.

[55]

R. Sun and Y. Ye, Worst-case complexity of cyclic coordinate descent: ${O}(n^{2})$ gap with randomized version, Math. Program., 185 (2021), pp. 487--520.

[56]

D. B. Thomas, L. Howes, and W. Luk, A comparison of cpus, gpus, fpgas, and massively parallel processor arrays for random number generation, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2009, pp. 63--72.

[57]

P. Tseng, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., 109 (2001), pp. 475--494.

[58]

S. J. Wright, Coordinate descent algorithms, Math. Program., 151 (2015), pp. 3--34.

[59]

S. J. Wright and C.-P. Lee, Analyzing random permutations for cyclic coordinate descent, Math. Comp., 89 (2020), pp. 2217--2248.

[60]

Y. Zhang and X. Lin, Stochastic primal-dual coordinate method for regularized empirical risk minimization, in Proceedings of the 32nd International Conference on Machine Learning, F. R. Bach and D. M. Blei, eds., JMLR Workshop Conf. Proc. 37, JMLR, Cambridge, MA, 2015, pp. 353--361, http://proceedings.mlr.press/v37/zhanga15.html.

[61]

Q. Zheng, P. Richtarik, and T. Zhang, Randomized dual coordinate ascent with arbitrary sampling, preprint, arXiv:1411.5873, 2015.

Recommendations

Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function

In this paper we develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an $$\varepsilon $$-accurate solution with probability at least $$1-...
A coordinate gradient descent method for nonsmooth separable minimization

We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ℓ1-regularization. We propose a (block) coordinate ...
On the complexity analysis of randomized block-coordinate descent methods

In this paper we analyze the randomized block-coordinate descent (RBCD) methods proposed in Nesterov (SIAM J Optim 22(2):341---362, 2012), Richtárik and Takáă (Math Program 144(1---2):1---38, 2014) for minimizing the sum of a smooth convex function and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Optimization

SIAM Journal on Optimization Volume 31, Issue 4

DOI:10.1137/sjope8.31.4

Issue’s Table of Contents

© 2021, Society for Industrial and Applied Mathematics.

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2021

Author Tags

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents