Block Kaczmarz Method with Inequalities

Jonathan Briskman¹ &
Deanna Needell¹

665 Accesses
18 Citations
Explore all metrics

Abstract

The randomized Kaczmarz method is an iterative algorithm that solves systems of linear equations. Recently, the randomized method was extended to systems of equalities and inequalities by Leventhal and Lewis. Even more recently, Needell and Tropp provided an analysis of a block version of this randomized method for systems of linear equations. This paper considers the use of a block type method for systems of mixed equalities and inequalities, bridging these two bodies of work. We show that utilizing a matrix paving over the equalities of the system can lead to significantly improved convergence, and prove a linear convergence rate as in the standard block method. We also demonstrate that using blocks of inequalities offers similar improvement only when the system satisfies a certain geometric property. We support the theoretical analysis with several experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

On a fast deterministic block Kaczmarz method for solving large-scale linear systems

Article 18 June 2021

On a Nonlinear Fast Deterministic Block Kaczmarz Method for Solving Nonlinear Equations

Article 23 August 2024

Efficient randomized block Kaczmarz method for linear feasibility

Article 07 December 2024

Notes

This assumption is both for notational convenience, and because the use of matrix pavings discussed below only hold for standardized matrices. In practice, one can employ pre-conditioning on non-standardized systems, or extend the construction of matrix pavings to non-standardized systems [37].
The standard definition of a row paving also includes a constant $\alpha $ which serves as a lower bound to the smallest singular value. We ignore that parameter here since it will not be utilized.

References

Bourgain, J., Tzafriri, L.: Invertibility of “large” submatrices with applications to the geometry of Banach spaces and harmonic analysis. Israel J. Math. 57(2), 137–224 (1987). doi:10.1007/BF02772174
Article MATH MathSciNet Google Scholar
Bourgain, J., Tzafriri, L.: On a problem of Kadison and Singer. J. Reine Angew. Math. 420, 1–43 (1991). JRMAA8; 46L05 (46L30 47B35 47D25); 1124564 (92j:46104); H. Halpern.
Byrne, C.L.: Applied Iterative Methods. A K Peters Ltd., Wellesley (2008)
MATH Google Scholar
Censor, Y., Eggermont, P.P.B., Gordon, D.: Strong underrelaxation in kaczmarz’s method for inconsistent systems. Numer. Math. 41(1), 83–92 (1983)
Article MATH MathSciNet Google Scholar
Censor, Y.: Row-action methods for huge and sparse systems and their applications. SIAM Rev. 23(4), 444–466 (1981)
Article MATH MathSciNet Google Scholar
Chen, X., Powell, A.: Almost sure convergence of the Kaczmarz algorithm with random measurements. J. Fourier Anal. Appl. 1–20 (2012).doi:10.1007/s00041-012-9237-2
Chrétien, S., Darses, S.: Invertibility of random submatrices via tail decoupling and a matrix chernoff inequality. Stat. Probab. Lett. 82(7), 1479–1487 (2012)
Article MATH Google Scholar
Eggermont, P.P.B., Herman, G.T., Lent, A.: Iterative algorithms for large partitioned linear systems, with applications to image reconstruction. Linear Algebra Appl. 40, 37–67 (1981). doi:10.1016/0024-3795(81)90139-7
Article MATH MathSciNet Google Scholar
Elfving, T.: Block-iterative methods for consistent and inconsistent linear equations. Numer. Math. 35(1), 1–12 (1980). doi:10.1007/BF01396365
Article MATH MathSciNet Google Scholar
Feichtinger, H.G., Cenker, C., Mayer, M., Steier, H., Strohmer, T.: New variants of the POCS method using affine subspaces of finite codimension with applications to irregular sampling. In: Applications in Optical Science and Engineering, pp. 299–310. International Society for Optics and Photonics, Bellingham (1992).
Feichtinger, H.G., Strohmer, T.: A kaczmarz-based approach to nonperiodic sampling on unions of rectangular lattices. In: SampTA ’95: 1995 Workshop on Sampling Theory and Applications, pp. 32–37. Jurmala, Latvia (1995).
Gordon, R., Bender, R., Herman, G.T.: Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography. J. Theor. Biol. 29, 471–481 (1970)
Article Google Scholar
Hamaker, C., Solmon, D.C.: The angles between the null spaces of X-rays. J. Math. Anal. Appl. 62(1), 1–23 (1978)
Article MATH MathSciNet Google Scholar
Hanke, M., Niethammer, W.: On the acceleration of kaczmarz’s method for inconsistent linear systems. Linear Algebra Appl. 130, 83–98 (1990)
Article MATH MathSciNet Google Scholar
Herman, G., Meyer, L.: Algebraic reconstruction techniques can be made computationally efficient. IEEE Trans. Med. Imaging 12(3), 600–609 (1993)
Article Google Scholar
Herman, G.T.: Fundamentals of Computerized Tomography: Image Reconstruction from Projections. Springer, New York (2009)
Book Google Scholar
Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Nat. Bur. Stand. 49, 263–265 (1952)
Article Google Scholar
Kaczmarz, S.: Angenäherte auflösung von systemen linearer gleichungen. Bull. Int. Acad. Polon. Sci. Lett. Ser. A 35, 335–357 (1937)
Google Scholar
Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010). doi:10.1287/moor.1100.0456
Article MATH MathSciNet Google Scholar
Liu, J., Wright, S.J., Srikrishna, S.: An asynchronous parallel randomized kaczmarz algorithm (2014). Available at arXiv:1401.4780.
Natterer, F.: The mathematics of computerized tomography, Classics in Applied Mathematics, vol. 32. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2001). doi: 10.1137/1.9780898719284. Reprint of the 1986 original.
Needell, D.: Randomized Kaczmarz solver for noisy linear systems. BIT 50(2), 395–403 (2010). doi:10.1007/s10543-010-0265-5
Article MATH MathSciNet Google Scholar
Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent and the randomized kaczmarz algorithm (2013). Submitted.
Needell, D., Tropp, J.A.: Paved with good intentions: analysis of a randomized block Kaczmarz method. Linear Algebra Appl. 441, 199–221 (2014)
Article MATH MathSciNet Google Scholar
Needell, D., Zhao, R., Zouzias, A.: Randomized block kaczmarz method with projection for solving least squares (2014). Submitted.
Popa, C.: Block-projections algorithms with blocks containing mutually orthogonal rows and columns. BIT 39(2), 323–338 (1999). doi:10.1023/A:1022398014630
Article MATH MathSciNet Google Scholar
Popa, C.: A fast Kaczmarz–Kovarik algorithm for consistent least-squares problems. Korean J. Comput. Appl. Math. 8(1), 9–26 (2001)
MATH MathSciNet Google Scholar
Popa, C.: A Kaczmarz–Kovarik algorithm for symmetric ill-conditioned matrices. An. Ştiinţ. Univ. Ovidius Constanţa Ser. Mat 12(2), 135–146 (2004)
MATH Google Scholar
Recht, B., Ré, C.: Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case studies, and consequences. In: Proceedings of the 25th Annual Conference on Learning Theory, Edinburgh (2012).
Sezan, M.I., Stark, H.: Applications of convex projection theory to image recovery in tomography and related areas. In: Stark, H. (ed.) Image Recovery: Theory and Application, pp. 155–270. Elsevier, Amsterdam (1987)
Google Scholar
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15(2), 262–278 (2009). doi:10.1007/s00041-008-9030-4
Article MATH MathSciNet Google Scholar
Tanabe, K.: Projection method for solving a singular system of linear equations and its applications. Numer. Math. 17(3), 203–214 (1971)
Article MATH MathSciNet Google Scholar
Tropp, J.A.: The random paving property for uniformly bounded matrices. Studia Math. 185(1), 67–82 (2008). doi:10.4064/sm185-1-4. 46B09 (15A52 46B20 60E15); 2379999 (2008k:46030); Sasha Sodin
Tropp, J.A.: Column subset selection, matrix factorization, and eigenvalue optimization. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 978–986. SIAM, Philadelphia (2009).
Tropp, J.A.: Improved analysis of the subsampled randomized hadamard transform. Adv. Adapt. Data Anal. 3(01n02), 115–126 (2011)
Article MATH MathSciNet Google Scholar
Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12(4), 389–434 (2012)
Article MATH MathSciNet Google Scholar
Vershynin, R.: John’s decompositions: Selecting a large part, Israel Journal of Mathematics, 122(1), 253–277 (2001). Inst. Math. Statist, Beachwood, OH (2006). doi: 10.1214/074921706000000815. 46B09 (46B07 46B20); 2387766 (2009h:46023); Dirk Werner
Vershynin, R.: Random sets of isomorphism of linear operators on Hilbert space, High dimensional probability, vol. 51, pp. 148–154. Inst. Math. Statist, Beachwood, OH (2006). doi:10.1214/074921706000000815. 46B09 (46B07 46B20); 2387766 (2009h:46023); Dirk Werner
Whitney, T.M., Meany, R.K.: Two algorithms related to the method of steepest descent. SIAM J. Numer. Anal. 4(1), 109–118 (1967)
Article MATH MathSciNet Google Scholar
Xu, J., Zikatanov, L.: The method of alternating projections and the method of subspace corrections in Hilbert space. J. Amer. Math. Soc. 15(3), 573–597 (2002). doi:10.1090/S0894-0347-02-00398-3
Article MATH MathSciNet Google Scholar
Zouzias, A., Freris, N.M.: Randomized extended kaczmarz for solving least squares. SIAM J. Matrix Anal. A 34(2), 773–793 (2013)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematical Sciences, Claremont McKenna College, Claremont, CA, 91711, USA
Jonathan Briskman & Deanna Needell

Authors

Jonathan Briskman
View author publications
You can also search for this author in PubMed Google Scholar
Deanna Needell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deanna Needell.

Proof of Theorem 2

Proof

Fix an iteration $j$ of Algorithm 3. As in the proof of Theorem 1, if a block of equalities is selected this iteration, then we again have (11). So we next instead consider the case when a block of inequalities is selected, and call this block $\tau '$, and its pruned subset $\sigma $. Set $\varvec{s} = P_S\varvec{x}_{j-1}$, where again $P_S$ denotes the orthogonal projection onto the solution set $S$. If we write $\tilde{H}_i = \{\varvec{x} : \langle \varvec{a_i}, \varvec{x}\rangle \le b_i\}$ and ${H}_i = \{\varvec{x} : \langle \varvec{a_i}, \varvec{x}\rangle = b_i\}$, then by their definitions we have

$$\begin{aligned} \varvec{s} \in \cap _{i\in \sigma }\tilde{H}_i,\quad \varvec{x}_{j-1} \in \cap _{i\in \sigma }\tilde{H}_i^c,\quad \text {and}\quad \varvec{x}_j = P_{\cap _{i\in \sigma }{H}_i}\varvec{x}_{j-1}. \end{aligned}$$

Then since $\sigma $ is part of an obtuse paving, the angle between $\varvec{x}_j - \varvec{x}_{j-1}$ and $\varvec{s} -\varvec{x}_{j-1}$ must be obtuse. There thus exists a point $\varvec{t}$ on the line segment $L = \{\gamma \varvec{x}_{j-1} + (1-\gamma )\varvec{s} : 0 \le \gamma \le 1\}$ such that $\varvec{x}_{j-1} - \varvec{x}_j$ and $\varvec{t} - \varvec{x}_j$ are orthogonal (see Fig. 4).

Now since $\varvec{t} \in L$, we have $||{\varvec{t} - \varvec{x}_{j-1}} ||_2 \le ||{\varvec{x}_{j-1} - \varvec{s}} ||_2$, and thus letting $\theta $ denote the angle between $\varvec{x}_j - \varvec{x}_{j-1}$ and $\varvec{t} - \varvec{x}_{j-1}$ (see Fig. 4), we have

$$\begin{aligned} ||{\varvec{x}_j - \varvec{x}_{j-1}} ||_2&\le ||{\varvec{x}_{j-1} - \varvec{s}} ||_2\cdot \frac{||{\varvec{x}_j - \varvec{x}_{j-1}} ||_2 }{||{\varvec{t} - \varvec{x}_{j-1}} ||_2}\\&= ||{\varvec{x}_{j-1} - \varvec{s}} ||_2\cdot \cos \theta \\&= \frac{||{\varvec{x}_j - \varvec{x}_{j-1}} ||_2\cdot ||{\varvec{s} - \varvec{x}_{j-1}} ||_2\cdot \cos \theta }{||{\varvec{x}_j - \varvec{x}_{j-1}} ||_2}\\&= \frac{\langle \varvec{s} - \varvec{x}_{j-1}, \varvec{x}_j - \varvec{x}_{j-1}\rangle }{||{\varvec{x}_j - \varvec{x}_{j-1}} ||_2}\\&= \frac{-\langle \varvec{x}_{j-1} - \varvec{s}, \varvec{x}_j - \varvec{x}_{j-1}\rangle }{||{\varvec{x}_j - \varvec{x}_{j-1}} ||_2}. \end{aligned}$$

Thus, we have that

$$\begin{aligned} \langle \varvec{x}_{j-1} - \varvec{s}, \varvec{x}_{j} - \varvec{x}_{j-1}\rangle \le -||{\varvec{x}_j - \varvec{x}_{j-1}} ||_2^2. \end{aligned}$$

By the definition of $\varvec{x}_j$, this means that

$$\begin{aligned} \langle \varvec{x}_{j-1} - \varvec{s}, \varvec{A}_{\sigma }^\dagger (\varvec{b}_{\sigma } - \varvec{A}_{\sigma }\varvec{x}_{j-1})\rangle \le - ||{{\varvec{A}_{\sigma }^\dagger }(\varvec{b}_{\sigma } - \varvec{A}_{\sigma }\varvec{x}_{j-1})} ||_2^2. \end{aligned}$$

(12)

Using this along with the paving properties we see that

$$\begin{aligned} ||{\varvec{x}_{j} - \varvec{s}} ||_2^2&= ||{\varvec{x}_{j-1} - \varvec{s} + {\varvec{A}_{\sigma }^\dagger }(\varvec{b}_{\sigma } - \varvec{A}_{\sigma }\varvec{x}_{j-1})} ||_2^2\\&= ||{\varvec{x}_{j-1} \!-\! \varvec{s}} ||_2^2 \!+\! 2\langle \varvec{x}_{j-1} \!-\! \varvec{s}, \varvec{A}_{\sigma }^\dagger (\varvec{b}_{\sigma } \!-\! \varvec{A}_{\sigma }\varvec{x}_{j-1})\rangle \\&\quad + ||{\varvec{A}_{\sigma }^\dagger (\varvec{b}_{\sigma } - \varvec{A}_{\sigma }\varvec{x}_{j-1})} ||_2^2\\&\le ||{\varvec{x}_{j-1} - \varvec{s}} ||_2^2 - ||{\varvec{A}_{\sigma }^\dagger (\varvec{b}_{\sigma } - \varvec{A}_{\sigma }\varvec{x}_{j-1})} ||_2^2\\&= d(\varvec{x}_{j-1}, S)^2 - ||{\varvec{A}_{\sigma }^\dagger (\varvec{b}_{\sigma } - \varvec{A}_{\sigma }\varvec{x}_{j-1})} ||_2^2\\&\le d(\varvec{x}_{j-1}, S)^2 - \frac{1}{\beta '}||{\varvec{b}_{\sigma } - \varvec{A}_{\sigma }\varvec{x}_{j-1}} ||_2^2. \end{aligned}$$

Thus, taking expectation (over the choice of $\tau '$, conditioned on previous choices), yields

$$\begin{aligned} \mathbb {E}[d(\varvec{x}_{j}, S)^2]&\le \mathbb {E}||{\varvec{x}_{j} - \varvec{s}} ||_2^2\\&\le d(\varvec{x}_{j-1}, S)^2 - \frac{1}{\beta '}\mathbb {E}||{\varvec{b}_{\sigma } - \varvec{A}_{\sigma }\varvec{x}_{j-1}} ||_2^2\\&= d(\varvec{x}_{j-1}, S)^2 - \frac{1}{\beta '}\mathbb {E}||{e(\varvec{b}_{\tau '} - \varvec{A}_{\tau '}\varvec{x}_{j-1})} ||_2^2\\&=\! d(\varvec{x}_{j-1}, S)^2 \!-\! \frac{1}{m'\beta '}\sum _{\tau '\in T'}||{e(\varvec{b}_{\tau '} \!-\! \varvec{A}_{\tau '}\varvec{x}_{j-1})} ||_2^2\\&= d(\varvec{x}_{j-1}, S)^2 - \frac{1}{m'\beta '}||{e(\varvec{b}_{\le } - \varvec{A}_{\le }\varvec{x}_{j-1})} ||_2^2. \end{aligned}$$

Combining this with (11) and letting $E_=$ and $E_\le $ denote the events that a block from $T$ and a block from $T'$ is selected, respectively, we have

$$\begin{aligned}&\mathbb {E}\left[ (d(\varvec{x}_j,S)^2\right] = p \cdot \mathbb {E}[ d(\varvec{x}_{j},S)^2 | E_=] + (1-p) \cdot \mathbb {E}[ d(\varvec{x}_{j},S)^2 | E_{\le }] \\&\le p\left[ d(\varvec{x}_{j-1}, S)^2 - \frac{1}{\ \beta m} \sum _{i \in I_=} e(\varvec{A}_{=} \varvec{x}_{j-1} - \varvec{b}_=)_{i}^2\right] \\&\;\;+ (1-p) \left[ d(\varvec{x}_{j-1}, S)^2 - \frac{1}{m'\beta '}||{e(\varvec{b}_{\le } - \varvec{A}_{\le }\varvec{x}_{j-1})} ||_2^2\right] \\&= d(\varvec{x}_{j-1}, S)^2 - p \cdot \frac{1}{\ \beta m} \sum _{i \in I_=} e(\varvec{A}_{=} \varvec{x}_{j-1} - \varvec{b}_=)_{i}^2\\&\;\;- (1-p) \cdot \frac{1}{m'\beta '}||{e(\varvec{b}_{\le } - \varvec{A}_{\le }\varvec{x}_{j-1})} ||_2^2 \end{aligned}$$

Since $p = \frac{\beta m}{\beta ' m' + \beta m}$, we have $\frac{1-p}{\beta ' m'} = \frac{1}{\beta ' m'+\beta m}$ and we can simplify

$$\begin{aligned} \mathbb {E}\left[ d(\varvec{x}_j,S)^2\right]&\le d(\varvec{x}_{j-1}, S)^2\\&- \frac{1}{\beta ' m' + \beta m} \Big [\sum _{i \in I_=} e(\varvec{A}_{=} \varvec{x}_{j-1} - \varvec{b}_=)_{i}^2 \\&\;\;+ ||{e(\varvec{b}_{\le } - \varvec{A}_{\le }\varvec{x}_{j-1})} ||_2^2 \Big ] \\&= d(\varvec{x}_{j-1}, S)^2 - \frac{1}{\beta ' m' + \beta m} ||{e(\varvec{A} \varvec{x}_{j-1} - \varvec{b})} ||_2^2 \\&\le d(\varvec{x}_{j-1}, S)^2 - \frac{1}{L^2(\beta ' m' + \beta m)} \cdot d(\varvec{x}_{j-1},S)^2 \\&= \left[ 1 - \frac{1}{L^2(\beta ' m' + \beta m)}\right] d(\varvec{x}_{j-1},S)^2, \end{aligned}$$

where we have utilized the Hoffman bound (7) in the second inequality.

Iterating this relation along with independence of the random control completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Briskman, J., Needell, D. Block Kaczmarz Method with Inequalities. J Math Imaging Vis 52, 385–396 (2015). https://doi.org/10.1007/s10851-014-0539-7

Download citation

Received: 27 June 2014
Accepted: 08 September 2014
Published: 17 September 2014
Issue Date: July 2015
DOI: https://doi.org/10.1007/s10851-014-0539-7