[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Choose Your Path Wisely: : Gradient Descent in a Bregman Distance Framework

Published: 01 January 2021 Publication History

Abstract

We propose an extension of a special form of gradient descent---in the literature known as linearized Bregman iteration---to a larger class of nonconvex functions. We replace the classical (squared) two norm metric in the gradient descent setting with a generalized Bregman distance, based on a proper, convex, and lower semicontinuous function. The algorithm's global convergence is proven for functions that satisfy the Kurdyka--Łojasiewicz property. Examples illustrate that features of different scale are being introduced throughout the iteration, transitioning from coarse to fine. This coarse-to-fine approach with respect to scale allows us to recover solutions of nonconvex optimization problems that are superior to those obtained with conventional gradient descent, or even projected and proximal gradient descent. The effectiveness of the linearized Bregman iteration in combination with early stopping is illustrated for the applications of parallel magnetic resonance imaging, blind deconvolution, as well as image classification with neural networks.

References

[1]
A. Aizenbud and D. Gourevitch, Schwartz functions on nash manifolds, Int. Math. Res. Not. IMRN, 2008 (2008).
[2]
H. Attouch and J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Math. Program., 116 (2009), pp. 5--16.
[3]
H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-łojasiewicz inequality, Math. Oper. Res., 35 (2010), pp. 438--457.
[4]
H. Attouch, J. Bolte, and B. F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward--backward splitting, and regularized Gauss--Seidel methods, Math. Program., 137 (2013), pp. 91--129.
[5]
M. Bachmayr and M. Burger, Iterative total variation methods for nonlinear inverse problems, Inverse Problems, 25 (2009), 26, https://doi.org/10.1088/0266-5611/25/10/105004.
[6]
H. H. Bauschke, J. Bolte, and M. Teboulle, A descent lemma beyond Lipschitz gradient continuity: First-order methods revisited and applications, Math. Oper. Res., 42 (2016), pp. 330--348.
[7]
H. H. Bauschke, J. M. Borwein, and P. L. Combettes, Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces, Commun. Contemp. Math., 3 (2001), pp. 615--647.
[8]
H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, CMS Books Math. 408, Springer, New York, 2011.
[9]
S. Becker and J. Fadili, A quasi-newton proximal splitting method, in Advances in Neural Information Processing Systems, 2012, pp. 2618--2626.
[10]
M. Benning, M. M. Betcke, M. J. Ehrhardt, and C.-B. Schönlieb, Gradient descent in a generalised Bregman distance framework, in Geometric Numerical Integration and Its Applications, G. R. W. Quispel, P. Bader, D. I. McLaren, and D. Tagami, eds., MI Lecture Notes 74, Institute of Mathematics for Industry, Kyushu University, Fukuoka, Japan, 2017, pp. 40--45, http://www.imi.kyushu-u.ac.jp/eng/files/imipublishattachment/file/math_58ec341a238fe.pdf.
[11]
M. Benning, L. Gladden, D. Holland, C.-B. Schönlieb, and T. Valkonen, Phase reconstruction from velocity-encoded MRI measurements--a survey of sparsity-promoting variational approaches, J. Magnetic Resonance, 238 (2014), pp. 26--43.
[12]
M. Benning, F. Knoll, C.-B. Schönlieb, and T. Valkonen, Preconditioned ADMM with nonlinear operator constraint, in Proceedings of the IFIP Conference on System Modeling and Optimization, Springer, New York, 2015, pp. 117--126.
[13]
D. Bertsekas, On the Goldstein-Levitin-Polyak gradient projection method, IEEE Trans. Automat. Control, 21 (1976), pp. 174--184.
[14]
D. P. Bertsekas, Incremental gradient, subgradient, and proximal methods for convex optimization: A survey, in Optimization for Machine Learning, S. Sra, S. Nowozin, and S. Wright, eds., MIT Press, Cambridge, MA, 2011, pp. 85--120.
[15]
J. Bolte, P. L. Combettes, and J.-C. Pesquet, Alternating proximal algorithm for blind image recovery, in Proceedings of the IEEE International Conference on Image Processing, 2010, pp. 1673--1676.
[16]
J. Bolte, S. Sabach, and M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program., 146 (2014), pp. 459--494.
[17]
J. Bolte, S. Sabach, M. Teboulle, and Y. Vaisbourd, First Order Methods Beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems, preprint, arXiv:1706.06461, 2017.
[18]
S. Bonettini, I. Loris, F. Porta, and M. Prato, Variable metric inexact line-search based methods for nonsmooth optimization, SIAM J. Optim., 26 (2016), pp. 891--921, http://dx.doi.org/10.1137/15M1019325.
[19]
S. Bonettini, I. Loris, F. Porta, M. Prato, and S. Rebegoldi, On the convergence of a linesearch based proximal-gradient method for nonconvex optimization, Inverse Problems, 33 (2017), http://dx.doi.org/10.1088/1361-6420/aa5bfd.
[20]
L. M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Comput. Math. Math. Phys., 7 (1967), pp. 200--217.
[21]
M. Burger, G. Gilboa, S. Osher, and J. Xu, Nonlinear inverse scale space methods, Commun. Math. Sci., 4 (2006), pp. 179--212.
[22]
M. Burger, M. Möller, M. Benning, and S. Osher, An adaptive inverse scale space method for compressed sensing, Math. Comp., 82 (2013), pp. 269--299.
[23]
M. Burger, E. Resmerita, and L. He, Error estimation for Bregman iterations and inverse scale space methods in image restoration, Computing, 81 (2007), pp. 109--135.
[24]
J.-F. Cai, S. Osher, and Z. Shen, Convergence of the linearized Bregman iteration for $\ell^1$-norm minimization, Math. Comp., 78 (2009), pp. 2127--2136.
[25]
J.-F. Cai, S. Osher, and Z. Shen, Linearized Bregman iterations for compressed sensing, Math. Comp., 78 (2009), pp. 1515--1536.
[26]
P. Campisi and K. Egiazarian, Blind Image Deconvolution: Theory and Applications, CRC Press, Boca Raton, FL, 2016.
[27]
Y. Censor and S. A. Zenios, Proximal minimization algorithm with d-functions, J. Optim. Theory Appl., 73 (1992), pp. 451--464.
[28]
A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Imaging vision, 40 (2011), pp. 120--145.
[29]
A. Chambolle and T. Pock, An introduction to continuous optimization for imaging, Acta Numer., 25 (2016), pp. 161--319.
[30]
T. F. Chan and J. Shen, Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods, SIAM, Philadelphia, 2005.
[31]
T. F. Chan and C.-K. Wong, Total variation blind deconvolution, IEEE Trans. Image Process., 7 (1998), pp. 370--375.
[32]
E. Chouzenoux, J.-C. Pesquet, and A. Repetti, A block coordinate variable metric forward--backward algorithm, J. Global Optim., 66 (2016), pp. 457--485.
[33]
J. Darbon and S. Osher, Fast Discrete Optimization for Sparse Approximations and Deconvolutions, 2007.
[34]
A. Defazio, F. Bach, and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, arXiv:1407.0202v2, 2014.
[35]
D. Drusvyatskiy, A. D. Ioffe, and A. S. Lewis, Nonsmooth Optimization Using Taylor-Like Models: Error Bounds, Convergence, and Termination Criteria, preprint, arXiv:1610.03446, 2016.
[36]
J. Eckstein, Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming, Math. Oper. Res., 18 (1993), pp. 202--226.
[37]
E. Esser, X. Zhang, and T. F. Chan, A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science, SIAM J. Imaging Sci., 3 (2010), pp. 1015--1046.
[38]
K. Frick and O. Scherzer, Regularization of ill-posed linear equations by the non-stationary augmented lagrangian method, J. Integral Equations Appl., 22 (2010), pp. 217--257.
[39]
D. Gabay, Chapter IX applications of the method of multipliers to variational inequalities, in Augmented Lagrangian Methods, Stud. Math. Appl. 15, Elsevier, Amsterdam, 1983, pp. 299--331.
[40]
G. Garrigos, L. Rosasco, and S. Villa, Iterative Regularization via Dual Diagonal Descent, preprint, arXiv:1610.02170, 2016.
[41]
A. A. Goldstein, Convex programming in Hilbert space, Bull. Amer. Math. Soc., 70 (1964), pp. 709--710.
[42]
A. A. Goldstein, Constructive Real Analysis, Tech. report, Department of Mathematics, University of Washington, Seattle, 1967.
[43]
B. Huang, S. Ma, and D. Goldfarb, Accelerated linearized bregman method, J. Sci. Comput., 54 (2013), pp. 428--453.
[44]
R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Proceedings of NIPS, Vol. 1, 2013, pp. 315--323.
[45]
B. Kaltenbacher, F. Schöpfer, and T. Schuster, Iterative methods for nonlinear ill-posed problems in banach spaces: Convergence and applications to parameter identification problems, Inverse Problems, 25 (2009), 065003.
[46]
K. C. Kiwiel, Proximal minimization methods with generalized Bregman functions, SIAM J. Control Optim., 35 (1997), pp. 1142--1168.
[47]
F. Knoll, K. Bredies, T. Pock, and R. Stollberger, MRI Raw Data: T2 Weighted TSE Scan of a Healthy Volunteer (4 Channel Head Coil), https://doi.org/10.5281/zenodo.800525, 2010.
[48]
F. Knoll, C. Clason, K. Bredies, M. Uecker, and R. Stollberger, Parallel imaging with nonlinear reconstruction using variational penalties, Magnetic Resonance in Medicine, 67 (2012), pp. 34--41.
[49]
D. Kundur and D. Hatzinakos, Blind image deconvolution, IEEE Signal Process. Mag., 13 (1996), pp. 43--64, https://doi.org/10.1109/79.489268.
[50]
K. Kurdyka, On gradients of functions definable in o-minimal structures, Ann. Inst. Fourier, 48 (1998), pp. 769--784.
[51]
Y. LeCun, C. Cortes, and C. J. Burges, MNIST Handwritten Digit Database, AT&T Labs, 2 (2010), http://yann.lecun.com/exdb/mnist.
[52]
G. Li and T. K. Pong, Global convergence of splitting methods for nonconvex composite optimization, SIAM J. Optim., 25 (2015), pp. 2434--2460.
[53]
J. Liang, J. Fadili, and G. Peyré, A multi-step inertial forward-backward splitting method for non-convex optimization, in Advances in Neural Information Processing Systems, 2016, pp. 4035--4043.
[54]
P.-L. Lions and B. Mercier, Splitting algorithms for the sum of two nonlinear operators, SIAM J. Numer. Anal., 16 (1979), pp. 964--979.
[55]
S. Lojasiewicz, Une propriété topologique des sous-ensembles analytiques réels, Équations Dérivées Partielles, 117 (1963), pp. 87--89.
[56]
S. Matet, L. Rosasco, S. Villa, and B. L. Vu, Don't Relax: Early Stopping for Convex Regularization, preprint, arXiv:1707.05422, 2017.
[57]
M. Moeller, M. Benning, C. Schönlieb, and D. Cremers, Variational depth from focus reconstruction, IEEE Trans. Image Process., 24 (2015), pp. 5369--5378.
[58]
J.-J. Moreau, Décomposition orthogonale d'un espace hilbertien selon deux cônes mutuellement polaires, C. R. Acad. Sci. Paris, 225 (1962), pp. 238--240.
[59]
J.-J. Moreau, Proximité et dualité dans un espace hilbertien, Bull. Soc. Math. France, 93 (1965), pp. 273--299.
[60]
V. A. Morozov, Methods for Solving Incorrectly Posed Problems, Springer, New York, 2012.
[61]
M. Nikolova and P. Tan, Alternating structure-adapted proximal gradient descent for nonconvex nonsmooth block-regularized problems, SIAM J. Optim., 29 (2019), pp. 2053--2078.
[62]
J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer, New York, 2006.
[63]
P. Ochs, Y. Chen, T. Brox, and T. Pock, iPiano: Inertial proximal algorithm for nonconvex optimization, SIAM J. Imaging Sci., 7 (2014), pp. 1388--1419.
[64]
P. Ochs, J. Fadili, and T. Brox, Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms, preprint, arXiv:1707.02278, 2017.
[65]
S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, An iterative regularization method for total variation-based image restoration, Multiscale Model. Simul., 4 (2005), pp. 460--489.
[66]
S. Osher, F. Ruan, J. Xiong, Y. Yao, and W. Yin, Sparse recovery via differential inclusions, Appl. Comput. Harmon. Anal., 41 (2016), pp. 436--469.
[67]
T. Pock, D. Cremers, H. Bischof, and A. Chambolle, An algorithm for minimizing the Mumford-Shah functional, in Proceedings of the 12th International Conference on Computer Vision, IEEE, 2009, pp. 1133--1140.
[68]
T. Pock and S. Sabach, Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems, SIAM J. Imaging Sci., 9 (2016), pp. 1756--1787.
[69]
M. Prato, S. Bonettini, I. Loris, F. Porta, and S. Rebegoldi, On the constrained minimization of smooth Kurdyka-Ł ojasiewicz functions with the scaled gradient projection method, J. Phys. Conf. Ser., 756 (2016), 012001, https://doi.org/http://dx.doi.org/10.1088/1742-6596/756/1/012001.
[70]
K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger, SENSE: Sensitivity encoding for fast MRI, Magnetic Resonance in Medicine, 42 (1999), pp. 952--62.
[71]
S. Ramani and J. A. Fessler, Parallel MR image reconstruction using augmented lagrangian methods, IEEE Trans. Medical Imaging, 30 (2011), pp. 694--706.
[72]
A. Repetti, M. Q. Pham, L. Duval, E. Chouzenoux, and J.-C. Pesquet, Euclid in a taxicab: Sparse blind deconvolution with smoothed $ell_1$-$ell_2$ regularization, IEEE Signal Process. Lett., 22 (2014), pp. 539--543.
[73]
R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970.
[74]
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Grundlehren Math. Wiss. 317, Springer, New York, 2009.
[75]
F. Schöpfer, A. K. Louis, and T. Schuster, Nonlinear iterative methods for linear ill-posed problems in Banach spaces, Inverse Problems, 22 (2006), 311.
[76]
M. Teboulle, Entropic proximal mappings with applications to nonlinear programming, Math. Oper. Res., 17 (1992), pp. 670--690.
[77]
M. Uecker, T. Hohage, K. T. Block, and J. Frahm, Image reconstruction by regularized nonlinear inversion-joint estimation of coil sensitivities and image content, Magnetic Resonance in Medicine, 60 (2008), pp. 674--682.
[78]
T. Valkonen, A primal--dual hybrid gradient method for nonlinear operators with applications to MRI, Inverse Problems, 30 (2014), 055012.
[79]
H. Wang and A. Banerjee, Bregman alternating direction method of multipliers, in Advances in Neural Information Processing Systems, 2014, pp. 2816--2824.
[80]
Y. Xu and W. Yin, A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion, SIAM J. Imaging Sciences, 6 (2013), pp. 1758--1789.
[81]
Y. Xu and W. Yin, A globally convergent algorithm for nonconvex optimization based on block coordinate update, J. Sci. Comput., (2017), pp. 1--35.
[82]
W. Yin, Analysis and generalizations of the linearized Bregman method, SIAM J. Imaging Sci., 3 (2010), pp. 856--877.
[83]
W. Yin, S. Osher, D. Goldfarb, and J. Darbon, Bregman iterative algorithms for $\ell_1$-minimization with applications to compressed sensing, SIAM J. Imaging Sci., 1 (2008), pp. 143--168.
[84]
M. Zhu and T. Chan, An Efficient Primal-Dual Hybrid Gradient Algorithm for Total Variation Image Restoration, UCLA CAM Report 34, 2008.

Cited By

View all
  • (2023)An alternating structure-adapted Bregman proximal gradient descent algorithm for constrained nonconvex nonsmooth optimization problems and its inertial variantJournal of Global Optimization10.1007/s10898-023-01300-087:1(277-300)Online publication date: 1-Sep-2023
  • (2023)An abstract convergence framework with application to inertial inexact forward–backward methodsComputational Optimization and Applications10.1007/s10589-022-00441-484:2(319-362)Online publication date: 1-Mar-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Imaging Sciences
SIAM Journal on Imaging Sciences  Volume 14, Issue 2
2021
358 pages
EISSN:1936-4954
DOI:10.1137/sjisbi.14.2
Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2021

Author Tags

  1. nonconvex optimization
  2. nonsmooth optimization
  3. gradient descent
  4. Bregman iteration
  5. linearized Bregman iteration
  6. parallel MRI
  7. blind deconvolution
  8. deep learning

Author Tags

  1. 49M37
  2. 65K05
  3. 65K10
  4. 90C26
  5. 90C30

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)An alternating structure-adapted Bregman proximal gradient descent algorithm for constrained nonconvex nonsmooth optimization problems and its inertial variantJournal of Global Optimization10.1007/s10898-023-01300-087:1(277-300)Online publication date: 1-Sep-2023
  • (2023)An abstract convergence framework with application to inertial inexact forward–backward methodsComputational Optimization and Applications10.1007/s10589-022-00441-484:2(319-362)Online publication date: 1-Mar-2023

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media