[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Peaceman–Rachford splitting for a class of nonconvex optimization problems

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We study the applicability of the Peaceman–Rachford (PR) splitting method for solving nonconvex optimization problems. When applied to minimizing the sum of a strongly convex Lipschitz differentiable function and a proper closed function, we show that if the strongly convex function has a large enough strong convexity modulus and the step-size parameter is chosen below a threshold that is computable, then any cluster point of the sequence generated, if exists, will give a stationary point of the optimization problem. We also give sufficient conditions guaranteeing boundedness of the sequence generated. We then discuss one way to split the objective so that the proposed method can be suitably applied to solving optimization problems with a coercive objective that is the sum of a (not necessarily strongly) convex Lipschitz differentiable function and a proper closed function; this setting covers a large class of nonconvex feasibility problems and constrained least squares problems. Finally, we illustrate the proposed algorithm numerically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. This slightly improves [25, Theorem 4] because [25, Theorem 4] assumed a slightly stronger condition that f and g are bounded below and one of them is coercive.

  2. We refer the readers to, for example, [1, 2, 7, 8], for the definition and examples of KL functions. In particular, if f and g are proper closed semi-algebraic functions, then \({\mathfrak {P}}_\gamma \) is a KL function for any \(\gamma > 0\).

  3. One natural choice of \(\beta \) is to set \(\beta = 5\) so that \(\max _{\beta > 2}\frac{\beta -2}{(\beta + 1)^2L_F} = \frac{1}{12L_F}\) is attained. However, we discover in our numerical experiments that a smaller \(\beta > 2\) coupled with a suitable heuristic for updating \(\gamma \) leads to faster convergence.

  4. We choose \(tol = 10^{-8}\), and we report \(\frac{1}{2}\Vert Az^t - b\Vert ^2\) for both methods.

  5. We choose \(tol = 10^{-5}\), and we report \(\frac{1}{2}\Vert Az^t - b\Vert ^2\) for both methods.

  6. For both methods, we report \(\frac{1}{2} d_C^2(z^t)\).

  7. We declare a failure if the function value at termination is above \(10^{-6}\), and a success if the value is below \(10^{-12}\).

References

  1. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems. An approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  2. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bauschke, H.H., Borwein, J.M.: On the convergence of von Neumann’s alternating projection algorithm for two sets. Set-Valued Anal. 1, 185–212 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38, 367–426 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

  6. Bogdan, M., van den Berg, E., Su, W., Candès, E.: Statistical estimation and testing via the sorted L1 norm. Preprint (2013). Available at arxiv:1310.1969

  7. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2007)

    Article  MATH  Google Scholar 

  8. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18, 556–572 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  9. Borwein, J.M., Li, G., Yao, L.J.: Analysis of the convergence rate for the cyclic projection algorithm applied to basic semialgebraic convex sets. SIAM J. Optim. 24, 498–527 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51, 34–81 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  11. Candès, E., Tao, T.: The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). Ann. Statist. 35, 2313–2351 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  12. Combettes, P.L.: Iterative construction of the resolvent of a sum of maximal monotone operators. J. Convex Anal. 16, 727–748 (2009)

    MathSciNet  MATH  Google Scholar 

  13. Combettes, P.L., Pesquet, J.-C.: A Douglas–Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE J. Sel. Top. Signal Process. 1(4), 564–574 (2007)

    Article  Google Scholar 

  14. Dobra, A.: Variable selection and dependency networks for genomewide data. Biostatistics 10, 621–639 (2009)

    Article  Google Scholar 

  15. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two or three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  16. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  17. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  18. Giselsson, P., Boyd, S.: Diagonal scaling in Douglas–Rachford splitting and ADMM. In: Proceedings of the 53rd IEEE Conference on Decision and Control, pp. 5033–5039 (2014)

  19. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  20. Hesse, R., Luke, D.R.: Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems. SIAM J. Optim. 23, 2397–2419 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  21. Hong, M., Luo, Z.-Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26, 337–364 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  22. Knight, K., Fu, W.: Asymptotics for the lasso-type estimators. Ann. Stat. 28, 1356–1378 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  23. Kyrillidis, A., Becker, S., Cevher, V., Koch, C.: Sparse projections onto the simplex. JMLR W&CP 28, 235–243 (2013)

    Google Scholar 

  24. Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25, 2434–2460 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  25. Li, G., Pong, T.K.: Douglas–Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Math. Program. 159, 371–401 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  26. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lu, Z., Pong, T.K., Zhang, Y.: An alternating direction method for finding Dantzig selectors. Comput. Stat. Data Anal. 56, 4037–4046 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  28. Luke, D.R.: Finding best approximation pairs relative to a convex and a prox-regular set in a Hilbert space. SIAM J. Optim. 19, 714–739 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  29. Patrinos, P., Stella, L., Bemporad, A.: Douglas–Rachford splitting: complexity estimates and accelerated variants. In: Proceedings of the 53rd IEEE Conference on Decision and Control, pp. 4234–4239 (2014)

  30. Peaceman, D.W., Rachford, H.H.: The numerical solution of parabolic and elliptic differential equations. J. Soc. Ind. Appl. Math. 3, 28–41 (1955)

    Article  MathSciNet  MATH  Google Scholar 

  31. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)

    Book  MATH  Google Scholar 

  32. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  33. Wang, H., Banerjee, A.: Bregman alternating direction method of multipliers. NIPS 27, 2816–2824 (2014)

    Google Scholar 

  34. Yeung, K.Y., Bumgarner, R.E., Raftery, A.E.: Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21, 2394–2402 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Kei Pong.

Additional information

G. Li was partially supported by a Research Grant from Australian Research Council. T. Liu was supported partly by the AMSS-PolyU Joint Research Institute Postdoctoral Scheme. T. K. Pong was supported partly by Hong Kong Research Grants Council PolyU253008/15p.

Appendix: concrete numerical examples

Appendix: concrete numerical examples

In this appendix, we provide some simple and concrete examples illustrating the different behaviors of the classical PR splitting method, the classical DR splitting method and our proposed PR splitting method (33).

The first example shows that, even in the convex setting, the classical PR splitting method can be faster than the classical DR splitting method, and our proposed PR method can outperform the classical DR method for some particular choice of the parameter \(\gamma \). The second example on nonconvex feasibility problem shows that the classical PR method can diverge while our proposed PR method converges linearly to a solution for the feasibility problem.

Example 1

(Classical DR splitting method vs. classical/proposed PR method) Consider \(f(x)=\Vert x\Vert ^2\) and \(g(x)=0\) for all \(x \in \mathrm{I}\!\mathrm{R}^n\). Then, a direct verification shows that, for any \(\gamma >0\),

$$\begin{aligned} \mathrm{prox}_{\gamma f}(z) = \mathop {\hbox {arg min}}_u\left\{ \gamma \Vert u\Vert ^2 + \frac{1}{2} \Vert u - z\Vert ^2\right\} =\frac{z}{2\gamma +1} \end{aligned}$$

and

$$\begin{aligned} \mathrm{prox}_{\gamma g}(z) = \mathop {\hbox {arg min}}_u\left\{ \frac{1}{2} \Vert u - z\Vert ^2\right\} =z. \end{aligned}$$

Thus, the classical DR method reads

$$\begin{aligned} x^{t+1}&=\frac{I+(2\mathrm{prox}_{\gamma g} - I)\circ (2\mathrm{prox}_{\gamma f} - I)}{2}(x^t)\\&=\frac{1}{2\gamma +1} x^t =\cdots = \left( \frac{1}{2\gamma +1}\right) ^{t+1} x^0, \end{aligned}$$

while the classical PR method reads

$$\begin{aligned} x^{t+1}=(2\mathrm{prox}_{\gamma g} - I)\circ (2\mathrm{prox}_{\gamma f} - I)(x^t)=\frac{1-2\gamma }{2\gamma +1} x^t =\cdots = \left( \frac{1-2\gamma }{2\gamma +1}\right) ^{t+1} x^0. \end{aligned}$$

Thus, for this example, the classical PR method converges faster than the classical DR method when \(\gamma \in (0,1)\).

Moreover, let \(\beta =2.5\) and \(\gamma <\frac{\beta -2}{(\beta +1)^2L_F}=\frac{1}{49}\). Then, the proposed PR method (33) reads

$$\begin{aligned} \left\{ \begin{array}{l} y^{t+1}=\mathop {\hbox {arg min}}_{y} \left\{ \frac{7}{2}\Vert y\Vert ^2 + \frac{1}{2\gamma }\Vert y - x^t\Vert ^2\right\} =\frac{1}{1+7\gamma } x^t, \\ z^{t+1}=\mathop {\hbox {arg min}}_{z} \left\{ - \frac{5}{2}\Vert z\Vert ^2 + \frac{1}{2\gamma }\Vert 2y^{t+1} - x^t - z\Vert ^2\right\} =\frac{1}{1-5\gamma }(2y^{t+1}-x^t),\\ x^{t+1}=x^t+2(z^{t+1}- y^{t+1})=\left( 1-\frac{4 \gamma }{(1-5 \gamma )(1+7\gamma )}\right) x^t. \end{array}\right. \nonumber \\ \end{aligned}$$
(50)

Note that, for \(\gamma =0.01<\frac{\beta -2}{(\beta +1)^2L_F}=\frac{1}{49}\), we have

$$\begin{aligned} 0<1-\frac{4 \gamma }{(1-5 \gamma )(1+7\gamma )} \le 0.97 < \frac{1}{2\gamma +1}. \end{aligned}$$

Thus, for \(\gamma =0.01\), our proposed PR method (33) is faster than the classical DR method for this example.

Example 2

(Classical PR method vs. the proposed PR method) Let \(C=\{(0,0)\}\) and \(D=\big (\{0\} \times \mathrm{I}\!\mathrm{R}\big ) \cup \big (\mathrm{I}\!\mathrm{R}\times \{0\}\big )\). We consider the feasibility problem of finding a point in the intersection of C and D. We start with the initial point \(x^0=(a,0)\) with \(a\ne 0\). Then, the classical PR splitting method applies (6) to \(f(x)=\delta _C(x)\) and \(g(x)=\delta _D(x)\) for all \(x \in \mathrm{I}\!\mathrm{R}^2\), and reduces to

$$\begin{aligned} x^{t+1}=(2\mathrm{prox}_{\gamma g} - I)\circ (2\mathrm{prox}_{\gamma f} - I)(x^t)=(2P_D - I)\circ (2P_C - I)(x^t)=-x^{t}. \end{aligned}$$

Thus, the classical PR splitting method diverges and cycles between two points (a, 0) and \((-a,0)\). On the other hand, let \(\beta =5\) and \(\gamma \in \left( 0,\frac{1}{12}\right) \) and consider the proposed PR method (49) for feasibility problems. This algorithm reads

$$\begin{aligned} \left\{ \begin{array}{l} y^{t+1} = \frac{\gamma P_C\left( \frac{x^t}{1 + \beta \gamma }\right) + x^t}{(1 + \beta )\gamma + 1}=\frac{x^t}{6\gamma + 1}, \\ z^{t+1}\in P_D\left( \frac{2y^{t+1} - x^t}{1 - \beta \gamma }\right) =\left\{ \frac{2y^{t+1} - x^t}{1 - 5\gamma }\right\} ,\\ x^{t+1}=x^t+2(z^{t+1}- y^{t+1})=\left( 1-\frac{2\gamma }{(1-5\gamma )(6\gamma +1)}\right) x^t, \end{array}\right. \end{aligned}$$
(51)

where the formula for the z-update follows from the fact that \(x^t\), \(y^t \in \mathrm{I}\!\mathrm{R}\times \{0\}\subset D\), and so is \(2y^{t+1}-x^t\) by the construction. Hence, the proposed PR method (51) converges to \((0,0) \in C \cap D\) linearly in this case.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, G., Liu, T. & Pong, T.K. Peaceman–Rachford splitting for a class of nonconvex optimization problems. Comput Optim Appl 68, 407–436 (2017). https://doi.org/10.1007/s10589-017-9915-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-017-9915-8

Keywords

Navigation