[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Alternating direction method of multipliers for a class of nonconvex bilinear optimization: convergence analysis and applications

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

In this paper, we study a class of nonconvex nonsmooth optimization problems with bilinear constraints, which have wide applications in machine learning and signal processing. We propose an algorithm based on the alternating direction method of multipliers, and rigorously analyze its convergence properties (to the set of stationary solutions). To test the performance of the proposed method, we specialize it to the nonnegative matrix factorization problem and certain sparse principal component analysis problem. Extensive experiments on real and synthetic data sets have demonstrated the effectiveness and broad applicability of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. MULT is coded by the authors of this paper.

  2. Code is available at https://sites.google.com/a/umn.edu/huang663/publications.

  3. Code is available at http://smallk.github.io/

  4. Code is available at http://www.math.ucla.edu/%7Ewotaoyin/papers/bcu/matlab.html.

References

  1. Ames, B., Hong, M.: Alternating directions method of multipliers for l1-penalized zero variance discriminant analysis and principal component analysis (2014). arXiv:1401.5492v2

  2. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  3. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1), 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  4. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 (2011)

  5. Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. 101(12), 4164–4169 (2004)

    Article  Google Scholar 

  6. d’Aspremont, A., Bach, F., Ghaoui, L.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)

    MathSciNet  MATH  Google Scholar 

  7. d’Aspremont, A., Ghaoui, L.E., Jordan, M., Lanckriet, G.: A direct formulation for sparse pca using semidefinite programming. SIAM Rev. 49, 434–V448 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  8. Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization (1989). Ph.D Thesis, Operations Research Center, MIT

  9. Eckstein, J., Yao, W.: Augmented lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. RUTCOR Research Reports 32 (2012)

  10. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)

    Article  MATH  Google Scholar 

  11. Giannakis, G.B., Ling, Q., Mateos, G., Schizas, I.D., Zhu, H.: Decentralized learning for wireless communications and networking. arXiv preprint arXiv:1503.08855 (2015)

  12. Gillis, N.: The why and how of nonnegative matrix factorization (2015). Book Chapter available at arXiv:1401.5226v2

  13. Glowinski, R., Marroco, A.: Sur l’approximation, par elements finis d’ordre un, et la resolution, par penalisation-dualite, d’une classe de problemes de dirichlet non lineares. Revue Franqaise d’Automatique, Informatique et Recherche Opirationelle 9, 41–76 (1975)

    Article  MATH  Google Scholar 

  14. Hajinezhad, D., Chang, T.H., Wang, X., Shi, Q., Hong, M.: Nonnegative matrix factorization using admm: algorithm and convergence analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016)

  15. Hajinezhad, D., Hong, M.: Nonconvex alternating direction method of multipliers for distributed sparse principal component analysis. In: 2015 IEEE International Conference on GlobalSIp 2015 (2015)

  16. Hajinezhad, D., Hong, M., Zhao, T., Wang, Z.: Nestt: a nonconvex primal-dual splitting method for distributed and stochastic optimization. Adv. Neural Inf. Process. Syst. 29, 3207–3215 (2016)

    Google Scholar 

  17. Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Appl. 4, 303–320 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  18. Hong, M., Chang, T.H., Wang, X., Razaviyayn, M., Ma, S., Luo, Z.Q.: A block successive upper bound minimization method of multipliers for linearly constrained convex optimization (2013). Preprint, available online arXiv:1401.7079

  19. Hong, M., Hajinezhad, D., Zhao, M.M.: Prox-PDA: The proximal primal-dual algorithm for fast distributed nonconvex optimization and learning over networks. In: Precup, D., Teh, Y.W. (eds) Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 70, pp. 1529–1538. PMLR, International Convention Centre, Sydney, Australia (2017)

  20. Hong, M., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems pp. 1410–1390 (2014). arXiv:1410.1390v1

  21. Huang, K., Sidiropoulos, N., Liavas, A.P.: A flexible and efficient algorithmic framework for constrained matrix and tensor factorization. arXiv preprint arXiv:1506.04209 (2015)

  22. Jeffers, J.: Two case studies in the application of principal component analysis. Appl. Stat. 16, 225–236 (1967)

    Article  Google Scholar 

  23. Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. arXiv preprint arXiv:1605.02408 (2016)

  24. Jolliffe, I.: Principal Component Analysis. Springer, New York (2002)

    MATH  Google Scholar 

  25. Jolliffe, I., Trendalov, N., Uddin, M.: A modifed principal component technique based on the lasso. J. Comput. Graph. Stat. 12, 531–V547 (2003)

    Article  Google Scholar 

  26. Journee, M., Nesterov, Y., Richtarik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)

    MathSciNet  MATH  Google Scholar 

  27. Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12), 1495–1502 (2007)

    Article  Google Scholar 

  28. Kim, J., Park, H.: Fast nonnegative matrix factorization: an active-set-like method and comparisons. SIAM J. Sci. Comput. 33(6), 3261–3281 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  29. Laboratories, A.: Cambridge orl database of faces. http://www.uk.research.att.com/facedatabase.html

  30. Lee, D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems 13, pp. 556–562. MIT Press (2001)

  31. Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  32. Lin, C.H.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  33. Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  34. Mackey, L.: Deflation methods for sparse pca. Adv. Neural Inf. Process. 21, 1017–1024 (2008)

    Google Scholar 

  35. Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation, I: Basic Theory, II: Applications. Springer, Berlin (2006)

    Book  Google Scholar 

  36. Mordukhovich, B.S., Nam, N.M., Yen, N.D.: Frchet subdifferential calculus and optimality conditions in nondifferentiable programming (2005). Mathematics Research Reports. Paper 29

  37. Pauca, V.P., Shahnaz, F., Berry, M.W., Plemmons, R.J.: Text Mining using Non-Negative Matrix Factorizations, chap. 45, pp. 452–456

  38. Rahmani, M., Atia, G.: High dimensional low rank plus sparse matrix decomposition. arXiv preprint arXiv:1502.00182 (2015)

  39. Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  40. Richtarik, P., Takac, M., Ahipasaoglu, S.D.: Alternating maximization: unifying framework for 8 sparse pca formulations and efficient parallel codes (2012). arXiv:1212.4137

  41. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)

    Book  MATH  Google Scholar 

  42. Sani, A., Vosoughi, A.: Distributed vector estimation for power-and bandwidth-constrained wireless sensor networks. IEEE Trans. Signal Process. 64(15), 3879–3894 (2016)

    Article  MathSciNet  Google Scholar 

  43. Schizas, I., Ribeiro, A., Giannakis, G.: Consensus in ad hoc wsns with noisy links—part I: distributed estimation of deterministic signals. IEEE Trans. Signal Process. 56(1), 350–364 (2008)

    Article  MathSciNet  Google Scholar 

  44. Shen, H., Huang, J.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99, 1015–1034 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  45. Song, D., Meyer, D.A., Min, M.R.: Fast nonnegative matrix factorization with rank-one admm. NIPS 2014 Workshop on Optimization for Machine Learning (OPT2014) (2014)

  46. Sun, D.L., Fevotte, C.: Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: The Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2014)

  47. Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 103(9), 475–494 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  48. Turkmen, A.C.: A review of nonnegative matrix factorization methods for clustering (2015). Preprint, available at arXiv:1507.03194v2

  49. Wang, F., Xu, Z., Xu, H.K.: Convergence of bregman alternating direction method with multipliers for nonconvex composite problems. arXiv preprint arXiv:1410.8625 (2014)

  50. Wang, Y., Yin, W., Zeng, J.: Global convergence of admm in nonconvex nonsmooth optimization. arXiv preprint arXiv:1511.06324 (2015)

  51. Wen, Z., Yang, C., Liu, X., Marchesini, S.: Alternating direction methods for classical and ptychographic phase retrieval. Inverse Prob. 28(11), 1–18 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  52. Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix completion with nonnegative factors. J. Front. Math. China pp. 365–384 (2011)

  53. Zdunek, R.: Alternating direction method for approximating smooth feature vectors in nonnegative matrix factorization. In: Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on, pp. 1–6 (2014)

  54. Zhang, R., Kwok, J.T.: Asynchronous distributed admm for consensus optimization. In: Proceedings of the 31st International Conference on Machine Learning (2014)

  55. Zhang, Y.: An alternating direction algorithm for nonnegative matrix factorization (2010). Preprint

  56. Zhao, Q., Meng, D., Xu, Z., Gao, C.: A block coordinate descent approach for sparse principal component analysis. J. Neurocomput. 153, 180–190 (2015)

    Article  Google Scholar 

  57. Zlobec, S.: On the liu-floudas convexification of smooth programs. J. Glob. Optim. 32(3), 401–407 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  58. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc.: Ser. B (Statistical Methodology) 67, 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  59. Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We would like to thank Dr. Mingyi Hong for sharing his pearls of wisdom with us during this research, and we would also thank anonymous reviewers for their careful and insightful reviews.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davood Hajinezhad.

Additional information

The conference versions of this work appear in [14, 15].

Appendix

Appendix

1.1 Proof of Lemma 1

Since \(\left( X, Z\right) ^{r+1}\) is optimal solution of (22b), it should satisfy the optimality condition which is given by:

$$\begin{aligned} \nabla f(Z^{r+1}) + {\varLambda }^r + \rho (Z^{r+1} - X^{r+1}Y^{r+1}) =0. \end{aligned}$$
(38)

If we combine Eq. (38) with dual update variable (22c), we have:

$$\begin{aligned} {\varLambda }^{r+1} = -\nabla f(Z^{r+1}). \end{aligned}$$
(39)

Applying the Eq. (23), eventually we have

$$\begin{aligned} \Vert {\varLambda }^{r+1}-{\varLambda }^r\Vert _F^2&= \Vert \nabla f(Z^{r+1}) - \nabla f(Z^{r})\Vert _F^2\le L^2\Vert Z^{r+1}-Z^r\Vert _F^2. \end{aligned}$$
(40)

The lemma is proved. Q.E.D.

1.2 Proof of Lemma 2

Part 1 First let us prove that \(r_1(X^{r+1})\le u_1(X^{r+1},X^r)\) which are given in (18) and (19), and similarly we can show that \(r_2(Y^{r+1})\le u_2(Y^{r+1},Y^r)\). When \(r_1\) is convex we have \(u_1(X,X^r)=r_1(X)\), so if we set \(X=X^{r+1}\) we get \(u_1(X^{r+1},X^r)=r_1(X^{r+1})\ge r_1(X^{r+1})\). In the second case when \(r_1\) is concave, we have

$$\begin{aligned} u_1(X,X^r)=r_1(X^r) + h'_1(l_1(X^r))\left[ l_1(X) - l_1(X^r)\right] . \end{aligned}$$

For simplicity, let us do the change of variable \(Z:=l_1(X)\), therefore we have \(r_1(X)=h_1(l_1(X))=h_1(Z).\) Based on the critical property of a concave function (i.e. linear approximation is global upper-estimation for the function) we have for every \(Z,\hat{Z}\in \text{ dom }\,(h_1)\)

$$\begin{aligned} h_1(Z)\le h_1(\hat{Z}) + h'_1(\hat{Z})(Z-\hat{Z}). \end{aligned}$$

If we plug back in \(Z:=l_1(X)\), and \(\hat{Z}:=l_1(\hat{X})\) in the above equation, we have

$$\begin{aligned} h_1(l_1(X))\le h_1(l_1(\hat{X}))+ h'_1(l_1(\hat{X}))\big [l_1(X)-l_1(\hat{X})\big ]. \end{aligned}$$

Now if we set \(X=X^{r+1}\), and \(\hat{X}=X^r\), we reach

$$\begin{aligned} h_1(l_1(X^{r+1}))\le h_1(l_1(X^r))+ h'_1(l_1(X^r))\big [l_1(X^{r+1})-l_1(X^r)\big ] \end{aligned}$$

which is equivalent to \(r_1(X^{r+1})\le u_1(X^{r+1},X^r).\)

Next, for simplicity let us define \(W^r:=(Y^{r}, (X, Z)^{r}; {\varLambda }^{r})\). Then, the successive difference of augmented Lagrangian can be written as the following by adding and subtracting the term \(L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]\)

$$\begin{aligned} L_\rho [W^{r+1}]-L_\rho [W^{r}]=&L_\rho [W^{r+1}]-L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}] \nonumber \\&+L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]-L_\rho [W^{r}]. \end{aligned}$$
(41)

First we bound \(L_\rho [W^{r+1}]-L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]\). Using Lemma 1 we have

$$\begin{aligned}&L_\rho [W^{r+1}]-L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]\nonumber \\&\quad =\langle {\varLambda }^{r+1} - {\varLambda }^{r}, Z^{r+1}-X^{r+1}Y^{r+1}\rangle = \frac{1}{\rho }\Vert {\varLambda }^{r+1} - {\varLambda }^{r}\Vert _F^2 \nonumber \\&\quad {\mathop {\le }\limits ^{(40)}}\frac{L^2}{\rho }\Vert Z^{r+1}-Z^{r}\Vert _F^2. \end{aligned}$$
(42)

Next let us bound \(L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]-L_\rho [W^{r}]\).

$$\begin{aligned}&L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]-L_\rho [W^{r}]\nonumber \\&\quad =L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]-L_\rho [Y^{r+1}, (X, Z)^{r}; {\varLambda }^{r}]\nonumber \\&\quad +L_\rho [Y^{r+1}, (X, Z)^{r}; {\varLambda }^{r}]-L_\rho [W^{r}] \end{aligned}$$
(43)

Suppose that \(\xi _X\in \partial _X H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r]\) is a subgradient of function \(H[(X,Z); X^r, Y^{r+1}, {\varLambda }^r]\) at the point \(X^{r+1}\). Then we have

$$\begin{aligned}&L_\rho [Y^{r+1}, (X, Z)^{r+1}; {\varLambda }^{r}]-L_\rho [Y^{r+1}, (X, Z)^{r}; {\varLambda }^{r}] \nonumber \\&\quad =H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r]-H[(X,Z)^{r}; X^r, Y^{r+1}, {\varLambda }^r]-\frac{\beta }{2}\Vert X^{r+1}-X^r\Vert _F^2\nonumber \\&\qquad +\left[ r_1(X^{r+1})-u_1(X^{r+1},X^r)\right] \nonumber \\&\quad {\mathop {\le }\limits ^\mathrm{(i)}}H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r]-H[(X,Z)^{r}; X^r, Y^{r+1}, {\varLambda }^r]-\frac{\beta }{2}\Vert X^{r+1}-X^r\Vert _F^2\nonumber \\&\quad {\mathop {\le }\limits ^\mathrm{(ii)}}\left\langle \xi _X, X^{r+1}-X^r\right\rangle +\langle \nabla _Z H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r], Z^{r+1}-Z^r \rangle \nonumber \\&\qquad -\frac{\gamma _x}{2}\Vert X^{r+1}-X^r\Vert _F^2-\frac{\gamma _z}{2}\Vert Z^{r+1}-Z^r\Vert _F^2-\frac{\beta }{2}\Vert X^{r+1}-X^r\Vert _F^2\nonumber \\&\quad {\mathop {=}\limits ^\mathrm{(iii)}}-\bigg (\frac{\gamma _x}{2}+\frac{\beta }{2}\bigg )\Vert X^{r+1}-X^r\Vert _F^2-\frac{\gamma _z}{2}\Vert Z^{r+1}-Z^r\Vert _F^2, \end{aligned}$$
(44)

where \(\mathrm{(i)}\) is true because from Eqs. (18) and (19) we conclude that \(r_1(X^{r+1})\le u_1(X^{r+1},X^{r})\), \(\mathrm (ii)\) comes from the strong convexity of \(H[(X,Z); X^r, Y^{r+1}, {\varLambda }^r]\) with respect to X and Z with modulus \(\gamma _x\) and \(\gamma _z\) respectively, and we have \(\mathrm{(iii)}\) due to the optimality condition for the problem (22b), which given by

$$\begin{aligned}&\nabla _Z H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r]=0, \,\text {and}\quad 0\in \partial _X H[(X,Z)^{r+1}; X^r, Y^{r+1}, {\varLambda }^r], \end{aligned}$$

thus, the first term in the second inequality disappears because we implicitly set \(\xi _X=0\).

Next suppose \(\xi _Y\in \partial _Y G[Y^{r+1}; (X,Z)^r, Y^r,{\varLambda }^r]\). Similarly we bound \(L_\rho [Y^{r+1}, (X, Z)^{r}; {\varLambda }^{r}]-L_\rho [W^{r}]\) as follows

$$\begin{aligned} L_\rho [Y^{r+1}, (X, Z)^{r}; {\varLambda }^{r}]-L_\rho [W^{r}]&= G[Y^{r+1}; (X,Z)^r, Y^r,{\varLambda }^r]-G[Y^r; (X,Z)^r, Y^r,{\varLambda }^r]\nonumber \\&\quad -\frac{\beta }{2}\Vert Y^{r+1}-Y^r\Vert _F^2 +\left[ r_2(Y^{r+1})-u_2(Y^{r+1},Y^r)\right] \nonumber \\&{\mathop {\le }\limits ^{\mathrm{(i)}}}G[Y^{r+1}; (X,Z)^r, Y^r,{\varLambda }^r]-G[Y^r; (X,Z)^r, Y^r,{\varLambda }^r]\nonumber \\&\quad -\frac{\beta }{2}\Vert Y^{r+1}-Y^r\Vert _F^2\nonumber \\&{\mathop {\le }\limits ^{\mathrm{(ii)}}}\left\langle \xi _Y, Y^{r+1}-Y^r\right\rangle -\frac{\gamma _y}{2}\Vert Y^{r+1}-Y^r\Vert _F^2\nonumber \\&\quad -\frac{\beta }{2}\Vert Y^{r+1}-Y^r\Vert _F^2\nonumber \\&{\mathop {=}\limits ^{\mathrm{(iii)}}}-\left( \frac{\gamma _y}{2}+\frac{\beta }{2}\right) \Vert Y^{r+1}-Y^r\Vert _F^2, \end{aligned}$$
(45)

where \({\mathrm{(i)}}\) is true because from Eqs. (18) and (19) we conclude that \(r_2(Y^{r+1})\le u_2(Y^{r+1},Y^{r})\), \({\mathrm{(ii)}}\) comes from the strong convexity of \(G[Y;(X,Z)^r,Y^r, {\varLambda }^r]\) with respect to Y with modulus \(\gamma _y\), and we have \(\mathrm{(iii)}\) due to the optimality condition for the problem (22a), which is \(0\in \partial _YG[Y^{r+1}; (X,Z)^r, Y^r,{\varLambda }^r]\), and we set \(\xi _Y=0\). Combining the Eqs. (42), (44) and (45), we have

$$\begin{aligned}&L_\rho [W^{r+1}]-L_\rho [W^{r}]\nonumber \\&\quad \le -\left( \frac{\gamma _{z}}{2}-\frac{L^2}{\rho }\right) \Vert Z^{r+1}-Z^r\Vert ^2_F-\left( \frac{\gamma _y}{2}+\frac{\beta }{2}\right) \Vert Y^{r+1}-Y^r\Vert _F^2\nonumber \\&\quad -\left( \frac{\gamma _x}{2}+\frac{\beta }{2}\right) \Vert X^{r+1}-X^r\Vert ^2_F. \end{aligned}$$
(46)

To complete the proof we only need to set \(C_z=\frac{\gamma _z}{2}-\frac{L^2}{\rho }\), \(C_y=\frac{\gamma _y+\beta }{2}\), and \(C_x=\frac{\gamma _x+\beta }{2}\). Furthermore, since the subproblems (22a) and (22b) are strongly convex with modulus \(\gamma _x\ge 0\), \(\gamma _y\ge 0\), we have \(C_y\ge 0\), and \(C_x\ge 0\). Consequently, when \(\rho \ge \frac{2L^2}{\gamma _{z}}\) we also have \(C_z\ge 0\). Thus, the augmented Lagrangian function is always decreasing.

Part 2 Now we show that the augmented Lagrangian function is lower bounded

$$\begin{aligned} L_\rho [W^{r+1}]&= f(Z^{r+1}) +r_1(X^{r+1})+r_2(Y^{r+1})+ \langle {\varLambda }^{r+1}, Z^{r+1}-X^{r+1}Y^{r+1} \rangle \nonumber \\&\quad +\frac{\rho }{2}\Vert Z^{r+1}-X^{r+1}Y^{r+1}\Vert ^2_F\nonumber \\&{\mathop {=}\limits ^\mathrm{(i)}} f(Z^{r+1})+r_1(X^{r+1})+r_2(Y^{r+1})+\langle \nabla f(Z^{r+1}), X^{r+1}Y^{r+1}-Z^{r+1} \rangle \nonumber \\&\quad +\frac{\rho }{2}\Vert Z^{r+1}-X^{r+1}Y^{r+1}\Vert ^2_F \nonumber \\&{\mathop {\ge }\limits ^\mathrm{(ii)}}f(Z^{r+1})+r_1(X^{r+1})+r_2(Y^{r+1})+\langle \nabla f(Z^{r+1}), X^{r+1}Y^{r+1}-Z^{r+1} \rangle \nonumber \\&\quad +\frac{L}{2}\Vert Z^{r+1}-X^{r+1}Y^{r+1}\Vert ^2_F \nonumber \\&{\mathop {\ge }\limits ^\mathrm{(iii)}}f(X^{r+1}Y^{r+1}) + r_1(X^{r+1})+r_2(Y^{r+1}) \nonumber \\&= g(X^{r+1}, Y^{r+1}, X^{r+1}Y^{r+1}), \end{aligned}$$
(47)

where in \(\mathrm (i)\) we use Eq. (39), \(\mathrm{(ii)}\) is true because we have picked \(\rho \ge L\), and \(\mathrm{(iii)}\) comes from the fact that for Lipchitz continuous function f we have

$$\begin{aligned} f(x)\le f(y)+\langle \nabla f(y), x-y \rangle +\frac{L}{2}\Vert x-y\Vert ^2\quad \forall x,y \in \text {Dom(f)}. \end{aligned}$$

Due to the lower boundedness of g(XYZ) (Assumption A) we have \(g(X^{r+1}, Y^{r+1}, X^{r+1}Y^{r+1})\ge \underline{g}\). Together with Eq. (47), we can set \(\underline{L}=\underline{g}\).

The proof is complete. Q.E.D.

1.3 Proof of Theorem 1

Part 1 Clearly, Eq. (29) together with the fact that augmented Lagrangian is lower bounded imply

$$\begin{aligned} Z^{r+1}-Z^r\rightarrow 0;~ Y^{r+1}-Y^r\rightarrow 0;~ X^{r+1}-X^r\rightarrow 0. \end{aligned}$$
(48)

Further, applying Lemma (1) we have

$$\begin{aligned} {\varLambda }^{r+1}-{\varLambda }^r\rightarrow 0. \end{aligned}$$

Utilizing the update equation for dual variable (22c), eventually we obtain

$$\begin{aligned} X^{r+1}Y^{r+1}-Z^{r+1}\rightarrow 0 . \end{aligned}$$
(49)

The part (1) is proved. From this part condition (26d) can be derived.

Part 2 Since \(Y^{r+1}\) minimizes problem (22a), we have

$$\begin{aligned} 0\in h'_2(l_2(Y^r))\partial l_2(Y^{r+1})-\rho (X^r)^\top \left( Z^r-X^{r}Y^{r+1}+\frac{1}{\rho }{\varLambda }^r\right) +\beta \left( Y^{r+1}-Y^r\right) . \end{aligned}$$

Thus, there exists \(\xi \in \partial l_2(Y^{r+1})\) such that for every Y

$$\begin{aligned} \langle Y-Y^{r+1} , {\varPhi }_2^r\xi - \rho (X^r)^\top \left( Z^r-X^{r}Y^{r+1}+\frac{1}{\rho }{\varLambda }^r\right) +\beta (Y^{r+1}-Y^r) \rangle \ge 0, \end{aligned}$$
(50)

where we have set \({\varPhi }_2^r = h'_2(l_2(Y^r))\) for notational simplicity. Because \(l_2(Y)\) is a convex function we have the following inequality for every \(\xi \in \partial l_2(Y^{r+1})\)

$$\begin{aligned} l_2(Y) - l_2(Y^{r+1})\ge \langle \xi , Y- Y^{r+1}\rangle ;\quad \forall ~Y. \end{aligned}$$
(51)

Further since we assumed that \(h_2\) is non-decreasing, we have \( h'_2(l_2(Y^r))\ge 0\). Combining this fact with (51), yields the following

$$\begin{aligned} {\varPhi }_2^rl_2(Y) - {\varPhi }_2^rl_2(Y^{r+1})\ge \langle {\varPhi }_2^r\xi , Y- Y^{r+1}\rangle ;\quad \forall ~Y. \end{aligned}$$
(52)

If we plug Eqs. (52) into (50) we obtain

$$\begin{aligned}&{\varPhi }_2^rl_2(Y) - {\varPhi }_2^rl_2(Y^{r+1}) \nonumber \\&+ \bigg \langle Y^{r+1}-Y , \rho (X^r)^\top (Z^r-X^{r}Y^{r+1}+\frac{1}{\rho }{\varLambda }^r)+\beta (Y^{r+1}-Y^r)\bigg \rangle \ge 0. \end{aligned}$$
(53)

Next taking the limit over (53) and utilizing the facts that \(\Vert Y^{r+1}-Y^r\Vert \rightarrow 0\), and \(\Vert X^{r+1}Y^{r+1}-Z^{r+1}\Vert _F\rightarrow 0\), we obtain the following

$$\begin{aligned} {\varPhi }_2^*l_2(Y) - {\varPhi }_2^*l_2(Y^{*})+ \langle Y^*-Y , (X^*)^\top {\varLambda }^* \rangle \ge 0;\quad \forall ~Y. \end{aligned}$$
(54)

where \({\varPhi }_2^* = h'_2(l_2(Y^*))\). From Eq. (54) we can conclude that

$$\begin{aligned}&{\varPhi }_2^*l_2(Y) + \langle Y^*-Y , (X^*)^\top {\varLambda }^* \rangle \ge {\varPhi }_2^*l_2(Y^{*})+\langle Y^*-Y^{*} , (X^*)^\top {\varLambda }^* \rangle ; \quad \forall ~Y, \end{aligned}$$
(55)

which further implies

$$\begin{aligned} Y^*\in \mathop {\mathrm{argmin}}_{Y}\left( {\varPhi }_2^*l_2(Y)+\langle Y^*-Y, (X^*)^\top {\varLambda }^* \rangle \right) . \end{aligned}$$

This is equivalent to

$$\begin{aligned} (X^*)^\top {\varLambda }^*\in h'_2\left( l_2(Y^*)\right) \partial l_2(Y^*). \end{aligned}$$

Applying the chain rule for Clark sub-differential given in Eq. (27), we have

$$\begin{aligned} (X^*)^\top {\varLambda }^*\in \partial ^c (h_2\circ l_2)(Y^*) = \partial ^c r_2(Y^*). \end{aligned}$$

From this equation we can simply conclude that

$$\begin{aligned} \text {dist}(\partial ^c r_2(Y^*),(X^*)^\top {\varLambda }^*)=0 \end{aligned}$$

This proves the Eq. (26c).

Now let us consider (XZ) step (22b). Similarly let us define \({\varPhi }_1 = h'_1(l_1(X^{r}))\). Since \((X,Z)^{r+1}\) minimizes (22b), we have

$$\begin{aligned}&\nabla f(Z^{r+1})+{\varLambda }^{r+1}+\rho \left( Z^{r+1}-X^{r+1}Y^{r+1}\right) = 0; \end{aligned}$$
(56)
$$\begin{aligned}&\quad 0\in {\varPhi }_1\partial l_1(X^{r+1})-\rho \left( Z^{r+1}-X^{r+1}Y^{r+1}+\frac{1}{\rho }{\varLambda }^r\right) (Y^{(r+1)\top })+\beta (X^{r+1}-X^r). \end{aligned}$$
(57)

Let us take limit over the Eqs. (56) and (57). Then invoking (48, and (49) and following the same process for proving (26b) it follows that

$$\begin{aligned}&\nabla f(Z^*)+{\varLambda }^* = 0; \end{aligned}$$
(58)
$$\begin{aligned}&\quad \text {dist}\left( \partial ^c r_1(X^*),{\varLambda }^*(Y^*)^\top \right) =0 \end{aligned}$$
(59)

which verifies (26a) and (26c).

The theorem is proved. Q.E.D.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hajinezhad, D., Shi, Q. Alternating direction method of multipliers for a class of nonconvex bilinear optimization: convergence analysis and applications. J Glob Optim 70, 261–288 (2018). https://doi.org/10.1007/s10898-017-0594-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-017-0594-x

Keywords

Navigation