Abstract
This paper proposes a mechanism to produce equivalent Lipschitz surrogates for zero-norm and rank optimization problems by means of the global exact penalty for their equivalent mathematical programs with an equilibrium constraint (MPECs). Specifically, we reformulate these combinatorial problems as equivalent MPECs by the variational characterization of the zero-norm and rank function, show that their penalized problems, yielded by moving the equilibrium constraint into the objective, are the global exact penalization, and obtain the equivalent Lipschitz surrogates by eliminating the dual variable in the global exact penalty. These surrogates, including the popular SCAD function in statistics, are also difference of two convex functions (D.C.) if the function and constraint set involved in zero-norm and rank optimization problems are convex. We illustrate an application by designing a multi-stage convex relaxation approach to the rank plus zero-norm regularized minimization problem.
Similar content being viewed by others
References
Bach, F.R.: Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9, 1179–1225 (2008)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Bi, J.S., Liu, X.L., Pan, S.H.: Exact penalty decomposition method for zero-norm minimization based on MPEC formulations. SIAM J. Sci. Comput. 36, A1451–A1477 (2014)
Bi, J.S., Pan, S.H.: Multi-stage convex relaxation approach to rank regularized minimization problems based on equivalent MPGCCs. SIAM J. Control Optim. 55, 2493–2518 (2017)
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer, New York (2000)
Bühlmann, P., Sara, V.D.G.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin (2011)
Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(l_1\) minimization. J. Fourier Anal. Appl. 14, 877–905 (2008)
Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9, 717–772 (2009)
Candès, E.J., Plain, Y.: Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inf. Theory 57, 2342–2359 (2011)
Chen, X.J., Lu, Z.S., Pong, T.K.: Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 26, 1465–1492 (2016)
Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York (1983)
Donoho, D.L., Stark, B.F.: Uncertainty principles and signal recovery. SIAM J. Appl. Math. 49, 906–931 (1989)
Donoho, D.L., Logan, B.F.: Signal recovery and the large sieve. SIAM J. Appl. Math. 52, 577–591 (1992)
Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings—A View from Variational Analysis. Springer, Berlin (2009)
Fan, J.Q., Li, R.Z.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Fazel, M.: Matrix rank minimization with applications. Ph.D. thesis, Stanford University (2002)
Fazel, M., Hindi, H., Boyd, S.: Log-det heuirstic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. In: American Control Conference, 2003. Proceedings of the 2003, vol. 3, pp. 2156–2162 (2003)
Feng, M.B., Mitchell, J.E., Pang, J.S., Shen, X., Wächter, A.: Complementarity formulations of \(\ell_0\)-norm optimization problems. Industrial Engineering and Management Sciences. Technical Report. Northwestern University, USA (2013)
Golbabaee, M., Vandergheynst, P.: Hyperspectral image compressed sensing via low-rank and joint sparse matrix recovery. In: Proceedings of IEEE International Conference on on Acoustics, Speech and Signal Processing, Kyota, (2011), pp. 2741-2744
Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis. Cambridge University Press, Cambridge (1991)
Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56, 2980–2998 (2010)
Le Thi, H.A., Pham, D.T., Huynh, V.N.: Exact penalty and error bounds in DC programming. J. Glob. Optim. 52, 509–535 (2012)
Lewis, A.S.: The convex analysis of unitarily invariant matrix functions. J. Convex Anal. 2, 173–183 (1995)
Li, P., Rangapuram, S.S., Slawski, M.: Methods for sparse and low-rank recovery under simplex constraints. arXiv:1605.00507 (2016)
Lai, M.J., Xu, Y.Y., Yin, W.T.: Improved iteratively reweighted least squares for unconstrained smoothed \(\ell_q\) minimization. SIAM J. Numer. Anal. 51, 927–957 (2013)
Luo, Z.Q., Pang, J.S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996)
Miao, W.M., Pan, S.H., Sun, D.F.: A rank-corrected procedure for matrix completion with fixed basis coefficients. Math. Program. 159, 289–338 (2016)
Mohan, K., Fazel, M.: Iterative reweighted algorithm for matrix rank minimization. J. Mach. Learn. Res. 13, 3441–3473 (2012)
Negahban, S., Wainwright, M.J.: Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Stat. 39, 1069–1097 (2011)
Negahban, S., Wainwright, M.J.: Restricted strong convexity and weighted matrix completion: optimal bounds with noise. J. Mach. Learn. Res. 13, 1665–1697 (2012)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27, 372–376 (1983)
Oymak, S., Jalali, A., Fazel, M., Eldar, Y.C., Hassibi, B.: Simultaneously structured models with application to sparse and low-rank matrices. IEEE Trans. Inf. Theory 61, 2886–2908 (2015)
Peng, Y.G., Ganesh, A., Wright, J., Xu, W.L., Ma, Y.: RASL: robust alignment via sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2233–2246 (2012)
Pietersz, R., Groenen, P.J.F.: Rank reduction of correlation matrices by majorization. Quant. Finance 4, 649–662 (2004)
Richard, E., Savalle, P., Vayatis, N.: Estimation simultaneously sparse and low rank matrices. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 1351–1358 (2012)
Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Salakhutdinov, R., Srebro, N.: Collaborative filtering in a non-uniform world: learning with the weighted trace norm. Adv. Neural Inf. Process. Syst. (NIPS) 23, 2056–2064 (2010)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Tropp, J.: Just relax: convex programming methods for identifying sparse signals. IEEE Trans. Inf. Theory 51, 1030–1051 (2006)
Toh, K.C., Yun, S.W.: An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pac. J. Optim. 6, 615–640 (2010)
Usman, M., Prieto, C., Schaeffter, T., Batchelor, P.G.: \(k-t\) Group sparse: a method for accelerating dynamic MRI. Magn. Reson. Med. 66, 1163–1176 (2011)
Ye, J.J., Zhu, D.L.: Optimality conditions for bilevel programming problems. Optimization 33, 9–27 (1995)
Ye, J.J., Zhu, D.L., Zhu, Q.J.: Exact penalization and necessary optimality conditions for generalized bilevel programming problems. SIAM J. Optim. 7, 481–507 (1997)
Ye, J.J., Ye, X.Y.: Necessary optimality conditions for optimization problems with variational inequality constraints. Math. Oper. Res. 4, 977–997 (1997)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49–67 (2006)
Zhou, Z.H., Li, X.D., Wright, J., Candès, E.J., Ma, Y.: Stable principal component pursuit. In: IEEE International Symposium on Information Theory Proceedings (ISIT), pp. 1518–1522 (2010)
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zou, H.: The adaptive Lasso and its Oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Acknowledgements
The authors would like to thank the referee for his/her helpful comments, which led to significant improvements in the presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the National Natural Science Foundation of China under Project Nos. 11571120 and 11701186, the Natural Science Foundation of Guangdong Province under Project Nos. 2015A030313214 and 2017A030310418.
Appendices
Appendix A
The following lemma characterizes an important property of the function family \(\Phi \).
Lemma 1
Let \(\phi \in \Phi \). Then, there exists \(t_0\in [0,1)\) such that \(\frac{1}{1-t^*}\in \partial \phi (t_0)\), and for any given \(\omega \ge 0\) and \(\varrho >0\), the optimal value \(\upsilon ^*\!:=\min _{t\in [0,1]}\{\phi (t)+\varrho \omega (1-\!t)\}\) satisfies
Proof
If \(\phi '(t)\) is a constant for \(t\in [t^*,1]\), then \(\phi '(t)=\frac{\phi (1)-\phi (t^*)}{1-t^*}=\frac{1}{1-t^*}\) for all \(t\in [t^*,1]\), which means that any \(t_0\in [t^*,1)\) satisfies the requirement. Otherwise, there must exist a point \(\overline{t}\in (t^*,1)\) such that \(\phi _{-}'(\overline{t})<\phi _{-}'(1)\). Together with the convexity of \(\phi \) in [0, 1], we have \(\phi _{-}'(t)\le \phi _{-}'(\overline{t})\) for \(t\in [t^*,\overline{t}]\). By [37, Corollary 24.2.1], it follows that
Also, by the convexity of \(\phi \) in [0, 1], \(1=\phi (1)\ge \phi (t^*)+\phi _{+}'(t^*)(1-t^*)=\phi _{+}'(t^*)(1-t^*)\). Thus, \(a:=\frac{1}{1-t^*}\in [\phi _{+}'(t^*),\phi _{-}'(1))\subseteq [\phi _{+}'(0),\phi _{-}'(1)]\), which further implies that
Notice that \((\partial \phi )^{-1}(a)\cap [0,1)\ne \emptyset \) (if not, \(a\in \partial \phi (1)=[\phi _{-}'(1),\phi _{+}'(1)]\), which is impossible since \(a<\phi _{-}'(1)\)). Therefore, \(t_0\in (\partial \phi )^{-1}(a)\cap [0,1)\) satisfies the requirement.
When \(\varrho \omega \ge \phi _{-}'(1)\), clearly, \(\upsilon ^*=\phi (1)=1\) since \(\phi (t)+\varrho \omega (1\!-t)\) is nonincreasing in [0, 1]. When \(\varrho \omega \in \big [0,\frac{1}{1-t^*}\big )\), since \(\phi _{-}'(t)\ge \phi _{+}'(t_0)>\varrho \omega \) for \(t>t_0\), the optimal solution \(\widehat{t}\) of \(\min _{t\in [0,1]}\{\phi (t)+\varrho \omega (1-t)\}\) satisfies \(\widehat{t}\le t_0\). Along with the convexity of \(\phi \) in [0, 1],
This shows that \(\upsilon ^*\ge \varrho \omega (1-t_0)\) for this case. When \(\varrho \omega \in \big [\frac{1}{1-t^*},\phi _{-}'(1)\big ]\), it follows that
If \(t_0>0\), from the fact that \(t_0<1\) and \(\frac{1}{1-t^*}\in \partial \phi (t_0)\), it immediately follows that
Together with (42) and \(\varrho \omega \le \phi _{-}'(1)\), we have \(\upsilon ^*\ge \frac{1-t_0}{1-t^*} \ge \frac{\varrho \omega (1-t_0)}{\phi _{-}'(1)(1-t^*)}\). If \(t_0=0\), from \(\frac{1}{1-t^*}\in \partial \phi (t_0)\) we have \( \phi _{+}'(0)\ge \frac{1}{1-t^*}\ge 1=\phi (1)\ge \phi (0)+\phi _{+}'(0)\ge \phi _{+}'(0), \) where the third inequality is due to the convexity of \(\phi \) at [0, 1]. Then, for any \(t\in [0,1]\),
where the first inequality is using the convexity of \(\phi \) at [0, 1]. Together with (42), it follows that \(\upsilon ^*\ge \frac{1}{1-t^*}\ge \frac{\varrho \omega }{\phi _{-}'(1)(1-t^*)} \ge \frac{\varrho \omega (1-t_0)}{\phi _{-}'(1)(1-t^*)}\). The proof is completed. \(\square \)
Lemma 2
Let \(\phi \in \Phi \). For any given \(\varrho >0\), the function \(\Theta _{\varrho }(X):={\textstyle \sum _{i=1}^{n_1}}\psi ^*(\varrho \sigma _i(X))\) is lsc and convex in \(\mathbb {R}^{n_1\times n_2}\), where \(\psi ^*\) is the conjugate of \(\psi \), defined by (1) with \(\phi \).
Proof
Let \(\widehat{\Psi }(x):=\sum _{i=1}^{n_1}\widehat{\psi }(x_i)\) for \(x\in \mathbb {R}^{n_1}\), where \(\widehat{\psi }(t):=\psi (|t|)\) for \(t\in \mathbb {R}\). Clearly, \(\widehat{\Psi }\) is absolutely symmetric, i.e., \(\widehat{\Psi }(x)=\widehat{\Psi }(Px)\) for any signed permutation matrix \(P\in \mathbb {R}^{n_1\times n_1}\). Moreover, by the definitions of \(\psi \) and \(\widehat{\Psi }\), the conjugate function \(\widehat{\Psi }^*\) of \(\widehat{\Psi }\) satisfies
By the definition of \(\Theta _\varrho \), we have \(\Theta _\varrho (X)\equiv (\widehat{\Psi }^*\circ \sigma )(\varrho X)\). Notice that \(\widehat{\Psi }^*\) is lsc and convex. By [23, Lemma 2.3(b) & Corollary 2.6], \(\Theta _\varrho \) is convex and lsc on \(\mathbb {R}^{n_1\times n_2}\). \(\square \)
Appendix B
Example 1
Let \(\phi (t):=t\) for \(t\in \mathbb {R}\). Clearly, \(\phi \in \Phi \) with \(t^*=0\). After a calculation,
Example 2
Let \(\phi (t):=\frac{\varphi (t)}{\varphi (1)}\) with \(\varphi (t)=-t-\frac{q-1}{q}(1-t+\epsilon )^{\frac{q}{q-1}}+\epsilon +\frac{q-1}{q}\,(0<q<1)\) for \(t\in (-\infty , 1+\epsilon ]\), where \(\epsilon \in (0,1)\) is a fixed constant. Now one has \(\psi ^*(s)=\!\frac{h(\varphi (1)s)}{\varphi (1)}\) with
Example 3
Let \(\phi (t):=\frac{\varphi (t)}{\varphi (1)}\) with \(\varphi (t)=-t-\ln (1-t+\epsilon )+\epsilon \) for \(t\in (-\infty ,1+\epsilon )\), where \(\epsilon \in (0,1)\) is a fixed constant. Clearly, \(\phi \in \Phi \) with \(t^*=\epsilon \). Now \(\psi ^*(s)=\frac{1}{\varphi (1)}h(\varphi (1)s)\) with
Example 4
Let \(\phi (t):=\frac{\varphi (t)}{\varphi (1)}\) with \(\varphi (t)=(1+\epsilon )\arctan \left( \sqrt{\frac{t}{1-t+\epsilon }}\right) -\sqrt{t(1-t+\epsilon )}\) for \(t\in [0,1]\), where \(\epsilon \in (0,1)\) is a fixed constant. Now one has \(\psi ^*(s)=\frac{1}{\varphi (1)}h(\varphi (1)s)\) with
Example 5
Let \(\phi (t):=\frac{\varphi (t)}{\varphi (1)}\) with \(\varphi (t)=\frac{a-1}{2}t^2+t\) for \(t\in \mathbb {R}\), where \(a>1\) is a fixed constant. Clearly, \(\phi \in \Phi \) with \(t^*=0\). For such \(\phi \), one has \(\psi ^*(s)=\frac{1}{\varphi (1)}h(\varphi (1)s)\) with
Now the objective function in (9) with \(m=n\) and \(J_i=\{i\}\), i.e., \(\sum _{i=1}^n\big [\varrho |x_i|-\psi ^*(\varrho |x_i|)\big ]\) is exactly the SCAD function [15]. This shows that the minimization problem of the SCAD function is an equivalent surrogate for the zero-norm problem under a mild condition.
Rights and permissions
About this article
Cite this article
Liu, Y., Bi, S. & Pan, S. Equivalent Lipschitz surrogates for zero-norm and rank optimization problems. J Glob Optim 72, 679–704 (2018). https://doi.org/10.1007/s10898-018-0675-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-018-0675-5