Abstract
The composite \(L_q~(0<q<1)\) minimization problem over a general polyhedron has received various applications in machine learning, wireless communications, image restoration, signal reconstruction, etc. This paper aims to provide a theoretical study on this problem. First, we derive the Karush–Kuhn–Tucker (KKT) optimality conditions for local minimizers of the problem. Second, we propose a smoothing sequential quadratic programming framework for solving this problem. The framework requires a (approximate) solution of a convex quadratic program at each iteration. Finally, we analyze the worst-case iteration complexity of the framework for returning an \(\epsilon \)-KKT point; i.e., a feasible point that satisfies a perturbed version of the derived KKT optimality conditions. To the best of our knowledge, the proposed framework is the first one with a worst-case iteration complexity guarantee for solving composite \(L_q\) minimization over a general polyhedron.
Similar content being viewed by others
References
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Beck, A., Teboulle, M.: A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization. MPS–SIAM Series on Optimization. SIAM, Philadelphia (2001)
Berg, E.V.D., Friedlander, M.P.: Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31(2), 890–912 (2008)
Bertsekas, D.P.: Convex Analyis and Optimization. Athena Scientific, Massachusetts (2003)
Bian, W., Chen, X.: Worst-case complexity of smoothing quadratic regularization methods for non-Lipschitzian optimization. SIAM J. Optim. 23(3), 1718–1741 (2013)
Bian, W., Chen, X.: Smoothing quadratic regularization methods for box constrained non-Lipschitz optimization in image restoration. Technical report, Hong Kong Polytechnic University (2014)
Bian, W., Chen, X., Ye, Y.: Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization. Math. Program. 149(1–2), 301–327 (2015)
Birbil, S.I., Fang, S.C., Frenk, J.B.G., Zhang, S.: Recursive approximation of the high dimensional max function. Oper. Res. Lett. 33(5), 450–458 (2005)
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10(4), 1196–1211 (2000)
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: The Fifth Annual Workshop of Computational Learning Theory, pp. 144–152 (1992)
Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009)
Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)
Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008)
Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127(2), 245–295 (2011)
Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity. Math. Program. 130(2), 295–319 (2011)
Cartis, C., Gould, N.I.M., Toint, P.L.: On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM J. Optim. 21(4), 1721–1739 (2011)
Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process. Lett. 14(10), 707–710 (2007)
Chartrand, R., Staneva, V.: Restricted isometry properties and nonconvex compressive sensing. Inverse Probl. 24(3), 1–14 (2008)
Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: Internal Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3869–3872 (2008)
Chen, X.: Smoothing methods for nonsmooth, nonconvex minimization. Math. Program. 134(1), 71–99 (2012)
Chen, X., Ge, D., Wang, Z., Ye, Y.: Complexity of unconstrained \(l_2\)-\(l_p\) minimization. Math. Program. 143(1–2), 371–383 (2014)
Chen, X., Ng, M.K., Zhang, C.: Non-Lipschitz \(l_{{p}}\)-regularization and box constrained model for image restoration. IEEE Trans. Image Process. 21(12), 4709–4721 (2012)
Chen, X., Niu, L., Yuan, Y.: Optimality conditions and a smoothing trust region newton method for nonLipschitz optimization. SIAM J. Optim. 23(3), 1528–1552 (2013)
Chen, X., Xu, F., Ye, Y.: Lower bound theory of nonzero entries in solutions of \(\ell _2\)-\(\ell _p\) minimization. SIAM J. Sci. Comput. 32(5), 2832–2852 (2010)
Chen, X., Zhou, W.: Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex minimization. SIAM J. Imaging Sci. 3(4), 765–790 (2010)
Clarke, F.H.: Optimization and Nonsmooth Analysis. John Wiley, New York (1983)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22(1), 1–10 (2002)
Daubechies, I., DeVore, R., Fornasier, M., Güntürk, C.S.: Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. 63(1), 1–38 (2010)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1359 (2001)
Foucart, S., Lai, M.J.: Sparsest solutions of underdetermined linear systems via \(\ell _q\)-minimization for \(0 < q \le 1\). Appl. Comput. Harmon. Anal. 26(3), 395–407 (2009)
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Springer, New York (2013)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, New York (1979)
Garmanjani, R., Vicente, L.N.: Smoothing and worst case complexity for direct-search methods in non-smooth optimization. IMA J. Numer. Anal. 33(3), 1008–1028 (2013)
Ge, D., Jiang, X., Ye, Y.: A note on the complexity of \(l_{p}\) minimization. Math. Program. 129(2), 285–299 (2011)
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Progam. (2015). doi:10.1007/s10107-015-0871-8
Gould, N.I.M., Toint, P.L.: Preprocessing for quadratic programming. Math. Program. 100(1), 95–132 (2004)
Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation applied to compressed sensing: implemetation and numerical experiments. J. Comput. Math. 28(2), 170–194 (2010)
Huang, J., Ma, S., Xie, H., Zhang, C.H.: A group bridge approach for variable selection. Biometrika 96(2), 339–355 (2009)
Ji, S., Sze, K.F., Zhou, Z., So, A.M.C., Ye, Y.: Beyond convex relaxation: A polynomial-time non-convex optimization approach to network localization. In: IEEE Conference on Computer Communications (INFOCOM), pp. 2499–2507 (2013)
Jiang, B., Dai, Y.H.: A framework of constraint preserving update schemes for optimization on stiefel manifold. Math. Program. (2015). doi:10.1007/s10107-014-0816-7
Jiang, B., Zhang, S.: Iteration bounds for finding \(\epsilon \)-stationary points of structured nonconvex optimization. Technical report, University of Minnesota (2014)
Lai, M.J., Wang, J.: An unconstrained \(\ell _q\) minimization with \(0<q\le 1\) for sparse solution of underdetermined linear systems. SIAM J. Optim. 21(1), 82–101 (2011)
Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(l_{q}\) minimization. SIAM J. Numer. Anal. 51(2), 927–957 (2013)
Liu, Y.F., Dai, Y.H., Luo, Z.Q.: Joint power and admission control via linear programming deflation. IEEE Trans. Signal Process. 61(6), 1327–1338 (2013)
Liu, Y.F., Dai, Y.H., Ma, S.: Joint power and admission control: non-convex \(l_q\) approximation and an effective polynomial time deflation approach. IEEE Trans. Signal Process. 63(14), 3641–3656 (2015)
Lu, Z.: Iterative reweighted minimization methods for \(l_p\) regularized unconstrained nonlinear programming. Math. Program. 147(1–2), 277–307 (2014)
Mitliagkas, I., Sidiropoulos, N.D., Swami, A.: Joint power and admission control for ad-hoc and cognitive underlay networks: convex approximation and distributed implementation. IEEE Trans. Wireless Commun. 10(12), 4110–4121 (2011)
Mourad, N., Reilly, J.P.: Minimizing nonconvex functions for sparse vector reconstruction. IEEE Trans. Signal Process. 58(7), 3485–3496 (2010)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(\text{ O }(1/k^2)\). Sov. Math. Dokl. 27(2), 372–376 (1983)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Nikolova, M., Ng, M.K., Zhang, S., Ching, W.K.: Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization. SIAM J. Imaging Sci. 1(1), 2–25 (2008)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, Massachusetts (1994)
Rao, B.D., Kreutz-delgado, K.: An affine scaling methodology for best basis selection. IEEE Trans. Signal Process. 47(1), 187–200 (1999)
Sun, Q.: Recovery of sparsest signals via \(\ell _q\)-minimization. Appl. Comput. Harmon. Anal. 32(3), 329–341 (2012)
Sun, W., Yuan, Y.: Optimization Theory and Methods: Nonlinear Programming. Springer, New York (2006)
Vazirani, V.V.: Approximation Algorithms. Springer, New York (2001)
Wagner, M., Meller, J., Elber, R.: Large-scale linear programming techniques for the design of protein folding potentials. Math. Program. 101(2), 301–318 (2004)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)
Ye, Y.: Interior Point Algorithms-Theory and Analysis. Wiley, New York (1997)
Yun, S., Toh, K.C.: A coordinate gradient descent method for \(l_1\)-regularized convex minimization. Comput. Optim. Appl. 48(2), 273–307 (2011)
Acknowledgments
We would like to thank Prof. Xiaojun Chen and Dr. Wei Bian for many insightful comments, which helped us in improving the results in this paper. We thank Dr. Qingna Li and Dr. Xin Liu for many useful discussions on an early version of this paper. We also thank Prof. Alexander Shapiro, the anonymous Associate Editor and referees for their constructive comments, which significantly improved the presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Y.-F. Liu was partially supported by NSFC Grants 11331012 and 11301516. S. Ma was partially supported by the Hong Kong Research Grants Council General Research Fund Early Career Scheme (Project ID: CUHK 439513). Y.-H. Dai was partially supported by the China National Funds for Distinguished Young Scientists Grant 11125107, the Key Project of Chinese National Programs for Fundamental Research and Development Grant 2015CB856000, NSFC Grant 71331001, and the CAS Program for Cross & Cooperative Team of the Science & Technology Innovation. S. Zhang was partially supported by NSF Grant CMMI-1462408.
Appendices
Appendix 1: Three motivating applications
Support Vector Machine [11, 28]. The support vector machine (SVM) is a state-of-the-art classification method introduced by Boser, Guyon, and Vapnik in 1992 in [11]. Given a database \(\left\{ s_m\in \mathbb {R}^{N-1},\,y_m\in \mathbb {R}\right\} _{m=1}^M,\) where \(s_m\) is called pattern or example and \(y_m\) is the label associated with \(s_m.\) For convenience, we assume the labels are \(+1\) for positive examples and \(-1\) for negative examples. If the data are linearly separable, the task of SVM is to find a linear discriminant function of the form \(\ell (s)=\hat{s}^Tx\) with \(\hat{s}=[s^T,1]^T\in \mathbb {R}^{N}\) such that all data are correctly classified and at the same time the margin of the hyperplane \(\ell \) that separates the two classes of examples is maximized. Mathematically, the above problem can be formulated as
In practice, data are often not linearly separable. In this case, problem (6.1) is not feasible, and the following problem can be solved instead:
where the constant \(\rho \ge 0\) balances the relative importance of minimizing the classification errors and maximizing the margin. Problem (6.2) with \(q=1\) is called the soft-margin SVM in [28]. It is clear that problem (6.2) is a special instance of (1.1) with
Here, e is the all-one vector of dimension M.
Joint Power and Admission Control [46, 49]. Consider a wireless network consisting of K interfering links (a link corresponds to a transmitter/receiver pair) with channel gains \(g_{kj}\ge 0\) (from the transmitter of link j to the receiver of link k), noise power \(\eta _k>0,\) signal-to-interference-plus-noise-ratio (SINR) target \(\gamma _k>0,\) and power budget \(\bar{p}_k>0\) for \(k, j=1,2,\ldots ,K.\) Denoting the transmission power of transmitter k by \(x_k\), the SINR at the kth receiver can be expressed as
Due to the existence of mutual interferences among different links [which correspond to the term \(\sum _{j\ne k}g_{kj}x_j\) in (6.3)], the linear system
may not be feasible. The joint power and admission control problem aims at supporting a maximum number of links at their specified SINR targets while using a minimum total transmission power. Assuming without loss of generality that \(g_{kk}=\gamma _k=\bar{p}_k=1\) for all \(k=1,2,\dots ,K,\) the joint power and admission control problem can be formulated as follows (see [46])
where \(\rho >0\) is a parameter, \(b=[\eta _1,\eta _2,\ldots ,\eta _K]^T,\) and \(A=[a_{kj}]\in \mathbb {R}^{K\times K}\) with
By utilizing the special structure of A, i.e., all of its diagonal entries are positive and off-diagonal entries are nonpositive, it is shown in [47, Theorem 1] that the solution of problem (6.4) can maximize the number of supported links using a minimum total transmission power as long as q is chosen to be sufficiently small (but not necessarily to be zero). Clearly, (6.4) is a special case of (1.1) with
Linear Decoding Problem [13]. Given the coding matrix \(C\in \mathbb {R}^{K_1\times K_2}\) and corrupted measurement \(c=Cx+e_u\in \mathbb {R}^{K_1},\) where \(e_u\) is an unknown vector of errors, the linear decoding problem is to recover x from c. It is shown in [13] that, if C satisfies the restricted isometry property, x can be exactly recovered by solving the convex minimization problem
provided that \(e_u\) is sparse. By [33, Theorem 4.10], \(L_q\) (\(q\in (0,1)\)) minimization
has a better capability of recovering x than \(L_1\) minimization. By using the equation \(|a|=\max \left\{ a,0\right\} +\max \left\{ -a,0\right\} ,\) it is simple to see problem (6.5) is a special case of (1.1) with
Appendix 2: Proof of Lemma 3.2
Let \(\bar{x}\) be any local minimizer of problem (3.2) with \(\mathcal{I}_{\bar{x}},\,\mathcal{J}_{\bar{x}},\) and \(\mathcal{K}_{\bar{x}}\) given in (3.1). For convenience, we denote \(\mathcal{I}_{\bar{x}},\,\mathcal{J}_{\bar{x}},\,\mathcal{K}_{\bar{x}}\) as \(\mathcal{I},\,\mathcal{J},\,\mathcal{K}\) in this proof. We prove that \(\bar{x}\) is a local minimizer of problem (1.1) by dividing the proof into two parts. The first one is the easy case where \(\mathcal{K}=\emptyset \) and the second one deals with the complicated case where \(\mathcal{K}\ne \emptyset .\)
Part 1: \(\mathcal{K}=\emptyset .\) In this case, \(\bar{x}\) is a local minimizer of problem
By the definition, \(\bar{x}\) is a local minimizer of problem (1.1).
Part 2: \(\mathcal{K}\ne \emptyset .\) Consider the feasible direction cone \(\mathcal {D}_{\bar{x}}\) of problem (1.1) at point \(\bar{x},\) i.e.,
For simplicity, we use \(\mathcal {D}\) to denote \(\mathcal {D}_{\bar{x}}\) in the subsequent proof. For any subset \(\mathcal{K}_{p}\) of \(\mathcal{K}\) indexed by \(p=1,2,\ldots ,P:=2^{|\mathcal{K}|},\) let \(\mathcal{K}_{p}^{c}=\mathcal{K}\setminus \mathcal{K}_{p},\) and define
By the Minkowski–Weyl Theorem [5, Proposition 3.2.1], there exist \(d_p^1, d_p^2, \ldots , d_p^{g_p} \in \mathcal{D}_p\), such that
and thus
Without loss of generality, assume \(\Vert d_p^{j}\Vert =1\) for all \(j=1,2,\ldots ,g_p,~p=1,2,\ldots ,P.\) For any \(d\in \bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} \subseteq \mathcal{D}\), define
Next, we consider the two cases where \(\overleftarrow{\mathcal{K}}^{d}\) is nonempty and empty, respectively. The former happens when d is not a feasible direction of problem (3.2) at point \(\bar{x};\) while the latter happens when d is a feasible direction of problem (3.2) at point \(\bar{x}.\)
Case 1: \(\overleftarrow{\mathcal{K}}^{d}\ne \emptyset .\) Since \(d\in \mathcal{D},\) there must exist \(\epsilon _0^d\) so that \(\bar{x}+\epsilon d\in \mathcal{X}\) holds for all \(0\le \epsilon \le \epsilon _0^d.\) Define
Choose \(\epsilon _1^d\) small enough such that
and
Therefore, for \(0\le \epsilon \le \min \left\{ \epsilon _0^d,\epsilon _1^d\right\} \), we obtain
where (6.10) is due to (3.1), (6.6), and (6.8); (6.11) is due to (6.7); (6.12) is due to the concavity of the function \(z^q\) with respect to \(z>0;\) (6.13) is due to (6.9) and the definition of \(\overrightarrow{\mathcal{J}}^{d}\) in (6.7). Moreover, by (1.3) and the Taylor’s expansion, for any \(0\le \epsilon \le 1\), there exists \(\xi \in (0,1)\) such that
Combining (6.13) with (6.14), for any \(0\le \epsilon \le \min \left\{ \epsilon _0^d,\epsilon _1^d,1\right\} ,\) we obtain
where
Define
and
From the above analysis, we can conclude that, for any \(d\in \bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} \) with \(\overleftarrow{\mathcal{K}}^{d}\ne \emptyset ,\) \(F(\bar{x}+\epsilon d)\ge F(\bar{x})\) holds for all \(\epsilon \in [0,\bar{\epsilon }^d].\)
Case 2: \(\overleftarrow{\mathcal{K}}^{d}=\emptyset .\) Recall the definition of \(\overleftarrow{\mathcal{K}}^{d}\) (cf. (6.6)). \(\overleftarrow{\mathcal{K}}^{d}=\emptyset \) implies that d is a feasible direction of problem (3.2) at point \(\bar{x}.\) From the assumption that \(\bar{x}\) is a local minimizer of problem (3.2), we know that there exists an \(\tilde{\epsilon }>0\) such that for all \(d\in \bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} \) with \(\overleftarrow{\mathcal{K}}^{d}=\emptyset ,\) there holds \(F(\bar{x}+\epsilon d)\ge F(\bar{x})\) for all \(\epsilon \in [0,\tilde{\epsilon }].\)
We now combine the above two cases: Case 1 and Case 2. Since there are finitely many directions \(\bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} ,\) it follows that
and
hold true for all \(\epsilon \in [0, \bar{\epsilon }].\)
Let \({\text{ Conv }}_p(\bar{x},\bar{\epsilon })\) denote the convex hull spanned by points \(\bar{x}\) and \(\bar{x}+\bar{\epsilon }d_p^{j},~j=1,2,\ldots ,g_p.\) Then, for any \(x\in \bigcup _{p=1}^P{\text{ Conv }}_p(\bar{x},\bar{\epsilon }),\) we have \(F(x)\ge F(\bar{x})\) by (6.15) and the fact that f(x) is concave in \({\text{ Conv }}_p(\bar{x},\bar{\epsilon })\). Furthermore, one can always choose a sufficiently small but fixed \(\epsilon >0\) such that \(B(\bar{x},\epsilon )\bigcap \mathcal{X}\subseteq \bigcup _{p=1}^P{\text{ Conv }}_p(\bar{x},\bar{\epsilon }).\) Therefore, \(\bar{x}\) is a local minimizer of problem (1.1). \(\square \)
Appendix 3: Proof of Lemma 4.1
We show the three items of Lemma 4.1 separately.
-
(i)
of Lemma 4.1: it follows directly from the inequality
$$\begin{aligned} \theta ^q(t)\le \theta ^q(t,\mu )\le \left( \theta (t)+\frac{\mu }{2}\right) ^q\le \theta ^q(t)+\left( \frac{\mu }{2}\right) ^q,\quad \forall ~t\le \mu . \end{aligned}$$ -
(ii)
of Lemma 4.1: When \(t\ne 0\) and \(t\ne \mu ,\) \(\theta ^q(t,\mu )\) is twice continuously differentiable with respect to t. Recall \(\theta ^q(t,\mu )\ge \left( \mu /2\right) ^q\) for all t (cf. (4.2)). Then it follows from (4.4) that
$$\begin{aligned} \left| \left[ \theta ^q(t,\mu )\right] ''\right| \le 4q\mu ^{q-2},\quad \forall ~t\notin \left\{ 0,\mu \right\} . \end{aligned}$$This further implies (ii) of Lemma 4.1.
-
(iii)
of Lemma 4.1: By the mean-value theorem [27, Theorem 2.3.7], we have
$$\begin{aligned} \theta ^q(t,\mu )= \theta ^q(\hat{t},\mu )+\left[ \theta ^q(\hat{t},\mu )\right] '\left( t-\hat{t}\right) +\frac{\upsilon }{2}\left( t-\hat{t}\right) ^2,\end{aligned}$$(6.16)where \(\upsilon \in \partial _{t}\left( \left[ \theta ^q(\xi \hat{t}+(1-\xi )t,\mu )\right] '\right) \) and \(\xi \in [0,1].\) We consider the following three cases.
-
Case \(\hat{t}>2\mu :\) Since \(t-\hat{t}\ge {-\hat{t}}/{2},\) it follows for any \(\xi \in [0,1]\) that
$$\begin{aligned} \xi t+(1-\xi )\hat{t}=\hat{t}+\xi (t-\hat{t})\ge \hat{t}/2>\mu . \end{aligned}$$This, together with (4.4) and (4.5), implies that the \(\upsilon \) in (6.16) satisfies
$$\begin{aligned} \upsilon \le 0= \kappa (\hat{t},\mu ). \end{aligned}$$ -
Case \(\hat{t}\in [-\mu , 2\mu ]:\) From (ii) of Lemma 4.1, \(|\upsilon |\) is uniformly bounded by \(\kappa (\hat{t},\mu )={{4q}\mu ^{q-2}}.\) Combining this with (6.16) yields (4.6).
-
Case \(\hat{t}<-\mu :\) Since \(t-\hat{t}\le \mu ,\) for any \(\xi \in [0,1],\) it follows
$$\begin{aligned} \xi t+(1-\xi )\hat{t}=\hat{t}+\xi (t-\hat{t})< 0. \end{aligned}$$
-
This completes the proof of Lemma 4.1. \(\square \)
Rights and permissions
About this article
Cite this article
Liu, YF., Ma, S., Dai, YH. et al. A smoothing SQP framework for a class of composite \(L_q\) minimization over polyhedron. Math. Program. 158, 467–500 (2016). https://doi.org/10.1007/s10107-015-0939-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-015-0939-5
Keywords
- Composite \(L_q\) minimization
- \(\epsilon \)-KKT point
- Nonsmooth nonconvex non-Lipschitzian optimization
- Optimality condition
- Smoothing approximation
- Worst-case iteration complexity