[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

A smoothing SQP framework for a class of composite \(L_q\) minimization over polyhedron

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

The composite \(L_q~(0<q<1)\) minimization problem over a general polyhedron has received various applications in machine learning, wireless communications, image restoration, signal reconstruction, etc. This paper aims to provide a theoretical study on this problem. First, we derive the Karush–Kuhn–Tucker (KKT) optimality conditions for local minimizers of the problem. Second, we propose a smoothing sequential quadratic programming framework for solving this problem. The framework requires a (approximate) solution of a convex quadratic program at each iteration. Finally, we analyze the worst-case iteration complexity of the framework for returning an \(\epsilon \)-KKT point; i.e., a feasible point that satisfies a perturbed version of the derived KKT optimality conditions. To the best of our knowledge, the proposed framework is the first one with a worst-case iteration complexity guarantee for solving composite \(L_q\) minimization over a general polyhedron.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Here, the differences between two norms (\(\Vert \cdot \Vert _{\infty }\) in (5.40) and \(\Vert \cdot \Vert \) in (5.42)) are neglected.

References

  1. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  2. Beck, A., Teboulle, M.: A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization. MPS–SIAM Series on Optimization. SIAM, Philadelphia (2001)

    Book  MATH  Google Scholar 

  4. Berg, E.V.D., Friedlander, M.P.: Probing the pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31(2), 890–912 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bertsekas, D.P.: Convex Analyis and Optimization. Athena Scientific, Massachusetts (2003)

    Google Scholar 

  6. Bian, W., Chen, X.: Worst-case complexity of smoothing quadratic regularization methods for non-Lipschitzian optimization. SIAM J. Optim. 23(3), 1718–1741 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bian, W., Chen, X.: Smoothing quadratic regularization methods for box constrained non-Lipschitz optimization in image restoration. Technical report, Hong Kong Polytechnic University (2014)

  8. Bian, W., Chen, X., Ye, Y.: Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization. Math. Program. 149(1–2), 301–327 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  9. Birbil, S.I., Fang, S.C., Frenk, J.B.G., Zhang, S.: Recursive approximation of the high dimensional max function. Oper. Res. Lett. 33(5), 450–458 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  10. Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10(4), 1196–1211 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  11. Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: The Fifth Annual Workshop of Computational Learning Theory, pp. 144–152 (1992)

  12. Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  13. Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl. 14(5), 877–905 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  15. Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127(2), 245–295 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  16. Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity. Math. Program. 130(2), 295–319 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  17. Cartis, C., Gould, N.I.M., Toint, P.L.: On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM J. Optim. 21(4), 1721–1739 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  18. Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process. Lett. 14(10), 707–710 (2007)

    Article  Google Scholar 

  19. Chartrand, R., Staneva, V.: Restricted isometry properties and nonconvex compressive sensing. Inverse Probl. 24(3), 1–14 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  20. Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: Internal Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3869–3872 (2008)

  21. Chen, X.: Smoothing methods for nonsmooth, nonconvex minimization. Math. Program. 134(1), 71–99 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  22. Chen, X., Ge, D., Wang, Z., Ye, Y.: Complexity of unconstrained \(l_2\)-\(l_p\) minimization. Math. Program. 143(1–2), 371–383 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  23. Chen, X., Ng, M.K., Zhang, C.: Non-Lipschitz \(l_{{p}}\)-regularization and box constrained model for image restoration. IEEE Trans. Image Process. 21(12), 4709–4721 (2012)

    Article  MathSciNet  Google Scholar 

  24. Chen, X., Niu, L., Yuan, Y.: Optimality conditions and a smoothing trust region newton method for nonLipschitz optimization. SIAM J. Optim. 23(3), 1528–1552 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  25. Chen, X., Xu, F., Ye, Y.: Lower bound theory of nonzero entries in solutions of \(\ell _2\)-\(\ell _p\) minimization. SIAM J. Sci. Comput. 32(5), 2832–2852 (2010)

    Article  MathSciNet  Google Scholar 

  26. Chen, X., Zhou, W.: Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex minimization. SIAM J. Imaging Sci. 3(4), 765–790 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  27. Clarke, F.H.: Optimization and Nonsmooth Analysis. John Wiley, New York (1983)

    MATH  Google Scholar 

  28. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  29. Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22(1), 1–10 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  30. Daubechies, I., DeVore, R., Fornasier, M., Güntürk, C.S.: Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. 63(1), 1–38 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  31. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1359 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  32. Foucart, S., Lai, M.J.: Sparsest solutions of underdetermined linear systems via \(\ell _q\)-minimization for \(0 < q \le 1\). Appl. Comput. Harmon. Anal. 26(3), 395–407 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  33. Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Springer, New York (2013)

    Book  MATH  Google Scholar 

  34. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, New York (1979)

    MATH  Google Scholar 

  35. Garmanjani, R., Vicente, L.N.: Smoothing and worst case complexity for direct-search methods in non-smooth optimization. IMA J. Numer. Anal. 33(3), 1008–1028 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  36. Ge, D., Jiang, X., Ye, Y.: A note on the complexity of \(l_{p}\) minimization. Math. Program. 129(2), 285–299 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  37. Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Progam. (2015). doi:10.1007/s10107-015-0871-8

  38. Gould, N.I.M., Toint, P.L.: Preprocessing for quadratic programming. Math. Program. 100(1), 95–132 (2004)

    MathSciNet  MATH  Google Scholar 

  39. Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation applied to compressed sensing: implemetation and numerical experiments. J. Comput. Math. 28(2), 170–194 (2010)

    MathSciNet  MATH  Google Scholar 

  40. Huang, J., Ma, S., Xie, H., Zhang, C.H.: A group bridge approach for variable selection. Biometrika 96(2), 339–355 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  41. Ji, S., Sze, K.F., Zhou, Z., So, A.M.C., Ye, Y.: Beyond convex relaxation: A polynomial-time non-convex optimization approach to network localization. In: IEEE Conference on Computer Communications (INFOCOM), pp. 2499–2507 (2013)

  42. Jiang, B., Dai, Y.H.: A framework of constraint preserving update schemes for optimization on stiefel manifold. Math. Program. (2015). doi:10.1007/s10107-014-0816-7

  43. Jiang, B., Zhang, S.: Iteration bounds for finding \(\epsilon \)-stationary points of structured nonconvex optimization. Technical report, University of Minnesota (2014)

  44. Lai, M.J., Wang, J.: An unconstrained \(\ell _q\) minimization with \(0<q\le 1\) for sparse solution of underdetermined linear systems. SIAM J. Optim. 21(1), 82–101 (2011)

    Article  MathSciNet  Google Scholar 

  45. Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(l_{q}\) minimization. SIAM J. Numer. Anal. 51(2), 927–957 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  46. Liu, Y.F., Dai, Y.H., Luo, Z.Q.: Joint power and admission control via linear programming deflation. IEEE Trans. Signal Process. 61(6), 1327–1338 (2013)

    Article  MathSciNet  Google Scholar 

  47. Liu, Y.F., Dai, Y.H., Ma, S.: Joint power and admission control: non-convex \(l_q\) approximation and an effective polynomial time deflation approach. IEEE Trans. Signal Process. 63(14), 3641–3656 (2015)

    Article  MathSciNet  Google Scholar 

  48. Lu, Z.: Iterative reweighted minimization methods for \(l_p\) regularized unconstrained nonlinear programming. Math. Program. 147(1–2), 277–307 (2014)

  49. Mitliagkas, I., Sidiropoulos, N.D., Swami, A.: Joint power and admission control for ad-hoc and cognitive underlay networks: convex approximation and distributed implementation. IEEE Trans. Wireless Commun. 10(12), 4110–4121 (2011)

    Article  Google Scholar 

  50. Mourad, N., Reilly, J.P.: Minimizing nonconvex functions for sparse vector reconstruction. IEEE Trans. Signal Process. 58(7), 3485–3496 (2010)

    Article  MathSciNet  Google Scholar 

  51. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(\text{ O }(1/k^2)\). Sov. Math. Dokl. 27(2), 372–376 (1983)

    MATH  Google Scholar 

  52. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  53. Nikolova, M., Ng, M.K., Zhang, S., Ching, W.K.: Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization. SIAM J. Imaging Sci. 1(1), 2–25 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  54. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)

    MATH  Google Scholar 

  55. Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, Massachusetts (1994)

    MATH  Google Scholar 

  56. Rao, B.D., Kreutz-delgado, K.: An affine scaling methodology for best basis selection. IEEE Trans. Signal Process. 47(1), 187–200 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  57. Sun, Q.: Recovery of sparsest signals via \(\ell _q\)-minimization. Appl. Comput. Harmon. Anal. 32(3), 329–341 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  58. Sun, W., Yuan, Y.: Optimization Theory and Methods: Nonlinear Programming. Springer, New York (2006)

    MATH  Google Scholar 

  59. Vazirani, V.V.: Approximation Algorithms. Springer, New York (2001)

    MATH  Google Scholar 

  60. Wagner, M., Meller, J., Elber, R.: Large-scale linear programming techniques for the design of protein folding potentials. Math. Program. 101(2), 301–318 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  61. Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)

    Article  MathSciNet  Google Scholar 

  62. Ye, Y.: Interior Point Algorithms-Theory and Analysis. Wiley, New York (1997)

    Book  MATH  Google Scholar 

  63. Yun, S., Toh, K.C.: A coordinate gradient descent method for \(l_1\)-regularized convex minimization. Comput. Optim. Appl. 48(2), 273–307 (2011)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We would like to thank Prof. Xiaojun Chen and Dr. Wei Bian for many insightful comments, which helped us in improving the results in this paper. We thank Dr. Qingna Li and Dr. Xin Liu for many useful discussions on an early version of this paper. We also thank Prof. Alexander Shapiro, the anonymous Associate Editor and referees for their constructive comments, which significantly improved the presentation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ya-Feng Liu.

Additional information

Y.-F. Liu was partially supported by NSFC Grants 11331012 and 11301516. S. Ma was partially supported by the Hong Kong Research Grants Council General Research Fund Early Career Scheme (Project ID: CUHK 439513). Y.-H. Dai was partially supported by the China National Funds for Distinguished Young Scientists Grant 11125107, the Key Project of Chinese National Programs for Fundamental Research and Development Grant 2015CB856000, NSFC Grant 71331001, and the CAS Program for Cross & Cooperative Team of the Science & Technology Innovation. S. Zhang was partially supported by NSF Grant CMMI-1462408.

Appendices

Appendix 1: Three motivating applications

Support Vector Machine [11, 28]. The support vector machine (SVM) is a state-of-the-art classification method introduced by Boser, Guyon, and Vapnik in 1992 in [11]. Given a database \(\left\{ s_m\in \mathbb {R}^{N-1},\,y_m\in \mathbb {R}\right\} _{m=1}^M,\) where \(s_m\) is called pattern or example and \(y_m\) is the label associated with \(s_m.\) For convenience, we assume the labels are \(+1\) for positive examples and \(-1\) for negative examples. If the data are linearly separable, the task of SVM is to find a linear discriminant function of the form \(\ell (s)=\hat{s}^Tx\) with \(\hat{s}=[s^T,1]^T\in \mathbb {R}^{N}\) such that all data are correctly classified and at the same time the margin of the hyperplane \(\ell \) that separates the two classes of examples is maximized. Mathematically, the above problem can be formulated as

$$\begin{aligned} \begin{array}{l@{\quad }l} \displaystyle \min _{x} &{} \displaystyle \frac{1}{2}\sum _{n=1}^{N-1}x_n^2 \\ \text{ s.t. } &{} \displaystyle y_m\hat{s}_m^Tx\ge 1,\quad m=1,2,\ldots ,M. \end{array} \end{aligned}$$
(6.1)

In practice, data are often not linearly separable. In this case, problem (6.1) is not feasible, and the following problem can be solved instead:

$$\begin{aligned} \displaystyle \min _{x} \displaystyle \sum _{m=1}^M\max \left\{ 1-y_m\hat{s}_m^Tx,0\right\} ^q+\frac{\rho }{2}\sum _{n=1}^{N-1}x_n^2, \end{aligned}$$
(6.2)

where the constant \(\rho \ge 0\) balances the relative importance of minimizing the classification errors and maximizing the margin. Problem (6.2) with \(q=1\) is called the soft-margin SVM in [28]. It is clear that problem (6.2) is a special instance of (1.1) with

$$\begin{aligned} A=\left[ \begin{array}{c} y_1\hat{s}_1^T \\ \vdots \\ y_M\hat{s}_M^T \\ \end{array} \right] ,~b=e,~h(x)=\frac{\rho }{2}\sum _{n=1}^{N-1}x_n^2,\quad \text {and}~\mathcal{X}=\mathbb {R}^N. \end{aligned}$$

Here, e is the all-one vector of dimension M.

Joint Power and Admission Control [46, 49]. Consider a wireless network consisting of K interfering links (a link corresponds to a transmitter/receiver pair) with channel gains \(g_{kj}\ge 0\) (from the transmitter of link j to the receiver of link k), noise power \(\eta _k>0,\) signal-to-interference-plus-noise-ratio (SINR) target \(\gamma _k>0,\) and power budget \(\bar{p}_k>0\) for \(k, j=1,2,\ldots ,K.\) Denoting the transmission power of transmitter k by \(x_k\), the SINR at the kth receiver can be expressed as

$$\begin{aligned} \displaystyle \text {SINR}_k=\frac{g_{kk}x_k}{\eta _k+\displaystyle \sum _{j\ne k}g_{kj}x_j},\quad k=1,2,\ldots ,K. \end{aligned}$$
(6.3)

Due to the existence of mutual interferences among different links [which correspond to the term \(\sum _{j\ne k}g_{kj}x_j\) in (6.3)], the linear system

$$\begin{aligned} \displaystyle \text {SINR}_k\ge \gamma _k,~\bar{p}_k\ge x_k\ge 0,\quad k=1,2,\ldots ,K \end{aligned}$$

may not be feasible. The joint power and admission control problem aims at supporting a maximum number of links at their specified SINR targets while using a minimum total transmission power. Assuming without loss of generality that \(g_{kk}=\gamma _k=\bar{p}_k=1\) for all \(k=1,2,\dots ,K,\) the joint power and admission control problem can be formulated as follows (see [46])

$$\begin{aligned} \begin{array}{ll} \displaystyle \min _{x} &{}\quad \left\| \max \left\{ b-Ax,{0}\right\} \right\| _q^q+\rho e^Tx \\ \text{ s.t. } &{}\quad \displaystyle {0}\le x\le e, \end{array} \end{aligned}$$
(6.4)

where \(\rho >0\) is a parameter, \(b=[\eta _1,\eta _2,\ldots ,\eta _K]^T,\) and \(A=[a_{kj}]\in \mathbb {R}^{K\times K}\) with

$$\begin{aligned} a_{kj}=\left\{ \begin{array}{ll} 1,&{}\quad \text {if }k=j;\\ - g_{kj},\quad &{}\quad \text {if }k\ne j. \end{array} \right. \end{aligned}$$

By utilizing the special structure of A,  i.e., all of its diagonal entries are positive and off-diagonal entries are nonpositive, it is shown in [47, Theorem 1] that the solution of problem (6.4) can maximize the number of supported links using a minimum total transmission power as long as q is chosen to be sufficiently small (but not necessarily to be zero). Clearly, (6.4) is a special case of (1.1) with

$$\begin{aligned} M=K,~N=K,~h(x)=\rho e^Tx,\quad \text {and}~\mathcal{X}=\left\{ x\,|\,{0}\le x\le e\right\} \subseteq \mathbb {R}^{N}. \end{aligned}$$

Linear Decoding Problem [13]. Given the coding matrix \(C\in \mathbb {R}^{K_1\times K_2}\) and corrupted measurement \(c=Cx+e_u\in \mathbb {R}^{K_1},\) where \(e_u\) is an unknown vector of errors, the linear decoding problem is to recover x from c. It is shown in [13] that, if C satisfies the restricted isometry property, x can be exactly recovered by solving the convex minimization problem

$$\begin{aligned} \min _{x}\Vert c-Cx\Vert _1 \end{aligned}$$

provided that \(e_u\) is sparse. By [33, Theorem 4.10], \(L_q\) (\(q\in (0,1)\)) minimization

$$\begin{aligned} \min _{x}\Vert c-Cx\Vert _q^q \end{aligned}$$
(6.5)

has a better capability of recovering x than \(L_1\) minimization. By using the equation \(|a|=\max \left\{ a,0\right\} +\max \left\{ -a,0\right\} ,\) it is simple to see problem (6.5) is a special case of (1.1) with

$$\begin{aligned} M=2K_1,\quad N=K_2,\quad A=\left[ \begin{array}{c} C \\ \\ -C \\ \end{array} \right] ,\quad b=\left[ \begin{array}{c} c \\ \\ -c \\ \end{array} \right] ,\quad h(x)=0,\quad \text {and}~\mathcal{X}=\mathbb {R}^N. \end{aligned}$$

Appendix 2: Proof of Lemma 3.2

Let \(\bar{x}\) be any local minimizer of problem (3.2) with \(\mathcal{I}_{\bar{x}},\,\mathcal{J}_{\bar{x}},\) and \(\mathcal{K}_{\bar{x}}\) given in (3.1). For convenience, we denote \(\mathcal{I}_{\bar{x}},\,\mathcal{J}_{\bar{x}},\,\mathcal{K}_{\bar{x}}\) as \(\mathcal{I},\,\mathcal{J},\,\mathcal{K}\) in this proof. We prove that \(\bar{x}\) is a local minimizer of problem (1.1) by dividing the proof into two parts. The first one is the easy case where \(\mathcal{K}=\emptyset \) and the second one deals with the complicated case where \(\mathcal{K}\ne \emptyset .\)

Part 1: \(\mathcal{K}=\emptyset .\) In this case, \(\bar{x}\) is a local minimizer of problem

$$\begin{aligned} \begin{array}{l@{\quad }l} \displaystyle \min _{x} &{} \displaystyle \Vert (b-Ax)_{{\mathcal{J}}}\Vert _q^q + h(x) \\ \text{ s.t. } &{} x\in \mathcal{X}. \end{array} \end{aligned}$$

By the definition, \(\bar{x}\) is a local minimizer of problem (1.1).

Part 2: \(\mathcal{K}\ne \emptyset .\) Consider the feasible direction cone \(\mathcal {D}_{\bar{x}}\) of problem (1.1) at point \(\bar{x},\) i.e.,

$$\begin{aligned} \mathcal{D}_{\bar{x}}=\left\{ \,d\,|\,\bar{x}+\alpha d\in \mathcal{X}~\text {for some}~\alpha >0\right\} . \end{aligned}$$

For simplicity, we use \(\mathcal {D}\) to denote \(\mathcal {D}_{\bar{x}}\) in the subsequent proof. For any subset \(\mathcal{K}_{p}\) of \(\mathcal{K}\) indexed by \(p=1,2,\ldots ,P:=2^{|\mathcal{K}|},\) let \(\mathcal{K}_{p}^{c}=\mathcal{K}\setminus \mathcal{K}_{p},\) and define

$$\begin{aligned} \mathcal{D}_p=\left\{ d\,|\,(Ad)_{\mathcal{K}_{p}}\le 0,(Ad)_{\mathcal{K}_{p}^{c}}\ge 0\right\} \bigcap \mathcal{D}. \end{aligned}$$

By the Minkowski–Weyl Theorem [5, Proposition 3.2.1], there exist \(d_p^1, d_p^2, \ldots , d_p^{g_p} \in \mathcal{D}_p\), such that

$$\begin{aligned} \mathcal{D}_p={\text{ Cone }}\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} , \end{aligned}$$

and thus

$$\begin{aligned} \mathcal{D}=\displaystyle \bigcup _{p=1}^P{\text{ Cone }}\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} . \end{aligned}$$

Without loss of generality, assume \(\Vert d_p^{j}\Vert =1\) for all \(j=1,2,\ldots ,g_p,~p=1,2,\ldots ,P.\) For any \(d\in \bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} \subseteq \mathcal{D}\), define

$$\begin{aligned} \overleftarrow{\mathcal{K}}^{d}=\left\{ m\in \mathcal{K}\,|\,(Ad)_m<0\right\} . \end{aligned}$$
(6.6)

Next, we consider the two cases where \(\overleftarrow{\mathcal{K}}^{d}\) is nonempty and empty, respectively. The former happens when d is not a feasible direction of problem (3.2) at point \(\bar{x};\) while the latter happens when d is a feasible direction of problem (3.2) at point \(\bar{x}.\)

Case 1: \(\overleftarrow{\mathcal{K}}^{d}\ne \emptyset .\) Since \(d\in \mathcal{D},\) there must exist \(\epsilon _0^d\) so that \(\bar{x}+\epsilon d\in \mathcal{X}\) holds for all \(0\le \epsilon \le \epsilon _0^d.\) Define

$$\begin{aligned} \overrightarrow{\mathcal{J}}^{d}=\left\{ m\in \mathcal{J}\,|\,(Ad)_m>0\right\} . \end{aligned}$$
(6.7)

Choose \(\epsilon _1^d\) small enough such that

$$\begin{aligned} \left( b-A\bar{x}-\epsilon _1^d Ad\right) _m\le 0,\quad \forall ~m\in {\mathcal{I}} \end{aligned}$$
(6.8)

and

$$\begin{aligned} \left( b-A\bar{x}-\epsilon _1^d Ad\right) _m\ge \frac{\left( b-A\bar{x}\right) _m}{2}>0,\quad \forall ~m\in \overrightarrow{\mathcal{J}}^{d}. \end{aligned}$$
(6.9)

Therefore, for \(0\le \epsilon \le \min \left\{ \epsilon _0^d,\epsilon _1^d\right\} \), we obtain

$$\begin{aligned}&f(\bar{x}+\epsilon d)- f(\bar{x})\nonumber \\&\quad =\Vert \max \left\{ b-A\left( \bar{x}+\epsilon d\right) ,0\right\} \Vert _q^q-\Vert \max \left\{ b-A\bar{x},0\right\} \Vert _q^q\nonumber \\&\quad =\sum _{m\in \overleftarrow{\mathcal{K}}^{d}\cup \mathcal{J}} \left( b-A\bar{x}-\epsilon Ad\right) _m^q-\sum _{m\in \mathcal{J}}\left( b-A\bar{x}\right) _m^q \end{aligned}$$
(6.10)
$$\begin{aligned}&\quad \ge \sum _{m\in \overleftarrow{\mathcal{K}}^{d}}(-Ad)_m^q\epsilon ^q + \sum _{m\in {\overrightarrow{\mathcal{J}}^{d}}} \left( \left( b-A\bar{x}-\epsilon Ad\right) _m^q-\left( b-A\bar{x}\right) _m^q\right) \end{aligned}$$
(6.11)
$$\begin{aligned}&\quad \ge \sum _{m\in \overleftarrow{\mathcal{K}}^{d}}(-Ad)_m^q\epsilon ^q + \sum _{m\in {\overrightarrow{\mathcal{J}}^{d}}} q \left( b-A\bar{x}-\epsilon Ad\right) _m^{q-1}\left( -\epsilon (Ad)_m\right) \end{aligned}$$
(6.12)
$$\begin{aligned}&\quad \ge \sum _{m\in \overleftarrow{\mathcal{K}}^{d}}(-Ad)_m^q\epsilon ^q + \sum _{m\in {\overrightarrow{\mathcal{J}}^{d}}} q \left( \frac{(b-A\bar{x})_m}{2}\right) ^{q-1}\left( -\epsilon (Ad)_m\right) , \end{aligned}$$
(6.13)

where (6.10) is due to (3.1), (6.6), and (6.8); (6.11) is due to (6.7); (6.12) is due to the concavity of the function \(z^q\) with respect to \(z>0;\) (6.13) is due to (6.9) and the definition of \(\overrightarrow{\mathcal{J}}^{d}\) in (6.7). Moreover, by (1.3) and the Taylor’s expansion, for any \(0\le \epsilon \le 1\), there exists \(\xi \in (0,1)\) such that

$$\begin{aligned} h(\bar{x}+\epsilon d)-h(\bar{x})= & {} \epsilon \nabla h(\bar{x}+\xi \epsilon d)^Td\nonumber \\\ge & {} -\epsilon \left\| \nabla h(\bar{x}+\xi \epsilon d)\right\| \nonumber \\\ge & {} -\epsilon \left( \left\| \nabla h(\bar{x})\right\| +\epsilon L_h\right) \\\ge & {} -\epsilon \left( \left\| \nabla h(\bar{x})\right\| +L_h\right) .\nonumber \end{aligned}$$
(6.14)

Combining (6.13) with (6.14), for any \(0\le \epsilon \le \min \left\{ \epsilon _0^d,\epsilon _1^d,1\right\} ,\) we obtain

$$\begin{aligned} F(\bar{x}+\epsilon d)- F(\bar{x})\ge \lambda _1^d\epsilon ^q-\lambda _2^d\epsilon , \end{aligned}$$

where

$$\begin{aligned}&\lambda _1^d:=\sum _{m\in \overleftarrow{\mathcal{K}}^{d}}(-Ad)_m^q\epsilon ^q>0,\\&\lambda _2^d:=\sum _{m\in \overrightarrow{\mathcal{J}}^{d}} q \left( \frac{(b-A\bar{x})_m}{2}\right) ^{q-1}(Ad)_m+\left\| \nabla h(\bar{x})\right\| +L_h>0. \end{aligned}$$

Define

$$\begin{aligned} \epsilon _2^d:=\left( \frac{\lambda _1^d}{\lambda _2^d}\right) ^{\frac{1}{1-q}} \end{aligned}$$

and

$$\begin{aligned} \bar{\epsilon }^d:= \min \left\{ \epsilon _0^d,\epsilon _1^d,\epsilon _2^d,1\right\} >0. \end{aligned}$$

From the above analysis, we can conclude that, for any \(d\in \bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} \) with \(\overleftarrow{\mathcal{K}}^{d}\ne \emptyset ,\) \(F(\bar{x}+\epsilon d)\ge F(\bar{x})\) holds for all \(\epsilon \in [0,\bar{\epsilon }^d].\)

Case 2: \(\overleftarrow{\mathcal{K}}^{d}=\emptyset .\) Recall the definition of \(\overleftarrow{\mathcal{K}}^{d}\) (cf. (6.6)). \(\overleftarrow{\mathcal{K}}^{d}=\emptyset \) implies that d is a feasible direction of problem (3.2) at point \(\bar{x}.\) From the assumption that \(\bar{x}\) is a local minimizer of problem (3.2), we know that there exists an \(\tilde{\epsilon }>0\) such that for all \(d\in \bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} \) with \(\overleftarrow{\mathcal{K}}^{d}=\emptyset ,\) there holds \(F(\bar{x}+\epsilon d)\ge F(\bar{x})\) for all \(\epsilon \in [0,\tilde{\epsilon }].\)

We now combine the above two cases: Case 1 and Case 2. Since there are finitely many directions \(\bigcup _{p=1}^P\left\{ d_p^{1},d_p^{2},\ldots ,d_p^{g_p}\right\} ,\) it follows that

$$\begin{aligned} \bar{\epsilon }:=\min \left\{ \min _{\overleftarrow{\mathcal{K}}^{d}\ne \emptyset ,\,j=1,\ldots ,g_p,\,p=1,\ldots ,P}\left\{ \bar{\epsilon }^{d_p^j}\right\} ,\tilde{\epsilon }\right\} >0 \end{aligned}$$

and

$$\begin{aligned} \bar{x}+\epsilon d_p^j\in \mathcal{X},~F(\bar{x}+\epsilon d_p^j)\ge F(\bar{x}),~\forall ~j=1,2,\ldots ,g_p,\,p=1,2,\ldots ,P\nonumber \\ \end{aligned}$$
(6.15)

hold true for all \(\epsilon \in [0, \bar{\epsilon }].\)

Let \({\text{ Conv }}_p(\bar{x},\bar{\epsilon })\) denote the convex hull spanned by points \(\bar{x}\) and \(\bar{x}+\bar{\epsilon }d_p^{j},~j=1,2,\ldots ,g_p.\) Then, for any \(x\in \bigcup _{p=1}^P{\text{ Conv }}_p(\bar{x},\bar{\epsilon }),\) we have \(F(x)\ge F(\bar{x})\) by (6.15) and the fact that f(x) is concave in \({\text{ Conv }}_p(\bar{x},\bar{\epsilon })\). Furthermore, one can always choose a sufficiently small but fixed \(\epsilon >0\) such that \(B(\bar{x},\epsilon )\bigcap \mathcal{X}\subseteq \bigcup _{p=1}^P{\text{ Conv }}_p(\bar{x},\bar{\epsilon }).\) Therefore, \(\bar{x}\) is a local minimizer of problem (1.1). \(\square \)

Appendix 3: Proof of Lemma 4.1

We show the three items of Lemma 4.1 separately.

  1. (i)

    of Lemma 4.1: it follows directly from the inequality

    $$\begin{aligned} \theta ^q(t)\le \theta ^q(t,\mu )\le \left( \theta (t)+\frac{\mu }{2}\right) ^q\le \theta ^q(t)+\left( \frac{\mu }{2}\right) ^q,\quad \forall ~t\le \mu . \end{aligned}$$
  2. (ii)

    of Lemma 4.1: When \(t\ne 0\) and \(t\ne \mu ,\) \(\theta ^q(t,\mu )\) is twice continuously differentiable with respect to t. Recall \(\theta ^q(t,\mu )\ge \left( \mu /2\right) ^q\) for all t (cf. (4.2)). Then it follows from (4.4) that

    $$\begin{aligned} \left| \left[ \theta ^q(t,\mu )\right] ''\right| \le 4q\mu ^{q-2},\quad \forall ~t\notin \left\{ 0,\mu \right\} . \end{aligned}$$

    This further implies (ii) of Lemma 4.1.

  3. (iii)

    of Lemma 4.1: By the mean-value theorem [27, Theorem 2.3.7], we have

    $$\begin{aligned} \theta ^q(t,\mu )= \theta ^q(\hat{t},\mu )+\left[ \theta ^q(\hat{t},\mu )\right] '\left( t-\hat{t}\right) +\frac{\upsilon }{2}\left( t-\hat{t}\right) ^2,\end{aligned}$$
    (6.16)

    where \(\upsilon \in \partial _{t}\left( \left[ \theta ^q(\xi \hat{t}+(1-\xi )t,\mu )\right] '\right) \) and \(\xi \in [0,1].\) We consider the following three cases.

    • Case \(\hat{t}>2\mu :\) Since \(t-\hat{t}\ge {-\hat{t}}/{2},\) it follows for any \(\xi \in [0,1]\) that

      $$\begin{aligned} \xi t+(1-\xi )\hat{t}=\hat{t}+\xi (t-\hat{t})\ge \hat{t}/2>\mu . \end{aligned}$$

      This, together with (4.4) and (4.5), implies that the \(\upsilon \) in (6.16) satisfies

      $$\begin{aligned} \upsilon \le 0= \kappa (\hat{t},\mu ). \end{aligned}$$

      From this and (6.16), we obtain (4.6).

    • Case \(\hat{t}\in [-\mu , 2\mu ]:\) From (ii) of Lemma 4.1, \(|\upsilon |\) is uniformly bounded by \(\kappa (\hat{t},\mu )={{4q}\mu ^{q-2}}.\) Combining this with (6.16) yields (4.6).

    • Case \(\hat{t}<-\mu :\) Since \(t-\hat{t}\le \mu ,\) for any \(\xi \in [0,1],\) it follows

      $$\begin{aligned} \xi t+(1-\xi )\hat{t}=\hat{t}+\xi (t-\hat{t})< 0. \end{aligned}$$

      From this, (4.4), (4.5), and (6.16), we can obtain (4.6).

This completes the proof of Lemma 4.1. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, YF., Ma, S., Dai, YH. et al. A smoothing SQP framework for a class of composite \(L_q\) minimization over polyhedron. Math. Program. 158, 467–500 (2016). https://doi.org/10.1007/s10107-015-0939-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-015-0939-5

Keywords

Mathematics Subject Classification

Navigation