Abstract
We investigate the structure of the local and global minimizers of the Huber loss function regularized with a sparsity inducing L0 norm term. We characterize local minimizers and establish conditions that are necessary and sufficient for a local minimizer to be strict. A necessary condition is established for global minimizers, as well as non-emptiness of the set of global minimizers. The sparsity of minimizers is also studied by giving bounds on a regularization parameter controlling sparsity. Results are illustrated in numerical examples.
Similar content being viewed by others
Notes
While everyone dealing with the Huber function uses the constant, we were not able to find a derivation, so it is provided.
This set appears in our subsequent results and is shown to be dense in \({{\mathbb {R}}}^N\) in “Appendix.”
References
Zoubir, A., Koivunnen, V., Ollila, E., Muma, M.: Robust Statistics for Signal Processing. Cambridge University Press, Cambridge (2018)
Huber, P.J.: Robust Statistics. Wiley, New York (1981)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2008)
El, d’Aspremont A., Ghaoui, L.: Testing the nullspace property using semidefinite programming. Math. Program. 127(1), 123–144 (2011)
Bryan, K., Leise, T.: Making do with less: an introduction to compressed sensing. SIAM Rev. 55, 547–566 (2013)
Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theor. 52(2), 489–509 (2006)
Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 52, 4203–4215 (2006)
Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1998)
Chrétien, S.: An alternating \(\ell _1\) approach to the compressed sensing problem. IEEE Signal Proc. Lett. 17, 181–184 (2010)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 51, 1289–1306 (2005)
Donoho, D.L.: For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59, 797–829 (2006)
Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer, New York (2010)
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Springer, New York (2013)
Tropp, J.A.: Recovery of short, complex linear combinations via \(\ell _1\) minimization. IEEE Trans. Inf. Theor. 51(4), 1568–1570 (2005)
Fuchs, J.J.: On sparse representations in arbitrary redundant bases. IEEE Trans. Inf. Theor. 50(6), 1341–1344 (2004)
Nikolova, M.: Description of the minimizers of least squares regularized with \(\ell _0\)-norm. Uniqueness of the global minimizer. SIAM J. Imaging Sci. 6(2), 904–937 (2013)
Beck, A., Hallak, N.: Proximal mapping for symmetric penalty and sparsity. SIAM J. Optim. 28(1), 496–527 (2018)
Tropp, J.A.: Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theor. 52, 1030–1051 (2006)
Chancelier, J.-Ph., De Lara, M.: Hidden convexity in the l0 pseudonorm. HAL (2019)
Chancelier, J.-Ph., De Lara, M.: Lower bound convex programs for exact sparse optimization. HAL (2019)
Chancelier, J.-Ph., De Lara, M.: A suitable conjugacy for the l0 sseudonorm. HAL (2019)
Soubies, E., Blanc-Féraud, L., Aubert, G.: New insights on the \(\ell _2\)-\(\ell _0\) minimization problem. J. Math. Imaging Vis. 62, 808–824 (2020)
Lanza, A., Morigi, S., Selesnick, I.W., Sgallari, F.: Sparsity-inducing non-convex, non-separable regularization for convex image processing. SIAM J. Imaging Sci. 12(2), 1099–1134 (2019)
Selesnick, I.: Sparse regularization via convex analysis. IEEE Trans. Signal Process. 65(17), 4481–4494 (2017)
Wang, S., Chen, X., Dai, W., Selesnick, I.W., Cai, G.: Vector minimax concave penalty for sparse representation. Digital Signal Process. 83, 165–179 (2018)
Wang, J., Zhang, F., Huang, J., Wang, W., Yuan, C.: A non-convex penalty function with integral convolution approximation for compressed sensing. Signal Process. 158, 116–128 (2019)
Chen, K., Lv, Q., Lu, Y., Dou, Y.: Robust regularized extreme learning machine for regression using iteratively reweighted least squares. Neurocomputing 230, 345–358 (2017)
Carrillo, R.E., Ramirez, A.B., Arce, G.R., Barner, K.E., Sadler, B.M.: Robust compressive sensing of sparse signals: a review. EURASIP J. Adv. Signal Process. 2016, 108 (2016)
Grechuk, B., Zabarankin, M.: Regression analysis: likelihood, error and entropy. Math. Program. 174, 145–166 (2019)
Li, W., Swetits, J.J.: The linear \(\ell _1\) estimator and the Huber M-estimator. SIAM J. Optim. 8, 457–475 (1998)
Madsen, K., Nielsen, H.B.: A finite smoothing algorithm for linear \(\ell _1\) estimation. SIAM J. Optim. 3, 223–235 (1993)
Madsen, K., Nielsen, H.B., Pınar, M.Ç.: New characterizations of \( \ell _1 \) solutions to overdetermined systems of linear equations. Oper. Res. Lett. 16, 159–166 (1994)
Auslender, A., Teboulle, M.: Asymptotic Cones and Functions in Optimization and Variational Inequalities. Springer, New York (2003)
Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, 629–654 (2008)
Pınar, M.Ç.: Linear Huber M-estimator under ellipsoidal data uncertainty. BIT 42(4), 856–866 (2002)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Panos M. Pardalos.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Definition 8.1
Let \( \Psi :{{\mathbb {R}}}^N\rightarrow {{\mathbb {R}}}\) be a differentiable function over \( {{\mathbb {R}}}^N \) and its gradient has a Lipschitz constant \( L_\Psi >0 \):
Proposition 8.1
\( \Psi \) has a Lipschitz constant \( \dfrac{\left\Vert A\right\Vert ^2_2}{\gamma } \), where \( \left\Vert A\right\Vert _2=\sup _{\left\Vert x\right\Vert _2=1}\dfrac{\left\Vert Ax\right\Vert _2}{\left\Vert x\right\Vert _2}. \)
Proof
Let \( x,y\in {{\mathbb {R}}}^N \),
Last inequality comes from continuity of the gradient. \(\square \)
Remark 8.1
One should use the Frobenius norm for easy computation, which causes the Lipschitz constant to be \( \dfrac{\left\Vert A\right\Vert ^2_F}{\gamma } \). This choice is safe since \( \left\Vert A\right\Vert _2\le \left\Vert A\right\Vert _F \).
Proposition 8.2
\( {\mathbb {B}}_\gamma \) defined in Theorem 5.2 is a dense subset of \( {{\mathbb {R}}}^N \).
Proof
Let \( c:{{\mathbb {R}}}^N\rightarrow {{\mathbb {R}}}^M \) be a linear continuous operator defined as \( c(x)=Ax-d \). Since A is full rank with \( M<N \), c is a surjection. Let \( T=c({\mathbb {B}}_\gamma ) \) where \( c({\mathbb {B}}_\gamma ) \) is the image of set \( {\mathbb {B}}_\gamma \). Hence, \( T=\{v\in {{\mathbb {R}}}^M:\forall i\in {{\mathbb {I}}}_M, \left|v[i]\right|\ne \gamma \} \). Let \( {{\mathcal {O}}}\) be an arbitrary non-empty open set in \( {{\mathbb {R}}}^M \). Let \( {\bar{v}}\in {{\mathcal {O}}}\) and define an index set as follows: \( {\bar{v}}_\gamma =\{i\in {{\mathbb {I}}}_M:\left|{\bar{v}}[i]\right|=\gamma \}. \) Now, there is some positive radius \( r_{{\bar{v}}} \) such that \( B_\infty ({\bar{v}},r_{{\bar{v}}}) \) stays in \( {{\mathcal {O}}}\). If we define
we have \( r^*>0 \) and \( B_\infty ({\bar{v}},r_{{\bar{v}}}) \) stays in \( {{\mathcal {O}}}\). Then, for any \( 0<\delta <r^* \), \( v^*={\bar{v}}+\delta \mathbb {1}_M \) belongs to \( {{\mathcal {O}}}\) and T at the same time. Hence, T is a dense set. T is the image of a continuous surjection; therefore, \( {\mathbb {B}}_\gamma \) is a dense set too. \(\square \)
Remark 8.2
Previous result shows that one can construct a sequence of vectors from \( {\mathbb {B}}_\gamma \) converging to any other vector in \( {{\mathbb {R}}}^N \). This may be useful for deriving algorithms using second-order methods since the second derivative exists only for vectors in \( {\mathbb {B}}_\gamma \).
For all numerical experiments reported in this paper, we use the following equivalent formulation for (HR)
Proposition 8.3
(Equivalent Characterization for (HR), [35]) Any optimal solution to the quadratic program (QHR2) is a minimizer of \( \Psi \), and conversely.
This alternative eliminates the need to work with piecewise functions and provides an easier computation tool. We used the following algorithm for the numerical examples reported in the paper:
Rights and permissions
About this article
Cite this article
Akkaya, D., Pınar, M.Ç. Minimizers of Sparsity Regularized Huber Loss Function. J Optim Theory Appl 187, 205–233 (2020). https://doi.org/10.1007/s10957-020-01745-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-020-01745-3
Keywords
- Sparse solution of linear systems
- Regularization
- local minimizer
- Global minimizer
- Huber loss function
- L0-norm