Minimizers of Sparsity Regularized Huber Loss Function

811 Accesses
8 Citations
Explore all metrics

Abstract

We investigate the structure of the local and global minimizers of the Huber loss function regularized with a sparsity inducing L0 norm term. We characterize local minimizers and establish conditions that are necessary and sufficient for a local minimizer to be strict. A necessary condition is established for global minimizers, as well as non-emptiness of the set of global minimizers. The sparsity of minimizers is also studied by giving bounds on a regularization parameter controlling sparsity. Results are illustrated in numerical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Sparse solutions to an underdetermined system of linear equations via penalized Huber loss

Article 06 November 2020

A second-order method for strongly convex $\ell _1$-regularization problems

Article 01 March 2015

Huber-Norm Regularization for Linear Prediction Models

Notes

While everyone dealing with the Huber function uses the constant, we were not able to find a derivation, so it is provided.
This set appears in our subsequent results and is shown to be dense in ${{\mathbb {R}}}^N$ in “Appendix.”

References

Zoubir, A., Koivunnen, V., Ollila, E., Muma, M.: Robust Statistics for Signal Processing. Cambridge University Press, Cambridge (2018)
Book MATH Google Scholar
Huber, P.J.: Robust Statistics. Wiley, New York (1981)
Book MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2008)
MATH Google Scholar
El, d’Aspremont A., Ghaoui, L.: Testing the nullspace property using semidefinite programming. Math. Program. 127(1), 123–144 (2011)
Article MathSciNet MATH Google Scholar
Bryan, K., Leise, T.: Making do with less: an introduction to compressed sensing. SIAM Rev. 55, 547–566 (2013)
Article MathSciNet MATH Google Scholar
Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theor. 52(2), 489–509 (2006)
Article MathSciNet MATH Google Scholar
Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 52, 4203–4215 (2006)
MathSciNet MATH Google Scholar
Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1998)
Article MathSciNet MATH Google Scholar
Chrétien, S.: An alternating $\ell _1$ approach to the compressed sensing problem. IEEE Signal Proc. Lett. 17, 181–184 (2010)
Google Scholar
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 51, 1289–1306 (2005)
MathSciNet MATH Google Scholar
Donoho, D.L.: For most large underdetermined systems of linear equations the minimal $\ell _1$-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59, 797–829 (2006)
MathSciNet MATH Google Scholar
Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer, New York (2010)
Book MATH Google Scholar
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Springer, New York (2013)
Book MATH Google Scholar
Tropp, J.A.: Recovery of short, complex linear combinations via $\ell _1$ minimization. IEEE Trans. Inf. Theor. 51(4), 1568–1570 (2005)
MATH Google Scholar
Fuchs, J.J.: On sparse representations in arbitrary redundant bases. IEEE Trans. Inf. Theor. 50(6), 1341–1344 (2004)
Article MathSciNet MATH Google Scholar
Nikolova, M.: Description of the minimizers of least squares regularized with $\ell _0$-norm. Uniqueness of the global minimizer. SIAM J. Imaging Sci. 6(2), 904–937 (2013)
MathSciNet MATH Google Scholar
Beck, A., Hallak, N.: Proximal mapping for symmetric penalty and sparsity. SIAM J. Optim. 28(1), 496–527 (2018)
Article MathSciNet MATH Google Scholar
Tropp, J.A.: Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theor. 52, 1030–1051 (2006)
Article MathSciNet MATH Google Scholar
Chancelier, J.-Ph., De Lara, M.: Hidden convexity in the l0 pseudonorm. HAL (2019)
Chancelier, J.-Ph., De Lara, M.: Lower bound convex programs for exact sparse optimization. HAL (2019)
Chancelier, J.-Ph., De Lara, M.: A suitable conjugacy for the l0 sseudonorm. HAL (2019)
Soubies, E., Blanc-Féraud, L., Aubert, G.: New insights on the $\ell _2$-$\ell _0$ minimization problem. J. Math. Imaging Vis. 62, 808–824 (2020)
MATH Google Scholar
Lanza, A., Morigi, S., Selesnick, I.W., Sgallari, F.: Sparsity-inducing non-convex, non-separable regularization for convex image processing. SIAM J. Imaging Sci. 12(2), 1099–1134 (2019)
Article MathSciNet Google Scholar
Selesnick, I.: Sparse regularization via convex analysis. IEEE Trans. Signal Process. 65(17), 4481–4494 (2017)
Article MathSciNet MATH Google Scholar
Wang, S., Chen, X., Dai, W., Selesnick, I.W., Cai, G.: Vector minimax concave penalty for sparse representation. Digital Signal Process. 83, 165–179 (2018)
Article MathSciNet Google Scholar
Wang, J., Zhang, F., Huang, J., Wang, W., Yuan, C.: A non-convex penalty function with integral convolution approximation for compressed sensing. Signal Process. 158, 116–128 (2019)
Article Google Scholar
Chen, K., Lv, Q., Lu, Y., Dou, Y.: Robust regularized extreme learning machine for regression using iteratively reweighted least squares. Neurocomputing 230, 345–358 (2017)
Article Google Scholar
Carrillo, R.E., Ramirez, A.B., Arce, G.R., Barner, K.E., Sadler, B.M.: Robust compressive sensing of sparse signals: a review. EURASIP J. Adv. Signal Process. 2016, 108 (2016)
Article Google Scholar
Grechuk, B., Zabarankin, M.: Regression analysis: likelihood, error and entropy. Math. Program. 174, 145–166 (2019)
Article MathSciNet MATH Google Scholar
Li, W., Swetits, J.J.: The linear $\ell _1$ estimator and the Huber M-estimator. SIAM J. Optim. 8, 457–475 (1998)
MathSciNet MATH Google Scholar
Madsen, K., Nielsen, H.B.: A finite smoothing algorithm for linear $\ell _1$ estimation. SIAM J. Optim. 3, 223–235 (1993)
MathSciNet MATH Google Scholar
Madsen, K., Nielsen, H.B., Pınar, M.Ç.: New characterizations of $ \ell _1 $ solutions to overdetermined systems of linear equations. Oper. Res. Lett. 16, 159–166 (1994)
MathSciNet MATH Google Scholar
Auslender, A., Teboulle, M.: Asymptotic Cones and Functions in Optimization and Variational Inequalities. Springer, New York (2003)
MATH Google Scholar
Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appl. 14, 629–654 (2008)
Article MathSciNet MATH Google Scholar
Pınar, M.Ç.: Linear Huber M-estimator under ellipsoidal data uncertainty. BIT 42(4), 856–866 (2002)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Bilkent University, Ankara, Turkey
Deniz Akkaya & Mustafa Ç. Pınar

Authors

Deniz Akkaya
View author publications
You can also search for this author in PubMed Google Scholar
Mustafa Ç. Pınar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mustafa Ç. Pınar.

Additional information

Communicated by Panos M. Pardalos.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Definition 8.1

Let $ \Psi :{{\mathbb {R}}}^N\rightarrow {{\mathbb {R}}}$ be a differentiable function over $ {{\mathbb {R}}}^N $ and its gradient has a Lipschitz constant $ L_\Psi >0 $:

$$\begin{aligned} \left\Vert \triangledown \Psi (x)-\triangledown \Psi (y)\right\Vert _2\le L_{\Psi }\left\Vert x-y\right\Vert _2 \text { for all } x,y\in {{\mathbb {R}}}^N. \end{aligned}$$

Proposition 8.1

$ \Psi $ has a Lipschitz constant $ \dfrac{\left\Vert A\right\Vert ^2_2}{\gamma } $, where $ \left\Vert A\right\Vert _2=\sup _{\left\Vert x\right\Vert _2=1}\dfrac{\left\Vert Ax\right\Vert _2}{\left\Vert x\right\Vert _2}. $

Proof

Let $ x,y\in {{\mathbb {R}}}^N $,

$$\begin{aligned} \left\Vert \triangledown \Psi (x)-\triangledown \Psi (y)\right\Vert _2&=\left\Vert A^{ T}[\hbox {clip}(Ax-d)-\hbox {clip}(Ay-d)]\right\Vert _2\\&\le \left\Vert A\right\Vert _2\left\Vert \hbox {clip}(Ax-d)-\hbox {clip}(Ay-d)\right\Vert _2\\&=\left\Vert A\right\Vert _2\dfrac{\left\Vert \hbox {clip}(Ax-d)-\hbox {clip}(Ay-d)\right\Vert _2}{\left\Vert A(x-y)\right\Vert _2}\left\Vert A(x-y)\right\Vert _2\\&\le \left\Vert A\right\Vert _2^2\dfrac{\left\Vert \hbox {clip}(Ax-d)-\hbox {clip}(Ay-d)\right\Vert _2}{\left\Vert (Ax-d)-(Ay-d)\right\Vert _2}\left\Vert x-y\right\Vert _2\\&\le \dfrac{\left\Vert A\right\Vert _2^2}{\gamma }\left\Vert x-y\right\Vert _2. \end{aligned}$$

Last inequality comes from continuity of the gradient. $\square $

Remark 8.1

One should use the Frobenius norm for easy computation, which causes the Lipschitz constant to be $ \dfrac{\left\Vert A\right\Vert ^2_F}{\gamma } $. This choice is safe since $ \left\Vert A\right\Vert _2\le \left\Vert A\right\Vert _F $.

Proposition 8.2

$ {\mathbb {B}}_\gamma $ defined in Theorem 5.2 is a dense subset of $ {{\mathbb {R}}}^N $.

Proof

Let $ c:{{\mathbb {R}}}^N\rightarrow {{\mathbb {R}}}^M $ be a linear continuous operator defined as $ c(x)=Ax-d $. Since A is full rank with $ M<N $, c is a surjection. Let $ T=c({\mathbb {B}}_\gamma ) $ where $ c({\mathbb {B}}_\gamma ) $ is the image of set $ {\mathbb {B}}_\gamma $. Hence, $ T=\{v\in {{\mathbb {R}}}^M:\forall i\in {{\mathbb {I}}}_M, \left|v[i]\right|\ne \gamma \} $. Let $ {{\mathcal {O}}}$ be an arbitrary non-empty open set in $ {{\mathbb {R}}}^M $. Let $ {\bar{v}}\in {{\mathcal {O}}}$ and define an index set as follows: $ {\bar{v}}_\gamma =\{i\in {{\mathbb {I}}}_M:\left|{\bar{v}}[i]\right|=\gamma \}. $ Now, there is some positive radius $ r_{{\bar{v}}} $ such that $ B_\infty ({\bar{v}},r_{{\bar{v}}}) $ stays in $ {{\mathcal {O}}}$. If we define

$$\begin{aligned} r^*=\min \Bigg \{\min _{i\notin {\bar{v}}_\gamma }\{\left|{\bar{v}}[i]-\gamma \right|\},\min _{i\notin {\bar{v}}_\gamma }\{\left|{\bar{v}}[i]+\gamma \right|\},r_{{\bar{v}}},\gamma \Bigg \}, \end{aligned}$$

we have $ r^*>0 $ and $ B_\infty ({\bar{v}},r_{{\bar{v}}}) $ stays in $ {{\mathcal {O}}}$. Then, for any $ 0<\delta <r^* $, $ v^*={\bar{v}}+\delta \mathbb {1}_M $ belongs to $ {{\mathcal {O}}}$ and T at the same time. Hence, T is a dense set. T is the image of a continuous surjection; therefore, $ {\mathbb {B}}_\gamma $ is a dense set too. $\square $

Remark 8.2

Previous result shows that one can construct a sequence of vectors from $ {\mathbb {B}}_\gamma $ converging to any other vector in $ {{\mathbb {R}}}^N $. This may be useful for deriving algorithms using second-order methods since the second derivative exists only for vectors in $ {\mathbb {B}}_\gamma $.

For all numerical experiments reported in this paper, we use the following equivalent formulation for (HR)

Proposition 8.3

(Equivalent Characterization for (HR), [35]) Any optimal solution to the quadratic program (QHR2) is a minimizer of $ \Psi $, and conversely.

This alternative eliminates the need to work with piecewise functions and provides an easier computation tool. We used the following algorithm for the numerical examples reported in the paper:

Rights and permissions

Reprints and permissions

About this article

Cite this article

Akkaya, D., Pınar, M.Ç. Minimizers of Sparsity Regularized Huber Loss Function. J Optim Theory Appl 187, 205–233 (2020). https://doi.org/10.1007/s10957-020-01745-3

Download citation

Received: 27 January 2020
Accepted: 02 September 2020
Published: 19 September 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s10957-020-01745-3