Abstract
Sparse learning models are popular in many application areas. Objective functions in sparse learning models are usually non-smooth, which makes it difficult to solve them numerically. We develop a fast and convergent two-step iteration scheme for solving a class of non-differentiable optimization models motivated from sparse learning. To overcome the difficulty of the non-differentiability of the models, we first present characterizations of their solutions as fixed-points of mappings involving the proximity operators of the functions appearing in the objective functions. We then introduce a two-step fixed-point algorithm to compute the solutions. We establish convergence results of the proposed two-step iteration scheme and compare it with the alternating direction method of multipliers (ADMM). In particular, we derive specific two-step iteration algorithms for three models in machine learning: \(\ell ^1\)-SVM classification, \(\ell ^1\)-SVM regression, and the SVM classification with the group LASSO regularizer. Numerical experiments with some synthetic datasets and some benchmark datasets show that the proposed algorithm outperforms ADMM and the linear programming method in computational time and memory storage costs.
Similar content being viewed by others
References
Argyriou, A., Micchelli, C.A., Pontil, M., Shen, L., Xu, Y.: Efficient first order methods for linear composite regularizers. arXiv:1104.1436 (2011)
Bach, F., Jenatton, R., Mairal, J., Obozinski, G., et al.: Optimization with sparsity-inducing penalties. Found. Trends® Mach. Learn. 4, 1–106 (2012)
Bottou, L., Lin, C.-J.: Support vector machine solvers. Large Scale Kernel Mach. 301–320 (2007)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3, 1–122 (2011)
Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Fung, G.M., Mangasarian, O.L.: A feature selection Newton method for support vector machine classification. Comput. Optim. Appl. 28, 185–202 (2004)
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)
Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, pp. 433–440 (2009)
Koshiba, Y., Abe, S.: Comparison of L1 and L2 support vector machines. In: Proceedings of the International Joint Conference On Neural Networks, 2003, vol. 3. IEEE, pp. 2054–2059 (2003)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Li, Q., Micchelli, C.A., Shen, L., Xu, Y.: A proximity algorithm accelerated by Gauss–Seidel iterations for L1/TV denoising models. Inverse Probl. 28, 095003 (2012)
Li, Q., Shen, L., Xu, Y., Zhang, N.: Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing. Adv. Comput. Math. 41, 387–422 (2015)
Li, Q., Xu, Y., Zhang, N.: Two-step fixed-point proximity algorithms for multi-block separable convex problems. J. Sci. Comput. 70, 1204–1228 (2017)
Li, Z., Song, G., Xu, Y.: A fixed-point proximity approach to solving the support vector regression with the group Lasso regularization. Int. J. Numer. Anal. Model. 15, 154–169 (2018)
Li, Z., Xu, Y., Ye, Q.: Sparse support vector machines in reproducing kernel Banach spaces. In: Dick, J., Kuo, F., Woźniakowski, H. (eds.) Contemporary Computational Mathematics: A Celebration of the 80th Birthday of Ian Sloan, pp. 869–887. Springer, Cham (2018)
Lichman, M.: UCI Machine Learning Repository. University of California, Irvine (2013)
Lin, R., Song, G., Zhang, H.: Multi-task learning in vector-valued reproducing kernel Banach spaces with the \(\ell ^1\) norm. arXiv:1901.01036 [math] (2019)
Mangasarian, O.L.: Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J. Mach. Learn. Res. 7, 1517–1530 (2007)
Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70, 53–71 (2008)
Micchelli, C.A., Shen, L., Xu, Y.: Proximity algorithms for image models: denoising. Inverse Prob. 27, 045009 (2011)
Micchelli, C.A., Shen, L., Xu, Y., Zeng, X.: Proximity algorithms for the L1/TV image denoising model. Adv. Comput. Math. 38, 401–426 (2013)
Micchelli, C.A., Xu, Y., Zhang, H.: Universal kernels. J. Mach. Learn. Res. 7, 2651–2667 (2006)
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate O (\(1/\)k\(\hat{2}\)), in Dokl. Akad. Nauk Sssr 269, 543–547 (1983)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1, 127–239 (2014)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Song, G., Zhang, H.: Reproducing kernel Banach spaces with the l1 norm II: error analysis for regularized least square regression. Neural Comput. 23, 2713–2729 (2011)
Song, G., Zhang, H., Hickernell, F.J.: Reproducing kernel Banach spaces with the l1 norm. Appl. Comput. Harmon. Anal. 34, 96–116 (2013)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288 (1996)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Wang, R., Xu, Y.: Functional reproducing kernel Hilbert spaces for non-point-evaluation functional data. Appl. Comput. Harmon. Anal. 46, 569–623 (2019)
Xu, Y., Ye, Q.: Generalized Mercer kernels and reproducing kernel Banach spaces. Mem. Am. Math. Soc. 258, 1–122 (2019)
Yuan, G., Chang, K., Hsieh, C., Lin, C.: A comparison of optimization methods and software for large-scale L1-regularized linear classification. J. Mach. Learn. Res. 11, 3183–3234 (2010)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. 68, 49–67 (2006)
Zhang, H., Xu, Y., Zhang, J.: Reproducing kernel Banach spaces for machine learning. J. Mach. Learn. Res. 10, 3520–3527 (2009)
Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. Adv. Neural Inf. Process. Syst. 16, 49–56 (2004)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was supported in part by the Ministry of Science and Technology of China under Grant 2016YFB0200602, by the Natural Science Foundation of China under Grants 11471013, 11771464, and by the US National Science Foundation under Grants DMS-1521661, DMS-1522332, DMS-1912958 and DMS-1939203.
Rights and permissions
About this article
Cite this article
Li, Z., Song, G. & Xu, Y. A Two-Step Fixed-Point Proximity Algorithm for a Class of Non-differentiable Optimization Models in Machine Learning. J Sci Comput 81, 923–940 (2019). https://doi.org/10.1007/s10915-019-01045-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10915-019-01045-7