Abstract
Lasso-type variable selection has been demonstrated to be effective in handling high-dimensional data. From the biological perspective, traditional Lasso-type models are capable of learning which stimuli are valuable while ignoring the many that are not, and thus perform feature selection. Traditional Lasso has the tendency to over-emphasize sparsity and to overlook the correlations between features. These drawbacks have been demonstrated to be critical in limiting its performance on real-world feature selection problems. Although some work has considered the problem of correlation, the issue of discriminative ability resulting from sparsity has been overlooked. To overcome this shortcoming, we propose a discriminative Lasso (referred to as dLasso) in which sparsity and correlation are jointly considered. Specifically, the new method can select features (or stimuli) that are correlated more strongly with the response but are less correlated with each other. Moreover, an efficient alternating direction method of multipliers (ADMM) is presented to solve the resulting sparse non-convex optimization problem. Extensive experiments on different datasets show that although our proposed model is not a convex problem, it outperforms both its approximately convex counterparts and a number of state-of-the-art feature selection methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends® Mach Learn. 2011;3(1):1–122.
Chang CC, Lin CJ. Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol. 2011;2(3):1–27.
Chen SB, Ding C, Luo B, Xie Y. Uncorrelated lasso. In: Twenty-seventh AAAI conference on artificial intelligence; 2013.
Davis G, Mallat S, Avellaneda M. Adaptive greedy approximations. Constr Approx. 1997;13(1):57–98.
De Mol C, De Vito E, Rosasco L. Elastic-net regularization in learning theory. J Complex. 2009;25(2):201–30.
Efron B, Hastie T, Johnstone I, Tibshirani R, et al. Least angle regression. Ann Stat. 2004;32(2):407–99.
Frank A, Asuncion A. Uci machine learning repository. Irvine, CA: University of California; 2010. http://archive.ics.uci.edu/ml.
Gan G, Ma C, Wu J. Data clustering: theory, algorithms, and applications. ASA-SIAM series on statistics and applied probability. Philadelphia: SIAM; 2007.
Georghiades AS, Belhumeur PN, Kriegman DJ. From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell. 2001;23(6):643–60.
He B, Xu D, Nian R, van Heeswijk M, Yu Q, Miche Y, Lendasse A. Fast face recognition via sparse coding and extreme learning machine. Cogn Comput. 2014;6(2):264–77.
Huang J, Zhang T, Metaxas D. Learning with structured sparsity. J Mach Learn Res. 2011;12:3371–412.
Hull JJ. A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell. 1994;16(5):550–4.
Jacob L, Obozinski G, Vert JP. Group lasso with overlap and graph lasso. In: Proceedings of the 26th annual international conference on machine learning. ACM ; 2009. p. 433–440.
Jiang B, Ding C, Luo B. Covariate-correlated lasso for feature selection. In: Machine learning and knowledge discovery in databases, vol 1. Springer; 2014. p. 595–606.
Magnússon S, Weeraddana PC, Rabbat MG, Fischione C. On the convergence of alternating direction lagrangian methods for nonconvex structured optimization problems. (2014) arXiv preprint arXiv:1409.8033.
Osborne MR, Presnell B, Turlach BA. A new approach to variable selection in least squares problems. IMA J Numer Anal. 2000;20(3):389–403.
Osborne MR, Presnell B, Turlach BA. On the lasso and its dual. J Comput Graph Stat. 2000;9(2):319–37.
Pavan M, Pelillo M. A new graph-theoretic approach to clustering and segmentation. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition. vol. 1. IEEE; 2003. p. 145–152.
Pichevar R, Lahdili H, Najaf-Zadeh H, Thibault L. New trends in biologically-inspired audio coding. Croatia: INTECH Open Access Publisher; 2010.
Rockafellar R. Convex analysis. Princeton University Press; 1970.
Schmidt M. Graphical model structure learning with l1-regularization. Ph.D. thesis, UNIVERSITY OF BRITISH COLUMBIA Vancouver; 2010.
Shervashidze N, Bach F. Learning the structure for structured sparsity. IEEE Trans Signal Process. 2015;63(18):4894–902.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological). 1996;58(1):267–88.
Vinje WE, Gallant JL. Sparse coding and decorrelation in primary visual cortex during natural vision. Science. 2000;287(5456):1273–6.
Wang H, Nie F, Huang H, Kim S, Nho K, Risacher SL, Saykin AJ, Shen L, Initiative ADN, et al. Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the adni cohort. Bioinformatics. 2012;28(2):229–37.
Xu H, Caramanis C, Mannor S. Sparse algorithms are not stable: a no-free-lunch theorem. IEEE Trans Pattern Anal Mach Intell. 2012;34(1):187–93.
Xu J, Yang G, Yin Y, Man H, He H. Sparse-representation-based classification with structure-preserving dimension reduction. Cogn Comput. 2014;6(3):608–21.
Yang S, Wang J, Fan W, Zhang X, Wonka P, Ye J. An efficient admm algorithm for multidimensional anisotropic total variation regularization problems. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2013 p. 641–649.
Yao Y, Guo P, Xin X, Jiang Z. Image fusion by hierarchical joint sparse representation. Cogn Comput. 2014;6(3):281–92.
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Statistical Methodology). 2006;68(1):49–67.
Zhang Z. Feature selection from higher order correlations. Ph.D. thesis, University of York; 2012.
Zhao P, Rocha G, Yu B. The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat. 2009;37(6A):3468–97.
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Statistical Methodology). 2005;67(2):301–20.
Acknowledgments
This work is supported by National Natural Science Foundation of China (Grant Nos. 61402389, 11271308 and 11401499), the Fundamental Research Funds for the Central Universities (Nos. 20720160073, 20720150001, 20720140524 and 20720150098) and Fujian Province Soft Sciences Foundation of China (No. 2014R0091).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
Zhihong Zhang, Jianbing Xiahou, Zheng-Jian Bai, Edwin R. Hancock, Da Zhou, Si-Bao Chen and Liyan Chen declare that they have no conflict of interest.
Informed Consent
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Declaration of Helsinki 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.
Human and Animal Rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Rights and permissions
About this article
Cite this article
Zhang, Z., Xiahou, J., Bai, ZJ. et al. Discriminative Lasso. Cogn Comput 8, 847–855 (2016). https://doi.org/10.1007/s12559-016-9402-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-016-9402-z