Self-regularized Lasso for selection of most informative features in microarray cancer classification

351 Accesses
5 Citations
Explore all metrics

Abstract

In this article, a new method is employed for maximizing the performance of the Least Absolute Shrinkage and Selection Operator (Lasso) feature selection model. In fact, we presented a novel regularization for the Lasso by employing an approach to find the best regularization parameter automatically which guarantees best performance of the Lasso in DNA microarray data classification. In our experiment, four well-known publicly available microarray datasets including breast cancer, Diffuse Large B-cell Lymphoma (DLBCL), leukemia and prostate cancer were utilized for evaluation the proposed methods. Experimental results demonstrated the significant dominance of the proposed Lasso against other widely used feature selection methods in terms of best features that led to best performance, robustness and stability in microarray data classification. Accordingly, the proposed method is a powerful algorithm for selection of most informative features which can be used for cancer diagnosis by gene expression profiles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

A Method for Cancer Genomics Feature Selection Based on LASSO-RFE

Article 20 April 2022

A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification

Article 07 August 2018

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Data availability

We have used public datasets for our investigation. Hereby, for easy access to the data, they are uploaded to GitHub and can be accessed by following link.

https://github.com/Mehrdadvatankhah/Microarray-Dataset.

Code availability

Our code is developed in MATLAB platform and can be accessed by following link.

https://github.com/Mehrdadvatankhah/Self-regularized-Lasso.

References

Algamal ZY, Lee MH (2015) Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert Syst Appl 42:9326–9332
Article Google Scholar
Alshalalfah M, Alhajj R (2009) Cancer class prediction: two stage clusteringapproach to identify informative genes. Intell Data Anal 13:671–686
Article Google Scholar
Anastasis Kratsios CH (2021) A Meta-algorithm for Universal UAP-Invariant feature representation. J Mach Learn Res 22:1–51
MathSciNet Google Scholar
Bergadano F, Raedt L (1994) Estimating attributes: analysis and extensions of RELIEF. Springer-Verlag, Berlin
Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H. Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterp Inf Syst 13(3):329–351. https://doi.org/10.1080/17517575.2018.1557256
Bolón-Canedo V, Alonso-Betanzos A (2019) Ensembles for feature selection: a review and future trends. Inform Fusion 52:1–12
Article Google Scholar
Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–193
Chen X-w, Wasikowski M (2008) A roc-based feature selection metric for small samples and imbalanced data classification problems. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 124–132
Ding C, Peng H (2005). Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–205
Drummond C, Holte RC (11 2006) An improved method for visualizing classifier performance. Mach Learn 65(1):95–130
Fu G, Wang P (2014) LASSO-type variable selection methods for high-dimensional data. Appl Mech Mater 444–445:604–609
Golub T et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods Applied on microarray data. Adv Bioinform. https://doi.org/10.1155/2015/198363
Hsu N-J, Hung H-L, Chang Y-M (2008) Subset selection for vector autoregressive processes using Lasso. ScienceDirect 52(7):3645–3657
Huang H-H, Liu X-Y, Liang Y (2016) Feature selection and cancer classification via sparse logistic regression with the hybrid L1/2 + 2 regularization. PLoS ONE 11(5):e0149675
Liu H (2010) Feature Selection. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer US, Boston, pp 402–406
Huang S, Huang M, Zhang Y, Chen J, Bhatti U (2020) Medical image segmentation using deep learning with feature enhancement. IET Image Proc 14:3324–3332. https://doi.org/10.1049/iet-ipr.2019.0772
Article Google Scholar
Hussain Shah S, Iqbal MJ, Ahmad I, Khan S, Rodrigues JJPC (2020) Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. Neural Comput Appl:1433–3058
Ijaz MF, Attique M, Son Y (2020) Data-driven cervical cancer prediction model with outlier detection and over-sampling methods. Sensors: 20(10):2809, [Online]. Available: https://www.mdpi.com/1424-8220/20/10/2809
Jiang L, Greenwood CMT, Yao W, Li L (2020) Bayesian Hyper-LASSO classification for feature selection with application to Endometrial Cancer RNA-seq data. Sci Rep 10(1):9747. https://doi.org/10.1038/s41598-020-66466-z
Article Google Scholar
Jolliffe I (2005) Principal component analysis. Wiley Online Library
Kang C, Huo Y, Xin L, Tian B, Yu B (2018) Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. J Theor Biol 463:77–91
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324
Article Google Scholar
Mandal M, Singh PK, Ijaz MF, Shafi J, Sarkar R (2021) A tri-stage wrapper-filter feature selection framework for disease classification. Sensors 21(16):5571, [Online]. Available: https://www.mdpi.com/1424-8220/21/16/5571
Momenzadeh M, Sehhati M, Rabbani H (2019) A novel feature selection method for microarray data classification based on hidden Markov model. J Biomed Inform. https://doi.org/10.1016/j.jbi.2019.103213
Momenzadeh M, Sehhati M, Rabbani H (2020) Using hidden Markov model to predict recurrence of breast cancer based on sequential patterns in gene expression profiles. J Biomed Inform 111:1–9
Article Google Scholar
Mundra PA, Rajapakse JC (2010) SVM-RFE with MRMR filter for gene selection. IEEE Trans Nanobiosci 9(1):1–37
Article Google Scholar
Navin Lal T, Chapelle O, Weston J, Elisseeff A (2006) Embedded methods. Springer-Verlag, Berlin
Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) A novel aggregate gene selection method for microarray data classification. Pattern Recognit Lett:16–23. https://doi.org/10.1016/j.patrec.2015.03.018
Rohini R, Muthukrishnan R (2016) LASSO: a feature selection technique in predictive modeling for machine learning. In: IEEE International Conference on Advances in Computer Applications
Roweis ST, Saul LK (12 2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Sanchez-Marono N, Alonso-Betanzos A, Tombilla-Sanroman M (2007) Filter methods for feature selection–a comparative study. Intelligent Data Engineering and Automated Learning, pp 178–187
Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):0–209
Article Google Scholar
Shipp MA et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74
Article Google Scholar
Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ (2021) Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 21(8):2852, [Online]. Available: https://www.mdpi.com/1424-8220/21/8/2852
Tibshirani GJDWTHR (2013) An introduction to statistical learning. Springer, Berlin
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodological) 58(1):267–288
MathSciNet Google Scholar
Tibshirani R (1997) The Lasso method for variable selection in the cox model. Stat Med 16(4):385–395
Article Google Scholar
Ulisses ERD, Braga-Neto M (2004) Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3):374–380. https://doi.org/10.1093/bioinformatics/btg419
van ’t Veer LJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536
Article Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83
Article Google Scholar
Wong T-T (2015) Performance evaluation of classification algorithms by k-fold and leave-one-outcross validation. Pattern Recognit:2839–2846. https://doi.org/10.1016/j.patcog.2015.03.009
Zeeshan Z et al (2021) Feature-based multi-criteria recommendation system using a weighted approach with ranking correlation. Intell Data Anal 25:1013–1029. https://doi.org/10.3233/IDA-205388
Article Google Scholar
Zeebaree DQ, Haron H, Abdulazeez AM (2018) Gene selection and classification of microarray data using convolutional neural network. In: International Conference on Advanced Science and Engineering, Kurdistan Region

Download references

Author information

Authors and Affiliations

Department of Artificial Intelligence, Smart University of Medical Sciences, Tehran, Iran
Mehrdad Vatankhah & Mohammadreza Momenzadeh

Authors

Mehrdad Vatankhah
View author publications
You can also search for this author in PubMed Google Scholar
Mohammadreza Momenzadeh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Mehrdad Vatankhah, as the first author, has done Implementation of the computer code, and supporting algorithms, Writing, Initial draft preparation.

Mohammadreza Momenzadeh, as the corresponding author, has done the Project administration, Writing, Reviewing and Editing, Data curation, and Conceptualization.

Corresponding author

Correspondence to Mohammadreza Momenzadeh.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

We confirm that this paper is the results of the original work done by Mehrdad Vatankhah and MohamadReza Momenzadeh, and there is no other authors or co-workers.

Consent for publication

We confirm that this paper contains the results of the original work done by us and has never been submitted to other journals or conferences.

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Vatankhah, M., Momenzadeh, M. Self-regularized Lasso for selection of most informative features in microarray cancer classification. Multimed Tools Appl 83, 5955–5970 (2024). https://doi.org/10.1007/s11042-023-15207-1

Download citation

Received: 23 April 2022
Revised: 05 October 2022
Accepted: 30 March 2023
Published: 30 May 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15207-1

Self-regularized Lasso for selection of most informative features in microarray cancer classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Method for Cancer Genomics Feature Selection Based on LASSO-RFE

A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Data availability

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflicts of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Self-regularized Lasso for selection of most informative features in microarray cancer classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Method for Cancer Genomics Feature Selection Based on LASSO-RFE

A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Data availability

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflicts of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation