[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

Self-regularized Lasso for selection of most informative features in microarray cancer classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this article, a new method is employed for maximizing the performance of the Least Absolute Shrinkage and Selection Operator (Lasso) feature selection model. In fact, we presented a novel regularization for the Lasso by employing an approach to find the best regularization parameter automatically which guarantees best performance of the Lasso in DNA microarray data classification. In our experiment, four well-known publicly available microarray datasets including breast cancer, Diffuse Large B-cell Lymphoma (DLBCL), leukemia and prostate cancer were utilized for evaluation the proposed methods. Experimental results demonstrated the significant dominance of the proposed Lasso against other widely used feature selection methods in terms of best features that led to best performance, robustness and stability in microarray data classification. Accordingly, the proposed method is a powerful algorithm for selection of most informative features which can be used for cancer diagnosis by gene expression profiles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

We have used public datasets for our investigation. Hereby, for easy access to the data, they are uploaded to GitHub and can be accessed by following link.

https://github.com/Mehrdadvatankhah/Microarray-Dataset.

Code availability

Our code is developed in MATLAB platform and can be accessed by following link.

https://github.com/Mehrdadvatankhah/Self-regularized-Lasso.

References

  1. Algamal ZY, Lee MH (2015) Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert Syst Appl 42:9326–9332

    Article  Google Scholar 

  2. Alshalalfah M, Alhajj R (2009) Cancer class prediction: two stage clusteringapproach to identify informative genes. Intell Data Anal 13:671–686

    Article  Google Scholar 

  3. Anastasis Kratsios CH (2021) A Meta-algorithm for Universal UAP-Invariant feature representation. J Mach Learn Res 22:1–51

    MathSciNet  Google Scholar 

  4. Bergadano F, Raedt L (1994) Estimating attributes: analysis and extensions of RELIEF. Springer-Verlag, Berlin

  5. Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H. Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterp Inf Syst 13(3):329–351. https://doi.org/10.1080/17517575.2018.1557256

  6. Bolón-Canedo V, Alonso-Betanzos A (2019) Ensembles for feature selection: a review and future trends. Inform Fusion 52:1–12

    Article  Google Scholar 

  7. Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–193

  8. Chen X-w, Wasikowski M (2008) A roc-based feature selection metric for small samples and imbalanced data classification problems. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 124–132

  9. Ding C, Peng H (2005). Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–205

  10. Drummond C, Holte RC (11 2006) An improved method for visualizing classifier performance. Mach Learn 65(1):95–130

  11. Fu G, Wang P (2014) LASSO-type variable selection methods for high-dimensional data. Appl Mech Mater 444–445:604–609

  12. Golub T et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

  13. Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods Applied on microarray data. Adv Bioinform. https://doi.org/10.1155/2015/198363

  14. Hsu N-J, Hung H-L, Chang Y-M (2008) Subset selection for vector autoregressive processes using Lasso. ScienceDirect 52(7):3645–3657

  15. Huang H-H, Liu X-Y, Liang Y (2016) Feature selection and cancer classification via sparse logistic regression with the hybrid L1/2 + 2 regularization. PLoS ONE 11(5):e0149675

  16. Liu H (2010) Feature Selection. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer US, Boston, pp 402–406

  17. Huang S, Huang M, Zhang Y, Chen J, Bhatti U (2020) Medical image segmentation using deep learning with feature enhancement. IET Image Proc 14:3324–3332. https://doi.org/10.1049/iet-ipr.2019.0772

    Article  Google Scholar 

  18. Hussain Shah S, Iqbal MJ, Ahmad I, Khan S, Rodrigues JJPC (2020) Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. Neural Comput Appl:1433–3058

  19. Ijaz MF, Attique M, Son Y (2020) Data-driven cervical cancer prediction model with outlier detection and over-sampling methods. Sensors: 20(10):2809, [Online]. Available: https://www.mdpi.com/1424-8220/20/10/2809

  20. Jiang L, Greenwood CMT, Yao W, Li L (2020) Bayesian Hyper-LASSO classification for feature selection with application to Endometrial Cancer RNA-seq data. Sci Rep 10(1):9747. https://doi.org/10.1038/s41598-020-66466-z

    Article  Google Scholar 

  21. Jolliffe I (2005) Principal component analysis. Wiley Online Library

  22. Kang C, Huo Y, Xin L, Tian B, Yu B (2018) Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine. J Theor Biol 463:77–91

  23. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324

    Article  Google Scholar 

  24. Mandal M, Singh PK, Ijaz MF, Shafi J, Sarkar R (2021) A tri-stage wrapper-filter feature selection framework for disease classification. Sensors 21(16):5571, [Online]. Available: https://www.mdpi.com/1424-8220/21/16/5571

  25. Momenzadeh M, Sehhati M, Rabbani H (2019) A novel feature selection method for microarray data classification based on hidden Markov model. J Biomed Inform. https://doi.org/10.1016/j.jbi.2019.103213

  26. Momenzadeh M, Sehhati M, Rabbani H (2020) Using hidden Markov model to predict recurrence of breast cancer based on sequential patterns in gene expression profiles. J Biomed Inform 111:1–9

    Article  Google Scholar 

  27. Mundra PA, Rajapakse JC (2010) SVM-RFE with MRMR filter for gene selection. IEEE Trans Nanobiosci 9(1):1–37

    Article  Google Scholar 

  28. Navin Lal T, Chapelle O, Weston J, Elisseeff A (2006) Embedded methods. Springer-Verlag, Berlin

  29. Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) A novel aggregate gene selection method for microarray data classification. Pattern Recognit Lett:16–23. https://doi.org/10.1016/j.patrec.2015.03.018

  30. Rohini R, Muthukrishnan R (2016) LASSO: a feature selection technique in predictive modeling for machine learning. In: IEEE International Conference on Advances in Computer Applications

  31. Roweis ST, Saul LK (12 2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

  32. Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517

  33. Sanchez-Marono N, Alonso-Betanzos A, Tombilla-Sanroman M (2007) Filter methods for feature selection–a comparative study. Intelligent Data Engineering and Automated Learning, pp 178–187

  34. Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):0–209

    Article  Google Scholar 

  35. Shipp MA et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74

    Article  Google Scholar 

  36. Srinivasu PN, SivaSai JG, Ijaz MF, Bhoi AK, Kim W, Kang JJ (2021) Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 21(8):2852, [Online]. Available: https://www.mdpi.com/1424-8220/21/8/2852

  37. Tibshirani GJDWTHR (2013) An introduction to statistical learning. Springer, Berlin

  38. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodological) 58(1):267–288

    MathSciNet  Google Scholar 

  39. Tibshirani R (1997) The Lasso method for variable selection in the cox model. Stat Med 16(4):385–395

    Article  Google Scholar 

  40. Ulisses ERD, Braga-Neto M (2004) Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3):374–380. https://doi.org/10.1093/bioinformatics/btg419

  41. van ’t Veer LJ et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536

    Article  Google Scholar 

  42. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83

    Article  Google Scholar 

  43. Wong T-T (2015) Performance evaluation of classification algorithms by k-fold and leave-one-outcross validation. Pattern Recognit:2839–2846. https://doi.org/10.1016/j.patcog.2015.03.009

  44. Zeeshan Z et al (2021) Feature-based multi-criteria recommendation system using a weighted approach with ranking correlation. Intell Data Anal 25:1013–1029. https://doi.org/10.3233/IDA-205388

    Article  Google Scholar 

  45. Zeebaree DQ, Haron H, Abdulazeez AM (2018) Gene selection and classification of microarray data using convolutional neural network. In: International Conference on Advanced Science and Engineering, Kurdistan Region

Download references

Author information

Authors and Affiliations

Authors

Contributions

Mehrdad Vatankhah, as the first author, has done Implementation of the computer code, and supporting algorithms, Writing, Initial draft preparation.

Mohammadreza Momenzadeh, as the corresponding author, has done the Project administration, Writing, Reviewing and Editing, Data curation, and Conceptualization.

Corresponding author

Correspondence to Mohammadreza Momenzadeh.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

We confirm that this paper is the results of the original work done by Mehrdad Vatankhah and MohamadReza Momenzadeh, and there is no other authors or co-workers.

Consent for publication

We confirm that this paper contains the results of the original work done by us and has never been submitted to other journals or conferences.

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vatankhah, M., Momenzadeh, M. Self-regularized Lasso for selection of most informative features in microarray cancer classification. Multimed Tools Appl 83, 5955–5970 (2024). https://doi.org/10.1007/s11042-023-15207-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15207-1

Keywords

Navigation