Abstract
This paper introduces a new combined filter-wrapper gene subset selection approach where a Genetic Algorithm (GA) is combined with Linear Discriminant Analysis (LDA). This LDA-based GA algorithm has the major characteristic that the GA uses not only a LDA classifier in its fitness function, but also LDA’s discriminant coefficients in its dedicated crossover and mutation operators. This paper studies the effect of these informed operators on the evolutionary process. The proposed algorithm is assessed on a several well-known datasets from the literature and compared with recent state of art algorithms. The results obtained show that our filter-wrapper approach obtains globally high classification accuracies with very small number of genes to those obtained by other methods.
Chapter PDF
Similar content being viewed by others
Keywords
References
Alizadeh, A., Eisen, M.B., et al.: Distinct types of diffuse large (b)-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Alon, U., Barkai, N., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)
Ben-Dor, A., Bruhn, L., et al.: Tissue classification with gene expression profiles. Journal of Computational Biology 7(3-4), 559–583 (2000)
Bonilla-Huerta, E., Duval, B., Hao, J.-K., et al.: A hybrid GA/SVM approach for gene selection and classification of microarray data. In: Rothlauf, F., Branke, J., Cagnoni, S., Costa, E., Cotta, C., Drechsler, R., Lutton, E., Machado, P., Moore, J.H., Romero, J., Smith, G.D., Squillero, G., Takagi, H. (eds.) EvoWorkshops 2006. LNCS, vol. 3907, pp. 34–44. Springer, Heidelberg (2006)
Bonilla-Huerta, E., Duval, B., Hao, J.-K., et al.: Gene selection for microarray by a LDA-based genetic algorithms. In: Chetty, M., Ngom, A., Ahmad, S. (eds.) PRIB 2008. LNCS (LNBI), vol. 5265, pp. 250–261. Springer, Heidelberg (2008)
Golub, T., Slonim, D., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. JASA 97, 77–87 (2002)
Cai, R., Hao, Z., Yang, X., Wen, W.: An efficient gene selection algorithm based on mutual information. Neurocomputing 26(3), 243–250 (2008)
Liao, C., Li, S., Luo, Z.: Gene selection for cancer classification using Wilcoxon Rank Sum Test and Support Vector Machine. In: International Conference on Computation Intelligence and Security, pp. 368–373 (2006)
Ye, J., Li, T., et al.: Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Trans. Comput. Biology Bioinform. 1(4), 181–190 (2004)
Yue, F., Wang, K., Zuo, W.: Informative gene selection and tumor classification by null space lda for Microarray data. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, pp. 435–446. Springer, Heidelberg (2007)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. JMLR 3, 1157–1182 (2003)
Furey, T.S., Cristianini, N., et al.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Li, L., Weinberg, C.R., et al.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)
Jourdan, L.: Metaheuristics for knowledge discovery: Application to genetic data, PhD thesis, University of Lille (2003) (in French)
Peng, S., Xu, Q., et al.: Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Letter 555(2), 358–362 (2003)
Reddy, A.R., Deb, K.: Classification of two-class cancer data reliably using evolutionary algorithms, Technical Report. KanGAL (2003)
Guyon, I., Weston, J., et al.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)
Saeys, Y., Aeyels, S., et al.: Feature selection for splice site prediction: A new method using eda-based feature ranking. BMC Bioinformatics, 5–64 (2004)
Goh, L., Song, Q., Kasabov, N.: A novel feature selection method to improve classification of gene expression data. In: Proc. of the 2nd Asia-Pacific Conference on Bioinformatics, ACS, Darlinghurst, Australia, pp. 161–166 (2004)
Hall, M., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Eng. 15(6), 1437–1447 (2003)
Gordon, G.J., Jensen, R.V., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 17(62), 4963–4967 (2002)
Singh, D., Febbo, P., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)
Piqué-Regí, R., Ortega, A., Asgharzadeh, S.: Sequential diagonal linear discriminant analysis (SeqDLDA) for microarray classification and gene identification. Computational Systems and Bioinformatics (2005)
Pomeroy, S.L., Tamayo, P., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)
Petricoin, E.F., Ardekani, A.M., et al.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)
Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic pattern. Genomic Informatics 13, 51–60 (2002)
Tan, F., Fu, X., et al.: Improving Feature Subset Selection Using a Genetic Algorithm for Microarray Gene Expression Data. In: CEC-IEEE, pp. 2529–2534 (2006)
Ding, C., Peng, H.: Minimum redundancy feature selection from Microarray gene expression data. Bioinformatics and Computational. Biology 3(2), 185–206 (2005)
Cho, S.B., Won, H.H.: Cancer classification using ensemble of neural networks with multiple significant gene subsets. Applied Intelligence 26(3), 243–250 (2007)
Yang, W.H., Dai, D.Q., Yan, H.: Generalized discriminant analysis for tumor classification with gene expression data. Machine Learning and Cybernetics 1, 4322–4327 (2006)
Yang, P., et al.: A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data. BMC Bioinformatics 11(suppl. 1), S6 (2010)
Peng, Y., Li, W., Liu, Y.: A hybrid approach for biomarker discovery from Microarray gene expression data. Cancer Informatics 2, 301–311 (2006)
Wang, Z., Palade, V., Xu, Y.: Neuro-fuzzy ensemble approach for Microarray cancer gene expression data analysis. In: Proc. E. Fuzzy Systems, pp. 241–246 (2006)
Pang, S., Havukkala, I., et al.: Classification consistency analysis for bootstrapping gene selection. Neural Computing and Applications 16, 527–539 (2007)
Li, G.Z., Zeng, X.Q., et al.: Partial least squares based dimension reduction with gene selection for tumor classification. In: BIBE-IEEE, pp. 1439–1444 (2007)
Zhang, L., Li, Z., Chen, H.: An effective gene selection method based on relevance analysis and discernibility matrix. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1088–1095. Springer, Heidelberg (2007)
Li, S., Wu, X., Hu, X.: Gene selection using genetic algorithm and support vectors machines. Soft Computing 12(7), 693–698 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bonilla Huerta, E., Hernández Hernández, J.C., Hernández Montiel, L.A. (2010). A New Combined Filter-Wrapper Framework for Gene Subset Selection with Specialized Genetic Operators. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Kittler, J. (eds) Advances in Pattern Recognition. MCPR 2010. Lecture Notes in Computer Science, vol 6256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15992-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-15992-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15991-6
Online ISBN: 978-3-642-15992-3
eBook Packages: Computer ScienceComputer Science (R0)