Abstract
DNA microarray technology can monitor thousands of genes in a single experiment. One important application of this high-throughput gene expression data is to classify samples into known categories. Since the number of gene often exceeds the number of samples, classical classification methods do not work well under this circumstance. Furthermore, there are many irrelevant and redundant genes which will decrease classification accuracy, thus a gene selection process is necessary. More accurate classification result using these selected genes is expected. A novel informative gene selection and sample classification method for gene expression data is proposed in this paper. This method is based on Linear Discriminant Analysis (LDA) in the regular space and the null space of within-class scatter matrix. By recursively filtering genes which have smaller coefficient in the optimal projection basis vectors, the remaining genes are more and more informative. The results of experiments on leukemia dataset and the colon dataset show that genes in this subset have much less correlations and more discriminative power compared to those selected by classical methods.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Michael, B.E., Paul, T.S., Patrick, O.B., David, B.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)
Laura, J.V., Hongyue, D., Marc, J.V., Yudong, D.H., Augustinus, A.M., Mao, M., Hans, L.P., Karin, K., Matthew, J.M., Anke, T.W., George, J.S., Ron, M.K., Chris, R., Peter, S.L., Rene, B., Stephen, H.F.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., BloomTeld, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Douglas, T.R., Uwe, S., Michael, B.E., Charles, M.P., Christian, R., Paul, S., Vishwanath, I., Stefanie, S.J., Matt, V.R., Mark, W., Alexander, P., Jeffrey, C.F., Deval, L., Dari, S., Timothy, G.M., John, N.W., David, B., Patrick, O.B.: Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24, 227–235 (2000)
Danh, V.N., David, M.R.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18, 39–50 (2002)
Antoniadis, S., Lambert, L., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19, 563–570 (2003)
Sun, M., Xiong, M.: A mathematical programming approach for gene selection and tissue classification. Bioinformatics 19, 1243–1251 (2003)
Guan, Z., Zhao, H.: A semiparametric approach for marker gene selection based on gene expression data. Bioinformatics 21, 529–536 (2005)
Roberto, R., José, C.R., Jesús, S.A.: Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognition (in press)
Dudoit, S., Fridlyand, J., Terence, P.S.: Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Journal of the American Statistical Association 97, 77–87 (2002)
Tao, L., Zhang, C., Mitsunori, O.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429–2437 (2004)
Statnikov, A., Constantin, F.A., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21, 631–643 (2005)
Inza, I., Larranaga, P., Blanco, R., Cerrolaza, A.J.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence in Medicine 31, 91–103 (2004)
Li, F., Yang, Y.: Analysis of recursive gene selection approaches from microarray data. Bioinformatics 21, 3741–3747 (2005)
West, M., Blanchette, C., Dressman, H., Huang, F., Ishida, S., Spang, R., Zuzan, H., Olason, J., Marks, I., Nevins, J.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA 98, 11462–11467 (2001)
Fisher, R.A.: The Use of Multiple Measures in Taxonomic Problems. Ann. Eugenics 7, 179–188 (1936)
Chen, L.F., Liao, H.Y., Ko, M.T., Lin, J.C., Yu, G.J.: A New LDA-Based Face Recognition System Which Can Solve the Small Sample Size Problem. Pattern Recognition 33, 1713–1726 (2000)
Yu, H., Yang, J.: A Direct LDA Algorithm for High-Dimensional Data with Application to Face Recognition. Pattern Recognition 34, 2067–2070 (2001)
Huang, R., Liu, Q., Lu, H., Ma, S.: Solving the Small Size Problem of LDA. Proc. 16th Int’l Conf. Pattern Recognition 3, 29–32 (2002)
Hakan, C., Marian, N., Mitch, W., Atalay, B.: Discriminative Common Vectors for Face Recognition. IEEE Trans. PAMI 27, 4–13 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yue, F., Wang, K., Zuo, W. (2007). Informative Gene Selection and Tumor Classification by Null Space LDA for Microarray Data. In: Chen, B., Paterson, M., Zhang, G. (eds) Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. ESCAPE 2007. Lecture Notes in Computer Science, vol 4614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74450-4_39
Download citation
DOI: https://doi.org/10.1007/978-3-540-74450-4_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74449-8
Online ISBN: 978-3-540-74450-4
eBook Packages: Computer ScienceComputer Science (R0)