[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1014052.1014149acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Redundancy based feature selection for microarray data

Published: 22 August 2004 Publication History

Abstract

In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.

References

[1]
A. Alizadeh and et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:503--511, 2000.
[2]
U. Alon and et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA, 96:6745--6750, 1999.
[3]
A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245--271, 1997.
[4]
M. Dash and H. Liu. Feature selection for classification. Intelligent Data Analysis: An International Journal, 1(3):131--156, 1997.
[5]
C. Ding and H. Peng. Minimum redundancy feature selection from microarray gene expression data. In Proceedings of the Computational Systems Bioinformatics Conference, pages 523--529, 2003.
[6]
E. R. Dougherty. Small sample issue for microarray-based classification. Comparative and Functional Genomics, 2:28--34, 2001.
[7]
T. R. Golub and et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531--537, 1999.
[8]
M. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of the 17th International Conference on Machine Learning, pages 359--366, 2000.
[9]
D. D. Jensen and P. R. Cohen. Multiple comparisions in induction algorithms. Machine Learning, 38(3):309--338, 2000.
[10]
D. Jiang, J. Pei, and A. Zhang. Interactive exploration of coherent patterns in time-series gene expression data. In Proceedings of the 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 565--570, 2003.
[11]
G. John, R. Kohavi, and K. Pfleger. Irrelevant feature and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning, pages 121--129, 1994.
[12]
R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273--324, 1997.
[13]
D. Koller and M. Sahami. Toward optimal feature selection. In Proceedings of the 13th International Conference on Machine Learning, pages 284--292, 1996.
[14]
H. Liu, F. Hussain, C. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4):393--423, 2002.
[15]
H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic Publishers, 1998.
[16]
F. Model, P. Adorjan, A. Olek, and C. Piepenbrock. Feature selection for DNA methylation based cancer classification. Bioinformatics, 17:157--164, 2001.
[17]
J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[18]
M. Robnik-Sikonja and I. Kononenko. Theoretical and empirical analysis of Relief and ReliefF. Machine Learning, 53:23--69, 2003.
[19]
M. Schena, D. Shalon, R. W. Davis, and P. O. Brown. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270:467--470, 1995.
[20]
C. Tang, A. Zhang, and J. Pei. Mining phenotypes and informative genes from gene expression data. In Proceedings of the 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 655--660, 2003.
[21]
I. Witten and E. Frank. Data Mining - Pracitcal Machine Learning Tools and Techniques with JAVA Implementations. Morgan Kaufmann Publishers, 2000.
[22]
E. Xing, M. Jordan, and R. Karp. Feature selection for high-dimensional genomic microarray data. In Proceedings of the 18th International Conference on Machine Learning, pages 601--608, 2001.
[23]
M. Xiong, Z. Fang, and J. Zhao. Biomarker identification by feature wrappers. Genome Research, 11:1878--1887, 2001.
[24]
L. Yu and H. Liu. Feature selection for high-dimensional data: a fast correlation-based filter solution. In Proc. of the 20th International Conference on Machine Learning, pages 856--863, 2003.

Cited By

View all
  • (2024)Surrogate-Assisted and Filter-Based Multiobjective Evolutionary Feature Selection for Deep LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.323462935:7(9591-9605)Online publication date: Jul-2024
  • (2024)Human local field potentials in motor and non-motor brain areas encode upcoming movement directionCommunications Biology10.1038/s42003-024-06151-37:1Online publication date: 27-Apr-2024
  • (2024)Filter unsupervised spectral feature selection method for mixed data based on a new feature correlation measureNeurocomputing10.1016/j.neucom.2023.127111571(127111)Online publication date: Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2004
874 pages
ISBN:1581138881
DOI:10.1145/1014052
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature redundancy
  2. gene selection
  3. microarray data

Qualifiers

  • Article

Conference

KDD04

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)7
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Surrogate-Assisted and Filter-Based Multiobjective Evolutionary Feature Selection for Deep LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.323462935:7(9591-9605)Online publication date: Jul-2024
  • (2024)Human local field potentials in motor and non-motor brain areas encode upcoming movement directionCommunications Biology10.1038/s42003-024-06151-37:1Online publication date: 27-Apr-2024
  • (2024)Filter unsupervised spectral feature selection method for mixed data based on a new feature correlation measureNeurocomputing10.1016/j.neucom.2023.127111571(127111)Online publication date: Feb-2024
  • (2024)A taxonomy of unsupervised feature selection methods including their pros, cons, and challengesThe Journal of Supercomputing10.1007/s11227-024-06368-380:16(24212-24240)Online publication date: 1-Nov-2024
  • (2024)Three-phases hybrid feature selection for facial expression recognitionThe Journal of Supercomputing10.1007/s11227-023-05758-380:6(8094-8128)Online publication date: 1-Apr-2024
  • (2024)Feature subset selection algorithm based on symmetric uncertainty and interaction factorMultimedia Tools and Applications10.1007/s11042-023-15821-z83:4(11247-11260)Online publication date: 1-Jan-2024
  • (2024)Feature selection techniques for machine learning: a survey of more than two decades of researchKnowledge and Information Systems10.1007/s10115-023-02010-566:3(1575-1637)Online publication date: 1-Mar-2024
  • (2024)The role of sucrose in maintaining pollen viability and germinability in Corylus avellana L.: a possible strategy to cope with climate variabilityProtoplasma10.1007/s00709-024-02015-zOnline publication date: 11-Dec-2024
  • (2024)Enhancing age-related postural sway classification using partial least squares-discriminant analysis and hybrid feature setNeural Computing and Applications10.1007/s00521-024-09557-636:10(5621-5643)Online publication date: 1-Apr-2024
  • (2023)A Comprehensive Review of Feature Selection and Feature Selection Stability in Machine LearningGazi University Journal of Science10.35378/gujs.99376336:4(1506-1520)Online publication date: 1-Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media