More Web Proxy on the site http://driver.im/

discussion

Evolving Feature Selection

Authors:

Edward R. Dougherty,

Jennifer G. Dy,

Michael Berens,

George FormanAuthors Info & Claims

IEEE Intelligent Systems, Volume 20, Issue 6

Pages 64 - 76

https://doi.org/10.1109/MIS.2005.105

Published: 01 November 2005 Publication History

Abstract

Feature selection is a preprocessing technique, commonly used on high-dimensional data, that studies how to select a subset or list of attributes or variables that are used to construct models describing data. Wide data sets, which have a huge number of features but relatively few instances, introduce a novel challenge to feature selection. This installment of Trends & Controversies looks at several different ways of meeting this challenge.This department is part of a special issue on Data Mining in Bioinformatics.

References

[1]

J. Chen et al., "Grand Challenges for Multimodal Bio-Medical Systems," IEEE Circuits and Systems, vol. 5, no. 2, 2005, pp. 46–52.

[2]

E.R. Dougherty and A. Datta, "Genomic Signal Processing: Diagnosis and Therapy," IEEE Signal Processing, vol. 22, no. 1, 2005, pp. 107–112.

[3]

C. Sima et al., "Impact of Error Estimation on Feature-Selection Algorithms," Pattern Recognition, vol. 38, no. 12, 2005, pp. 2472–2482.

Digital Library

[4]

J. Hua et al., "Optimal Number of Features as a Function of Sample Size for Various Classification Rules," Bioinformatics, vol. 21, no. 8, 2005, pp. 1509–1515.

Digital Library

[5]

T. Cover and J. Van Campenhout, "On the Possible Orderings in the Measurement Selection Problem," IEEE Trans. Systems, Man, and Cybernetics, vol. 7, no. 9, 1977, pp. 657–661.

[6]

J.G. Dy and C.E. Brodley, "Feature Selection for Unsupervised Learning," J. Machine Learning Research, vol. 5, Aug. 2004, pp. 845–889.

Digital Library

[7]

M. Dash et al., "Feature Selection for Clustering— A Filter Solution," Proc. 2002 IEEE Int'l Conf. Data Mining (ICDM 2002), IEEE Press, 2002, pp. 115–122.

Digital Library

[8]

R. Agrawal et al., "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications," Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 1998, pp. 94–105.

Digital Library

[9]

L. Parson, E. Haque, and H. Liu, "Subspace Clustering for High Dimensional Data: A Review," SIGKDD Explorations, vol. 6, no. 1, 2004, pp. 90–105.

Digital Library

[10]

M.H. Law, M. Figueiredo, and A.K. Jain, "Feature Selection in Mixture-Based Clustering," Advances in Neural Information Processing Systems 15, MIT Press, 2003, pp. 609–616.

[11]

I.S. Dhillon, S. Mallela, and D.S. Modha, "Information-Theoretic Co-Clustering," Proc. 9th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD 03), ACM Press, 2003, pp. 89–98.

Digital Library

[12]

J.G. Dy and C.E. Brodley, "Interactive Visualization and Feature Selection for Unsupervised Data," Proc. 6th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD 00), ACM Press, 2000, pp. 360–364.

Digital Library

[13]

L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, 1996, pp. 123–140.

[14]

L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, 2001, pp. 5–32.

Digital Library

[15]

M. Hall, "Correlation-Based Feature Selection for Machine Learning," PhD thesis, Dept. of Computer Science, Waikato Univ., 1998.

[16]

I. Guyon et al., "Gene Selection for Cancer Classification using Support Vector Machines," Machine Learning, vol. 46, nos. 1–3, 2002, pp. 389–422.

Digital Library

[17]

J.H. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, tech. report, Dept. of Statistics, Stanford Univ., 1999.

[18]

C. Ding and H.C. Peng, "Minimum Redundancy Feature Selection from Microarray Gene Expression Data," Proc. IEEE Computer Soc. Bioinformatics Conf. (CSB 03), IEEE CS Press, 2003, pp. 523–528.

Digital Library

[19]

H.C. Peng, F.H. Long, and C. Ding, "Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, 2005, pp. 1226–1238.

Digital Library

[20]

E. Herskovits, H.C. Peng, and C. Davatzikos, "A Bayesian Morphometry Algorithm," IEEE Trans. Medical Imaging, vol. 23, no. 6, 2004, pp. 723–737.

[21]

R. Kohavi and G. John, "Wrappers for Feature Subset Selection," Artificial Intelligence, vol. 97, nos. 1–2, 1997, pp. 273–324.

Digital Library

[22]

J. Jaeger, R. Sengupta, and W.L. Ruzzo, "Improved Gene Selection for Classification of Microarrays," Proc. 8th Pacific Symp. Biocomputing (PSB 03), World Scientific, 2003, pp. 53–64.

[23]

L. Yu and H. Liu, "Efficiently Handling Feature Redundancy in High-Dimensional Data," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD 03), ACM Press, 2003, pp. 685–690.

Digital Library

[24]

G. Piatetsky-Shapiro and P. Tamayo, "Microarray Data Mining: Facing the Challenges," SIGKDD Explorations Newsletter, vol. 5, no. 2, 2003, pp. 1–5.

Digital Library

[25]

H. Liu and L. Yu, "Toward Integrating Feature Selection Algorithms for Classification and Clustering," IEEE Trans. Knowledge and Data Eng ., vol. 17, no. 3, 2005, pp. 1–12.

Digital Library

[26]

J.L. Rennert et al., "Supervised Pattern Recognition Identifies Principal Components of Differential Gene Expression Related to Brain Tumor Migration," Proc. Oncogenomics 2005: Dissecting Cancer Through Genome Research, Am. Assoc. for Cancer Research, 2005, p. B38.

[27]

L. Yu et al., Exploiting Statistical Redundancy in Expression Microarray Data to Foster Biological Relevancy, tech. report TR-05-005, Computer Science and Eng. Dept., Arizona State Univ., 2005.

[28]

L. Yu and H. Liu, "Redundancy Based Feature Selection for Microarray Data," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD 04), 2004, pp. 737–742.

Digital Library

[29]

H. Liu and L. Yu, "Toward Integrating Feature Selection Algorithms for Classification and Clustering," IEEE Trans. Knowledge and Data Eng ., vol. 17, no. 4, 2005, pp. 491–502.

Digital Library

[30]

G. Forman, "A Pitfall and Solution in Multi-Class Feature Selection for Text Classification," Proc. 21st Int'l Conf. Machine Learning (ICML 04), ACM Press, 2004, p. 38.

Digital Library

[31]

X. Yin and J. Han, "Efficient Classification from Multiple Heterogeneous Databases," Proc. 9th European Conf. Principles and Practices of Knowledge Discovery in Databases (PKDD 05), 2005, pp. 404–416.

Digital Library

[32]

S. Veeramachaneni et al., "Active Sampling for Knowledge Discovery from Biomedical Data," Proc. 9th European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD 05), 2005, pp. 343–354.

Digital Library

[33]

G. Forman, "A Pitfall and Solution in Multi-Class Feature Selection for Text Classification," Proc. 21st Int'l Conf. Machine Learning (ICML 04), ACM Press, 2004, p. 38.

Digital Library

Cited By

Barra SCarta SGiuliani APisu APodda ARiboni D(2022)FootApp: An AI-powered system for football match annotationMultimedia Tools and Applications10.1007/s11042-022-13359-082:4(5547-5567)Online publication date: 4-Jul-2022
https://dl.acm.org/doi/10.1007/s11042-022-13359-0
Mallenahalli NSarma T(2018)A Tunable Particle Swarm Size Optimization Algorithm for Feature Selection2018 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC.2018.8477694(1-7)Online publication date: 8-Jul-2018
https://dl.acm.org/doi/10.1109/CEC.2018.8477694
An SWang JWei JYang ZLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)Unsupervised Feature Selection with Joint Clustering AnalysisProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132999(1639-1648)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132999
Show More Cited By

Index Terms

Evolving Feature Selection
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
    2. Machine learning algorithms
      1. Feature selection

Recommendations

Is there a grand challenge or X-prize for data mining?
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

This panel will discuss possible exciting and motivating Grand Challenge problems for Data Mining, focusing on bioinformatics, multimedia mining, link mining, text mining, and web mining.
Role of Data Mining Techniques in Bioinformatics

Data mining offers a highly effective technique that is useful in research and development of bioinformatics. Bioinformatics consists biological information such as DNA, RNA, and protein. Data mining tasks/techniques are classification, prediction, ...
What are the grand challenges for data mining?: KDD-2006 panel report

We discuss what makes exciting and motivating Grand Challenge problems for Data Mining, and propose criteria for a good Grand Challenge. We then consider possible GC problems from multimedia mining, link mining, large-scale modeling, text mining, and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Intelligent Systems

IEEE Intelligent Systems Volume 20, Issue 6

November 2005

92 pages

ISSN:1541-1672

Issue’s Table of Contents

Copyright © 2006.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 November 2005

Author Tags

Qualifiers

Discussion

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Barra SCarta SGiuliani APisu APodda ARiboni D(2022)FootApp: An AI-powered system for football match annotationMultimedia Tools and Applications10.1007/s11042-022-13359-082:4(5547-5567)Online publication date: 4-Jul-2022
https://dl.acm.org/doi/10.1007/s11042-022-13359-0
Mallenahalli NSarma T(2018)A Tunable Particle Swarm Size Optimization Algorithm for Feature Selection2018 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC.2018.8477694(1-7)Online publication date: 8-Jul-2018
https://dl.acm.org/doi/10.1109/CEC.2018.8477694
An SWang JWei JYang ZLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)Unsupervised Feature Selection with Joint Clustering AnalysisProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132999(1639-1648)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132999
Izetta JVerdes PGranitto P(2017)Improved multiclass feature selection via list combinationExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.06.04388:C(205-216)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1016/j.eswa.2017.06.043
Moro SCortez PRita P(2017)A framework for increasing the value of predictive data-driven models by enriching problem domain characterization with novel featuresNeural Computing and Applications10.1007/s00521-015-2157-828:6(1515-1523)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1007/s00521-015-2157-8
Simonson KDerek West RHansen RLaBruyere TVan Benthem M(2017)A statistical approach to combining multisource information in one-class classifiersStatistical Analysis and Data Mining10.1002/sam.1134210:4(199-210)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1002/sam.11342
Bafna PShirwaikar SPramod D(2016)Multi-Step Iterative Algorithm for Feature Selection on Dynamic DocumentsInternational Journal of Information Retrieval Research10.4018/IJIRR.20160401026:2(24-40)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.4018/IJIRR.2016040102
Ang JMirzal AHaron HHamed H(2016)Supervised, Unsupervised, and Semi-Supervised Feature SelectionIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2015.247845413:5(971-989)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1109/TCBB.2015.2478454
Borkar PSarode MMalik L(2016)Employing Speeded Scaled Conjugate Gradient Algorithm for Multiple Contiguous Feature Vector FramesProcedia Computer Science10.1016/j.procs.2016.02.04778:C(740-747)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.procs.2016.02.047
Shao DZhang TMannar KHan Y(2016)Time Series Forecasting on Engineering Systems Using Recurrent Neural NetworksAdvanced Data Mining and Applications10.1007/978-3-319-49586-6_31(459-471)Online publication date: 12-Dec-2016
https://dl.acm.org/doi/10.1007/978-3-319-49586-6_31
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents