[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
discussion

Evolving Feature Selection

Published: 01 November 2005 Publication History

Abstract

Feature selection is a preprocessing technique, commonly used on high-dimensional data, that studies how to select a subset or list of attributes or variables that are used to construct models describing data. Wide data sets, which have a huge number of features but relatively few instances, introduce a novel challenge to feature selection. This installment of Trends & Controversies looks at several different ways of meeting this challenge.This department is part of a special issue on Data Mining in Bioinformatics.

References

[1]
J. Chen et al., "Grand Challenges for Multimodal Bio-Medical Systems," IEEE Circuits and Systems, vol. 5, no. 2, 2005, pp. 46–52.
[2]
E.R. Dougherty and A. Datta, "Genomic Signal Processing: Diagnosis and Therapy," IEEE Signal Processing, vol. 22, no. 1, 2005, pp. 107–112.
[3]
C. Sima et al., "Impact of Error Estimation on Feature-Selection Algorithms," Pattern Recognition, vol. 38, no. 12, 2005, pp. 2472–2482.
[4]
J. Hua et al., "Optimal Number of Features as a Function of Sample Size for Various Classification Rules," Bioinformatics, vol. 21, no. 8, 2005, pp. 1509–1515.
[5]
T. Cover and J. Van Campenhout, "On the Possible Orderings in the Measurement Selection Problem," IEEE Trans. Systems, Man, and Cybernetics, vol. 7, no. 9, 1977, pp. 657–661.
[6]
J.G. Dy and C.E. Brodley, "Feature Selection for Unsupervised Learning," J. Machine Learning Research, vol. 5, Aug. 2004, pp. 845–889.
[7]
M. Dash et al., "Feature Selection for Clustering— A Filter Solution," Proc. 2002 IEEE Int'l Conf. Data Mining (ICDM 2002), IEEE Press, 2002, pp. 115–122.
[8]
R. Agrawal et al., "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications," Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 1998, pp. 94–105.
[9]
L. Parson, E. Haque, and H. Liu, "Subspace Clustering for High Dimensional Data: A Review," SIGKDD Explorations, vol. 6, no. 1, 2004, pp. 90–105.
[10]
M.H. Law, M. Figueiredo, and A.K. Jain, "Feature Selection in Mixture-Based Clustering," Advances in Neural Information Processing Systems 15, MIT Press, 2003, pp. 609–616.
[11]
I.S. Dhillon, S. Mallela, and D.S. Modha, "Information-Theoretic Co-Clustering," Proc. 9th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD 03), ACM Press, 2003, pp. 89–98.
[12]
J.G. Dy and C.E. Brodley, "Interactive Visualization and Feature Selection for Unsupervised Data," Proc. 6th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD 00), ACM Press, 2000, pp. 360–364.
[13]
L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, 1996, pp. 123–140.
[14]
L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, 2001, pp. 5–32.
[15]
M. Hall, "Correlation-Based Feature Selection for Machine Learning," PhD thesis, Dept. of Computer Science, Waikato Univ., 1998.
[16]
I. Guyon et al., "Gene Selection for Cancer Classification using Support Vector Machines," Machine Learning, vol. 46, nos. 1–3, 2002, pp. 389–422.
[17]
J.H. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, tech. report, Dept. of Statistics, Stanford Univ., 1999.
[18]
C. Ding and H.C. Peng, "Minimum Redundancy Feature Selection from Microarray Gene Expression Data," Proc. IEEE Computer Soc. Bioinformatics Conf. (CSB 03), IEEE CS Press, 2003, pp. 523–528.
[19]
H.C. Peng, F.H. Long, and C. Ding, "Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, 2005, pp. 1226–1238.
[20]
E. Herskovits, H.C. Peng, and C. Davatzikos, "A Bayesian Morphometry Algorithm," IEEE Trans. Medical Imaging, vol. 23, no. 6, 2004, pp. 723–737.
[21]
R. Kohavi and G. John, "Wrappers for Feature Subset Selection," Artificial Intelligence, vol. 97, nos. 1–2, 1997, pp. 273–324.
[22]
J. Jaeger, R. Sengupta, and W.L. Ruzzo, "Improved Gene Selection for Classification of Microarrays," Proc. 8th Pacific Symp. Biocomputing (PSB 03), World Scientific, 2003, pp. 53–64.
[23]
L. Yu and H. Liu, "Efficiently Handling Feature Redundancy in High-Dimensional Data," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD 03), ACM Press, 2003, pp. 685–690.
[24]
G. Piatetsky-Shapiro and P. Tamayo, "Microarray Data Mining: Facing the Challenges," SIGKDD Explorations Newsletter, vol. 5, no. 2, 2003, pp. 1–5.
[25]
H. Liu and L. Yu, "Toward Integrating Feature Selection Algorithms for Classification and Clustering," IEEE Trans. Knowledge and Data Eng ., vol. 17, no. 3, 2005, pp. 1–12.
[26]
J.L. Rennert et al., "Supervised Pattern Recognition Identifies Principal Components of Differential Gene Expression Related to Brain Tumor Migration," Proc. Oncogenomics 2005: Dissecting Cancer Through Genome Research, Am. Assoc. for Cancer Research, 2005, p. B38.
[27]
L. Yu et al., Exploiting Statistical Redundancy in Expression Microarray Data to Foster Biological Relevancy, tech. report TR-05-005, Computer Science and Eng. Dept., Arizona State Univ., 2005.
[28]
L. Yu and H. Liu, "Redundancy Based Feature Selection for Microarray Data," Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD 04), 2004, pp. 737–742.
[29]
H. Liu and L. Yu, "Toward Integrating Feature Selection Algorithms for Classification and Clustering," IEEE Trans. Knowledge and Data Eng ., vol. 17, no. 4, 2005, pp. 491–502.
[30]
G. Forman, "A Pitfall and Solution in Multi-Class Feature Selection for Text Classification," Proc. 21st Int'l Conf. Machine Learning (ICML 04), ACM Press, 2004, p. 38.
[31]
X. Yin and J. Han, "Efficient Classification from Multiple Heterogeneous Databases," Proc. 9th European Conf. Principles and Practices of Knowledge Discovery in Databases (PKDD 05), 2005, pp. 404–416.
[32]
S. Veeramachaneni et al., "Active Sampling for Knowledge Discovery from Biomedical Data," Proc. 9th European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD 05), 2005, pp. 343–354.
[33]
G. Forman, "A Pitfall and Solution in Multi-Class Feature Selection for Text Classification," Proc. 21st Int'l Conf. Machine Learning (ICML 04), ACM Press, 2004, p. 38.

Cited By

View all
  • (2022)FootApp: An AI-powered system for football match annotationMultimedia Tools and Applications10.1007/s11042-022-13359-082:4(5547-5567)Online publication date: 4-Jul-2022
  • (2018)A Tunable Particle Swarm Size Optimization Algorithm for Feature Selection2018 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC.2018.8477694(1-7)Online publication date: 8-Jul-2018
  • (2017)Unsupervised Feature Selection with Joint Clustering AnalysisProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132999(1639-1648)Online publication date: 6-Nov-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Intelligent Systems
IEEE Intelligent Systems  Volume 20, Issue 6
November 2005
92 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 November 2005

Author Tags

  1. bioinformatics
  2. classification
  3. clustering
  4. data mining
  5. feature selection
  6. text mining

Qualifiers

  • Discussion

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)FootApp: An AI-powered system for football match annotationMultimedia Tools and Applications10.1007/s11042-022-13359-082:4(5547-5567)Online publication date: 4-Jul-2022
  • (2018)A Tunable Particle Swarm Size Optimization Algorithm for Feature Selection2018 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC.2018.8477694(1-7)Online publication date: 8-Jul-2018
  • (2017)Unsupervised Feature Selection with Joint Clustering AnalysisProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132999(1639-1648)Online publication date: 6-Nov-2017
  • (2017)Improved multiclass feature selection via list combinationExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.06.04388:C(205-216)Online publication date: 1-Dec-2017
  • (2017)A framework for increasing the value of predictive data-driven models by enriching problem domain characterization with novel featuresNeural Computing and Applications10.1007/s00521-015-2157-828:6(1515-1523)Online publication date: 1-Jun-2017
  • (2017)A statistical approach to combining multisource information in one-class classifiersStatistical Analysis and Data Mining10.1002/sam.1134210:4(199-210)Online publication date: 1-Aug-2017
  • (2016)Multi-Step Iterative Algorithm for Feature Selection on Dynamic DocumentsInternational Journal of Information Retrieval Research10.4018/IJIRR.20160401026:2(24-40)Online publication date: 1-Apr-2016
  • (2016)Supervised, Unsupervised, and Semi-Supervised Feature SelectionIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2015.247845413:5(971-989)Online publication date: 1-Sep-2016
  • (2016)Employing Speeded Scaled Conjugate Gradient Algorithm for Multiple Contiguous Feature Vector FramesProcedia Computer Science10.1016/j.procs.2016.02.04778:C(740-747)Online publication date: 1-Mar-2016
  • (2016)Time Series Forecasting on Engineering Systems Using Recurrent Neural NetworksAdvanced Data Mining and Applications10.1007/978-3-319-49586-6_31(459-471)Online publication date: 12-Dec-2016
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media