More Web Proxy on the site http://driver.im/

research-article

On feature selection protocols for very low-sample-size data

Authors:

Ludmila I. Kuncheva,

Juan J. RodríguezAuthors Info & Claims

Volume 81, Issue C

Pages 660 - 673

https://doi.org/10.1016/j.patcog.2018.03.012

Published: 01 September 2018 Publication History

Highlights

•

Feature selection with very few instances, possibly high-dimensional.

•

Widely used protocol: 1) feature selection, 2) cross-validation to test a classifier.

•

Alternative, proper, protocol includes both steps in a single cross-validation loop.

•

Experiment using 24 datasets, 3 feature selection methods and 5 classifier models.

•

The proper protocol accuracy is significantly closer to the true accuracy.

Abstract

High-dimensional data with very few instances are typical in many application domains. Selecting a highly discriminative subset of the original features is often the main interest of the end user. The widely-used feature selection protocol for such type of data consists of two steps. First, features are selected from the data (possibly through cross-validation), and, second, a cross-validation protocol is applied to test a classifier using the selected features. The selected feature set and the testing accuracy are then returned to the user. For the lack of a better option, the same low-sample-size dataset is used in both steps. Questioning the validity of this protocol, we carried out an experiment using 24 high-dimensional datasets, three feature selection methods and five classifier models. We found that the accuracy returned by the above protocol is heavily biased, and therefore propose an alternative protocol which avoids the contamination by including both steps in a single cross-validation loop. Statistical tests verify that the classification accuracy returned by the proper protocol is significantly closer to the true accuracy (estimated from an independent testing set) compared to that returned by the currently favoured protocol.

References

[1]

E.A. Patrick, Fundamentals of Pattern Recognition, Prentice-Hall, Inc., Englewood Cliffs, N.J., 1972.

[2]

P.A. Devijver, J. Kittler, Pattern Recognition: A Statistical Approach, Prentice-Hall, Inc., Englewood Cliffs, N.J., 1982.

[3]

I. Guyon, A. Elisseeff, An introduction to variable and feature selection, J. Mach. Learn. Res. 3 (2003) 1157–1182.

[4]

M. Dash, H. Liu, Feature selection for classification, Intell. Data Anal. 1 (1997) 131–156.

[5]

X. Zhu, Z. Huang, Y. Yang, H. Tao Shen, C. Xu, J. Luo, Self-taught dimensionality reduction on the high-dimensional small-sized data, Pattern Recognit. 46 (1) (2013) 215–229.

[6]

L. Yu, H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution, In Proceedings of the 20th International Conference on Machine Learning (ICML2003), Washington, DC, 2003.

[7]

J. Hua, W.D. Tembe, E.R. Dougherty, Performance of feature-selection methods in the classification of high-dimension data, Pattern Recognit. 42 (3) (2009) 409–424.

Digital Library

[8]

A. Golugula, G. Lee, A. Madabhushi, Evaluating feature selection strategies for high dimensional, small sample size datasets, IEEE International Conference of Engineering in Medicine and Biology Society (EMBS), 2011, pp. 949–952.

[9]

P. Bermejo, L. de la Ossa, J.A. Gámez, J.M. Puerta, Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking, Knowl. Based Syst. 25 (1) (2012) 35–44.

[10]

E.P. Xing, M.I. Jordan, R.M. Karp, Feature selection for high-dimensional genomic microarray data, Proceedings of the 18th International Conference on Machine Learning, (ICML2001), 2001, pp. 601–608.

[11]

Y. Saeys, I.n. Inza, P. Larrañaga, A review of feature selection techniques in bioinformatics, Bioinformatics 23 (19) (2007) 2507–2517.

Digital Library

[12]

G. Brown, A. Pocock, M. Zhao, M. Lujan, Conditional likelihood maximisation: a unifying framework for information theoretic feature selection, J. Mach. Learn. Res. 13 (2012) 27–66.

Digital Library

[13]

I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines, Mach. Learn. 46 (2002) 389–422.

Digital Library

[14]

F. De Martino, G. Valente, N. Staeren, J. Ashburner, R.G. a E. Formisano, Combining multivariate voxel selection and support vector machines for mapping and classification of fmri spatial patterns, NeuroImage 43 (1) (2008) 44–58.

[15]

J. Reunanen, Overfitting in making comparisons between variable selection methods, J. Mach. Learn. Res. 3 (2003) 1371–1382.

[16]

P. Smialowski, D. Frishman, S. Kramer, Pitfalls of supervised feature selection, Bioinformatics 26 (3) (2010) 440–443.

[17]

S. Diciotti, S. Ciulli, M. Mascalchi, M. Giannelli, N. Toschi, The “peeking” effect in supervised feature selection on diffusion tensor imaging data, Am. J. Neuroradiol. (2013).

[18]

F. Pereira, T. Mitchell, M. Botvinick, Machine learning classifiers and fMRI: a tutorial overview, NeuroImage 45 (1, Supplement 1) (2009).

[19]

Y. Li, T. Li, H. Liu, Recent advances in feature selection and its applications, Knowl. Inf. Syst. 53 (3) (2017) 551–577,.

Digital Library

[20]

J. Li, K. Cheng, S. Wang, F. Morstatter, T. Robert, J. Tang, H. Liu, Feature selection: a data perspective, arXiv:1601.07996 (2016).

[21]

F. Viegas, L. Rocha, M. Gonçalves, F. Mourão, G. Sá, T. Salles, G. Andrade, I. Sandin, A genetic programming approach for feature selection in highly dimensional skewed data, Neurocomputing 273 (2018) 554–569,.

Digital Library

[22]

E. Hancer, B. Xue, M. Zhang, D. Karaboga, B. Akay, Pareto front feature selection based on artificial bee colony optimization, Inf. Sci. 422 (2018) 462–479,.

Digital Library

[23]

P.P. Kundu, S. Mitra, Feature selection through message passing, IEEE Trans. Cybern. 47 (12) (2017) 4356–4366.

[24]

S. Solorio-Fernández, J.F. Martínez-Trinidad, J.A. Carrasco-Ochoa, A new unsupervised spectral feature selection method for mixed data : a filter approach, Pattern Recognit. 72 (2017) 314–326,.

Digital Library

[25]

J. Izetta, P.F. Verdes, P.M. Granitto, Improved multiclass feature selection via list combination, Expert Syst. Appl. 88 (2017) 205–216,.

Digital Library

[26]

K. Yu, X. Wu, W. Ding, Y. Mu, H. Wang, Markov blanket feature selection using representative sets, IEEE Trans. Neural Netw. Learn.Syst. 28 (11) (2017) 2775–2788.

[27]

R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, META-DES.Oracle: meta-learning and feature selection for dynamic ensemble selection, Inf. Fusion 38 (2017) 84–103,.

Digital Library

[28]

J.M.V. Campenhout, On the peaking of the Hughes mean recognition accuracy: the resolution of an apparent paradox, IEEE Trans. Syst. Man, Cybern. 8 (5) (1978) 390–395,.

[29]

A.K. Jain, D. Zongker, Feature selection: evaluation, application and small sample performance, IEEE Trans. Pattern Anal. Mach. Intell. 19 (2) (1997) 153–158.

[30]

C. Sima, E.R. Dougherty, The peaking phenomenon in the presence of feature-selection, Pattern Recognit. Lett. 29 (11) (2008) 1667–1674,.

Digital Library

[31]

B. Ghaddar, J. Naoum-sawaya, High dimensional data classification and feature selection using support vector machines, Eur. J. Oper. Res. 265 (3) (2018) 993–1004,.

[32]

S. Raudys, A. Jain, Small sample size effects in statistical pattern recognition: recommendations for practitioners and open problems, Proc. 10th Int. Conf. on Pattern Recognition, Atlantic City, New Jersey, 1990, pp. 417–423.

[33]

K. Bache, M. Lichman, UCI machine learning repository, 2013.

[34]

R. Kohavi, G. John, Wrappers for feature subset selection, Artif. Intell. J. 97 (1997) 273–324.

[35]

V. Svetnik, A. Liaw, C. Tong, J.C. Culberson, R.P. Sheridan, B.P. Feuston, Random forest: a classification and regression tool for compound classification and QSAR modeling, J. Chem. Inf. Comput. Sci. 43 (6) (2003) 1947–1958.

[36]

K. Kira, L.A. Rendell, A practical approach to feature selection, Proceedings of the ninth international workshop on Machine learning, 1992, pp. 249–256.

[37]

M. Robnik-Šikonja, I. Kononenko, Theoretical and empirical analysis of relieff and rrelieff, Mach. Learn. 53 (1) (2003) 23–69,.

Digital Library

[38]

M.A. Hall, G. Holmes, Benchmarking attribute selection techniques for discrete class data mining, IEEE Trans. Knowl. Data Eng. 15 (6) (2003) 1437–1447,.

Digital Library

[39]

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: an update, SIGKDD Explor. Newsl. 11 (1) (2009) 10–18,.

Digital Library

[40]

J. Quinlan, C4.5: Programs for Machine Learning, Morgan Kauffman, 1993.

[41]

L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32.

Digital Library

[42]

L. Kuncheva, A stability index for feature selection, Proc. IASTED, Artificial Intelligence and Applications, Innsbruck, Austria, 2007, pp. 390–395.

[43]

W. Altidor, T. Khoshgoftaar, A. Napolitano, A noise-based stability evaluation of threshold-based feature selection techniques, Information Reuse and Integration (IRI), 2011 IEEE International Conference on, IEEE, 2011, pp. 240–245.

[44]

T. Abeel, T. Helleputte, Y. Van de Peer, P. Dupont, Y. Saeys, Robust biomarker identification for cancer diagnosis with ensemble feature selection methods, Bioinformatics 26 (3) (2010) 392–3988.

Digital Library

Index Terms

On feature selection protocols for very low-sample-size data
1. Computing methodologies
  1. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Stability of Ensemble Feature Selection on High-Dimension and Low-Sample Size Data
ICPRAM 2014: Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods

Feature selection is an important step when building a classifier. However, the feature selection tends to be unstable on high-dimension and small-sample size data. This instability reduces the usefulness of selected features for knowledge discovery: if ...
Selection of Classifier and Feature Selection Method for Microarray Data
ICMLA '10: Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications

Micro array data have a low instance-count and high dimensionality problem which prevent classifiers from building accurate models. This may result in significantly different classification accuracies across classifiers and features chosen. Therefore it ...
Incremental feature selection by sample selection and feature-based accelerator▪
Abstract
Incremental feature selection is an efficient paradigm that updates an optimal feature subset from added-in data without forgetting the previously learned knowledge. Most existing studies of rough set-based incremental feature ...
Highlights
- A new feature selection framework is proposed based on discernibility score.
- ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Pattern Recognition

Pattern Recognition Volume 81, Issue C

Sep 2018

694 pages

ISSN:0031-3203

Issue’s Table of Contents

Elsevier Ltd.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 September 2018

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents