[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

On feature selection protocols for very low-sample-size data

Published: 01 September 2018 Publication History

Highlights

Feature selection with very few instances, possibly high-dimensional.
Widely used protocol: 1) feature selection, 2) cross-validation to test a classifier.
Alternative, proper, protocol includes both steps in a single cross-validation loop.
Experiment using 24 datasets, 3 feature selection methods and 5 classifier models.
The proper protocol accuracy is significantly closer to the true accuracy.

Abstract

High-dimensional data with very few instances are typical in many application domains. Selecting a highly discriminative subset of the original features is often the main interest of the end user. The widely-used feature selection protocol for such type of data consists of two steps. First, features are selected from the data (possibly through cross-validation), and, second, a cross-validation protocol is applied to test a classifier using the selected features. The selected feature set and the testing accuracy are then returned to the user. For the lack of a better option, the same low-sample-size dataset is used in both steps. Questioning the validity of this protocol, we carried out an experiment using 24 high-dimensional datasets, three feature selection methods and five classifier models. We found that the accuracy returned by the above protocol is heavily biased, and therefore propose an alternative protocol which avoids the contamination by including both steps in a single cross-validation loop. Statistical tests verify that the classification accuracy returned by the proper protocol is significantly closer to the true accuracy (estimated from an independent testing set) compared to that returned by the currently favoured protocol.

References

[1]
E.A. Patrick, Fundamentals of Pattern Recognition, Prentice-Hall, Inc., Englewood Cliffs, N.J., 1972.
[2]
P.A. Devijver, J. Kittler, Pattern Recognition: A Statistical Approach, Prentice-Hall, Inc., Englewood Cliffs, N.J., 1982.
[3]
I. Guyon, A. Elisseeff, An introduction to variable and feature selection, J. Mach. Learn. Res. 3 (2003) 1157–1182.
[4]
M. Dash, H. Liu, Feature selection for classification, Intell. Data Anal. 1 (1997) 131–156.
[5]
X. Zhu, Z. Huang, Y. Yang, H. Tao Shen, C. Xu, J. Luo, Self-taught dimensionality reduction on the high-dimensional small-sized data, Pattern Recognit. 46 (1) (2013) 215–229.
[6]
L. Yu, H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution, In Proceedings of the 20th International Conference on Machine Learning (ICML2003), Washington, DC, 2003.
[7]
J. Hua, W.D. Tembe, E.R. Dougherty, Performance of feature-selection methods in the classification of high-dimension data, Pattern Recognit. 42 (3) (2009) 409–424.
[8]
A. Golugula, G. Lee, A. Madabhushi, Evaluating feature selection strategies for high dimensional, small sample size datasets, IEEE International Conference of Engineering in Medicine and Biology Society (EMBS), 2011, pp. 949–952.
[9]
P. Bermejo, L. de la Ossa, J.A. Gámez, J.M. Puerta, Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking, Knowl. Based Syst. 25 (1) (2012) 35–44.
[10]
E.P. Xing, M.I. Jordan, R.M. Karp, Feature selection for high-dimensional genomic microarray data, Proceedings of the 18th International Conference on Machine Learning, (ICML2001), 2001, pp. 601–608.
[11]
Y. Saeys, I.n. Inza, P. Larrañaga, A review of feature selection techniques in bioinformatics, Bioinformatics 23 (19) (2007) 2507–2517.
[12]
G. Brown, A. Pocock, M. Zhao, M. Lujan, Conditional likelihood maximisation: a unifying framework for information theoretic feature selection, J. Mach. Learn. Res. 13 (2012) 27–66.
[13]
I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines, Mach. Learn. 46 (2002) 389–422.
[14]
F. De Martino, G. Valente, N. Staeren, J. Ashburner, R.G. a E. Formisano, Combining multivariate voxel selection and support vector machines for mapping and classification of fmri spatial patterns, NeuroImage 43 (1) (2008) 44–58.
[15]
J. Reunanen, Overfitting in making comparisons between variable selection methods, J. Mach. Learn. Res. 3 (2003) 1371–1382.
[16]
P. Smialowski, D. Frishman, S. Kramer, Pitfalls of supervised feature selection, Bioinformatics 26 (3) (2010) 440–443.
[17]
S. Diciotti, S. Ciulli, M. Mascalchi, M. Giannelli, N. Toschi, The “peeking” effect in supervised feature selection on diffusion tensor imaging data, Am. J. Neuroradiol. (2013).
[18]
F. Pereira, T. Mitchell, M. Botvinick, Machine learning classifiers and fMRI: a tutorial overview, NeuroImage 45 (1, Supplement 1) (2009).
[19]
Y. Li, T. Li, H. Liu, Recent advances in feature selection and its applications, Knowl. Inf. Syst. 53 (3) (2017) 551–577,.
[20]
J. Li, K. Cheng, S. Wang, F. Morstatter, T. Robert, J. Tang, H. Liu, Feature selection: a data perspective, arXiv:1601.07996 (2016).
[21]
F. Viegas, L. Rocha, M. Gonçalves, F. Mourão, G. Sá, T. Salles, G. Andrade, I. Sandin, A genetic programming approach for feature selection in highly dimensional skewed data, Neurocomputing 273 (2018) 554–569,.
[22]
E. Hancer, B. Xue, M. Zhang, D. Karaboga, B. Akay, Pareto front feature selection based on artificial bee colony optimization, Inf. Sci. 422 (2018) 462–479,.
[23]
P.P. Kundu, S. Mitra, Feature selection through message passing, IEEE Trans. Cybern. 47 (12) (2017) 4356–4366.
[24]
S. Solorio-Fernández, J.F. Martínez-Trinidad, J.A. Carrasco-Ochoa, A new unsupervised spectral feature selection method for mixed data : a filter approach, Pattern Recognit. 72 (2017) 314–326,.
[25]
J. Izetta, P.F. Verdes, P.M. Granitto, Improved multiclass feature selection via list combination, Expert Syst. Appl. 88 (2017) 205–216,.
[26]
K. Yu, X. Wu, W. Ding, Y. Mu, H. Wang, Markov blanket feature selection using representative sets, IEEE Trans. Neural Netw. Learn.Syst. 28 (11) (2017) 2775–2788.
[27]
R.M.O. Cruz, R. Sabourin, G.D.C. Cavalcanti, META-DES.Oracle: meta-learning and feature selection for dynamic ensemble selection, Inf. Fusion 38 (2017) 84–103,.
[28]
J.M.V. Campenhout, On the peaking of the Hughes mean recognition accuracy: the resolution of an apparent paradox, IEEE Trans. Syst. Man, Cybern. 8 (5) (1978) 390–395,.
[29]
A.K. Jain, D. Zongker, Feature selection: evaluation, application and small sample performance, IEEE Trans. Pattern Anal. Mach. Intell. 19 (2) (1997) 153–158.
[30]
C. Sima, E.R. Dougherty, The peaking phenomenon in the presence of feature-selection, Pattern Recognit. Lett. 29 (11) (2008) 1667–1674,.
[31]
B. Ghaddar, J. Naoum-sawaya, High dimensional data classification and feature selection using support vector machines, Eur. J. Oper. Res. 265 (3) (2018) 993–1004,.
[32]
S. Raudys, A. Jain, Small sample size effects in statistical pattern recognition: recommendations for practitioners and open problems, Proc. 10th Int. Conf. on Pattern Recognition, Atlantic City, New Jersey, 1990, pp. 417–423.
[33]
K. Bache, M. Lichman, UCI machine learning repository, 2013.
[34]
R. Kohavi, G. John, Wrappers for feature subset selection, Artif. Intell. J. 97 (1997) 273–324.
[35]
V. Svetnik, A. Liaw, C. Tong, J.C. Culberson, R.P. Sheridan, B.P. Feuston, Random forest: a classification and regression tool for compound classification and QSAR modeling, J. Chem. Inf. Comput. Sci. 43 (6) (2003) 1947–1958.
[36]
K. Kira, L.A. Rendell, A practical approach to feature selection, Proceedings of the ninth international workshop on Machine learning, 1992, pp. 249–256.
[37]
M. Robnik-Šikonja, I. Kononenko, Theoretical and empirical analysis of relieff and rrelieff, Mach. Learn. 53 (1) (2003) 23–69,.
[38]
M.A. Hall, G. Holmes, Benchmarking attribute selection techniques for discrete class data mining, IEEE Trans. Knowl. Data Eng. 15 (6) (2003) 1437–1447,.
[39]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: an update, SIGKDD Explor. Newsl. 11 (1) (2009) 10–18,.
[40]
J. Quinlan, C4.5: Programs for Machine Learning, Morgan Kauffman, 1993.
[41]
L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32.
[42]
L. Kuncheva, A stability index for feature selection, Proc. IASTED, Artificial Intelligence and Applications, Innsbruck, Austria, 2007, pp. 390–395.
[43]
W. Altidor, T. Khoshgoftaar, A. Napolitano, A noise-based stability evaluation of threshold-based feature selection techniques, Information Reuse and Integration (IRI), 2011 IEEE International Conference on, IEEE, 2011, pp. 240–245.
[44]
T. Abeel, T. Helleputte, Y. Van de Peer, P. Dupont, Y. Saeys, Robust biomarker identification for cancer diagnosis with ensemble feature selection methods, Bioinformatics 26 (3) (2010) 392–3988.

Index Terms

  1. On feature selection protocols for very low-sample-size data
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Pattern Recognition
        Pattern Recognition  Volume 81, Issue C
        Sep 2018
        694 pages

        Publisher

        Elsevier Science Inc.

        United States

        Publication History

        Published: 01 September 2018

        Author Tags

        1. Feature selection
        2. Wide datasets
        3. Experimental protocol
        4. Training/testing
        5. Cross-validation

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 24 Dec 2024

        Other Metrics

        Citations

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media