Best Subset Feature Selection for Massive Mixed-Type Problems

Eugene Tuv²⁰,
Alexander Borisov²¹ &
Kari Torkkola²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1752 Accesses
5 Citations

Abstract

We address the problem of identifying a non-redundant subset of important variables. All modern feature selection approaches including filters, wrappers, and embedded methods experience problems in very general settings with massive mixed-type data, and with complex relationships between the inputs and the target. We propose an efficient ensemble-based approach measuring statistical independence between a target and a potentially very large number of inputs including any meaningful order of interactions between them, removing redundancies from the relevant ones, and finally ranking variables in the identified minimum feature set. Experiments with synthetic data illustrate the sensitivity and the selectivity of the method, whereas the scalability of the method is demonstrated with a real car sensor data base.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Diverse Selection of Feature Subsets for Ensemble Regression

Redundant Feature Selection for Telemetry Data

Ensemble feature selection for high dimensional data: a new method and a comparative study

Article 24 April 2017

References

Borisov, A., Eruhimov, V., Tuv, E.: Dynamic soft feature selection for tree-based ensembles. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, New York (2005)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. CRC Press, Boca Raton (1984)
MATH Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Technical report, Dept. of Statistics, Stanford University (1999)
Google Scholar
Gabrilovich, E., Markovitch, S.: Text categorization with many redundant features: Using aggressive feature selection to make svms competitive with c4.5. In: Proc. ICML 2004 (2004)
Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Article Google Scholar
Torkkola, K., Tuv, E.: Ensembles of regularized least squares classifiers for highdimensional problems. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (2005)
Google Scholar
Torkkola, K., Gardner, M., Wood, C., Schreiner, C., Massey, N., Leivian, B., Summers, J., Venkatesan, S.: Toward modeling and classification of naturalistic driving. In: Proceedings of the 2005 IEEE Intelligent Vehicles Symposium, Las Vegas, NV, USA, June 6 - 8 2005, pp. 638–643 (2005)
Google Scholar
Tuv, E.: Feature selection and ensemble learning. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, New York (2005)
Google Scholar
Tuv, E., Borisov, A., Torkkola, K.: Feature selection using ensemble based ranking against artificial contrasts. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 2006), Vancouver, Canada, July 16-22 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Intel, Analysis and Control Technology, Chandler, AZ, USA
Eugene Tuv
Intel, Analysis and Control Technology, N.Novgorod, Russia
Alexander Borisov
Motorola, Intelligent Systems Lab, Tempe, AZ, USA
Kari Torkkola

Authors

Eugene Tuv
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Borisov
View author publications
You can also search for this author in PubMed Google Scholar
Kari Torkkola
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politécnica Superior, GICAP Research Group, Universidad de Burgo, Calle Francisco de Vitoria S/N, Edifico C, Campus Vena, 09006, Burgos, Spain
Emilio Corchado
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin
Department of Information Systems and Computation, Technical University of Valencia, Camino de Vera, Valencia, Spain
Vicente Botti
University of West Scotland, PA1 2BE, Paisley, Scotland
Colin Fyfe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tuv, E., Borisov, A., Torkkola, K. (2006). Best Subset Feature Selection for Massive Mixed-Type Problems. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_125

Download citation

DOI: https://doi.org/10.1007/11875581_125
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Best Subset Feature Selection for Massive Mixed-Type Problems

Abstract

Access this chapter

Preview

Similar content being viewed by others

Diverse Selection of Feature Subsets for Ensemble Regression

Redundant Feature Selection for Telemetry Data

Ensemble feature selection for high dimensional data: a new method and a comparative study

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Best Subset Feature Selection for Massive Mixed-Type Problems

Abstract

Access this chapter

Preview

Similar content being viewed by others

Diverse Selection of Feature Subsets for Ensemble Regression

Redundant Feature Selection for Telemetry Data

Ensemble feature selection for high dimensional data: a new method and a comparative study

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation