Abstract
The amount of data in the world is growing exponentially due to the elevated number of applications in the most various contexts. This data needs to be analyzed in order to extract valuable underlying information from them. Machine learning is a useful tool to do this task, but the high complexity of the data forces to use other methods to reduce such complexity. Dimensionality reduction (feature selection) is one of the most used method to achieve this goal. As usual, many algorithms were proposed to reduce dimension of data, each one with its own advantages and drawbacks. The variety of algorithms usually makes researches to test several methods and choose the best solution. Based on that, this paper proposes a combination of feature selection algorithms in order to create a single and more stable solution. We tested this approach using real datasets and machine learning algorithms. Results showed we can use the combined solution with little or none loss in classification accuracy. So, our method can be used as a stable choice when there is few knowledge about the problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5, 537–550 (1994)
Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 3rd edn. Prentice-Hall Inc., Upper Saddle River (2006)
Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963–4967 (2002)
Gorman, P.R., Sejnowski, T.J.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1(1), 75–89 (1988)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., Upper Saddle River (1988)
Jolliffe, I.: Principal Component Analysis. Springer Series in Statistics. Springer, New York (2002)
Lichman, M.: UCI Machine Learning Repository (2013)
Nguyen, X.V., Chan, J., Romano, S., Bailey, J.: Effective global approaches for mutual information based feature selection. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 512–521. ACM, New York (2014)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948)
Shen, Q., Diao, R., Su, P.: Feature selection ensemble. In Voronkov, A. (ed.) Turing-100. The Alan Turing Centenary. EPiC Series in Computing, vol. 10, pp. 289–306. EasyChair (2012)
Sigillito, V.G., Wing, S.P., Hutton, L.V., Baker, K.B.: Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech. Dig. 10, 262–266 (1989)
Tsanas, A., Little, M.A., Fox, C., Ramig, L.O.: Objective automatic assessment of rehabilitative speech treatment in parkinson’s disease. IEEE Trans. Neural Syst. Rehabil. Eng. 22(1), 181–190 (2014)
Acknowledgments
This paper was partially supported by CNPq Universal Grant no 480997/2013-6 and UFRN scholarship program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Araújo, D., Jesus, J., Neto, A.D., Martins, A. (2016). A Combination Method for Reducing Dimensionality in Large Datasets. In: Villa, A., Masulli, P., Pons Rivero, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2016. ICANN 2016. Lecture Notes in Computer Science(), vol 9887. Springer, Cham. https://doi.org/10.1007/978-3-319-44781-0_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-44781-0_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44780-3
Online ISBN: 978-3-319-44781-0
eBook Packages: Computer ScienceComputer Science (R0)