Abstract
A large portion of a document is usually covered by irrelevant features. Instead of identifying actual context of the document, such features increase dimensions in the representation model and computational complexity of underlying algorithm, and hence adversely affect the performance. It necessitates a requirement of relevant feature selection in the given feature space. In this context, feature selection plays a key role in removing irrelevant features from the original feature space. Feature selection methods are broadly categorized into three groups: filter, wrapper, and embedded. Filter methods are widely used in text mining because of their simplicity, computational complexity, and efficiency. In this article, we provide a brief survey of filter feature selection methods along with some of the recent developments in this area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with Naïve Bayes. Expert Syst. Appl. 36(3), 5432–5435 (2009)
Chen, X.: An improved branch and bound algorithm for feature selection. Pattern Recogn. Lett. 24(12), 1925–1933 (2003)
Chuang, L.Y., Tsai, S.W., Yang, C.H.: Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst. Appl. 38(10), 12699–12707 (2011)
Chuang, L.Y., Yang, C.H., Wu, K.C., Yang, C.H.: A hybrid feature selection method for DNA microarray data. Comput. Biol. Med. 41(4), 228–237 (2011)
Church, K.W., Hanks, P.: Word association norm, mutual information and lexicography. J. Comput. Linguist. 27(1), 22–29 (1990)
Deerwester, S.: Improving information retrieval with latent semantic indexing. In: Proceedings of the 51st Annual Meeting of the American Society for Information Science, Vol. 25, pp. 36–40 (1988)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 185–205 (2005)
Ferreira, A.J., Figueired, M.A.T.: Efficient feature selection filters for high-dimensional data. Pattern Recogn. Lett. 33(13), 1794–1804 (2012)
Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. Thesis. Department of Computer Science, University of Waikato (1999)
Hsu, H.H., Hsieh, C. W., Lu, M.D.: Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl. 38(7), 8144–8150 (2011)
Li, B., Zhang, P., Ren, G., Xing, Z.: A two stage feature selection method for gear fault diagnosis using reliefF and GA-wrapper. In: Proceedings International Conference on Measuring Technology and Mechatronics Automation, pp. 578–581 (2009)
Liu, L., Kang, J., Yu, J., Wang, Z.: A comparative study on unsupervised feature selection methods for text clustering. In: Proceedings of Natural Language Processing and Knowledge, Engineering, pp. 59–601 (2005)
Liu, Y., Qin, Z., Xu, Z., He, X.: Feature selection with particle swarms. In: Computational and Information Science, pp. 425–430. Springer, Heidelberg (2004)
Liu, Y., Wang, G., Chen, H., Dong, H., Zhu, X., Wang, S.: An improved particle swarm optimization for feature selection. J. Bionic Eng. 8(2), 191–200 (2011)
Meng, J., Lin, H., Yu, Y.: A two-stage feature selection method for text categorization. Knowl.-Based Syst. 62(7), 2793–2800 (2011)
Mitra, P., Murthy, C., Pal, S.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Machine Intell. 24(3), 301–312 (2002)
Ng, H. T., Goh, W. B., Low, K. L.: Feature selection, perception learning, and a usability case study for text categorization. In: Proceedings of the 20th ACM International Conference on Research and Development in, Information Retrieval, pp. 67–73 (1997)
Pearson, K.: On lines and planes of closest filt to systems of points in space. Phil. Mag. 1(6), 559–572 (1901)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Pudil, P., Novoviciva, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 15(11), 1119–1125 (1994)
Quinlan, J.R.: Induction of decision tree. Mach. learn. 1(1), 81–106 (1986)
Salton, G., Wong, A., Yang, C. S.: A vector space model for automatic indexing. Commun. ACM18(11), 613–620 (1975)
Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A novel feature selection algorithm for text clustering. Expert Syst. Appl. 33(1), 1–5 (2007)
Shevade, S., Keerthi, S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17), 2246–2253 (2003)
Song, W., Park, S.C.: Genetic algorithm for text clustering based on latent semantic indexing. Comput. Math. Appl. 57(11–12), 1901–1907 (2009)
Tu, C.J., Chuang, L.Y., Chang, J.Y., Yang, C.H.: Feature selection using PSO-SVM. In: Proceedings of Multiconferenc of Engineers, pp. 138–143 (2006)
Uguz, H.: A hybrid system based on information gain and principal component analysis for the classification of transcranial Doppler signals. Comput. Methods Programs Biomed. 107(3), 598–609 (2012)
Uguz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl. Based. Syst. 24(7), 1024–1032 (2011)
Unler, A., Murat, A., Chinnam, R.B.: \(\text{ mr }^{2}\text{ PSO }\): A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Inf. Sci. 181(20), 4625–4641 (2011)
Yang, C.H., Chuang, L.Y., Yang, C.H.: IG-GA: a hybrid filter/wrapper method for feature selection of microarray data. J. Med. Biol. Eng. 30(1), 23–28 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer India
About this paper
Cite this paper
Bharti, K.K., Singh, P.k. (2014). A Survey on Filter Techniques for Feature Selection in Text Mining. In: Babu, B., et al. Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. Advances in Intelligent Systems and Computing, vol 236. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1602-5_154
Download citation
DOI: https://doi.org/10.1007/978-81-322-1602-5_154
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1601-8
Online ISBN: 978-81-322-1602-5
eBook Packages: EngineeringEngineering (R0)