Abstract
Feature selection, an important task in text categorization, is used for the purpose of dimensionality reduction. Feature selection basically can be performed locally and globally. For local selection, distinct feature sets are derived from different classes. The number of feature set is thus depended on the number of class. In contrary, only one universal feature set will be used in global feature selection. It is assumed that the feature set should preserve the characteristic of all classes. Furthermore, feature selection can also be carried out based on relevant feature set only (local dictionary) or both relevant and irrelevant feature set (universal dictionary). In this paper, we explored the different frameworks of feature selection to the task of text categorization on the Reuters(10) and Reuters(115) datasets (variants of Reuters-21578 corpus). We then investigate the efficiency of 7 different local or global feature selections corresponds the use of local and universal dictionary. Our experiments have shown that local feature selection with local dictionary yields optimal categorization results.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Franca, D., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of SAC-03, 18th ACM Symposium on Applied Computing, Melbourne, US, pp. 784–788 (2003)
Ng, H.T., Goh, W.B., Low, K.L.: Feature Selection, Perception Learning, and a Usability Case Study for Text Categorization. In: SIGIR 1997, pp. 67–73. ACM Press, New York (1997)
Zheng, Z., Wu, X., Srihari, R.: Feature Selection for Text Categorization on Imbalanced Data. ACM KDD Explorations Newsletter 6(1), 80–89 (2004)
Sebastiani, F.: A tutorial on automated text categorisation. In: Proceedings of ASAI 1999, 1st Argentinian Symposium on Artificial Intelligence, Buenos Aires, AR (1999)
Yang, Y., Pedersen, J.: A comparative study on feature set selection in text categorization. In: Proc. of the 14th International Conference on Machine Learning, Nashville, TN, pp. 412–420. Morgan Kaufmann, San Francisco (1997)
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and its Applications. Studies in Fuzziness and Soft Computing, vol. 138, pp. 81–98. Physica-Verlag, Heidelberg (2004)
Zheng, Z., Srihari, R.: Optimally Combining Positive and Negative Features for Text Categorization. In: ICML 2003 Workshop (2003)
Forman, G.: An Extensive Empirical of Feature Selection Metrics for Text Categorization. Journal of Machine Learning Research 3, 1289–1305 (2003)
Bong, C.H., Narayanan, K.: An Empirical Study of Feature Selection for Text Categorization based on Term Weighting Scheme. In: IEEE/WIC/ACM International Joint Conference on Web Intelligence (WI 2004), Beijing (September 2004)
Lewis, D.D.: Evaluating and optimizing autonomous text classification systems. In: Proceedings of SIGIR 1995, 18th ACM International Conference on Research and Development in Information Retrieval, Seattle, US, pp. 246–254 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
How, B.C., Kiong, W.T. (2005). An Examination of Feature Selection Frameworks in Text Categorization. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_50
Download citation
DOI: https://doi.org/10.1007/11562382_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)