[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

An Examination of Feature Selection Frameworks in Text Categorization

  • Conference paper
Information Retrieval Technology (AIRS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Included in the following conference series:

Abstract

Feature selection, an important task in text categorization, is used for the purpose of dimensionality reduction. Feature selection basically can be performed locally and globally. For local selection, distinct feature sets are derived from different classes. The number of feature set is thus depended on the number of class. In contrary, only one universal feature set will be used in global feature selection. It is assumed that the feature set should preserve the characteristic of all classes. Furthermore, feature selection can also be carried out based on relevant feature set only (local dictionary) or both relevant and irrelevant feature set (universal dictionary). In this paper, we explored the different frameworks of feature selection to the task of text categorization on the Reuters(10) and Reuters(115) datasets (variants of Reuters-21578 corpus). We then investigate the efficiency of 7 different local or global feature selections corresponds the use of local and universal dictionary. Our experiments have shown that local feature selection with local dictionary yields optimal categorization results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Franca, D., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of SAC-03, 18th ACM Symposium on Applied Computing, Melbourne, US, pp. 784–788 (2003)

    Google Scholar 

  2. Ng, H.T., Goh, W.B., Low, K.L.: Feature Selection, Perception Learning, and a Usability Case Study for Text Categorization. In: SIGIR 1997, pp. 67–73. ACM Press, New York (1997)

    Chapter  Google Scholar 

  3. Zheng, Z., Wu, X., Srihari, R.: Feature Selection for Text Categorization on Imbalanced Data. ACM KDD Explorations Newsletter 6(1), 80–89 (2004)

    Article  Google Scholar 

  4. Sebastiani, F.: A tutorial on automated text categorisation. In: Proceedings of ASAI 1999, 1st Argentinian Symposium on Artificial Intelligence, Buenos Aires, AR (1999)

    Google Scholar 

  5. Yang, Y., Pedersen, J.: A comparative study on feature set selection in text categorization. In: Proc. of the 14th International Conference on Machine Learning, Nashville, TN, pp. 412–420. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  6. Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Sirmakessis, S. (ed.) Text Mining and its Applications. Studies in Fuzziness and Soft Computing, vol. 138, pp. 81–98. Physica-Verlag, Heidelberg (2004)

    Google Scholar 

  7. Zheng, Z., Srihari, R.: Optimally Combining Positive and Negative Features for Text Categorization. In: ICML 2003 Workshop (2003)

    Google Scholar 

  8. Forman, G.: An Extensive Empirical of Feature Selection Metrics for Text Categorization. Journal of Machine Learning Research 3, 1289–1305 (2003)

    Article  MATH  Google Scholar 

  9. Bong, C.H., Narayanan, K.: An Empirical Study of Feature Selection for Text Categorization based on Term Weighting Scheme. In: IEEE/WIC/ACM International Joint Conference on Web Intelligence (WI 2004), Beijing (September 2004)

    Google Scholar 

  10. Lewis, D.D.: Evaluating and optimizing autonomous text classification systems. In: Proceedings of SIGIR 1995, 18th ACM International Conference on Research and Development in Information Retrieval, Seattle, US, pp. 246–254 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

How, B.C., Kiong, W.T. (2005). An Examination of Feature Selection Frameworks in Text Categorization. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_50

Download citation

  • DOI: https://doi.org/10.1007/11562382_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29186-2

  • Online ISBN: 978-3-540-32001-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics