Non-negative Matrix Factorization Based Text Mining: Feature Extraction and Classification

P. C. Barman²⁰,
Nadeem Iqbal²⁰ &
Soo-Young Lee²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4233))

Included in the following conference series:

International Conference on Neural Information Processing

1162 Accesses
5 Citations

Abstract

The unlabeled document or text collections are becoming larger and larger which is common and obvious; mining such data sets are a challenging task. Using the simple word-document frequency matrix as feature space the mining process is becoming more complex. The text documents are often represented as high dimensional about few thousand sparse vectors with sparsity about 95 to 99% which significantly affects the efficiency and the results of the mining process. In this paper, we propose the two-stage Non-negative Matrix Factorization (NMF): in the first stage we tried to extract the uncorrelated basis probabilistic document feature vectors by significantly reducing the dimension of the feature vectors of the word-document frequency from few thousand to few hundred, and in the second stage for clustering or classification. In our propose approach it has been observed that the clustering or classification performance with more than 98.5% accuracy. The dimension reduction and classification performance has observed for the Classic3 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Nonnegative Matrix Factorization for Document Clustering: A Survey

An Application of Non Negative Matrix Factorization in Text Mining

Text mining using nonnegative matrix factorization and latent semantic analysis

Article 21 April 2021

References

Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing 13. Proc. NIPS 2000, MIT Press, Cambridge (2001)
Google Scholar
Xu, W., Liu, X., Gong, Y.: Document-Clustering based on Non-Negative Matrix Factorization. In: Proceedings of SIGIR 2003, Toronto, CA, July 28-August 1, 2003, pp. 267–273 (2003)
Google Scholar
Willett, P.: Document clustering using an inverted file approach. Journal of Information Science 2, 223–231 (1990)
Article Google Scholar
Baker, L., McCallum, A.: Distributional clustering of words for text classification. In: Proceedings of ACM SIGIR (1998)
Google Scholar
Liu, X., Gong, Y.: Document clustering with cluster refinement and model selection capabilities. In: Proceedings of ACM SIGIR 2002, Tampere, Finland (2002)
Google Scholar
Shahnaz, F., Berry, M.W.: Document Clustering Using Nonnegative Matrix Factorization. Journal on Information Processing & Management (2004)
Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-Theoretic Co-clustering. In: SIGKDD 2003, August 24-27, 2003, Washington (2003)
Google Scholar
Zha, H., He, X., Ding, C., Gu, M., Simon, H.: Bipartite graph partitioning and data clustering. In: Proceedings of ACM CIKM (2001)
Google Scholar
Lia, J., Zha, H.: Two-way Poisson mixture models for simultaneous document classification and word clustering. Computational Statistics & Data Analysis, Elsevier (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of BioSystems, Korea Advanced Institute of Science and Technology, Brain Science Research Center and Computational NeuroSystems Lab, Daejeon, 305-701, Republic of Korea
P. C. Barman, Nadeem Iqbal & Soo-Young Lee

Authors

P. C. Barman
View author publications
You can also search for this author in PubMed Google Scholar
Nadeem Iqbal
View author publications
You can also search for this author in PubMed Google Scholar
Soo-Young Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Engineering, The Chinese Univ. of Hong Kong, Shatin, N.T., Hong Kong
Irwin King
Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
Jun Wang
The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Lai-Wan Chan
Department of Computer Science and Engineering & Center for Cognitive Science, The Ohio State University, OH 43210, Columbus
DeLiang Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barman, P.C., Iqbal, N., Lee, SY. (2006). Non-negative Matrix Factorization Based Text Mining: Feature Extraction and Classification. In: King, I., Wang, J., Chan, LW., Wang, D. (eds) Neural Information Processing. ICONIP 2006. Lecture Notes in Computer Science, vol 4233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893257_78

Download citation

DOI: https://doi.org/10.1007/11893257_78
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46481-5
Online ISBN: 978-3-540-46482-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Non-negative Matrix Factorization Based Text Mining: Feature Extraction and Classification

Abstract

Access this chapter

Preview

Similar content being viewed by others

Nonnegative Matrix Factorization for Document Clustering: A Survey

An Application of Non Negative Matrix Factorization in Text Mining

Text mining using nonnegative matrix factorization and latent semantic analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Non-negative Matrix Factorization Based Text Mining: Feature Extraction and Classification

Abstract

Access this chapter

Preview

Similar content being viewed by others

Nonnegative Matrix Factorization for Document Clustering: A Survey

An Application of Non Negative Matrix Factorization in Text Mining

Text mining using nonnegative matrix factorization and latent semantic analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation