Using kNN model for automatic text categorization

Gongde Guo¹,
Hui Wang¹,
David Bell²,
Yaxin Bi² &
…
Kieran Greer¹

1035 Accesses
Explore all metrics

Abstract

An investigation is conducted on two well-known similarity-based learning approaches to text categorization: the k-nearest neighbors (kNN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new classifier called the kNN model-based classifier (kNN Model) is proposed. It combines the strength of both kNN and Rocchio. A text categorization prototype, which implements kNN Model along with kNN and Rocchio, is described. An experimental evaluation of different methods is carried out on two common document corpora: the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the proposed kNN model-based method outperforms the kNN and Rocchio classifiers, and is therefore a good alternative for kNN and Rocchio in some application areas.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Classifying News Articles Using Feature Similarity K Nearest Neighbor

Text categorization based on a new classification by thresholds

Article 03 June 2021

Improved Document Categorization Through Feature-Rich Combinations

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Lam W, Ho C (1998) Using a generalized instance set for automatic text categorization. SIGIR'98, pp 81–89
Lewis D (1998) Naïve (Bayes) at forty: the independent assumption in information retrieval. In: Proceedings of ECML-98, 10th European conference on machine learning, pp 4–15
Cohen W, Singer Y (1999) Context-sensitive learning methods for text categorization. ACM Trans Inform Syst 17(2):141–173
Google Scholar
Li H, Yamanishi K (1999) Text classification using esc-based stochastic decision lists. In: Proceedings of CIKM-99, 8th ACM international conference on information and knowledge management, pp 122–130
Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of SIGIR-99, 22nd ACM international conference on research and development in information retrieval, pp 42–49
Ruiz M, Srinivasan P (1999) Hierarchical neural networks for text categorization. In: Proceedings of SIGIR-99, 22nd ACM International Information Retrieval, pp 281–282
Mitchell T (1996) Machine learning. McGraw Hill, New York
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of 10th European conference on machine learning, Chemnitz, Germany, pp 137–142
Joachims T (2001) A statistical learning model of text classification for support vector machines. In: Proceedings of SIGIR-01, 24th ACM international conference on research and development in information retrieval, pp 128–136
Rocchio J (1971) Relevance feedback in information retrieval. In: The SMART retrieval system: experiments in automatic document processing. Salton G (ed) Prentice-Hall, Englewood Cliffs
Joachims T (1997) A probabilistic analysis of the rocchio algorithm with TFIDF for test categorization. In: Proceedings of ICML-97, 14th international conference on machine learning, pp 143–151
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Google Scholar
Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7): 1895–1924
Google Scholar
Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Vesley, Reading
Han E, Karypis G (2000) Centroid-based document classification: analysis and experimental results, Technical Report:#00-017, University of Minnesota, Department of Computer Science / Army HPC Research Center, Minneapolis, MN 55455
ICONS (2001) ICONS Consortium, intelligent content management system contract number IST-2001-32429. Annex I-Description of work

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, University of Ulster, (Shore Road, Newtownabbey), Co. Antrim, BT37 0QB, UK
Gongde Guo, Hui Wang & Kieran Greer
School of Computer Science, Queen's University Belfast, Belfast, BT7 1NN, UK
David Bell & Yaxin Bi

Authors

Gongde Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
David Bell
View author publications
You can also search for this author in PubMed Google Scholar
Yaxin Bi
View author publications
You can also search for this author in PubMed Google Scholar
Kieran Greer
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This work was partly supported by the European Commission project ICONS, project no. IST-2001-32429.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, G., Wang, H., Bell, D. et al. Using kNN model for automatic text categorization. Soft Comput 10, 423–430 (2006). https://doi.org/10.1007/s00500-005-0503-y

Download citation

Published: 18 May 2005
Issue Date: March 2006
DOI: https://doi.org/10.1007/s00500-005-0503-y

Using kNN model for automatic text categorization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classifying News Articles Using Feature Similarity K Nearest Neighbor

Text categorization based on a new classification by thresholds

Improved Document Categorization Through Feature-Rich Combinations

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Using kNN model for automatic text categorization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classifying News Articles Using Feature Similarity K Nearest Neighbor

Text categorization based on a new classification by thresholds

Improved Document Categorization Through Feature-Rich Combinations

Explore related subjects

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now