Cross-Comparison for Two-Dimensional Text Categorization

Giorgio Maria Di Nunzio¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

736 Accesses

Abstract

The organization of large text collections is the main goal of automated text categorization. In particular, the final aim is to classify documents into a certain number of pre-defined categories in an efficient way and with as much accuracy as possible. On-line and run-time services, such as personalization services and information filtering services, have increased the importance of effective and efficient document categorization techniques. In the last years, a wide range of supervised learning algorithms have been applied to this problem [1]. Recently, a new approach that exploits a two-dimensional summarization of the data for text classification was presented [2]. This method does not go through a selection of words phase; instead, it uses the whole dictionary to present data in intuitive way on two-dimensional graphs. Although successful in terms of classification effectiveness and efficiency (as recently showed in [3]), this method presents some unsolved key issues: the design of the training algorithm seems to be ad hoc for the Reuters-21578 collection; the evaluation has only been done only on the 10 most frequent classes of the Reuters-21578 dataset; the evaluation lacks measure of significance in most parts; the method adopted lacks a mathematical justification. We focus on the first three aspects, leaving the fourth as the future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Low-Dimensional Classification of Text Documents

A Similarity Function for Feature Pattern Clustering and High Dimensional Text Document Classification

Article 09 March 2019

Improved Document Categorization Through Feature-Rich Combinations

References

Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Article MathSciNet Google Scholar
Di Nunzio, G.M.: A bidimensional view of documents for text categorisation. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 112–126. Springer, Heidelberg (2004)
Chapter Google Scholar
Di Nunzio, G.M., Micarelli, A.: Pushing “underfitting” to the limit: Learning in bidimensional text categorization. In: Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004), Valencia, Spain (2004) (forthcoming)
Google Scholar
Ross, S.: Introduction to Probability and Statistics for Engineers and Scientists. Academic Press, London (2000)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Padua,
Giorgio Maria Di Nunzio

Authors

Giorgio Maria Di Nunzio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Georgia Institute of Technology and Università di Padova,
Alberto Apostolico
Department of Information Engineering, University of Padova,
Massimo Melucci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Di Nunzio, G.M. (2004). Cross-Comparison for Two-Dimensional Text Categorization. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-30213-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Cross-Comparison for Two-Dimensional Text Categorization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Low-Dimensional Classification of Text Documents

A Similarity Function for Feature Pattern Clustering and High Dimensional Text Document Classification

Improved Document Categorization Through Feature-Rich Combinations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Cross-Comparison for Two-Dimensional Text Categorization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Low-Dimensional Classification of Text Documents

A Similarity Function for Feature Pattern Clustering and High Dimensional Text Document Classification

Improved Document Categorization Through Feature-Rich Combinations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation