Structure-Based Supervised Term Weighting and Regularization for Text Classification

Niloofer Shanavas¹⁹,
Hui Wang¹⁹,
Zhiwei Lin¹⁹ &
…
Glenn Hawe¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11608))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

1574 Accesses
1 Citations

Abstract

Text documents have rich information that can be useful for different tasks. How to utilise the rich information in texts effectively and efficiently for tasks such as text classification is still an active research topic. One approach is to weight the terms in a text document based on their relevance to the classification task at hand. Another approach is to utilise structural information in a text document to regularize learning so that the learned model is more accurate. An important question is, can we combine the two approaches to achieve better performance? This paper presents a novel method for utilising the rich information in texts. We use supervised term weighting, which utilises the class information in a set of pre-classified training documents, thus the resulting term weighting is class specific. We also use structured regularization, which incorporates structural information into the learning process. A graph is built for each class from the pre-classified training documents and structural information in the graphs is used to calculate the supervised term weights and to define the groups for structured regularization. Experimental results for six text classification tasks show the increase in text classification accuracy with the utilisation of structural information in text for both weighting and regularization. Using graph-based text representation for supervised term weighting and structured regularization can build a compact model with considerable improvement in the performance of text classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 47.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 59.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Model-induced term-weighting schemes for text classification

Article 15 January 2016

Survey on supervised machine learning techniques for automatic text classification

Article 19 January 2019

Learning to Classify Text Using a Few Labeled Examples

Notes

1.
http://disi.unitn.it/moschitti/corpora.htm.

References

Aggarwal, C.C.: Data Classification: Algorithms and Applications. CRC Press, Boca Raton (2014)
Book Google Scholar
Bakin, S., et al.: Adaptive regression and model selection in data mining problems (1999)
Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008)
Article Google Scholar
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, New York (2007)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)
Article Google Scholar
Lewis, D.D.: Representation quality in text classification: an introduction and experiment. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, 24–27 June 1990 (1990)
Google Scholar
Martins, A.F., Smith, N.A., Aguiar, P.M., Figueiredo, M.A.: Structured sparsity in structured prediction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1500–1511 (2011)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34, 1–47 (2002)
Article Google Scholar
Shanavas, N., Wang, H., Lin, Z., Hawe, G.: Centrality-based approach for supervised term weighting. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 1261–1268. IEEE (2016)
Google Scholar
Skianis, K., Rousseau, F., Vazirgiannis, M.: Regularizing text categorization with clusters of words. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1827–1837 (2016)
Google Scholar
Skianis, K., Tziortziotis, N., Vazirgiannis, M.: Orthogonal matching pursuit for text classification. In: Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pp. 93–103 (2018)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Yogatama, D., Smith, N.: Making the most of bag of words: sentence regularization with alternating direction method of multipliers. In: International Conference on Machine Learning, pp. 656–664 (2014)
Google Scholar
Yogatama, D., Smith, N.A.: Linguistic structured sparsity in text categorization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 786–796 (2014)
Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)
Article MathSciNet Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Ulster University, Jordanstown, UK
Niloofer Shanavas, Hui Wang, Zhiwei Lin & Glenn Hawe

Authors

Niloofer Shanavas
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Glenn Hawe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Niloofer Shanavas .

Editor information

Editors and Affiliations

Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais
University of Salford, Salford, UK
Farid Meziane
University of Salford, Salford, UK
Sunil Vadera
Oakland University, Rochester, MI, USA
Vijayan Sugumaran
CSE, University of Salford, Salford, UK
Mohamad Saraee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shanavas, N., Wang, H., Lin, Z., Hawe, G. (2019). Structure-Based Supervised Term Weighting and Regularization for Text Classification. In: Métais, E., Meziane, F., Vadera, S., Sugumaran, V., Saraee, M. (eds) Natural Language Processing and Information Systems. NLDB 2019. Lecture Notes in Computer Science(), vol 11608. Springer, Cham. https://doi.org/10.1007/978-3-030-23281-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-23281-8_9
Published: 21 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23280-1
Online ISBN: 978-3-030-23281-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Structure-Based Supervised Term Weighting and Regularization for Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Model-induced term-weighting schemes for text classification

Survey on supervised machine learning techniques for automatic text classification

Learning to Classify Text Using a Few Labeled Examples

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Structure-Based Supervised Term Weighting and Regularization for Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Model-induced term-weighting schemes for text classification

Survey on supervised machine learning techniques for automatic text classification

Learning to Classify Text Using a Few Labeled Examples

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation