[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

GEML: a graph-enhanced pre-trained language model framework for text classification via mutual learning

Published: 11 September 2024 Publication History

Abstract

Large-scale Pre-trained Language Models (PLMs) have become the backbones of text classification due to their exceptional performance. However, they treat input documents as independent and uniformly distributed, thereby disregarding potential relationships among the documents. This limitation could lead to some miscalculations and inaccuracies in text classification. To address this issue, some recent work explores the integration of Graph Neural Networks (GNNs) with PLMs, as GNNs can effectively model document relationships. Yet, combining graph-based methods with PLMs is challenging due to the structural incompatibility between graphs and sequences. To tackle this challenge, we propose a graph-enhanced text mutual learning framework that integrates graph-based models with PLMs to boost classification performance. Our approach separates graph-based methods and language models into two independent channels and allows them to approximate each other through mutual learning of probability distributions. This probability-distribution-guided approach simplifies the adaptation of graph-based models to PLMs and enables seamless end-to-end training of the entire architecture. Moreover, we introduce Asymmetrical Learning, a strategy that enhances the learning process, and incorporate Uncertainty Weighting loss to achieve smoother probability distribution learning. These enhancements significantly improve the performance of mutual learning. The practical value of our research lies in its potential applications in various industries, such as social network analysis, information retrieval, and recommendation systems, where understanding and leveraging document relationships are crucial. Importantly, our method can be easily combined with different PLMs and consistently achieves state-of-the-art results on multiple public datasets.

References

[1]
Chen H, Lin Y, Qi F et al (2021) Aspect-level sentiment-controllable review generation with mutual learning framework. In: Proceedings of the AAAI conference on artificial intelligence, pp 12639–12647
[2]
Chen Z, Mao H, Li H, et al. Exploring the potential of large language models (llms) in learning on graphs ACM SIGKDD Explorations Newsl 2024 25 2 42-61
[3]
Cui H, Wang G, Li Y, et al. Self-training method based on GCN for semi-supervised short text classification Inf Sci 2022 611 18-29
[4]
Devlin J (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
[5]
Ding K, Wang J, Li J et al (2020) Be more with less: hypergraph attention networks for inductive text classification. In: Conference on empirical methods in natural language processing. pp 4927–4936
[6]
Fang X, Zhu J, Zhang R, et al. Ibnet: interactive branch network for salient object detection Neurocomputing 2021 465 574-583
[7]
Forman G (2008) BNS feature scaling: an improved representation over tf-idf for svm text classification. In: Proceedings of the 17th ACM conference on information and knowledge management, CIKM 2008, Napa Valley, California, USA, October 26-30, 2008. pp 263–270
[8]
Gao W and Huang H A gating context-aware text classification model with bert and graph convolutional networks J Intell Fuzz Syst 2021 40 3 4331-4343
[9]
Gui L, Jia L, Zhou J et al (2020) Multi-task learning with mutual learning for joint sentiment classification and topic detection. IEEE Trans Knowl Data Eng 1–1
[10]
Hinton G (2015) Distilling the knowledge in a neural network. arXiv:1503.02531
[11]
Huang L, Ma D, Li S et al (2019) Text level graph neural network for text classification. In: Inui K, Jiang J, Ng V et al (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. pp 3442–3448
[12]
Huang X, Ma T, Jia L, et al. An effective multimodal representation and fusion method for multimodal intent recognition Neurocomputing 2023 548 126373
[13]
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Machine Learning: ECML-98, 10th European conference on machine learning, Chemnitz, Germany, April 21-23, 1998, Proceedings. pp 137–142
[14]
Joulin A, Grave E, Bojanowski P et al (2016) Bag of tricks for efficient text classification. arXiv:1607.01759
[15]
Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp 7482–7491
[16]
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing. pp 1746–1751
[17]
Kingma DP and Ba J Bengio Y and LeCun Y Adam: a method for stochastic optimization 3rd International Conference on Learning Representations 2015 ICLR
[18]
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings
[19]
Li C, Peng X, Peng H, et al (2021) Textgtl: graph-based transductive learning for semi-supervised text classification via structure-sensitive interpolation. In: Zhou Z (ed) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021. pp 2680–2686
[20]
Li Q, Li L, Wang W, et al. A comprehensive exploration of semantic relation extraction via pre-trained cnns Knowl-Based Syst 2020 194 105488
[21]
Liang Y, Li H, Guo B, et al. Fusion of heterogeneous attention mechanisms in multi-view convolutional neural network for text classification Inf Sci 2021 548 295-312
[22]
Lin Y, Meng Y, Sun X et al (2021) Bertgcn: transductive text classification by combining GNN and BERT. In: Findings of the association for computational linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021. pp 1456–1462
[23]
Liu X, You X, Zhang X, et al (2020) Tensor graph convolutional networks for text classification. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, The thirty-second innovative applications of artificial intelligence conference, IAAI 2020, The tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, USA, February 7-12, 2020. pp 8409–8416
[24]
Liu Y (2019) Roberta: a robustly optimized bert pretraining approach. arXiv:1907.11692
[25]
Liu Y, Guan R, Giunchiglia F et al (2021) Deep attention diffusion graph neural networks for text classification. In: Moens M, Huang X, Specia L et al (eds) Proceedings of the 2021 conference on empirical methods in natural language processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. pp 8142–8152
[26]
Lu Z, Du P, Nie J (2020) VGCN-BERT: augmenting BERT with graph embedding for text classification. In: Advances in information retrieval - 42nd European conference on IR research, ECIR 2020, Lisbon, Portugal, April 14-17, 2020, Proceedings, Part I. pp 369–382
[27]
Ma Q, Yu L, Chen H, et al. Sequence labeling with mlta: multi-level topic-aware mechanism Inf Sci 2023 637 118934
[28]
Ma Y, Yu J, Ji B, et al (2021) Three-way decisions based rnn models for sentiment classification. In: Rough sets: international joint conference, IJCRS 2021, Bratislava, Slovakia, September 19–24, 2021, Proceedings. Springer, pp 247–258
[29]
Ma Y, Hiraoka T, Okazaki N (2022) Joint entity and relation extraction based on table labeling using convolutional neural networks. In: Proceedings of the sixth workshop on structured prediction for NLP. pp 11–21
[30]
Maron ME Automatic indexing: an experimental inquiry J ACM 1961 8 3 404-417
[31]
Mikolov T, Karafiát M, Burget L et al (2010) Recurrent neural network based language model. In: Interspeech, Makuhari. pp 1045–1048
[32]
Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? In: Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. pp 4696–4705
[33]
Onan A Hierarchical graph-based text classification framework with contextual node embedding and bert-based dynamic fusion J King Saud Univ Comput Inf 2023 100 101610
[34]
Pan M, Pei Q, Liu Y, et al. Sprf: a semantic pseudo-relevance feedback enhancement for information retrieval via conceptnet Knowl-Based Syst 2023 274 110602
[35]
Phan XH, Nguyen ML, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Huai J, Chen R, Hon H et al (eds) Proceedings of the 17th international conference on World Wide Web, WWW 2008, Beijing, China, April 21-25, 2008. pp 91–100
[36]
Ragesh R, Sellamanickam S, Iyer A et al (2021) Hetegcn: Heterogeneous graph convolutional networks for text classification. In: WSDM ’21, The fourteenth ACM international conference on web search and data mining, Virtual Event, Israel, March 8-12, 2021. pp 860–868
[37]
Romero R, Celard P, Sorribes-Fdez JM, et al. Mobydeep: a lightweight cnn architecture to configure models for text classification Knowl-Based Syst 2022 257 109914
[38]
Sanh V (2019) Distilbert, a distilled version of bert: Smaller, faster, cheaper and lighter. arXiv:1910.01108
[39]
Shen D, Wang G, Wang W et al (2018) Baseline needs more love: on simple word-embedding-based models and associated pooling mechanisms. In: Proceedings of the 56th annual meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, vol 1: Long Papers. pp 440–450
[40]
Socher R, Perelygin A, Wu J et al (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL. pp 1631–1642
[41]
Song R, Giunchiglia F, Shen Q, et al. Improving abusive language detection with online interaction network Inf Process Manag 2022 59 5 103009
[42]
Song R, Giunchiglia F, Zhao K, et al. Graph topology enhancement for text classification Appl Intell 2022 52 13 15091-15104
[43]
Soni S, Chouhan SS, and Rathore SS Textconvonet: a convolutional neural network based architecture for text classification Appl Intell 2023 53 11 14249-14268
[44]
Sun S, Luo C, and Chen J A review of natural language processing techniques for opinion mining systems Inf Fusion 2017 36 10-25
[45]
Tan Z, Liu B, Yin G (2021) Asymmetric graph representation learning. arXiv preprint arXiv:2110.07436
[46]
Tu M, Zhu K, Guo H, et al. Multi-granularity mutual learning network for object re-identification IEEE Trans Intell Transp Syst 2022 23 9 15178-15189
[47]
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA. pp 5998–6008
[48]
Wallach HM (2006) Topic modeling: beyond bag-of-words. In: Machine learning, proceedings of the twenty-third international conference (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25-29, 2006. pp 977–984
[49]
Wang K, Han SC, Poon J (2022) Induct-gcn: inductive graph convolutional networks for text classification. In: 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, pp 1243–1249
[50]
Wang S, Manning CD (2012) Baselines and bigrams: simple, good sentiment and topic classification. In: The 50th annual meeting of the association for computational linguistics, proceedings of the conference, July 8-14, 2012, Jeju Island, Korea - Volume 2: Short Papers. pp 90–94
[51]
Xia R, Zong C, and Li S Ensemble of feature sets and classification algorithms for sentiment classification Inf Sci 2011 181 6 1138-1152
[52]
Xu J, Xu B, Wang P et al (2017) Self-taught convolutional neural networks for short text clustering. Neural Netw 22–31
[53]
Yang J, Liu Z, Xiao S, et al. Graphformers: Gnn-nested transformers for representation learning on textual graph Adv Neural Inf Process Syst 2021 34 28798-28810
[54]
Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence. pp 7370–7377
[55]
Ye Z, Jiang G, Liu Y, et al (2020) Document and word representations generated by graph convolutional network and bert for short text classification. ECAI 2020: 24TH European conference on artificial intelligence. pp 2275–2281
[56]
Zhang Y, Jin R, and Zhou Z Understanding bag-of-words model: a statistical framework Int J Mach Learn Cybern 2010 1 1–4 43-52
[57]
Zhang Y, Liu Q, Song L (2018) Sentence-state LSTM for text representation. In: Gurevych I, Miyao Y (eds) Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, vol 1: Long Papers. pp 317–327
[58]
Zhang Y, Xiang T, Hospedales MT, et al (2018) Deep mutual learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp 4320–4328
[59]
Zhang Y, Yu X, Cui Z, et al (2020) Every document owns its structure: inductive text classification via graph neural networks. In: Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5-10, 2020. pp 334–339
[60]
Zhang Z, Zhou Z, Wang Y (2022) Ssegcn: syntactic and semantic enhanced graph convolutional network for aspect-based sentiment analysis. In: Proceedings of the 2022 conference of the North American Chapter of the association for computational linguistics: human language technologies. pp 4916–4925
[61]
Zhou L, Chen Y, Cao C, et al. Macro-micro mutual learning inside compositional model for human pose estimation Neurocomputing 2021 449 176-188

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Applied Intelligence
Applied Intelligence  Volume 54, Issue 23
Dec 2024
576 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 11 September 2024
Accepted: 29 August 2024

Author Tags

  1. Pre-trained language models
  2. Mutual learning
  3. Graph neural networks
  4. Text classification

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media