Few-Shot Text Classification with Global–Local Feature Information
<p>The SumFS framework. The blue arrow denotes the global feature processing process, the black arrow denotes the training process, and the red arrow denotes the testing process.</p> "> Figure 2
<p>CH index under several Sum-H. The number of sentences corresponding to the red points was chosen.</p> "> Figure 3
<p>TSNE visualization of the input representations for Sum-H and raw data of three datasets, which are THUCnews, Fudan news and Marine news. Their Sum-H’s distributions are shown in (<b>a</b>), (<b>c</b>) and (<b>e</b>), respectively, and their distributions of raw data are shown in (<b>b</b>), (<b>d</b>) and (<b>f</b>), respectively. Each color/marker pair corresponds to a specific label. The class numbers are listed in <a href="#sensors-22-04420-t001" class="html-table">Table 1</a>.</p> "> Figure 3 Cont.
<p>TSNE visualization of the input representations for Sum-H and raw data of three datasets, which are THUCnews, Fudan news and Marine news. Their Sum-H’s distributions are shown in (<b>a</b>), (<b>c</b>) and (<b>e</b>), respectively, and their distributions of raw data are shown in (<b>b</b>), (<b>d</b>) and (<b>f</b>), respectively. Each color/marker pair corresponds to a specific label. The class numbers are listed in <a href="#sensors-22-04420-t001" class="html-table">Table 1</a>.</p> "> Figure 4
<p>Comparison of IDF-IWF-ATT results with different numbers of sentences.</p> "> Figure 5
<p>The accuracy trends of training and testing using the ATT-IDF-IWF weighted strategy. According to the results, Sum-H preserves the original text information well.</p> "> Figure 6
<p>Comparison of the standard deviations during the testing steps.</p> "> Figure 7
<p>Comparison of results with the optimal solution strategy.</p> "> Figure 8
<p>Comparison of time consumption between Raw and Sum-H. These results were estimated using the average time consumption for each epoch.</p> ">
Abstract
:1. Introduction
- 1
- The SumFS algorithm is proposed to implement text pre-processing and small sample classification tasks in a pipeline processing manner;
- 2
- We propose a global-to-local strategy that utilizes the maximum dataset distribution distance to minimize the category distances within each learning task;
- 3
- Extensive experiments on the datasets were conducted, along with detailed comparative analysis, which demonstrates that our proposed method is superior to its alternatives.
2. Related Works
2.1. Deep Learning for Text Classification
2.2. Meta-Learning for Text Classification
2.3. Feature Selection for Text Classification
3. The Proposed Method
Algorithm 1: Pseudo-code for SumFS. |
|
3.1. Sentence Extraction
3.2. Weight Generator
3.3. Sentence Embedding
3.4. Classifier
4. Experiments and Results
4.1. Experimental Setup
4.2. Clustering Experiment Results
4.3. Classification Experiment Results
- 1
- MAML [13] is compatible with any model trained using gradient descent for different learning problems https://github.com/dragen1860/MAML-Pytorch (accessed on 8 April 2022).
- 2
- Proto [39] performs classification by computing the distance to the prototype representation of each class based on metric space (https://github.com/jakesnell/prototypical-networks?utmsource (accessed on 8 April 2022).
- 3
- R2D2 [3] aims to improve classification performance by learning high-quality attention from the source pool distribution features (https://github.com/YujiaBao/Distributional-Signatures (accessed on 8 April 2022)).
- 4
- MLADA [4] is committed to improving the model’s adaptability. The text representations are obtained by introducing an adversarial domain adaptation network. (https://github.com/hccngu/MLADA (accessed on 8 April 2022)).
4.4. Comparison of Computing Time
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Pennington, J.; Socher, R.; Manning, C. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), EMNLP 2014, Qatar, Doha, 25–29 October 2014. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, X.; Huang, W.; Gao, J.; Zhang, S. Cross Modal Few-Shot Contextual Transfer for Heterogenous Image Classification. Front. Neurorobot. 2021, 15, 654519. [Google Scholar] [CrossRef]
- Bao, Y.; Wu, M.; Chang, S.; Barzilay, R. Few-shot Text Classification with Distributional Signatures. arXiv 2020, arXiv:190806039. [Google Scholar]
- Han, C.; Fan, Z.; Zhang, D.; Qiu, M.; Gao, M.; Zhou, A. Meta-Learning Adversarial Domain Adaptation Network for Few-Shot Text Classification. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, ACL-Findings 2021, Online, 1–6 August 2021. [Google Scholar] [CrossRef]
- Pintas, J.T.; Fernandes, L.A.F.; Garcia, A.C.B. Feature selection methods for text classification: A systematic literature review. Artif. Intell. Rev. 2021, 54, 6149–6200. [Google Scholar] [CrossRef]
- Rashid, J.; Adnan, S.; Syed, M.; Irtaza, A. A Novel Fuzzy K-means Latent Semantic Analysis (FKLSA) Approach for Topic Modeling over Medical and Health Text Corpora. J. Intell. Fuzzy Syst. 2019, 37, 6573–6588. [Google Scholar] [CrossRef]
- Lee, J.Y.; Dernoncourt, F. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2016, San Diego, CA, USA, 12–17 June 2016. [Google Scholar] [CrossRef]
- Wang, Y.; Sun, A.; Han, J.; Liu, Y.; Zhu, X. Sentiment Analysis by Capsules. In Proceedings of the International World Wide Web Conferences Steering Committee, Lyon, France, 23–27 April 2018. [Google Scholar] [CrossRef]
- Lai, T.; Cheng, L.; Wang, D.; Ye, H.; Zhang, W. RMAN: Relational multi-head attention neural network for joint extraction of entities and relations. Appl. Intell. 2022, 52, 3132–3142. [Google Scholar] [CrossRef]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite Bert for Self-Supervised Learning of Language Representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding; Curran Associates Inc.: Nice, France, 2021. [Google Scholar] [CrossRef]
- Zhang, Y.; Yu, X.; Cui, Z.; Wu, S.; Wen, Z.; Wang, L. Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar] [CrossRef]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, PMLR 70, Sydney, Australia, 6–11 August 2017. [Google Scholar] [CrossRef]
- Pan, C.; Huang, J.; Gong, J.; Yuan, X. Few-Shot Transfer Learning for Text Classification with Lightweight Word Embedding Based Models. IEEE Access 2019, 7, 53296–53304. [Google Scholar] [CrossRef]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning; Curran Associates Inc.: Nice, France, 2016. [Google Scholar] [CrossRef]
- Nichol, A.; Schulman, J. Reptile: A Scalable Metalearning Algorithm. arXiv 2018, arXiv:1803.02999. [Google Scholar]
- Rusu, A.A.; Rao, D.; Sygnowski, J.; Vinyals, O.; Pascanu, R.; Osindero, S.; Hadsell, R. Meta-Learning with Latent Embedding Optimization. arXiv 2018, arXiv:1807.05960. [Google Scholar]
- Gu, J.; Wang, Y.; Chen, Y.; Li, V.O.K.; Cho, K. Meta-Learning for Low-Resource Neural Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar] [CrossRef]
- Hui, B.; Liu, L.; Chen, J.; Zhou, X.; Nian, Y. Few-shot relation classification by context attention-based prototypical networks with BERT. Eurasip J. Wirel. Commun. Netw. 2020, 2020, 118. [Google Scholar] [CrossRef]
- Pang, N.; Zhao, X.; Wang, W.; Xiao, W.; Guo, D. Few-shot text classification by leveraging bi-directional attention and cross-class knowledge. Sci. China Inf. Sci. 2021, 64, 130103. [Google Scholar] [CrossRef]
- Deng, X.; Li, Y.; Weng, J.; Zhang, J. Feature selection for text classification: A review. Multimedia Tools Appl. 2019, 78, 3797–3816. [Google Scholar] [CrossRef]
- Mirończuk, M.M.; Protasiewicz, J. A recent overview of the state-of-the-art elements of text classification. Expert Syst. Appl. 2018, 106, 36–54. [Google Scholar] [CrossRef]
- Awan, M.N.; Beg, M.O. TOP-Rank: A TopicalPostionRank for Extraction and Classification of Keyphrases in Text. Comput. Speech Lang. 2021, 65, 101116. [Google Scholar] [CrossRef]
- Tang, J.; Wang, X.; Gao, H.; Hu, X.; Liu, H. Enriching short text representation in microblog for clustering. Front. Comput. Sci. 2012, 6, 88–101. [Google Scholar] [CrossRef]
- Khurana, A.; Verma, O.P. Optimal Feature Selection for Imbalanced Text Classification. IEEE Trans. Artif. Intell. 2022; early access. [Google Scholar] [CrossRef]
- Rashid, J.; Adnan, S.; Syed, M.; Irtaza, A. An efficient topic modeling approach for text mining and information retrieval through K-means clustering. Mehran Univ. Res. J. Eng. Technol. 2020, 39, 213–222. [Google Scholar] [CrossRef]
- Behera, B.; Kumaravelan, G. Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN). Soft Comput. 2021, 25, 9915–9923. [Google Scholar] [CrossRef]
- Watanabe, W.M.; Felizardo, K.R.; Candido, A., Jr.; de Souza, E.F.; de Campos Neto, J.E.; Vijaykumar, N.L. Reducing efforts of software engineering systematic literature reviews updates using text classification. Inf. Softw. Technol. 2020, 128, 106395. [Google Scholar] [CrossRef]
- Tang, Z.; Li, W.; Li, Y.; Zhao, W.; Li, S. Several alternative term weighting methods for text representation and classification. Knowl. Based Syst. 2020, 207, 106399. [Google Scholar] [CrossRef]
- Liu, W.; Pang, J.; Du, Q.; Li, N.; Yang, S. A Method of Short Text Representation Fusion with Weighted Word Embeddings and Extended Topic Information. Sensors 2022, 22, 1066. [Google Scholar] [CrossRef]
- Roul, R.K. Topic modeling combined with classification technique for extractive multi-document text summarization. Soft Comput. 2021, 25, 1113–1127. [Google Scholar] [CrossRef]
- Mihalcea, R.; Tarau, P. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004. [Google Scholar]
- Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web; Stanford InfoLab.: Stanford, CA, USA, 1999; Available online: http://ilpubs.stanford.edu:8090/422/ (accessed on 8 April 2022).
- Caliński, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat. 1974, 3, 1–27. [Google Scholar] [CrossRef]
- Nafis, N.S.M.; Awang, S. An Enhanced Hybrid Feature Selection Technique Using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification. IEEE Access 2021, 9, 52177–52192. [Google Scholar] [CrossRef]
- Liwicki, M.; Graves, A.; Bunke, H.; Schmidhuber, J. A Novel Approach to On-Line Handwriting Recognition Based on Bidirectional Long Short-Term Memory Networks. In Proceedings of the 9th International Conference on Document Analysis and Recognition, ICDAR 2007, Curitiba, Brazil, 23–26 September 2007. [Google Scholar]
- Chen, D.; Chen, Y.; Li, Y.; Mao, F.; He, Y.; Xue, H. Self-Supervised Learning for Few-Shot Image Classification. In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (Icassp 2021), Toronto, ON, Canada, 6–11 June 2021; IEEE: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
- Bertinetto, L.; Henriques, J.F.; Torr, P.H.S.; Vedaldi, A. Meta-learning with differentiable closed-form solvers. arXiv 2019, arXiv:180508136. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical Networks for Few-shot Learning. arXiv 2017, arXiv:1703.05175.3295163. [Google Scholar]
Dataset | Training Category | Validation Category | Target Category |
---|---|---|---|
THUCnews | 0, 1, 7, 8, 9, 10 | 4, 5, 6, 12 | 2, 3, 11, 13 |
Fudan news | 0, 2, 4, 6, 7, 9, 11, 17 | 1, 3, 10, 13, 18 | 5, 8, 12, 14, 15, 16, 19 |
Marine news | 0, 1, 7, 8, 9 | 4, 5, 10 | 2, 3, 6 |
Dataset | Category | Average Number of Sentences |
---|---|---|
THUCnews | Sports, Entertainment, Home, Lottery, ticket, Estate, Education, Fashion, Politics, Constellation, Game, Society, Science and Technology, Stock, Economics | 50.36 |
Fudan news | Agriculture, Art, Communication, Computer science, Economy, Education, Electronics, Energy, Environment, History, Law, Literature, Pharmacy, Military, Mining industry, Philosophy, Politics, Space, Sports, Transport | 92.97 |
Marine news | Marine Equipment, The Blue Economy, Historical culture, Education, Marine Engineering Universities, Marine Military, Marine communication, Inertnet + Ocean, Biotechnology, Travel, High technology | 40.78 |
Method | Marine Sum13 | THUCnews Sum8 | Fudan Sum15 | |||
---|---|---|---|---|---|---|
3-1 | 3-3 | 4-1 | 4-4 | 5-1 | 5-5 | |
MAML | 36.02 | 39.06 | 28.01 | 35.68 | 21.45 | 21.14 |
Proto | 62.02 | 78.03 | 47.90 | 64.90 | 51.64 | 65.33 |
R2D2 | 69.08 | 83.39 | 59.26 | 80.50 | 56.43 | 83.00 |
MLADA | 69.07 | 82.44 | 54.16 | 76.91 | 53.08 | 80.76 |
Ours | 69.22 | 83.35 | 59.51 | 80.51 | 56.68 | 83.17 |
Dataset | N-K | Weight Generator | ||||
---|---|---|---|---|---|---|
IDF-ATT | IWF-ATT | IDF-IWF-ATT | TF-IDF | IDF-IWF | ||
THU-Sum8 | 2-1 | 73.73 | 73.85 | 73.83 | 74.27 | 73.8 |
2-2 | 82.65 | 83.17 | 82.92 | 83.29 | 82.51 | |
4-1 | 59.09 | 59.51 | 58.96 | 59.09 | 58.7 | |
4-4 | 80.37 | 80.51 | 80.27 | 80.31 | 79.96 | |
Fudan-Sum15 | 2-1 | 78.73 | 78.35 | 79.07 | 78.34 | 78.77 |
2-2 | 86.24 | 86.21 | 86.28 | 86.09 | 86.25 | |
5-1 | 56.67 | 56.42 | 56.68 | 56.21 | 56.53 | |
5-5 | 83.13 | 83.06 | 83.17 | 83.14 | 83.16 | |
Marine-Sum13 | 2-1 | 79.48 | 79.31 | 79.20 | 79.2 | 79.35 |
2-2 | 87.04 | 87.00 | 87.03 | 86.73 | 87.18 | |
3-1 | 69.22 | 69.44 | 69.17 | 68.81 | 69.33 | |
3-3 | 83.35 | 83.23 | 83.33 | 83.16 | 83.33 |
Dataset | The Number of Sentences | The Average Length of Texts | ||
---|---|---|---|---|
Sum-H | Raw | Sum-H | Raw | |
THUCnews | 8 | 50.36 | 240 | 382 |
Fudan news | 15 | 92.97 | 390 | 941 |
Marine news | 13 | 40.78 | 151 | 360 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, D.; Wang, Z.; Cheng, L.; Zhang, W. Few-Shot Text Classification with Global–Local Feature Information. Sensors 2022, 22, 4420. https://doi.org/10.3390/s22124420
Wang D, Wang Z, Cheng L, Zhang W. Few-Shot Text Classification with Global–Local Feature Information. Sensors. 2022; 22(12):4420. https://doi.org/10.3390/s22124420
Chicago/Turabian StyleWang, Depei, Zhuowei Wang, Lianglun Cheng, and Weiwen Zhang. 2022. "Few-Shot Text Classification with Global–Local Feature Information" Sensors 22, no. 12: 4420. https://doi.org/10.3390/s22124420
APA StyleWang, D., Wang, Z., Cheng, L., & Zhang, W. (2022). Few-Shot Text Classification with Global–Local Feature Information. Sensors, 22(12), 4420. https://doi.org/10.3390/s22124420