[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3617695.3617707acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbdiotConference Proceedingsconference-collections
research-article

Heterogeneous-training: A Semi-supervised Text Classification Method

Published: 02 November 2023 Publication History

Abstract

With the advent of the information age, there are more and more text data on the Internet. As the most widely distributed information carrier with the largest amount of data, it is particularly important to use text classification technology to organize and manage massive data scientifically. In this paper, a semi-supervised ensemble learning algorithm Heterogeneous-training is proposed and applied to the field of text classification. Based on the Tri-training algorithm, the Heterogeneous-training algorithm improves the traditional Tri-training algorithm by using different classifiers, dynamically updating the probability threshold and adaptively editing data. A large number of experiments show that our method always outperforms Tri-training algorithm in text classification on benchmark text data sets.

References

[1]
Agichtein E, Eskin E, Gravano L. Combining Strategies for Extracting Relations from Text Collections. 2000
[2]
Li X, Roth D, Tu Y. [Association for Computational Linguistics the seventh conference - Edmonton, Canada (2003.05.31-.)] Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - - PhraseNet. 2003; 4: 87-94
[3]
Tao F, Zhang C, Chen X, Doc 2 Cube : Automated Document Allocation to Text Cube via Dimension-Aware Joint Embedding. 2018
[4]
Meng Y, Shen J, Zhang C, Han J. Weakly-Supervised Neural Text Classification. Proceedings of the 27th ACM International Conference on Information and Knowledge Management 2018
[5]
Mekala D, Shang J. Contextualized Weak Supervision for Text Classification. In: Association for Computational Linguistics; 2020; Online:323–333
[6]
Dietterich TG. Ensemble Methods in Machine Learning. 2000
[7]
Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. 1998
[8]
Qian Z, Huailiang L. An Algorithm of Short Text Classification Based on Semi-supervised Learning. Data Analysis and Knowledge Discovery 2013;29: 30-35.
[9]
Goldman SA, Zhou Y. Enhancing Supervised Learning with Unlabeled Data. 2000
[10]
Zhou ZH, Li M. Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 2005; 17:1529-1541
[11]
Saito K, Ushiku Y, Harada T. Asymmetric Tri-training for Unsupervised Domain Adaptation. 2017
[12]
Miyato T, Dai AM, Goodfellow IJ. Adversarial Training Methods for Semi-Supervised Text Classification. arXiv: Machine Learning 2017
[13]
Clark K, Luong MT, Manning CD, Le Q. Semi-Supervised Sequence Modeling with Cross-View Training. In: Association for ComputationalLinguistics; 2018; Brussels, Belgium: 1914–1925
[14]
Sachan DS, Zaheer M, Salakhutdinov R. Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function. 2019
[15]
Miyato T, Maeda iS, Koyama M, Ishii S. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning.IEEE Transactions on Pattern Analysis and Machine Intelligence 2019; 41: 1979-1993
[16]
Xie Q, Dai Z, Hovy E, Luong T, Le Q. Unsupervised Data Augmentation for Consistency Training. In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H., eds. Advances in Neural Information Processing Systems. 33. Curran Associates, Inc. 2020: 6256–6268
[17]
Chen J, Yang Z, Yang D. MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification. ArXiv 2020; abs/2004.12239
[18]
Ganaie MA, Hu M, Tanveer M, Suganthan PN. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022; 115: 105151.
[19]
Zhou Z. Machine Learning. China Civil And Commercial 2016; 03(No.21): 93-93
[20]
Opitz D, Maclin R. Popular Ensemble Methods: An Empirical Study. J. Artif. Int. Res. 1999; 11(1): 169–198
[21]
Dasgupta S, Littman ML, McAllester DA. PAC Generalization Bounds for Co-training. In: Dietterich TG, Becker S, Ghahramani Z., eds. Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3-8, 2001, Vancouver, British Columbia, Canada]MIT Press; 2001: 375-382
[22]
Sun S. Local within-class accuracies for weighting individual outputs in multiple classifier systems. Pattern Recognit. Lett. 2010; 31: 119-124
[23]
Wang S, Minku LL, Yao X. Resampling-Based Ensemble Methods for Online Class Imbalance Learning. IEEE Transactions on Knowledge and Data Engineering 2015; 27: 1356-1368
[24]
Belkin M, Niyogi P, Sindhwani V. On Manifold Regularization. 2005
[25]
Zhu X. Semi-Supervised Learning Literature Survey. 2005
[26]
Zhang X, Zhao J, LeCun Y. Character-level Convolutional Networks for Text Classification. arXiv: Learning 2015
[27]
Wan M, McAuley J. Item recommendation on monotonic behavior chains. Proceedings of the 12th ACM Conference on Recommender Systems 2018
[28]
Wan M, Misra R, Nakashole N, McAuley J. Fine-Grained Spoiler Detection from Large-Scale Review Corpora. In: Association for Computational Linguistics; 2019; Florence, Italy: 2605–2610
[29]
Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
BDIOT '23: Proceedings of the 2023 7th International Conference on Big Data and Internet of Things
August 2023
232 pages
ISBN:9798400708015
DOI:10.1145/3617695
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Ensemble Learning
  2. Machine Learning
  3. Semi-supervised Learning
  4. Text-classification

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • the Science and Technology Program of Sichuan Province
  • the Opening Project of Intelligent Policing Key Laboratory of Sichuan Province
  • the Opening Project of Intelligent Policing Key Laboratory of Sichuan Province

Conference

BDIOT 2023

Acceptance Rates

Overall Acceptance Rate 75 of 136 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 24
    Total Downloads
  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media