DUKMSVM: A Framework of Deep Uniform Kernel Mapping Support Vector Machine for Short Text Classification
<p>The architecture of deep uniform kernel mapping support vector machine (D<span class="html-italic">U</span>KMSVM).</p> "> Figure 2
<p>The architecture of multi-layer perceptron kernel mapping (DMKMSVM).</p> "> Figure 3
<p>Schematic diagram of DRKMSVM model.</p> "> Figure 4
<p>Word vector similarity diagram. (<b>a</b>) The offset of two pairs of words. (<b>b</b>) The offset of two pairs of words under different projections.</p> "> Figure 5
<p>The structure of bidirectional recurrent neural network (BRNN).</p> "> Figure 6
<p>GRU cell.</p> "> Figure 7
<p>Performance of DRKMSVM using different recurrent structures.</p> ">
Abstract
:1. Introduction
2. The Framework of Deep Uniform Kernel Mapping Support Vector Machine (DUKMSVM)
Algorithm 1 Joint training of DMKMSVM |
Input: Training set , network architecture, max number of epoch |
Output: |
Stage One: Pre-training |
1. Randomly initialize , , ; |
2. Learn and for every of and , ; |
Stage Two: Fine-tuning |
3. For |
4. For l=1 to N do |
5. Compute |
6. Compute |
7. Compute |
8. Compute |
9. Compute |
10. Compute |
11. Compute |
12. Update weights and biases: |
13. End For |
14. End For |
3. Deep Recurrent Kernel Mapping SVM (DRKMSVM) for Short Text Classification
3.1. Representing the Short Text with Word Vector
3.2. Representing Kernel Mapping with BRNN
3.3. Classifying with SVM
4. Experimental Results
4.1. Datasets
- MR (Movie Review) (http://www.cs.cornell.edu/people/pabo/movie-review-data/): This dataset is often refered to as polarity dataset, it intends to divide movie reviews into positive or negative reviews.
- CR (Custom Review) (https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#datasets/): This dataset intends to predict whether a review is good or bad.
- Subj (Subject) (http://www.cs.cornell.edu/people/pabo/movie-review-data/): It is known as subjectivity dataset and its categorization purpose lies in predicting whether the sentence is subjective or objective.
- MPQA (Multiple Perspective QA) (http://mpqa.cs.pitt.edu/): This dataset contains new articles from a wide variety. In this paper, we choose opinion polarity detection subtask of the MPQA dataset, it aims to determine whether an opinion sentence is positive or negative.
- TREC (Text Retrieval Conference) (https://cogcomp.seas.upenn.edu/Data/QA/QC/): This dataset is composed of 6 types of problems, which aims to predict which category of problem it belongs to.
4.2. Parameter Settings of DRKMSVM, DNMSVM, CNN and SVM
4.3. Comparison with DMKMSVM-2, CNN, RBF-SVM and NB
4.4. Influence of the Structure of Recurrent Network on DRKMSVM
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Ren, F.; Deng, J. Background Knowledge Based Multi-Stream Neural Network for Text Classification. Appl. Sci. 2018, 8, 2472. [Google Scholar] [CrossRef] [Green Version]
- Nigam, K.; McCallum, A.K.; Thrun, S.; Mitchell, T. Text classification from labeled and unlabeled documents using EM. Mach. Learn. 2000, 39, 103–134. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Duan, Q. A Feature Selection Method for Multi-Label Text Based on Feature Importance. Appl. Sci. 2019, 9, 665. [Google Scholar] [CrossRef] [Green Version]
- Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T. Bag of Tricks for Efficient Text Classification. arXiv 2016, arXiv:1607.01759. [Google Scholar]
- Li, F.; Yin, Y.; Shi, J.; Mao, X.; Shi, R. Method of Feature Reduction in Short Text Classification Based on Feature Clustering. Appl. Sci. 2019, 9, 1578. [Google Scholar] [CrossRef] [Green Version]
- Forman, G. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 2003, 3, 1289–1305. [Google Scholar]
- Aggarwal, C.C.; Zhai, C. A Survey of Text Classification Algorithms. In Mining Text Data; Springer US: Boston, MA, USA, 2012; pp. 163–222. [Google Scholar]
- Kim, S.B.; Han, K.S.; Rim, H.C.; Myaeng, S.H. Some effective techniques for naive bayes text classification. IEEE Trans. Knowl. Data Eng. 2006, 18, 1457–1466. [Google Scholar]
- Sun, A. Short text classification using very few words. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, OR, USA, 12–16 August 2012; pp. 1145–1146. [Google Scholar]
- Lin, Z.; Yan, L. A support vector machine classifier based on a new kernel function model for hyperspectral data. GIS. Remote Sens. 2016, 53, 85–101. [Google Scholar] [CrossRef]
- Amari, S.; Wu, S. Improving support vector machine classifiers by modifying kernel functions. Neur. Net. 1999, 12, 783–789. [Google Scholar] [CrossRef]
- Cassel, M.; Lima, F. Evaluating one-hot encoding finite state machines for SEU reliability in SRAM-based FPGAs. In Proceedings of the 12th International On-Line Testing Symposium, Lake of Como, Italy, 10–12 July 2006; pp. 1–6. [Google Scholar]
- Zhang, Y.; Jin, R.; Zhou, Z.H. Understanding bag-of-words model: A statistical framework. Int. J. Mach. Learn. Cybern. 2010, 1, 43–52. [Google Scholar] [CrossRef]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [Green Version]
- Faruqui, M.; Tsvetkov, Y.; Yogatama, D.; Dyer, C.; Smith, N. Sparse overcomplete word vector representations. arXiv 2015, arXiv:1506.02004. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Deng, W.W.; Peng, H. Research on a naive bayesian based short message filtering system. In Proceedings of the International Conference on Machine Learning and Cybernetics, Dalian, China, 13–16 August 2006; pp. 1233–1237. [Google Scholar]
- Schneider, K.M. Techniques for improving the performance of naive bayes for text classification. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, Mexico, 13–19 February 2005; pp. 682–693. [Google Scholar]
- Zhao, W.; Meng, L.; Zhao, H.; Wang, C. Improvement and Applications of the Naive Algorithm. Meas. Control. Technol. 2016, 35, 143–147. [Google Scholar]
- Khamar, K. Short text classification using kNN based on distance function. Int. J. Adv. Res. Comput. Commun. Eng. 2013, 2, 1916–1919. [Google Scholar]
- Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. Using kNN model for automatic text categorization. Soft Comput. 2006, 10, 423–430. [Google Scholar] [CrossRef]
- Shi, K.; Li, L.; Liu, H.; He, J.; Zhang, N.; Song, W. An improved KNN text classification algorithm based on density. In Proceedings of the International Conference on Cloud Computing and Intelligence Systems, Beijing, China, 15–17 September 2011; pp. 113–117. [Google Scholar]
- Yin, C.; Xiang, J.; Zhang, H.; Wang, J.; Yin, Z.; Kim, J. A new SVM method for short text classification based on semi-supervised learning. In Proceedings of the 4th International Conference on Advanced Information Technology and Sensor Application, Harbin, China, 21–23 August 2015; pp. 100–103. [Google Scholar]
- Song, G.; Ye, Y.; Du, X.; Huang, X.; Bie, S. Short text classification: A survey. J. Multim. 2014, 9, 635. [Google Scholar] [CrossRef]
- Sanchez, A.V.D. Advanced support vector machines and kernel methods. Neurocomputing 2003, 55, 5–20. [Google Scholar] [CrossRef]
- Hassan, A.; Mahmood, A. Efficient Deep Learning Model for Text Classification Based on Recurrent and Convolutional Layers. In Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 18–21 December 2017; pp. 1108–1113. [Google Scholar]
- Tang, D.; Wei, F.; Yang, N.; Zhou, M.; Liu, T.; Qin, B. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, 22–27 June 2014; pp. 1555–1565. [Google Scholar]
- Kim, Y. Convolutional neural networks for sentence classification. In Proceedings of the International Conference on empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Er, M.J.; Zhang, Y.; Wang, N.; Pratama, M. Attention pooling based convolutional neural network for sentence modelling. Inf. Sci. 2016, 373, 388–403. [Google Scholar] [CrossRef]
- Shen, T.; Zhou, T.; Long, G.; Jiang, J.; Pan, S.; Zhang, C. DiSAN: Directional self-attention network for rnn/cnn-free language understanding. arXiv 2017, arXiv:1709.04696. [Google Scholar]
- Zhou, C.; Sun, C.; Liu, Z.; Lau, F. A C-LSTM Neural Network for Text Classification. Compos. Sci. 2015, 1, 39–44. [Google Scholar]
- Olabiyi, O.; Martinson, E.; Chintalapudi, V.; Guo, R. Driver Action Prediction Using Deep (Bidirectional) Recurrent Neural Network. arXiv 2017, arXiv:1706.02257. [Google Scholar]
- Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Zhang, T. Deep neural mapping support vector machines. Neural Netw. 2017, 93, 185–194. [Google Scholar] [CrossRef]
- Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef] [Green Version]
- Chung, J.; Gulcehre, C.; Cho, K.H.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
- Goldberg, Y.; Levy, O. Word2vec Explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv 2014, arXiv:1402.3722. [Google Scholar]
- Lipton, Z.C.; Berkowitz, J.; Elkan, C. A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv 2015, arXiv:1506.00019. [Google Scholar]
- Cho, K.; Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv 2016, arXiv:1603.04467. [Google Scholar]
- Kim, Y. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
- Story, M.; Congalton, R.G. Accuracy assessment: A user’s perspective. Photogramm. Eng. Remote Sens. 1986, 52, 397–399. [Google Scholar]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
- Davis, J.; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]
- Sammut, C.; Webb, G.I. F1-Measure. In Encyclopedia of Machine Learning and Data Mining; Springer US: Boston, MA, USA, 2017; p. 497. [Google Scholar]
Word | Before Fine-Tuning | After Fine-Tuning |
---|---|---|
good | great | great |
nice | bad | |
terrific | terrific | |
decent | decent | |
bad | lousy | good |
horrible | terrible | |
crummy | horrible | |
lousy | lousy | |
funny | hilarious | hilarious |
witty | humorous | |
comical | hilariously_funny | |
sarcastic | amusing | |
boring | dull | dull |
uninteresting | uninteresting | |
monotonous | monotonous | |
pointless | bored | |
but | so | although |
yet | though | |
although | because | |
too | so | |
not | do | do |
neither | did | |
however | anymore | |
either | necessarily |
Name | Abbreviation | TSN | CN | ASL | DL | NVSS |
---|---|---|---|---|---|---|
Movie Review | MR | 9596 | 2 | 111 | 18,765 | 1000 |
Custom Review | CR | 3397 | 2 | 93 | 5046 | 378 |
Subject | Subj | 9000 | 2 | 125 | 17,913 | 1000 |
Opinion polarity dataset | MPQA | 9546 | 2 | 18 | 5046 | 1060 |
TREC QA | TREC | 5452 | 6 | 60 | 6083 | 500 |
Hyper-Parameter | Value |
---|---|
Number of hidden layers | 1 |
Number of hidden notes | 300 |
Learning rate | 0.001 |
Mini-batch | 64 |
Dropout | 0.5 |
C | 64 |
Hyper-Parameter | Value |
---|---|
Number of hidden notes | 100 |
Learning rate | 0.001 |
Mini-batch | 64 |
Dropout | 0.5 |
C | 64 |
Hyper-Parameters | Value |
---|---|
Size of convolutional kernel | 3300,4300,5300 |
Number of convolutional kernel | 128,128,128 |
Learning rate | 0.001 |
Minibatch | 64 |
Dropout | 0.5 |
Datasets | C | |
---|---|---|
MR | 0.1 | 8 |
CR | 0.1 | 8 |
MPQA | 0.1 | 8 |
Subj | 0.01 | 64 |
TREC | 0.01 | 64 |
Methods | Datasets | Accuracy (%) | Precision (%) | Recall (%) | -Score (%) |
---|---|---|---|---|---|
DRKMSVM | MR | 78.91 | 78.91 | 78.96 | 78.94 |
CR | 79.37 | 77.96 | 77.79 | 77.89 | |
Subj | 92.10 | 92.24 | 91.90 | 92.03 | |
MPQA | 89.16 | 88.57 | 86.36 | 87.32 | |
TREC | 96.61 | 97.26 | 95.64 | 96.36 | |
Average | 87.23 | 86.99 | 86.13 | 86.51 | |
DMKMSVM-2 | MR | 78.83 | 78.90 | 78.77 | 78.78 |
CR | 80.77 | 79.93 | 77.79 | 78.49 | |
Subj | 92.44 | 92.43 | 92.45 | 92.43 | |
MPQA | 86.92 | 85.95 | 82.82 | 84.08 | |
TREC | 87.20 | 89.40 | 83.776 | 85.88 | |
Average | 85.23 | 85.32 | 83.12 | 83.93 | |
CNN | MR | 77.24 | 72.38 | 72.27 | 72.19 |
CR | 77.72 | 77.58 | 73.70 | 74.65 | |
Subj | 90.83 | 90.78 | 90.83 | 90.79 | |
MPQA | 87.66 | 86.14 | 85.68 | 85.93 | |
TREC | 89.42 | 89.29 | 90.26 | 89.72 | |
Average | 84.57 | 83.23 | 82.54 | 82.66 | |
SVM | MR | 77.21 | 77.22 | 77.20 | 77.28 |
CR | 79.47 | 78.11 | 76.77 | 77.27 | |
Subj | 90.90 | 90.90 | 90.90 | 90.89 | |
MPQA | 86.63 | 85.67 | 82.41 | 83.72 | |
TREC | 81.60 | 83.95 | 81.89 | 82.64 | |
Average | 83.16 | 83.17 | 81.83 | 82.36 | |
NB | MR | 73.66 | 73.65 | 73.66 | 73.65 |
CR | 83.60 | 81.38 | 76.57 | 78.36 | |
Subj | 91.30 | 91.32 | 91.36 | 91.30 | |
MPQA | 86.15 | 87.28 | 79.28 | 81.85 | |
TREC | 77.80 | 67.07 | 66.12 | 66.10 | |
Average | 82.50 | 80.14 | 77.39 | 78.25 |
Methods | Datasets | Accuracy (%) | Precision (%) | Recall (%) | -Score (%) |
---|---|---|---|---|---|
DRKMSVM | MR | 78.91 | 78.91 | 78.96 | 78.94 |
CR | 79.37 | 77.96 | 77.79 | 77.89 | |
Subj | 92.10 | 92.24 | 91.90 | 92.03 | |
MPQA | 89.16 | 88.57 | 86.36 | 87.32 | |
TREC | 96.61 | 97.26 | 95.64 | 96.36 | |
Average | 87.23 | 86.99 | 86.13 | 86.51 | |
DRKMSVM-LSTM | MR | 78.63 | 78.58 | 78.26 | 78.57 |
CR | 77.51 | 75.82 | 75.75 | 75.78 | |
Subj | 91.70 | 91.69 | 91.67 | 91.68 | |
MPQA | 86.33 | 84.39 | 82.97 | 83.62 | |
TREC | 92.61 | 85.53 | 92.46 | 86.72 | |
Average | 85.36 | 83.20 | 84.22 | 83.27 | |
Uni-DRKMSVM | MR | 75.07 | 75.06 | 75.06 | 75.06 |
CR | 78.30 | 77.20 | 77.79 | 77.44 | |
Subj | 88.90 | 88.92 | 88.81 | 88.85 | |
MPQA | 86.99 | 85.42 | 84.39 | 84.87 | |
TREC | 88.78 | 83.21 | 87.88 | 85.86 | |
Average | 85.36 | 83.20 | 84.22 | 83.27 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Z.; Kan, H.; Zhang, T.; Li, Y. DUKMSVM: A Framework of Deep Uniform Kernel Mapping Support Vector Machine for Short Text Classification. Appl. Sci. 2020, 10, 2348. https://doi.org/10.3390/app10072348
Liu Z, Kan H, Zhang T, Li Y. DUKMSVM: A Framework of Deep Uniform Kernel Mapping Support Vector Machine for Short Text Classification. Applied Sciences. 2020; 10(7):2348. https://doi.org/10.3390/app10072348
Chicago/Turabian StyleLiu, Zhaoying, Haipeng Kan, Ting Zhang, and Yujian Li. 2020. "DUKMSVM: A Framework of Deep Uniform Kernel Mapping Support Vector Machine for Short Text Classification" Applied Sciences 10, no. 7: 2348. https://doi.org/10.3390/app10072348
APA StyleLiu, Z., Kan, H., Zhang, T., & Li, Y. (2020). DUKMSVM: A Framework of Deep Uniform Kernel Mapping Support Vector Machine for Short Text Classification. Applied Sciences, 10(7), 2348. https://doi.org/10.3390/app10072348