计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 144-149.doi: 10.11896/jsjkx.210500205
林夕, 陈孜卓, 王中卿
LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing
摘要: 情感分类一直是自然语言处理领域的重要研究部分。该任务一般是将带有情感色彩的样本分类成正类和负类两种类别。在很多理论模型中,都假设正负类数据样本是平衡的,而在现实中正负类样本一般是不平衡的。提出一种基于属性级的LSTM集成学习的方法,针对不平衡样本数据进行属性级情感分类。首先,对数据集进行欠采样处理,将其分成多组;其次,为每组数据分配一种分类算法进行训练;最后,将多组模型融合,得到最终分类结果。一系列的实验结果显示,基于属性级的LSTM集成学习的方法明显提高了分类的准确性,其性能优于传统的LSTM模型分类方法。
中图分类号:
[1] ZHAO Y Y,QIN B,LIU T.Text sentiment analysis[J].Journal of Software,2010,21(8):1834-1848. [2] BARANDELA R,SANCHEZ B J S,GARCIA V,et al.Strategies for learning in class imbalance problems[J].Pattern Recognition,2003,36(3):849-851. [3] HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780. [4] TANG D,QIN B,FENG X,et al.Effective LSTMs for Target-Dependent Sentiment Classification[J].arXiv:1512.01100,2015. [5] XU F,PAN Z,XIA R.E-commerce product review sentiment classification based on a naïve Bayes continuous learning framework[J].Information Processing & Management,2020,57(5):102221. [6] MULLEN T,COLLIER N.Sentiment analysis using supportvector machines with diverseinformation sources[C]//Procee-dings of the 2004 Conference on Empirical Methods in Natural Language Processing.2004:412-418. [7] XIE X,GE S,HU F,et al.An improved algorithm for sentiment analysis based on maximum entropy[J].Soft Computing,2019,23(2):599-611. [8] PANG B,LEE L,VAITHYANATHAN S.Thumbs up? Sentiment Classification using Machine Learning Techniques[C]//2002 Conference on Empirical Methods in Natural Language Processing.2002:79-86. [9] JAYANAG B,VINEELA K,VASAVI S.Feature Subsumption for Sentiment Classification of Dynamic Data in Social Networks using SCDDF[J].International Journal of Advanced Computer Science and Applications,2012,3(9):1575-1605. [10] GRAVES A.Supervised sequence labelling with recurrent neural networks [M].Berlin:Springer,2012. [11] LONG F,ZHOU K,OU W.Sentiment analysis of text based on bidirectional LSTM with multi-head attention[J].IEEE Access,2019,7:141960-141969 [12] WANG Y,HUANG M,ZHU X,et al.Attention-based LSTM for Aspect-level Sentiment Classification[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016. [13] WU Z,ONG D C.Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis[J].arXiv:2010.07523,2020. [14] JIANG N,TIAN F,LI J,et al.MAN:Mutual Attention Neural Networks Model for Aspect-Level Sentiment Classification in SIoT[J].IEEE Internet of Things Journal,2020,7(4):2901-2913. [15] WANG Z H,WANG Z Q,LI S S,et al.Feature Selection for Imbalanced Sentiment Classification[J].Journal of Chinese Information Processing,2013,27(4):113-119. [16] YE F,JIANG Y S.Unbalanced classification method based on clustering and under-sampling[J].Computer Application and Software,2020,37(1):298-303. [17] LIN W C.Clustering-based undersampling in class-imbalanced data[J].Information Sciences,2017,409-410:17-26. [18] LIU X Y,WU J,ZHOU Z H.Exploratory Undersampling for Class-Imbalance Learning[J].IEEE Transactions on Systems Man & Cybernetics Part B,2009,39(2):539-550. [19] KITTLER J,HATEF M.On combining classifiers[J].IEEETransactions on Pattern Analysis & Machine Intelligence,1998,20(3):226-239. [20] LI J,LUONG M T,JURAFSKY D,et al.When Are Tree Structures Necessary for Deep Learning of Representations?[C]//The 2015 Conference on Empirical Methods in Natural Language Processing.2015:2304-2314. [21] BAHDANAU D,CHO K,BENGIO Y.Neural Machine Translation by Jointly Learning to Align and Translate[J].arXiv:1409.0473,2014. |
[1] | 张源, 康乐, 宫朝辉, 张志鸿. 基于Bi-LSTM的期货市场关联交易行为检测方法 Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM 计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304 |
[2] | 王杉, 徐楚怡, 师春香, 张瑛. 基于CNN-LSTM的卫星云图云分类方法研究 Study on Cloud Classification Method of Satellite Cloud Images Based on CNN-LSTM 计算机科学, 2022, 49(6A): 675-679. https://doi.org/10.11896/jsjkx.210300177 |
[3] | 于家畦, 康晓东, 白程程, 刘汉卿. 一种新的中文电子病历文本检索模型 New Text Retrieval Model of Chinese Electronic Medical Records 计算机科学, 2022, 49(6A): 32-38. https://doi.org/10.11896/jsjkx.210400198 |
[4] | 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065 |
[5] | 王宇飞, 陈文. 基于DECORATE集成学习与置信度评估的Tri-training算法 Tri-training Algorithm Based on DECORATE Ensemble Learning and Credibility Assessment 计算机科学, 2022, 49(6): 127-133. https://doi.org/10.11896/jsjkx.211100043 |
[6] | 韩红旗, 冉亚鑫, 张运良, 桂婕, 高雄, 易梦琳. 基于共同子空间分类学习的跨媒体检索研究 Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning 计算机科学, 2022, 49(5): 33-42. https://doi.org/10.11896/jsjkx.210200157 |
[7] | 董奇达, 王喆, 吴松洋. 结合注意力机制与几何信息的特征融合框架 Feature Fusion Framework Combining Attention Mechanism and Geometric Information 计算机科学, 2022, 49(5): 129-134. https://doi.org/10.11896/jsjkx.210300180 |
[8] | 李浩, 张兰, 杨兵, 杨海潇, 寇勇奇, 王飞, 康雁. 融合双重权重机制和图卷积神经网络的微博细粒度情感分类 Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network 计算机科学, 2022, 49(3): 246-254. https://doi.org/10.11896/jsjkx.201200073 |
[9] | 潘志豪, 曾碧, 廖文雄, 魏鹏飞, 文松. 基于交互注意力图卷积网络的方面情感分类 Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification 计算机科学, 2022, 49(3): 294-300. https://doi.org/10.11896/jsjkx.210100180 |
[10] | 任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132 |
[11] | 陈伟, 李杭, 李维华. 核小体定位预测的集成学习方法 Ensemble Learning Method for Nucleosome Localization Prediction 计算机科学, 2022, 49(2): 285-291. https://doi.org/10.11896/jsjkx.201100195 |
[12] | 刘振宇, 宋晓莹. 一种可用于分类型属性数据的多变量回归森林 Multivariate Regression Forest for Categorical Attribute Data 计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189 |
[13] | 袁景凌, 丁远远, 盛德明, 李琳. 基于视觉方面注意力的图像文本情感分析模型 Image-Text Sentiment Analysis Model Based on Visual Aspect Attention 计算机科学, 2022, 49(1): 219-224. https://doi.org/10.11896/jsjkx.201000074 |
[14] | 周新民, 胡宜桂, 刘文洁, 孙荣俊. 基于多模态多层级数据融合方法的城市功能识别研究 Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method 计算机科学, 2021, 48(9): 50-58. https://doi.org/10.11896/jsjkx.210500220 |
[15] | 郑建华, 李小敏, 刘双印, 李迪. 融合级联上采样与下采样的改进随机森林不平衡数据分类算法 Improved Random Forest Imbalance Data Classification Algorithm Combining Cascaded Up-sampling and Down-sampling 计算机科学, 2021, 48(7): 145-154. https://doi.org/10.11896/jsjkx.200800120 |
|