融合BERT词嵌入表示和主题信息增强的自动摘要模型

计算机科学 ›› 2022, Vol. 49 ›› Issue (6): 313-318.doi: 10.11896/jsjkx.210400101

融合BERT词嵌入表示和主题信息增强的自动摘要模型

郭雨欣¹, 陈秀宏²

1 江南大学人工智能与计算机学院江苏无锡 214122
2 江苏省媒体设计与软件技术重点实验室江苏无锡 214122

收稿日期:2021-04-10 修回日期:2021-07-25 出版日期:2022-06-15 发布日期:2022-06-08
通讯作者: 陈秀宏(625325682@163.com)
作者简介:(1171784997@qq.com)
基金资助:
江苏省研究生科研与实践创新计划项目(JNKY19＿074)

Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement

GUO Yu-xin¹, CHEN Xiu-hong²

1 School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi,Jiangsu 214122,China
2 Jiangsu Key Laboratory of Media Design and Software Technology,Wuxi,Jiangsu 214122,China

Received:2021-04-10 Revised:2021-07-25 Online:2022-06-15 Published:2022-06-08
About author:GUO Yu-xin,born in 1997,postgra-duate.Her main research interests include natural language processing and text summarization.
CHEN Xiu-hong,born in 1964,Ph.D supervisor.His main research interests include pattern recognition and intelligent computing,etc.
Supported by:
Jiangsu Postgraduate Research and Practice Innovation Program(JNKY19＿074).

摘要/Abstract

摘要： 自动文本摘要能够帮助人们快速筛选、辨别信息,掌握新闻关键内容,缓解信息过载问题。主流的生成式自动文摘模型主要基于编码器-解码器架构。针对解码器端在预测目标词时未充分考虑文本主题信息,并且传统的Word2Vec静态词向量无法解决一词多义问题的现状,提出了一种融合BERT词嵌入表示和主题信息增强的中文短新闻自动摘要模型。编码器端联合无监督算法获取文本主题信息并将其融入注意力机制中,提升了模型的解码效果;解码器端将BERT预训练语言模型抽取出的BERT句向量作为补充特征,以获取更多的语义信息,同时引入指针机制来解决词表外的单词问题,并利用覆盖机制有效抑制重复。在训练过程中,为了避免暴露偏差问题,针对不可微指标ROUGE采用强化学习方法来优化模型。在两个中文短新闻摘要数据集上的多组对比实验结果表明,该模型在ROUGE评价指标上有显著的提升,能有效融合文本主题信息,生成语句流畅、简明扼要的摘要。

关键词: BERT, 强化学习, 生成式摘要, 主题信息, 注意力机制

Abstract: Automatic text summarization can help people to filter and identify information quickly,grasp the key content of news,and alleviate the problem of information overload.The mainstream abstractive summarization model is mainly based on the encoder-decoder architecture.In view of the fact that the decoder does not fully consider the text topic information when predicting the target word,and the traditional Word2Vec static word vector cannot solve the polysemy problem,an automatic summarization model for Chinese short news is proposed,which integrates the BERT word embedding representation and topic information enhancing.The encoder combines unsupervised algorithm to obtain text topic information and integrates it into the attention mechanism to improve the decoding effect of the model.At the decoder side,the BERT sentence vector extracted from the BERT pre-trained language model is used as the supplementary feature to obtain more semantic information.Meanwhile,pointer mechanism is introduced to solve the problem of out of vocabulary,and coverage mechanism is used to suppress repetition effectively.Finally,in the training process,reinforcement learning method is adopted to optimize the model for non-differentiable index ROUGE to avoid exposing bias.Experimental results on two datasets of Chinese short news summarization show that the proposed model can significantly improve the ROUGE evaluation index,effectively integrate text topic information,and generate fluent and concise summaries.

Key words: Abstractive summarization, Attention mechanism, BERT, Reinforcement learning, Topic information

中图分类号:

TP391.1

郭雨欣, 陈秀宏. 融合BERT词嵌入表示和主题信息增强的自动摘要模型[J]. 计算机科学, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101

GUO Yu-xin, CHEN Xiu-hong. Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement[J]. Computer Science, 2022, 49(6): 313-318. https://doi.org/10.11896/jsjkx.210400101

参考文献

[1] HU X,LIN Y,WANG C.Overview of automatic textsum-mingtechnology[J].Journal of Information,2010,29(8):144-147.
[2] RUSH A M,CHOPRA S,WESTON J.A neural attention model for abstractive sentence summarization[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:379-389.
[3] PAULUS R,XIONG C,SOCHER R.A deep reinforced modelfor abstractive summarization[J].arXiv:1705.04304,2018.
[4] CHOPRA S,AULI M,RUSH A M.Abstractive sentence summarization with attentive recurrent neural networks[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:93-98.
[5] NALLAPATI R,ZHOU B W,GULCEHRE C,et al.Abstrac-tive text summarization using sequence-to-sequence RNNs and beyond[C]//Proceedings of the 20th SIGNLL Conference on Computatioal Natural Language Learning.Stroudsburg:Association for Computational Linguistics.2016:280-290.
[6] GU J,LU Z,LI H,et al.Incorporating copying mechanism in sequence-to-sequence learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016:1631-1640.
[7] ZENG W,LUO W,FIDLER S,et al.Efficient summarizationwith read-again and copy mechanism[J].arXiv:1611.03382,2016.
[8] SEE A,LIU P J,MANNING C D.Get to the point summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meetings of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,2017:1073-1083.
[9] JONAS G,MICHAEL A,DAVID G,et al.Convolutional se-quence to sequence learning[J].arXiv:1705.03122,2017.
[10] WANG L,YAO J L,TAO Y Z,et al.A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization[C]//Proceedings of the Twenty-Seventh Inter-national Joint Conference on Artificial Intelligence.2018:4453-4460.
[11] WANG Q S,ZHANG H,LI F.Automatic Summary Generation Method Based on Multidimensional Text Feature[J].Computer Engineering,2020,46(9):110-116.
[12] ILYA S,ORIOLV,QUOCV L.Sequence to sequence learningwith neural networks[C]//Advances in Neural Information Processing Systems.2014:3104-3112.
[13] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[14] JACOB D,CHANG M,KENTON L,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1801.04805v2,2018.
[15] ASHISH V,NOAM S,NIKI P,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[16] LUHN H P.The Automatic Creation of Literature Abstracts[J].IBM Journal of Research and Development,1958,2(2):159-165.
[17] AIZAWA A.An information-theoretic perspective of tf-idfmeasures[J].Information Processing & Management,2003,39(1):45-65.
[18] HOU L W,HU P,CAO W L.Automatic Chinese abstractive summarization with topical keywords fusion[J].Acta Automa-tica Sinica,2019,45(3):530-539.
[19] WILLIAMS R J,ZIPSER D.A learning algorithm for continually running fully recurrent neural networks[J].Neural Computation,1998,1(2):270-280.
[20] RENNIE S J,MARCHERET E,MROUEH Y,et al.Self-Critical Sequence Training for Image Captioning[C]//Proceedings of the 2017 Conference of the IEEE Computer Vision and Pattern Recognition.2017:1179-1195.
[21] LIU C,LOWE R,SERBAN I,et al.How NOT To EvaluateYour Dialogue System:An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,Association for Computational Linguistics.2016:2122-2132.
[22] HU B,CHEN Q,ZHU F.LCSTS:A Large Scale Chinese Short Text Summarization Dataset[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.2015:1967-1972.
[23] MIHALCEA R,TARAU P.TextRank:Bringing Order intoTexts[C]//Conference on Empirical Methods in Natural Language Processing.2004:404-411.
[24] XIE N,LI S,REN H,et al.Abstractive summarization improved by WordNet-based extractive sentences[C]//CCF International Conference on Natural Language Processing and Chinese Computing.Cham:Springer,2018:404-415.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[3]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[4]	戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[5]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[6]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[7]	史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[8]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[9]	汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[10]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[11]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[12]	袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[13]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[14]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[15]	熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚. 融合双向门控循环单元和注意力机制的软件自承认技术债识别方法 Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism 计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed