CN116150334A - Chinese Empathy Sentence Training Method and System Based on UniLM Model and Copy Mechanism - Google Patents
Chinese Empathy Sentence Training Method and System Based on UniLM Model and Copy Mechanism Download PDFInfo
- Publication number
- CN116150334A CN116150334A CN202211591710.7A CN202211591710A CN116150334A CN 116150334 A CN116150334 A CN 116150334A CN 202211591710 A CN202211591710 A CN 202211591710A CN 116150334 A CN116150334 A CN 116150334A
- Authority
- CN
- China
- Prior art keywords
- emotion
- model
- training
- unilm
- replies
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 46
- 230000007246 mechanism Effects 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 title claims abstract description 27
- 150000001875 compounds Chemical class 0.000 claims abstract description 13
- 230000008451 emotion Effects 0.000 claims abstract 4
- 230000006870 function Effects 0.000 claims description 15
- 239000013598 vector Substances 0.000 claims description 14
- 238000011156 evaluation Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000010845 search algorithm Methods 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 claims 3
- 238000004590 computer program Methods 0.000 claims 2
- 230000010076 replication Effects 0.000 claims 2
- 230000009193 crawling Effects 0.000 claims 1
- 230000002708 enhancing effect Effects 0.000 claims 1
- 239000006184 cosolvent Substances 0.000 abstract 1
- 230000002996 emotional effect Effects 0.000 description 11
- 230000004044 response Effects 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000009223 counseling Methods 0.000 description 3
- 101100247669 Quaranfil virus (isolate QrfV/Tick/Afghanistan/EG_T_377/1968) PB1 gene Proteins 0.000 description 2
- 101100242901 Quaranfil virus (isolate QrfV/Tick/Afghanistan/EG_T_377/1968) PB2 gene Proteins 0.000 description 2
- 101150025928 Segment-1 gene Proteins 0.000 description 2
- 101150082826 Segment-2 gene Proteins 0.000 description 2
- 101100242902 Thogoto virus (isolate SiAr 126) Segment 1 gene Proteins 0.000 description 2
- 101100194052 Thogoto virus (isolate SiAr 126) Segment 2 gene Proteins 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003340 mental effect Effects 0.000 description 2
- 230000004630 mental health Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
Description
技术领域technical field
本发明属于面向中文的自然语言生成技术领域,尤其涉及一种基于UniLM和Copy机制的中文共情回复生成方法。The invention belongs to the technical field of Chinese-oriented natural language generation, and in particular relates to a Chinese empathy reply generation method based on UniLM and Copy mechanism.
背景技术Background technique
随着深度学习应用在各个领域中,智能会话系统也得到了快速发展。用户希望能和智能会话系统实现情感上的交流,而共情能达到这一目的。于是,共情回复生成应运而生。共情被卡尔·罗杰斯(Carl Ransom Rogers)定义为:在人际交往过程中,站在他人立场上想象他人的经历与逻辑,体会他人的想法与感受,从他人的视角看待问题并解决问题。共情回复生成是指智能会话系统通过历史会话判断用户的情感状态,从而生成体会到用户感受的情感回复。现有研究表明,具有共情能力的智能会话系统不仅能提高用户的满意度,而且能获得用户更多积极的反馈。With the application of deep learning in various fields, intelligent conversational systems have also developed rapidly. Users hope to communicate emotionally with intelligent conversational systems, and empathy can achieve this goal. Therefore, empathy recovery generation came into being. Empathy is defined by Carl Ransom Rogers as: in the process of interpersonal communication, imagining the experience and logic of others from the standpoint of others, understanding the thoughts and feelings of others, looking at and solving problems from the perspective of others. Empathy reply generation refers to the intelligent conversation system judging the user's emotional state through historical conversations, so as to generate an emotional reply that understands the user's feelings. Existing studies have shown that an intelligent conversational system with empathy can not only improve user satisfaction, but also obtain more positive feedback from users.
在心理健康咨询会话中,智能会话系统作为辅助工具能帮助咨询师解决部分任务,被认为是心理健康干预、辅助咨询诊断等服务应用的关键。因此,被赋予共情能力的智能会话系统逐渐成为研究热点。一个好的会话模型,它的输入和输出之间必然具有极强的上下文相关性。上下文相关性是指用户的输入和模型的输出二者之间的相互关系。目前,主流的回复生成方法是基于深度学习的序列到序列方法,或者是基于预训练模型。In the mental health counseling session, the intelligent conversation system, as an auxiliary tool, can help the counselor solve some tasks, and is considered to be the key to the application of mental health intervention, auxiliary counseling and diagnosis and other service applications. Therefore, intelligent conversational systems endowed with empathy have gradually become a research hotspot. A good conversation model must have a strong contextual correlation between its input and output. Contextual relevance refers to the interrelationship between the user's input and the model's output. Currently, the mainstream reply generation methods are sequence-to-sequence methods based on deep learning, or based on pre-trained models.
传统的序列到序列的编码器端主要是RNN、LSTM等。相比于Transformer,RNN、LSTM等在语义提取特征方面能力不够,并且在长距离依赖上有所欠缺。虽然基于Transformer的各类语言模型生成的回复可读性高于RNN、LSTM等,但还是存在生成细节不准确而造成上下文不相关的问题。The traditional sequence-to-sequence encoder end is mainly RNN, LSTM, etc. Compared with Transformer, RNN, LSTM, etc. are not capable of semantically extracting features, and are lacking in long-distance dependencies. Although the readability of replies generated by various Transformer-based language models is higher than that of RNN, LSTM, etc., there is still the problem of inaccurate generation details resulting in irrelevant context.
发明内容Contents of the invention
针对现有技术存在的问题,本发明提出一种基于UniLM模型和Copy机制的中文共情语句训练方法。Aiming at the problems existing in the prior art, the present invention proposes a Chinese empathy sentence training method based on the UniLM model and the Copy mechanism.
本发明是这样实现的,一种基于UniLM模型和Copy机制的中文共情回复生成方法,融合Copy机制的目的是将源序列中的情绪关键词和复杂事件细节复制到输出中;然后使用困惑度等评价标准对输出的共情回复进行评价,将符合预期的回复和用户陈述放入原始训练语料中进行复式自动迭代训练,得到进一步更新优化的共情回复生成模型。The present invention is achieved in this way, a Chinese empathy reply generation method based on the UniLM model and the Copy mechanism, the purpose of integrating the Copy mechanism is to copy the emotional keywords and complex event details in the source sequence to the output; and then use the perplexity Evaluate the output empathy responses according to the evaluation criteria, put the expected replies and user statements into the original training corpus for compound automatic iterative training, and obtain a further updated and optimized empathy response generation model.
本发明采用的技术方案是基于UniLM模型和Copy机制的中文共情回复生成方法,具体包括如下步骤:The technical solution adopted by the present invention is a Chinese empathy reply generation method based on the UniLM model and the Copy mechanism, which specifically includes the following steps:
步骤1,使用爬虫技术爬取心理对话领域具有共情能力的语料,并进行预处理,得到输入表示;
步骤2,基于UniLM模型进行预训练,同时使用三种类型的语言模型,每种语言模型使用不同的自注意力掩码机制;
步骤3,利用交叉熵损失函数计算损失,完成基于UniLM模型的预训练,得到共情回复生成模型;Step 3, use the cross-entropy loss function to calculate the loss, complete the pre-training based on the UniLM model, and obtain the empathy reply generation model;
步骤4,基于UniLM模型进行共情回复生成任务,通过序列到序列语言模型的自注意力机制解码,得到词表概率分布;
步骤5,在步骤4基础上构建包含Copy机制的解码器,引入生成概率和复制概率,优化步骤4中的词表概率分布;Step 5, build a decoder including Copy mechanism on the basis of
步骤6,将交叉熵损失函数作为模型的损失函数,利用BeamSearch算法得到生成的共情回复;Step 6, use the cross-entropy loss function as the loss function of the model, and use the BeamSearch algorithm to obtain the generated empathy reply;
步骤7,将生成的优质共情回复和用户的陈述放入步骤1的语料中,进一步基于UniLM模型进行复式自动迭代训练,得到更新优化后的共情回复生成模型。Step 7. Put the generated high-quality empathy replies and user statements into the corpus in
进一步,所述每次输入两个文本序列Segment1,记作s1和Segment2,记作s2,例如:“[CLS]脑子里总会想一些自己非常讨厌的人或事[SEP]了解到你因为纠结生活中的负面事件、遗忘积极事件而感到困惑和不解[SEP]”。[CLS]标记序列开端,[SEP]标记序列结尾,文本序列对通过三种Embedding得到输入表示。Further, each time you input two text sequences Segment1, denoted as s1 and Segment2, denoted as s2, for example: "[CLS] always thinks of some people or things that I hate very much in my mind [SEP] understands that you are because of entanglement Confused and puzzled by negative events in life, forgetting positive events [SEP]". [CLS] marks the beginning of the sequence, [SEP] marks the end of the sequence, and the text sequence pair is represented by three types of Embedding.
进一步,所述UniLM模型由12层Transformer结构堆叠,每层Transformer的隐藏层都有768个隐藏节点、12个头,结构和BERT-BASE一样,因此可以由训练好的BERT-BASE模型初始化参数。UniLM模型能同时完成三种预训练目标,可以完成单向训练语言模型、双向训练语言模型、序列到序列语言模型的预测任务,使模型能够应用自然语言生成任务。针对不同的语言模型,采取不同的MASK机制,MASKING方式:总体比例15%,其中80%的情况下直接用[MASK]替代,10%的情况下随机选择词典中一个词替代,最后10%的情况用真实值,不做任何处理。还有就是80%的情况是每次只MASK一个词,另外20%的情况是MASK掉两个词bigram或者三个词trigram。对于要预测的MASK,单向语言模型使用一侧的上下文,例如预测序列"X1X2[MASK]X4"中的掩码,仅仅只有X1,X2和它自己的信息可用,X4的信息是不可用的。双向语言模型从两个方向编码上下文信息,以"X1X2[MASK]X4"为例子,其中X1,X2,X4及自己的信息都可用。序列到序列语言模型中,若MASK在S1中,则只能编码S1的上下文信息;若MASK在S2中,则它可获得MASK左侧,包括S1的上下文信息。Further, the UniLM model is stacked by a 12-layer Transformer structure, and the hidden layer of each Transformer has 768 hidden nodes and 12 heads. The structure is the same as that of BERT-BASE, so parameters can be initialized by the trained BERT-BASE model. The UniLM model can complete three pre-training objectives at the same time, and can complete the prediction tasks of one-way training language model, two-way training language model, and sequence-to-sequence language model, enabling the model to apply natural language generation tasks. For different language models, different MASK mechanisms are adopted, and the MASKING method: the overall ratio is 15%, of which 80% are directly replaced by [MASK], and in 10% cases, a word in the dictionary is randomly selected to replace, and the last 10% The case uses the true value and does not do any processing. In addition, in 80% of cases, only one word is masked at a time, and in the other 20% of cases, two words bigram or three words trigram are masked. For the MASK to be predicted, the one-way language model uses the context of one side, such as the mask in the prediction sequence "X1X2[MASK]X4", only X1, X2 and its own information are available, and the information of X4 is not available . The bidirectional language model encodes context information from two directions, taking "X1X2[MASK]X4" as an example, where X1, X2, X4 and its own information are all available. In the sequence-to-sequence language model, if the MASK is in S1, it can only encode the context information of S1; if the MASK is in S2, it can obtain the left side of the MASK, including the context information of S1.
进一步,所述Transformer网络输出的文本表征输入Softmax分类器,预测被掩盖的词,对预测分词和原始分词使用交叉熵损失函数,优化模型参数,完成预训练。Further, the text representation output by the Transformer network is input into the Softmax classifier to predict the masked words, use the cross-entropy loss function for the predicted word segmentation and the original word segmentation, optimize the model parameters, and complete the pre-training.
进一步,所述通过随机掩盖掉目标序列中一定比例的分词,使用序列到序列语言模型学习恢复被掩盖的词,其训练目标是基于上下文信息最大化被掩盖分词的概率。目标序列结尾的[SEP]也可以被掩盖掉,让模型学习什么时候终止生成目标序列。模型使用MASK机制,结合注意力机制得到文本特征向量,将其输入到全连接层,得到词表概率分布。Further, by randomly covering up a certain proportion of the word segmentation in the target sequence, the sequence-to-sequence language model is used to learn to restore the covered word, and the training goal is to maximize the probability of the masked word segmentation based on context information. [SEP] at the end of the target sequence can also be masked, allowing the model to learn when to stop generating the target sequence. The model uses the MASK mechanism, combined with the attention mechanism to obtain the text feature vector, which is input to the fully connected layer to obtain the probability distribution of the vocabulary.
进一步,所述词表概率分布输入全连接层和Sigmoid层,得到生成概率。再引入复制概率,结合生成概率和复制概率,得到更新改进的词表概率分布。Further, the vocabulary probability distribution is input into the fully connected layer and the Sigmoid layer to obtain the generation probability. Then the copy probability is introduced, combined with the generation probability and the copy probability, an updated and improved vocabulary probability distribution is obtained.
进一步,所述使用交叉熵损失函数完成模型的微调任务,并使用Beam Search算法生成共情回复。Further, the cross-entropy loss function is used to complete the fine-tuning task of the model, and the Beam Search algorithm is used to generate an empathy reply.
进一步,所述使用困惑度、BLEU-4、F1和专家评价等四种评价指标对步骤6生成的共情回复做出综合评价,将符合预期标准的共情回复以及用户输入自动放入步骤1的原始语料中进行复式自动迭代训练,增强训练数据,得到更新优化后的中文共情回复生成模型。Further, the four evaluation indicators such as perplexity, BLEU-4, F1 and expert evaluation are used to make a comprehensive evaluation of the empathy reply generated in step 6, and the empathy reply that meets the expected standard and user input are automatically put into
本发明的目的在于针对基于Transformer网络生成的共情回复无法生成情绪关键词、复杂事件细节的问题,提出在解码器中融合Copy机制,将情绪关键词和复杂事件细节复制到输出中来解决。The purpose of the present invention is to solve the problem that the empathy reply generated based on Transformer network cannot generate emotional keywords and complex event details, and proposes to integrate the Copy mechanism in the decoder to copy the emotional keywords and complex event details to the output.
本发明的另一目的在于针对中文心里对话具有共情能力的语料匮乏的问题,本发明采用复式自动迭代训练来增强训练数据。Another purpose of the present invention is to solve the problem of lack of corpus with empathy ability in Chinese mental dialogue, and the present invention adopts compound automatic iterative training to enhance the training data.
结合上述的技术方案和解决的技术问题,请从以下几方面分析本发明所要保护的技术方案所具备的优点及积极效果为:Combining the above-mentioned technical solutions and technical problems to be solved, please analyze the advantages and positive effects of the technical solutions to be protected by the present invention from the following aspects:
第一、针对上述现有技术存在的技术问题以及解决该问题的难度,紧密结合本发明的所要保护的技术方案以及研发过程中结果和数据等,详细、深刻地分析本发明技术方案如何解决的技术问题,解决问题之后带来的一些具备创造性的技术效果。具体描述如下:First, in view of the technical problems existing in the above-mentioned prior art and the difficulty of solving the problems, closely combine the technical solution to be protected in the present invention and the results and data in the research and development process, etc., to analyze in detail how the technical solution of the present invention solves it Technical problems, some creative technical effects brought about after solving the problems. The specific description is as follows:
在人际交往过程中,人们更多的希望能站在他人立场上想象他人的经历与逻辑,体会他人的想法与感受,从他人的视角看待问题并解决问题。其中,被赋予共情能力的智能会话系统逐渐成为研究热点。本发明解决共情回复生成是指智能会话系统通过历史会话判断用户的情感状态,从而生成体会到用户感受的情感回复。具有共情能力的智能会话系统不仅能提高用户的满意度,而且能获得用户更多积极的反馈。In the process of interpersonal communication, people hope to imagine other people's experience and logic from the standpoint of others, understand other people's thoughts and feelings, and look at and solve problems from the perspective of others. Among them, the intelligent conversational system endowed with empathy has gradually become a research hotspot. The present invention solves the problem of empathy reply generation, which means that the intelligent conversation system judges the user's emotional state through historical conversations, thereby generating an emotional reply that understands the user's feelings. An intelligent conversational system with empathy can not only improve user satisfaction, but also get more positive feedback from users.
第二,把技术方案看做一个整体或者从产品的角度,本发明所要保护的技术方案具备的技术效果和优点,具体描述如下:Second, regarding the technical solution as a whole or from the perspective of a product, the technical effects and advantages of the technical solution to be protected by the present invention are specifically described as follows:
发明提出了一种基于UniLM模型和Copy机制的中文共情回复生成方法。本发明使用UniLM模型作为基本架构,针对基于Transformer网络生成的共情回复无法生成情绪关键词、复杂事件细节的问题,提出在解码器中融合Copy机制,将情绪关键词和复杂事件细节复制到输出中来解决。针对中文心里对话具有共情能力的语料匮乏的问题,本发明采用复式自动迭代训练来增强训练数据。The invention proposes a Chinese empathy reply generation method based on the UniLM model and the Copy mechanism. The invention uses the UniLM model as the basic framework, and aims at the problem that the empathy reply generated based on the Transformer network cannot generate emotional keywords and complex event details, and proposes to integrate the Copy mechanism in the decoder to copy the emotional keywords and complex event details to the output to solve. Aiming at the problem of lack of corpus with empathy ability in Chinese mental dialogue, the present invention adopts compound automatic iterative training to enhance training data.
本发明将源序列中的情绪关键词和复杂事件细节复制到输出中;然后使用困惑度等评价标准对输出的共情回复进行评价,将符合预期的回复和用户陈述放入原始训练语料中进行复式自动迭代训练,得到进一步更新优化的共情回复生成模型。The present invention copies the emotional keywords and complex event details in the source sequence to the output; then uses evaluation criteria such as perplexity to evaluate the output empathy replies, and puts the replies and user statements that meet expectations into the original training corpus. Compound automatic iterative training to obtain a further updated and optimized empathy response generation model.
附图说明Description of drawings
图1是本发明实施例提供的基于UniLM模型和Copy机制的中文共情回复生成模型的框架图;Fig. 1 is a framework diagram of the Chinese empathy reply generation model based on the UniLM model and the Copy mechanism provided by the embodiment of the present invention;
图2是本发明实施例提供的使用的UniLM模型架构示意图;Fig. 2 is a schematic diagram of the UniLM model architecture provided by the embodiment of the present invention;
图3是本发明实施例提供的基于UniLM模型和Copy机制的中文共情回复生成方法的具体流程图。Fig. 3 is a specific flow chart of the Chinese empathy reply generation method based on the UniLM model and the Copy mechanism provided by the embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
为了使本领域技术人员充分了解本发明如何具体实现,该部分是对权利要求技术方案进行展开说明的解释说明实施例。In order to make those skilled in the art fully understand how to implement the present invention, this part is an explanatory embodiment for explaining the technical solution of the claims.
本发明结合附图和具体实施方式,基于UniLM模型和Copy机制的中文共情回复生成方法进行进一步详细说明。In the present invention, the method for generating Chinese empathy responses based on the UniLM model and the Copy mechanism will be further described in detail in conjunction with the accompanying drawings and specific implementation methods.
如图1所示,本发明主要以UniLM模型为基础,在解码端融合Copy机制,实现了会话共情充分利用面对复杂事件细节的上下文相关性的目的。主要包括输入处理、预训练、共情回复生成、复式训练四个阶段。具体实施方式如下:As shown in Figure 1, the present invention is mainly based on the UniLM model, and integrates the Copy mechanism at the decoding end to realize the purpose of conversational empathy to make full use of the context correlation of complex event details. It mainly includes four stages: input processing, pre-training, empathy response generation, and compound training. The specific implementation is as follows:
预训练的语料包括心理咨询来访者的有关心理问题的陈述和咨询师的具有共情能力的回复。来访者的陈述Segment1,记作S1,咨询师的回复Segment2,记作S2,加入特殊标记[CLS]和[SEP],形如“[CLS]S1[SEP]S2[SEP]”。如图2所示,模型的输入表示由SegmentEmbedding、Position Embedding、Token Embedding三部分的和构成。The pre-trained corpus includes psychological counseling visitors' statements about psychological problems and counselors' empathic responses. Segment1 of the visitor’s statement, denoted as S1, Segment2 of the consultant’s reply, denoted as S2, with special marks [CLS] and [SEP] added, in the form of “[CLS]S1[SEP]S2[SEP]”. As shown in Figure 2, the input representation of the model consists of the sum of SegmentEmbedding, Position Embedding, and Token Embedding.
模型预训练,输入Embedding向量,每层Transformer编码输入向量,使用多头注意力机制聚合上层输入,通过掩码矩阵控制每个词或者位置能够注意的范围,得到当前位置对其他位置的注意力分布,计算出解码器当前位置的特征向量。Model pre-training, input Embedding vector, each layer of Transformer encodes the input vector, uses multi-head attention mechanism to aggregate the upper layer input, controls the range of attention of each word or position through the mask matrix, and obtains the attention distribution of the current position to other positions, Calculate the eigenvector for the current position of the decoder.
生成的词向量对t时刻的文本特征向量XInput的注意力分布At如下:The attention distribution At of the generated word vector to the text feature vector XInput at time t is as follows:
t时刻解码器输出的特征向量XOutput如下:The feature vector XOutput output by the decoder at time t is as follows:
XOutput=At*Wv*XIntput X Output = A t *W v *X Input
其中,Xt是t时刻的目标向量;XInput是t时刻文本特征向量;M是掩码矩阵,作用是控制词注意力范围;dk是词向量的维度;Wq、Wk、Wv是学习参数。Among them, Xt is the target vector at time t; XInput is the text feature vector at time t; M is the mask matrix, which is used to control the word attention range; dk is the dimension of the word vector; Wq, Wk, and Wv are learning parameters.
Softmax函数将分数s的向量映射为概率分布,其定义如下:The Softmax function maps a vector of scores s to a probability distribution, which is defined as follows:
其中,i表示输出节点的编号;si是第i个节点的输出值;n是输出节点的个数,即分类的类别个数。Among them, i represents the number of the output node; si is the output value of the i-th node; n is the number of output nodes, that is, the number of classification categories.
进一步,对模型预测结果XOutput,记作s,和被掩盖的原分词st计算交叉熵损失来优化模型的参数。交叉熵函数定义如下:Further, calculate the cross-entropy loss for the model prediction result XOutput, denoted as s, and the masked original word st to optimize the parameters of the model. The cross entropy function is defined as follows:
预处理过程:将预处理好的数据输入模型进行训练,一共训练20个Epoch,Dropout为0.1,隐向量维度为768,学习率Learning_rate为2e-5,Epochs为20,批处理大小Batch_size为32,注意力头数为12,隐藏层数为12,嵌入层为12,隐藏层单元数为768,词表大小为21128。最大输入长度设置为512,最大生成共情回复的长度设置为40,使用交叉熵函数计算损失。Preprocessing process: Input the preprocessed data into the model for training. A total of 20 Epochs are trained, the Dropout is 0.1, the hidden vector dimension is 768, the learning rate Learning_rate is 2e-5, the Epochs is 20, and the batch size Batch_size is 32. The number of attention heads is 12, the number of hidden layers is 12, the number of embedding layers is 12, the number of hidden layer units is 768, and the vocabulary size is 21128. The maximum input length is set to 512, the maximum generated empathy reply length is set to 40, and the loss is calculated using the cross-entropy function.
完成预训练后,使用UniLM的序列到序列语言模型进行微调,进行共情回复生成任务。在解码时,例如:用户输入一句内心心理问题的陈述“X1”,当t=1时刻输入序列“[CLS]X1[SEP]Y1[MASK]”,在序列末尾加上“[MASK]”,其对应的特征表示预测下一个词。“[CLS]X1[SEP]”是已知的源序列,在编码阶段能互相看到句子内上下文信息。“Y1[MASK]”是预测的目标序列,在解码阶段能看到源序列的信息和目标序列其左侧部分的信息。模型通过掩码矩阵将编码器和解码器融合在一起。After completing the pre-training, use UniLM's sequence-to-sequence language model for fine-tuning to perform empathy response generation tasks. When decoding, for example: the user enters a statement "X1" of inner psychological problems, and when t=1, enters the sequence "[CLS]X1[SEP]Y1[MASK]", and adds "[MASK]" at the end of the sequence, Its corresponding feature represents predicting the next word. "[CLS]X1[SEP]" is a known source sequence, and the context information in the sentence can be seen from each other in the encoding stage. "Y1[MASK]" is the predicted target sequence. In the decoding stage, the information of the source sequence and the information of the left part of the target sequence can be seen. The model fuses the encoder and decoder together via a mask matrix.
语料样本在经过UniLM模型编码后,得到一个sequence length X hidden size矩阵,第一行是[CLS]的特征表示,第二行是X1的特征表示,依次类推。在解码阶段,使用[MASK]特征表示经过线性层,使用Softmax函数来获得词汇表中词的概率分布,并选择概率最大的词作为解码得到的单词,重复以上步骤,当生成[SEP]时停止,得到t时刻解码器输出的特征向量XOutput。具体计算如下:After the corpus sample is encoded by the UniLM model, a sequence length X hidden size matrix is obtained. The first row is the feature representation of [CLS], the second row is the feature representation of X1, and so on. In the decoding stage, use the [MASK] feature to represent the linear layer, use the Softmax function to obtain the probability distribution of words in the vocabulary, and select the word with the highest probability as the word obtained by decoding, repeat the above steps, and stop when [SEP] is generated , to get the feature vector XOutput output by the decoder at time t. The specific calculation is as follows:
XOutput经过两次线性变换、Softmax函数得到词表概率分布Pv。XOutput obtains the vocabulary probability distribution Pv after two linear transformations and the Softmax function.
Pv=Softmax(W′(W*XOutput+b)+b′)P v =Softmax(W ′ (W*X Output +b)+b ′ )
其中,W′、W、b、b′是可学习参数。Among them, W ′ , W, b, b ′ are learnable parameters.
引入生成概率Pg,表示从词表中生成词的概率;引入复制概率Pc,表示从源文本中复制词的概率,其中Pg+Pc=1。将XOutput、At、Xt通过全连接层和Sigmoid函数计算得到Pg。The generation probability Pg is introduced to represent the probability of generating words from the vocabulary; the copy probability Pc is introduced to represent the probability of copying words from the source text, where Pg+Pc=1. Calculate Pg by calculating XOutput, At, and Xt through the fully connected layer and the Sigmoid function.
Pg=Sigmoid(W[Xt,XOutput,At]+b)P g =Sigmoid(W[X t , X Output , A t ]+b)
其中,W、b是可学习的参数。Among them, W and b are learnable parameters.
进一步计算更新改进后的词表概率分布:Further calculate the updated and improved vocabulary probability distribution:
P(w)=Pg*Pv(w)+Pc*At P(w)=P g *P v (w)+P c *A t
其中,当w不是词表中的词时,Pv(w)=0,预测的词从源序列中生成;当w不是源序列中的词时,At=0,预测的词从词表中生成。Copy机制从源序列复制情绪关键词和复杂事件细节(概率高的词)作为生成的共情回复的一部分,在一定程度上可以控制共情回复生成的准确性。Copy机制还在一定程度上起到了动态扩充词表的作用,来降低生成未登录词的概率。Among them, when w is not a word in the vocabulary, P v (w)=0, the predicted word is generated from the source sequence; when w is not a word in the source sequence, A t =0, the predicted word is generated from the vocabulary generated in. The copy mechanism copies emotional keywords and complex event details (words with high probability) from the source sequence as part of the generated empathy reply, which can control the accuracy of empathy reply generation to a certain extent. The copy mechanism also plays a role in dynamically expanding the vocabulary to a certain extent to reduce the probability of generating unregistered words.
将Beam size设置为1,使用Beam Search算法搜索接近最优的目标序列,生成共情回复。对生成的共情回复进行评价,将符合标准的共情回复以及用户的陈述放到原始语料中进行复式自动迭代训练,增强训练数据,得到更新优化后的中文共情回复生成模型。Set the Beam size to 1, use the Beam Search algorithm to search for a target sequence that is close to the optimal, and generate an empathetic reply. Evaluate the generated empathy replies, put the standard empathy replies and user statements into the original corpus for compound automatic iterative training, enhance the training data, and obtain an updated and optimized Chinese empathy reply generation model.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,都应涵盖在本发明的保护范围之内。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Anyone familiar with the technical field within the technical scope disclosed in the present invention, whoever is within the spirit and principles of the present invention Any modifications, equivalent replacements and improvements made within shall fall within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211591710.7A CN116150334A (en) | 2022-12-12 | 2022-12-12 | Chinese Empathy Sentence Training Method and System Based on UniLM Model and Copy Mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211591710.7A CN116150334A (en) | 2022-12-12 | 2022-12-12 | Chinese Empathy Sentence Training Method and System Based on UniLM Model and Copy Mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116150334A true CN116150334A (en) | 2023-05-23 |
Family
ID=86357427
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211591710.7A Pending CN116150334A (en) | 2022-12-12 | 2022-12-12 | Chinese Empathy Sentence Training Method and System Based on UniLM Model and Copy Mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116150334A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117591866A (en) * | 2024-01-16 | 2024-02-23 | 中国传媒大学 | Multimodal false information detection method guided by empathy theory |
-
2022
- 2022-12-12 CN CN202211591710.7A patent/CN116150334A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117591866A (en) * | 2024-01-16 | 2024-02-23 | 中国传媒大学 | Multimodal false information detection method guided by empathy theory |
CN117591866B (en) * | 2024-01-16 | 2024-05-07 | 中国传媒大学 | Multimodal false information detection method guided by empathy theory |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109657239B (en) | Chinese Named Entity Recognition Method Based on Attention Mechanism and Language Model Learning | |
CN113255755B (en) | Multi-modal emotion classification method based on heterogeneous fusion network | |
CN113535904B (en) | Aspect level emotion analysis method based on graph neural network | |
CN111291556B (en) | Chinese entity relation extraction method based on character and word feature fusion of entity meaning item | |
CN112734881B (en) | Text synthesis image method and system based on saliency scene graph analysis | |
CN113535953B (en) | Meta learning-based few-sample classification method | |
CN112115687A (en) | Problem generation method combining triples and entity types in knowledge base | |
CN110888980A (en) | Implicit discourse relation identification method based on knowledge-enhanced attention neural network | |
CN109189862A (en) | A kind of construction of knowledge base method towards scientific and technological information analysis | |
CN113626589A (en) | Multi-label text classification method based on mixed attention mechanism | |
CN117033602B (en) | A method for constructing a multimodal user mind-aware question-answering model | |
CN111274359B (en) | Query recommendation method and system based on improved VHRED and reinforcement learning | |
CN115688879A (en) | Intelligent customer service voice processing system and method based on knowledge graph | |
CN114925195A (en) | A method for generating standard content text summaries that integrates lexical coding and structural coding | |
CN112307179A (en) | Text matching method, apparatus, device and storage medium | |
CN113988083A (en) | A Factual Information Coding and Evaluation Method for Shipping News Summary Generation | |
CN113010662B (en) | A hierarchical conversational machine reading comprehension system and method | |
CN116150334A (en) | Chinese Empathy Sentence Training Method and System Based on UniLM Model and Copy Mechanism | |
CN118733777A (en) | A text classification method based on event labels | |
CN117172241B (en) | Tibetan language syntax component labeling method | |
Miao et al. | Multi-turn dialogue model based on the improved hierarchical recurrent attention network | |
CN118551004A (en) | Knowledge retrieval graph-based Chinese dialogue knowledge retrieval method and system | |
CN112560440A (en) | Deep learning-based syntax dependence method for aspect-level emotion analysis | |
CN117332060A (en) | Open domain multi-round dialogue reply generation method for chapter relation perception | |
CN117474009A (en) | A cross-sentence event causality identification system and method based on graph neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |