CN116150334A

CN116150334A - Chinese Empathy Sentence Training Method and System Based on UniLM Model and Copy Mechanism

Info

Publication number: CN116150334A
Application number: CN202211591710.7A
Authority: CN
Inventors: 朱国华; 姚盛根; 胡晓莉
Original assignee: Jianghan University
Current assignee: Jianghan University
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-05-23

Abstract

The invention belongs to the technical field of Chinese-oriented natural language generation, and provides a Chinese cosolvents sentence training method and system based on a UniLM model and a Copy mechanism. Meanwhile, aiming at the lack of sufficient and diversified training corpuses, the generated co-emotion replies are comprehensively evaluated, and the high-quality co-emotion replies meeting the expected standard and user input are put into the original training corpuses to carry out compound automatic iterative training, so that training data are enhanced. The invention aims to solve the problem that a Copy mechanism is fused in a decoder and emotion keywords and complex event details are copied into output. Aiming at the problem of corpus shortage of Chinese mind dialogue with co-emotion capability, the invention adopts compound automatic iterative training to enhance training data.

Description

Chinese Empathy Sentence Training Method and System Based on UniLM Model and Copy Mechanism

技术领域technical field

本发明属于面向中文的自然语言生成技术领域，尤其涉及一种基于UniLM和Copy机制的中文共情回复生成方法。The invention belongs to the technical field of Chinese-oriented natural language generation, and in particular relates to a Chinese empathy reply generation method based on UniLM and Copy mechanism.

背景技术Background technique

随着深度学习应用在各个领域中，智能会话系统也得到了快速发展。用户希望能和智能会话系统实现情感上的交流，而共情能达到这一目的。于是，共情回复生成应运而生。共情被卡尔·罗杰斯(Carl Ransom Rogers)定义为：在人际交往过程中，站在他人立场上想象他人的经历与逻辑，体会他人的想法与感受，从他人的视角看待问题并解决问题。共情回复生成是指智能会话系统通过历史会话判断用户的情感状态，从而生成体会到用户感受的情感回复。现有研究表明，具有共情能力的智能会话系统不仅能提高用户的满意度，而且能获得用户更多积极的反馈。With the application of deep learning in various fields, intelligent conversational systems have also developed rapidly. Users hope to communicate emotionally with intelligent conversational systems, and empathy can achieve this goal. Therefore, empathy recovery generation came into being. Empathy is defined by Carl Ransom Rogers as: in the process of interpersonal communication, imagining the experience and logic of others from the standpoint of others, understanding the thoughts and feelings of others, looking at and solving problems from the perspective of others. Empathy reply generation refers to the intelligent conversation system judging the user's emotional state through historical conversations, so as to generate an emotional reply that understands the user's feelings. Existing studies have shown that an intelligent conversational system with empathy can not only improve user satisfaction, but also obtain more positive feedback from users.

在心理健康咨询会话中，智能会话系统作为辅助工具能帮助咨询师解决部分任务，被认为是心理健康干预、辅助咨询诊断等服务应用的关键。因此，被赋予共情能力的智能会话系统逐渐成为研究热点。一个好的会话模型，它的输入和输出之间必然具有极强的上下文相关性。上下文相关性是指用户的输入和模型的输出二者之间的相互关系。目前，主流的回复生成方法是基于深度学习的序列到序列方法，或者是基于预训练模型。In the mental health counseling session, the intelligent conversation system, as an auxiliary tool, can help the counselor solve some tasks, and is considered to be the key to the application of mental health intervention, auxiliary counseling and diagnosis and other service applications. Therefore, intelligent conversational systems endowed with empathy have gradually become a research hotspot. A good conversation model must have a strong contextual correlation between its input and output. Contextual relevance refers to the interrelationship between the user's input and the model's output. Currently, the mainstream reply generation methods are sequence-to-sequence methods based on deep learning, or based on pre-trained models.

传统的序列到序列的编码器端主要是RNN、LSTM等。相比于Transformer，RNN、LSTM等在语义提取特征方面能力不够，并且在长距离依赖上有所欠缺。虽然基于Transformer的各类语言模型生成的回复可读性高于RNN、LSTM等，但还是存在生成细节不准确而造成上下文不相关的问题。The traditional sequence-to-sequence encoder end is mainly RNN, LSTM, etc. Compared with Transformer, RNN, LSTM, etc. are not capable of semantically extracting features, and are lacking in long-distance dependencies. Although the readability of replies generated by various Transformer-based language models is higher than that of RNN, LSTM, etc., there is still the problem of inaccurate generation details resulting in irrelevant context.

发明内容Contents of the invention

针对现有技术存在的问题，本发明提出一种基于UniLM模型和Copy机制的中文共情语句训练方法。Aiming at the problems existing in the prior art, the present invention proposes a Chinese empathy sentence training method based on the UniLM model and the Copy mechanism.

本发明是这样实现的，一种基于UniLM模型和Copy机制的中文共情回复生成方法，融合Copy机制的目的是将源序列中的情绪关键词和复杂事件细节复制到输出中；然后使用困惑度等评价标准对输出的共情回复进行评价，将符合预期的回复和用户陈述放入原始训练语料中进行复式自动迭代训练，得到进一步更新优化的共情回复生成模型。The present invention is achieved in this way, a Chinese empathy reply generation method based on the UniLM model and the Copy mechanism, the purpose of integrating the Copy mechanism is to copy the emotional keywords and complex event details in the source sequence to the output; and then use the perplexity Evaluate the output empathy responses according to the evaluation criteria, put the expected replies and user statements into the original training corpus for compound automatic iterative training, and obtain a further updated and optimized empathy response generation model.

本发明采用的技术方案是基于UniLM模型和Copy机制的中文共情回复生成方法，具体包括如下步骤：The technical solution adopted by the present invention is a Chinese empathy reply generation method based on the UniLM model and the Copy mechanism, which specifically includes the following steps:

步骤1，使用爬虫技术爬取心理对话领域具有共情能力的语料，并进行预处理，得到输入表示；Step 1, use crawler technology to crawl the corpus with empathy ability in the field of psychological dialogue, and perform preprocessing to obtain input representation;

步骤2，基于UniLM模型进行预训练，同时使用三种类型的语言模型，每种语言模型使用不同的自注意力掩码机制；Step 2, pre-training based on the UniLM model, using three types of language models at the same time, each language model uses a different self-attention mask mechanism;

步骤3，利用交叉熵损失函数计算损失，完成基于UniLM模型的预训练，得到共情回复生成模型；Step 3, use the cross-entropy loss function to calculate the loss, complete the pre-training based on the UniLM model, and obtain the empathy reply generation model;

步骤4，基于UniLM模型进行共情回复生成任务，通过序列到序列语言模型的自注意力机制解码，得到词表概率分布；Step 4, based on the UniLM model, the empathy reply generation task is performed, and the vocabulary probability distribution is obtained through the self-attention mechanism decoding of the sequence-to-sequence language model;

步骤5，在步骤4基础上构建包含Copy机制的解码器，引入生成概率和复制概率，优化步骤4中的词表概率分布；Step 5, build a decoder including Copy mechanism on the basis of step 4, introduce generation probability and copy probability, and optimize the vocabulary probability distribution in step 4;

步骤6，将交叉熵损失函数作为模型的损失函数，利用BeamSearch算法得到生成的共情回复；Step 6, use the cross-entropy loss function as the loss function of the model, and use the BeamSearch algorithm to obtain the generated empathy reply;

步骤7，将生成的优质共情回复和用户的陈述放入步骤1的语料中，进一步基于UniLM模型进行复式自动迭代训练，得到更新优化后的共情回复生成模型。Step 7. Put the generated high-quality empathy replies and user statements into the corpus in step 1, and further perform compound automatic iterative training based on the UniLM model to obtain an updated and optimized empathy reply generation model.

进一步，所述每次输入两个文本序列Segment1，记作s1和Segment2，记作s2，例如：“[CLS]脑子里总会想一些自己非常讨厌的人或事[SEP]了解到你因为纠结生活中的负面事件、遗忘积极事件而感到困惑和不解[SEP]”。[CLS]标记序列开端，[SEP]标记序列结尾，文本序列对通过三种Embedding得到输入表示。Further, each time you input two text sequences Segment1, denoted as s1 and Segment2, denoted as s2, for example: "[CLS] always thinks of some people or things that I hate very much in my mind [SEP] understands that you are because of entanglement Confused and puzzled by negative events in life, forgetting positive events [SEP]". [CLS] marks the beginning of the sequence, [SEP] marks the end of the sequence, and the text sequence pair is represented by three types of Embedding.

进一步，所述UniLM模型由12层Transformer结构堆叠，每层Transformer的隐藏层都有768个隐藏节点、12个头，结构和BERT-BASE一样，因此可以由训练好的BERT-BASE模型初始化参数。UniLM模型能同时完成三种预训练目标，可以完成单向训练语言模型、双向训练语言模型、序列到序列语言模型的预测任务，使模型能够应用自然语言生成任务。针对不同的语言模型，采取不同的MASK机制，MASKING方式：总体比例15％，其中80％的情况下直接用[MASK]替代，10％的情况下随机选择词典中一个词替代，最后10％的情况用真实值，不做任何处理。还有就是80％的情况是每次只MASK一个词，另外20％的情况是MASK掉两个词bigram或者三个词trigram。对于要预测的MASK，单向语言模型使用一侧的上下文，例如预测序列"X1X2[MASK]X4"中的掩码，仅仅只有X1,X2和它自己的信息可用，X4的信息是不可用的。双向语言模型从两个方向编码上下文信息，以"X1X2[MASK]X4"为例子，其中X1,X2,X4及自己的信息都可用。序列到序列语言模型中，若MASK在S1中，则只能编码S1的上下文信息；若MASK在S2中，则它可获得MASK左侧，包括S1的上下文信息。Further, the UniLM model is stacked by a 12-layer Transformer structure, and the hidden layer of each Transformer has 768 hidden nodes and 12 heads. The structure is the same as that of BERT-BASE, so parameters can be initialized by the trained BERT-BASE model. The UniLM model can complete three pre-training objectives at the same time, and can complete the prediction tasks of one-way training language model, two-way training language model, and sequence-to-sequence language model, enabling the model to apply natural language generation tasks. For different language models, different MASK mechanisms are adopted, and the MASKING method: the overall ratio is 15%, of which 80% are directly replaced by [MASK], and in 10% cases, a word in the dictionary is randomly selected to replace, and the last 10% The case uses the true value and does not do any processing. In addition, in 80% of cases, only one word is masked at a time, and in the other 20% of cases, two words bigram or three words trigram are masked. For the MASK to be predicted, the one-way language model uses the context of one side, such as the mask in the prediction sequence "X1X2[MASK]X4", only X1, X2 and its own information are available, and the information of X4 is not available . The bidirectional language model encodes context information from two directions, taking "X1X2[MASK]X4" as an example, where X1, X2, X4 and its own information are all available. In the sequence-to-sequence language model, if the MASK is in S1, it can only encode the context information of S1; if the MASK is in S2, it can obtain the left side of the MASK, including the context information of S1.

进一步，所述Transformer网络输出的文本表征输入Softmax分类器，预测被掩盖的词，对预测分词和原始分词使用交叉熵损失函数，优化模型参数，完成预训练。Further, the text representation output by the Transformer network is input into the Softmax classifier to predict the masked words, use the cross-entropy loss function for the predicted word segmentation and the original word segmentation, optimize the model parameters, and complete the pre-training.

进一步，所述通过随机掩盖掉目标序列中一定比例的分词，使用序列到序列语言模型学习恢复被掩盖的词，其训练目标是基于上下文信息最大化被掩盖分词的概率。目标序列结尾的[SEP]也可以被掩盖掉，让模型学习什么时候终止生成目标序列。模型使用MASK机制，结合注意力机制得到文本特征向量，将其输入到全连接层，得到词表概率分布。Further, by randomly covering up a certain proportion of the word segmentation in the target sequence, the sequence-to-sequence language model is used to learn to restore the covered word, and the training goal is to maximize the probability of the masked word segmentation based on context information. [SEP] at the end of the target sequence can also be masked, allowing the model to learn when to stop generating the target sequence. The model uses the MASK mechanism, combined with the attention mechanism to obtain the text feature vector, which is input to the fully connected layer to obtain the probability distribution of the vocabulary.

进一步，所述词表概率分布输入全连接层和Sigmoid层，得到生成概率。再引入复制概率，结合生成概率和复制概率，得到更新改进的词表概率分布。Further, the vocabulary probability distribution is input into the fully connected layer and the Sigmoid layer to obtain the generation probability. Then the copy probability is introduced, combined with the generation probability and the copy probability, an updated and improved vocabulary probability distribution is obtained.

进一步，所述使用交叉熵损失函数完成模型的微调任务，并使用Beam Search算法生成共情回复。Further, the cross-entropy loss function is used to complete the fine-tuning task of the model, and the Beam Search algorithm is used to generate an empathy reply.

进一步，所述使用困惑度、BLEU-4、F1和专家评价等四种评价指标对步骤6生成的共情回复做出综合评价，将符合预期标准的共情回复以及用户输入自动放入步骤1的原始语料中进行复式自动迭代训练，增强训练数据，得到更新优化后的中文共情回复生成模型。Further, the four evaluation indicators such as perplexity, BLEU-4, F1 and expert evaluation are used to make a comprehensive evaluation of the empathy reply generated in step 6, and the empathy reply that meets the expected standard and user input are automatically put into step 1 The compound automatic iterative training is carried out in the original corpus, the training data is enhanced, and the updated and optimized Chinese empathy reply generation model is obtained.

本发明的目的在于针对基于Transformer网络生成的共情回复无法生成情绪关键词、复杂事件细节的问题，提出在解码器中融合Copy机制，将情绪关键词和复杂事件细节复制到输出中来解决。The purpose of the present invention is to solve the problem that the empathy reply generated based on Transformer network cannot generate emotional keywords and complex event details, and proposes to integrate the Copy mechanism in the decoder to copy the emotional keywords and complex event details to the output.

本发明的另一目的在于针对中文心里对话具有共情能力的语料匮乏的问题，本发明采用复式自动迭代训练来增强训练数据。Another purpose of the present invention is to solve the problem of lack of corpus with empathy ability in Chinese mental dialogue, and the present invention adopts compound automatic iterative training to enhance the training data.

结合上述的技术方案和解决的技术问题，请从以下几方面分析本发明所要保护的技术方案所具备的优点及积极效果为：Combining the above-mentioned technical solutions and technical problems to be solved, please analyze the advantages and positive effects of the technical solutions to be protected by the present invention from the following aspects:

第一、针对上述现有技术存在的技术问题以及解决该问题的难度，紧密结合本发明的所要保护的技术方案以及研发过程中结果和数据等，详细、深刻地分析本发明技术方案如何解决的技术问题，解决问题之后带来的一些具备创造性的技术效果。具体描述如下：First, in view of the technical problems existing in the above-mentioned prior art and the difficulty of solving the problems, closely combine the technical solution to be protected in the present invention and the results and data in the research and development process, etc., to analyze in detail how the technical solution of the present invention solves it Technical problems, some creative technical effects brought about after solving the problems. The specific description is as follows:

在人际交往过程中，人们更多的希望能站在他人立场上想象他人的经历与逻辑，体会他人的想法与感受，从他人的视角看待问题并解决问题。其中，被赋予共情能力的智能会话系统逐渐成为研究热点。本发明解决共情回复生成是指智能会话系统通过历史会话判断用户的情感状态，从而生成体会到用户感受的情感回复。具有共情能力的智能会话系统不仅能提高用户的满意度，而且能获得用户更多积极的反馈。In the process of interpersonal communication, people hope to imagine other people's experience and logic from the standpoint of others, understand other people's thoughts and feelings, and look at and solve problems from the perspective of others. Among them, the intelligent conversational system endowed with empathy has gradually become a research hotspot. The present invention solves the problem of empathy reply generation, which means that the intelligent conversation system judges the user's emotional state through historical conversations, thereby generating an emotional reply that understands the user's feelings. An intelligent conversational system with empathy can not only improve user satisfaction, but also get more positive feedback from users.

第二，把技术方案看做一个整体或者从产品的角度，本发明所要保护的技术方案具备的技术效果和优点，具体描述如下：Second, regarding the technical solution as a whole or from the perspective of a product, the technical effects and advantages of the technical solution to be protected by the present invention are specifically described as follows:

发明提出了一种基于UniLM模型和Copy机制的中文共情回复生成方法。本发明使用UniLM模型作为基本架构，针对基于Transformer网络生成的共情回复无法生成情绪关键词、复杂事件细节的问题，提出在解码器中融合Copy机制，将情绪关键词和复杂事件细节复制到输出中来解决。针对中文心里对话具有共情能力的语料匮乏的问题，本发明采用复式自动迭代训练来增强训练数据。The invention proposes a Chinese empathy reply generation method based on the UniLM model and the Copy mechanism. The invention uses the UniLM model as the basic framework, and aims at the problem that the empathy reply generated based on the Transformer network cannot generate emotional keywords and complex event details, and proposes to integrate the Copy mechanism in the decoder to copy the emotional keywords and complex event details to the output to solve. Aiming at the problem of lack of corpus with empathy ability in Chinese mental dialogue, the present invention adopts compound automatic iterative training to enhance training data.

本发明将源序列中的情绪关键词和复杂事件细节复制到输出中；然后使用困惑度等评价标准对输出的共情回复进行评价，将符合预期的回复和用户陈述放入原始训练语料中进行复式自动迭代训练，得到进一步更新优化的共情回复生成模型。The present invention copies the emotional keywords and complex event details in the source sequence to the output; then uses evaluation criteria such as perplexity to evaluate the output empathy replies, and puts the replies and user statements that meet expectations into the original training corpus. Compound automatic iterative training to obtain a further updated and optimized empathy response generation model.

附图说明Description of drawings

图1是本发明实施例提供的基于UniLM模型和Copy机制的中文共情回复生成模型的框架图；Fig. 1 is a framework diagram of the Chinese empathy reply generation model based on the UniLM model and the Copy mechanism provided by the embodiment of the present invention;

图2是本发明实施例提供的使用的UniLM模型架构示意图；Fig. 2 is a schematic diagram of the UniLM model architecture provided by the embodiment of the present invention;

图3是本发明实施例提供的基于UniLM模型和Copy机制的中文共情回复生成方法的具体流程图。Fig. 3 is a specific flow chart of the Chinese empathy reply generation method based on the UniLM model and the Copy mechanism provided by the embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

为了使本领域技术人员充分了解本发明如何具体实现，该部分是对权利要求技术方案进行展开说明的解释说明实施例。In order to make those skilled in the art fully understand how to implement the present invention, this part is an explanatory embodiment for explaining the technical solution of the claims.

本发明结合附图和具体实施方式，基于UniLM模型和Copy机制的中文共情回复生成方法进行进一步详细说明。In the present invention, the method for generating Chinese empathy responses based on the UniLM model and the Copy mechanism will be further described in detail in conjunction with the accompanying drawings and specific implementation methods.

如图1所示，本发明主要以UniLM模型为基础，在解码端融合Copy机制，实现了会话共情充分利用面对复杂事件细节的上下文相关性的目的。主要包括输入处理、预训练、共情回复生成、复式训练四个阶段。具体实施方式如下：As shown in Figure 1, the present invention is mainly based on the UniLM model, and integrates the Copy mechanism at the decoding end to realize the purpose of conversational empathy to make full use of the context correlation of complex event details. It mainly includes four stages: input processing, pre-training, empathy response generation, and compound training. The specific implementation is as follows:

预训练的语料包括心理咨询来访者的有关心理问题的陈述和咨询师的具有共情能力的回复。来访者的陈述Segment1，记作S1，咨询师的回复Segment2，记作S2，加入特殊标记[CLS]和[SEP]，形如“[CLS]S1[SEP]S2[SEP]”。如图2所示，模型的输入表示由SegmentEmbedding、Position Embedding、Token Embedding三部分的和构成。The pre-trained corpus includes psychological counseling visitors' statements about psychological problems and counselors' empathic responses. Segment1 of the visitor’s statement, denoted as S1, Segment2 of the consultant’s reply, denoted as S2, with special marks [CLS] and [SEP] added, in the form of “[CLS]S1[SEP]S2[SEP]”. As shown in Figure 2, the input representation of the model consists of the sum of SegmentEmbedding, Position Embedding, and Token Embedding.

模型预训练，输入Embedding向量，每层Transformer编码输入向量，使用多头注意力机制聚合上层输入，通过掩码矩阵控制每个词或者位置能够注意的范围，得到当前位置对其他位置的注意力分布，计算出解码器当前位置的特征向量。Model pre-training, input Embedding vector, each layer of Transformer encodes the input vector, uses multi-head attention mechanism to aggregate the upper layer input, controls the range of attention of each word or position through the mask matrix, and obtains the attention distribution of the current position to other positions, Calculate the eigenvector for the current position of the decoder.

生成的词向量对t时刻的文本特征向量XInput的注意力分布At如下：The attention distribution At of the generated word vector to the text feature vector XInput at time t is as follows:

t时刻解码器输出的特征向量XOutput如下：The feature vector XOutput output by the decoder at time t is as follows:

X_Output＝A_t*W_v*X_Intput X _Output = A _t *W _v *X _Input

其中，Xt是t时刻的目标向量；XInput是t时刻文本特征向量；M是掩码矩阵，作用是控制词注意力范围；dk是词向量的维度；Wq、Wk、Wv是学习参数。Among them, Xt is the target vector at time t; XInput is the text feature vector at time t; M is the mask matrix, which is used to control the word attention range; dk is the dimension of the word vector; Wq, Wk, and Wv are learning parameters.

Softmax函数将分数s的向量映射为概率分布，其定义如下：The Softmax function maps a vector of scores s to a probability distribution, which is defined as follows:

其中，i表示输出节点的编号；si是第i个节点的输出值；n是输出节点的个数，即分类的类别个数。Among them, i represents the number of the output node; si is the output value of the i-th node; n is the number of output nodes, that is, the number of classification categories.

进一步，对模型预测结果XOutput，记作s，和被掩盖的原分词st计算交叉熵损失来优化模型的参数。交叉熵函数定义如下：Further, calculate the cross-entropy loss for the model prediction result XOutput, denoted as s, and the masked original word st to optimize the parameters of the model. The cross entropy function is defined as follows:

预处理过程：将预处理好的数据输入模型进行训练，一共训练20个Epoch，Dropout为0.1，隐向量维度为768，学习率Learning_rate为2e-5，Epochs为20，批处理大小Batch_size为32，注意力头数为12，隐藏层数为12，嵌入层为12，隐藏层单元数为768，词表大小为21128。最大输入长度设置为512，最大生成共情回复的长度设置为40，使用交叉熵函数计算损失。Preprocessing process: Input the preprocessed data into the model for training. A total of 20 Epochs are trained, the Dropout is 0.1, the hidden vector dimension is 768, the learning rate Learning_rate is 2e-5, the Epochs is 20, and the batch size Batch_size is 32. The number of attention heads is 12, the number of hidden layers is 12, the number of embedding layers is 12, the number of hidden layer units is 768, and the vocabulary size is 21128. The maximum input length is set to 512, the maximum generated empathy reply length is set to 40, and the loss is calculated using the cross-entropy function.

完成预训练后，使用UniLM的序列到序列语言模型进行微调，进行共情回复生成任务。在解码时，例如：用户输入一句内心心理问题的陈述“X1”，当t＝1时刻输入序列“[CLS]X1[SEP]Y1[MASK]”，在序列末尾加上“[MASK]”，其对应的特征表示预测下一个词。“[CLS]X1[SEP]”是已知的源序列，在编码阶段能互相看到句子内上下文信息。“Y1[MASK]”是预测的目标序列，在解码阶段能看到源序列的信息和目标序列其左侧部分的信息。模型通过掩码矩阵将编码器和解码器融合在一起。After completing the pre-training, use UniLM's sequence-to-sequence language model for fine-tuning to perform empathy response generation tasks. When decoding, for example: the user enters a statement "X1" of inner psychological problems, and when t=1, enters the sequence "[CLS]X1[SEP]Y1[MASK]", and adds "[MASK]" at the end of the sequence, Its corresponding feature represents predicting the next word. "[CLS]X1[SEP]" is a known source sequence, and the context information in the sentence can be seen from each other in the encoding stage. "Y1[MASK]" is the predicted target sequence. In the decoding stage, the information of the source sequence and the information of the left part of the target sequence can be seen. The model fuses the encoder and decoder together via a mask matrix.

语料样本在经过UniLM模型编码后，得到一个sequence length X hidden size矩阵，第一行是[CLS]的特征表示，第二行是X1的特征表示，依次类推。在解码阶段，使用[MASK]特征表示经过线性层，使用Softmax函数来获得词汇表中词的概率分布，并选择概率最大的词作为解码得到的单词，重复以上步骤，当生成[SEP]时停止，得到t时刻解码器输出的特征向量XOutput。具体计算如下：After the corpus sample is encoded by the UniLM model, a sequence length X hidden size matrix is obtained. The first row is the feature representation of [CLS], the second row is the feature representation of X1, and so on. In the decoding stage, use the [MASK] feature to represent the linear layer, use the Softmax function to obtain the probability distribution of words in the vocabulary, and select the word with the highest probability as the word obtained by decoding, repeat the above steps, and stop when [SEP] is generated , to get the feature vector XOutput output by the decoder at time t. The specific calculation is as follows:

XOutput经过两次线性变换、Softmax函数得到词表概率分布Pv。XOutput obtains the vocabulary probability distribution Pv after two linear transformations and the Softmax function.

P_v＝Softmax(W^′(W*X_Output+b)+b^′)P _v ＝Softmax(W ^′ (W*X _Output +b)+b ^′ )

其中，W^′、W、b、b^′是可学习参数。Among them, W ^′ , W, b, b ^′ are learnable parameters.

引入生成概率Pg，表示从词表中生成词的概率；引入复制概率Pc，表示从源文本中复制词的概率，其中Pg+Pc＝1。将XOutput、At、Xt通过全连接层和Sigmoid函数计算得到Pg。The generation probability Pg is introduced to represent the probability of generating words from the vocabulary; the copy probability Pc is introduced to represent the probability of copying words from the source text, where Pg+Pc=1. Calculate Pg by calculating XOutput, At, and Xt through the fully connected layer and the Sigmoid function.

P_g＝Sigmoid(W[X_t，X_Output，A_t]+b)P _g ＝Sigmoid(W[X _t , X _Output , A _t ]+b)

其中，W、b是可学习的参数。Among them, W and b are learnable parameters.

进一步计算更新改进后的词表概率分布：Further calculate the updated and improved vocabulary probability distribution:

P(w)＝P_g*P_v(w)+P_c*A_t P(w)＝P _g *P _v (w)+P _c *A _t

其中，当w不是词表中的词时，P_v(w)＝0，预测的词从源序列中生成；当w不是源序列中的词时，A_t＝0，预测的词从词表中生成。Copy机制从源序列复制情绪关键词和复杂事件细节(概率高的词)作为生成的共情回复的一部分，在一定程度上可以控制共情回复生成的准确性。Copy机制还在一定程度上起到了动态扩充词表的作用，来降低生成未登录词的概率。Among them, when w is not a word in the vocabulary, P _v (w)=0, the predicted word is generated from the source sequence; when w is not a word in the source sequence, A _t =0, the predicted word is generated from the vocabulary generated in. The copy mechanism copies emotional keywords and complex event details (words with high probability) from the source sequence as part of the generated empathy reply, which can control the accuracy of empathy reply generation to a certain extent. The copy mechanism also plays a role in dynamically expanding the vocabulary to a certain extent to reduce the probability of generating unregistered words.

将Beam size设置为1，使用Beam Search算法搜索接近最优的目标序列，生成共情回复。对生成的共情回复进行评价，将符合标准的共情回复以及用户的陈述放到原始语料中进行复式自动迭代训练，增强训练数据，得到更新优化后的中文共情回复生成模型。Set the Beam size to 1, use the Beam Search algorithm to search for a target sequence that is close to the optimal, and generate an empathetic reply. Evaluate the generated empathy replies, put the standard empathy replies and user statements into the original corpus for compound automatic iterative training, enhance the training data, and obtain an updated and optimized Chinese empathy reply generation model.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，都应涵盖在本发明的保护范围之内。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Anyone familiar with the technical field within the technical scope disclosed in the present invention, whoever is within the spirit and principles of the present invention Any modifications, equivalent replacements and improvements made within shall fall within the protection scope of the present invention.

Claims

1. A Chinese co-emotion reply generation method based on a UniLM model and a Copy mechanism is characterized in that emotion keywords and complex event details in a source sequence are copied into output; and evaluating the output co-emotion replies by using evaluation standards such as confusion, putting the replies meeting expectations and user statements into an original training corpus to perform compound automatic iterative training, and obtaining a further updated and optimized co-emotion reply generation model.

2. The method for generating Chinese co-emotion reply based on UniLM model and Copy mechanism as claimed in claim 1, which is characterized by comprising the following steps:

step 1, crawling corpus with co-emotion capacity in the field of the psychology dialogue by using a crawler technology, and preprocessing to obtain input representation;

step 2, pre-training is carried out based on a UniLM model, and simultaneously three types of language models are used, wherein each language model uses a different self-attention mask mechanism;

step 3, calculating loss by using a cross entropy loss function, and completing pre-training based on a UniLM model to obtain a co-condition reply generation model;

step 4, carrying out a co-emotion reply generation task based on the UniLM model, and decoding through a self-attention mechanism from a sequence to a sequence language model to obtain vocabulary probability distribution;

step 5, constructing a decoder containing a Copy mechanism on the basis of the step 4, introducing generation probability and Copy probability, and optimizing vocabulary probability distribution in the step 4;

step 6, using the cross entropy loss function as a loss function of the model, and obtaining a generated co-situation reply by using a beam search algorithm;

and 7, putting the generated high-quality co-emotion replies and the statements of the users into the corpus in the step 1, and further carrying out compound automatic iterative training based on the UniLM model to obtain an updated and optimized co-emotion reply generation model.

3. The method for generating Chinese co-emotion replies based on UniLM model and Copy mechanism as claimed in claim 2, wherein step 2 specifically comprises: initializing parameters by using a BERT-BASE pre-training model; based on the same transducer network structure, different MASK is predicted as a pre-training target, the prediction tasks of unidirectional, bidirectional and sequence-to-sequence language models are completed, and different language models are uniformly distributed and used.

4. The method for generating Chinese co-emotion replies based on UniLM model and Copy mechanism as claimed in claim 2, wherein step 4 specifically comprises: using a self-attention MASK mechanism of a sequence-to-sequence language model to randomly segment words in a MASK target sequence and MASK the end of the sequence to learn when to stop generating a co-emotion reply; taking the maximum word segmentation probability under the condition of given context information as a training target, fusing encoding and decoding by using a MASK mechanism, and obtaining text feature vectors by combining an attention mechanism; and inputting the feature vectors obtained by decoding into a full-connection layer, and obtaining vocabulary probability distribution by using a Softmax function.

5. The method for generating Chinese co-emotion reply based on UniLM model and Copy mechanism as claimed in claim 2, wherein step 5 specifically comprises: inputting the vocabulary probability obtained in the last step into a full-connection layer and a Sigmoid layer to obtain a generation probability, introducing a replication probability, and fusing the generation probability and the replication probability to obtain updated and improved vocabulary probability distribution; the Copy mechanism effectively copies emotion keywords and complex event details input by a user into output, accuracy of details in generated co-emotion replies is improved, and meanwhile probability of generating unregistered words can be effectively reduced.

6. The method for generating Chinese co-emotion replies based on UniLM model and Copy mechanism as claimed in claim 2, wherein step 7 specifically comprises: and (3) evaluating the co-emotion replies generated in the step (6) through evaluation standards such as confusion, automatically putting the expected co-emotion replies and user input into the corpus in the step (1) for iterative training, enhancing training data, and obtaining an updated and optimized co-emotion reply generation model.

7. A chinese co-emotion reply generation system based on the generation method of any one of claims 1 to 6, comprising:

the detail copying module is used for copying emotion keywords and complex event details in the source sequence into output;

and the co-emotion reply generation model module is used for evaluating the output co-emotion replies by using evaluation standards such as confusion, and putting expected replies and user statements into the original training corpus to perform compound automatic iterative training so as to obtain a further updated and optimized co-emotion reply generation model.

8. A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method of improving pedestrian detection of a YOLOv4 network according to any one of claims 1-5.

9. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the improved method of pedestrian detection in a YOLOv4 network according to any one of claims 1-5.

10. An information data processing terminal for realizing the pedestrian detection system of the improved YOLOv4 network according to claim 1.