CN110826334B

CN110826334B - Chinese named entity recognition model based on reinforcement learning and training method thereof

Info

Publication number: CN110826334B
Application number: CN201911089295.3A
Authority: CN
Inventors: 叶梅; 卓汉逵
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2023-04-21
Anticipated expiration: 2039-11-08
Also published as: CN110826334A

Abstract

The invention relates to a Chinese named entity recognition model based on reinforcement learning and a training method thereof, wherein the model comprises a strategy network module, a word segmentation recombination network and a named entity recognition network module; firstly, a strategy network designates an action sequence, then a word segmentation and recombination network executes actions in the action sequence one by one, a phrase is obtained through 'terminating' actions, the phrase is used as auxiliary input information, a lattice-LSTM modeling is carried out to obtain a hidden state sequence, the hidden state is input into a named entity recognition network, a label sequence of sentences is obtained, and a recognition result is used as a delay rewarding guiding strategy network module to update. The invention effectively divides sentences by reinforcement learning, avoids modeling redundant interference words matched in sentences, effectively avoids the influence on dependence and long text of an external dictionary, can better utilize the correct word information, and better helps a Chinese named entity recognition model to improve recognition effect.

Description

A Chinese named entity recognition model based on reinforcement learning and its training method

技术领域Technical Field

本发明涉及机器学习领域，更具体地，涉及一种基于强化学习的中文命名实体识别模型及其训练方法。The present invention relates to the field of machine learning, and more specifically, to a Chinese named entity recognition model based on reinforcement learning and a training method thereof.

背景技术Background Art

命名实体识别(named entity recognition，NER)是自然语言处理领域里的一项基础性任务，是指从文本中识别出命名性指称项，为关系抽取、问答系统、句法分析、机器翻译等任务做铺垫，在自然语言处理技术走向实用化的过程中占有重要地位。一般来说，命名实体识别任务就是识别出待处理文本中三大类(实体类、时间类和数字类)、七小类(人名、组织机构名、地名、时间、日期、货币和百分比)命名实体。Named entity recognition (NER) is a basic task in the field of natural language processing. It refers to identifying named referents from text, paving the way for tasks such as relationship extraction, question-answering systems, syntactic analysis, and machine translation. It plays an important role in the process of natural language processing technology becoming practical. Generally speaking, the task of named entity recognition is to identify three major categories (entity category, time category, and number category) and seven subcategories (names, organization names, place names, time, date, currency, and percentage) of named entities in the text to be processed.

现有的一个中文命名实体识别模型为lattice-LSTM，该模型除了会输入句子中的每个字外，还会将以该字为词尾的所有潜在词的细胞向量作为输入，这些潜在词的选择取决于外部词典，另外增加一个补充门来控制字粒度信息和词粒度信息的选取，将输入向量由字信息、上一个隐藏状态向量和上一个细胞状态向量更改为字信息、上一个隐藏状态向量和以该字为词尾的所有词信息。该模型的优势在于，在基于字序列标记的模型中能够利用显式词信息，而且不会遇到分词错误。An existing Chinese named entity recognition model is lattice-LSTM. In addition to inputting each word in the sentence, the model also takes the cell vectors of all potential words ending with the word as input. The selection of these potential words depends on the external dictionary. In addition, a supplementary gate is added to control the selection of word granularity information and word granularity information, and the input vector is changed from word information, the previous hidden state vector, and the previous cell state vector to word information, the previous hidden state vector, and all word information ending with the word. The advantage of this model is that it can use explicit word information in a model based on word sequence tagging, and will not encounter word segmentation errors.

然而，正因为lattice-LSTM模型使用了句子中所有词的信息，导致句子中相邻字组成的词如果存在于外部词典中，就会被作为登录的词粒度信息输入到模型中，但该词在该句子中并不一定是正确的划分，比如：“南京市长江大桥”根据这个模型的思想，会按顺序将字组成的登录词作为输入，登录词的意思是该词是外部词典中已收录的名词，那么该模型中会将“南京”、“南京市”、“市长”、“长江”、“大桥”以及“长江大桥”作为登录词输入，但很显然“市长”这个词在本句中是个干扰性的词语，用了它的词信息对实体识别是有负面影响的。此外，该模型通常需要根据实验所用的数据集自主构造外部词典，对外部词典具有严重的依赖性。同时，当文本长度增加时，句子中潜在词语数量也会随之增加，会大大提升模型的复杂度。However, because the lattice-LSTM model uses the information of all words in the sentence, if the words composed of adjacent characters in the sentence exist in the external dictionary, they will be input into the model as the word granularity information of the login, but the word is not necessarily correctly divided in the sentence. For example, according to the idea of "Nanjing Yangtze River Bridge", the login words composed of characters will be input in order according to the idea of this model. The login word means that the word is a noun included in the external dictionary. Then the model will input "Nanjing", "Nanjing City", "Mayor", "Yangtze River", "Bridge" and "Yangtze River Bridge" as login words, but it is obvious that the word "Mayor" is a disturbing word in this sentence, and its word information has a negative impact on entity recognition. In addition, the model usually needs to independently construct an external dictionary based on the data set used in the experiment, and has a serious dependence on the external dictionary. At the same time, when the length of the text increases, the number of potential words in the sentence will also increase, which will greatly increase the complexity of the model.

发明内容Summary of the invention

本发明为克服上述现有技术中对句子里匹配出的多余干扰词语进行建模，以及依赖外部词典和受长文本影响的问题，提供一种基于强化学习的中文命名实体识别模型及其训练方法，通过构建一个强化学习模型来学习句子的内在关系，有效地学习到与命名实体识别任务相关的句子划分方法，从而对句子进行切割，实现有效的句子划分。这样就可以有效避免输入干扰词和使用外部词典，同时当文本长度增加时减少了句子中的词语数量，利用这些正确的词语信息能够更好地帮助中文命名实体识别模型提高它的识别正确率。In order to overcome the problems of modeling the redundant interference words matched in the sentence in the above-mentioned prior art, relying on external dictionaries and being affected by long texts, the present invention provides a Chinese named entity recognition model based on reinforcement learning and its training method, by building a reinforcement learning model to learn the internal relationship of the sentence, effectively learning the sentence division method related to the named entity recognition task, so as to cut the sentence and achieve effective sentence division. In this way, it is possible to effectively avoid inputting interference words and using external dictionaries, and at the same time, when the text length increases, the number of words in the sentence is reduced, and the use of these correct word information can better help the Chinese named entity recognition model improve its recognition accuracy.

为解决上述技术问题，本发明采用的技术方案是：提供一种基于强化学习的中文命名实体识别模型，包括策略网络模块、分词重组网络和命名实体识别网络模块；In order to solve the above technical problems, the technical solution adopted by the present invention is: to provide a Chinese named entity recognition model based on reinforcement learning, including a strategy network module, a word segmentation and reorganization network and a named entity recognition network module;

所述策略网络模块用于采用随机策略，在各个状态空间下对句子中的每个字采样一个动作，从而对整个句子得到一个动作序列，并根据中文命名实体识别网络的识别结果获得延时奖励，以指导策略网络模块更新；The strategy network module is used to adopt a random strategy to sample an action for each word in the sentence in each state space, so as to obtain an action sequence for the entire sentence, and obtain a delayed reward according to the recognition result of the Chinese named entity recognition network to guide the update of the strategy network module;

分词重组网络，用于根据所述策略网络模块输出的动作序列，对句子进行划分，将句子切割成一个个短语，将短语进行编码和该短语的最后一个字的编码向量结合，从而得到句子的lattice-LSTM表征；The word segmentation and reorganization network is used to divide the sentence according to the action sequence output by the strategy network module, cut the sentence into phrases, encode the phrase and combine it with the encoding vector of the last word of the phrase, so as to obtain the lattice-LSTM representation of the sentence;

命名实体识别网络模块，用于将所述句子的lattice-LSTM表征的隐藏状态输入到CRF(conditional random field，条件随机场)层中，最后得到命名实体识别结果，并根据识别结果计算得到一个损失值用来训练命名实体识别模型，同时将该损失值作为延迟奖励指导所述策略网络模块的更新。The named entity recognition network module is used to input the hidden state of the lattice-LSTM representation of the sentence into the CRF (conditional random field) layer, and finally obtain the named entity recognition result, and calculate a loss value based on the recognition result to train the named entity recognition model, and use the loss value as a delayed reward to guide the update of the strategy network module.

优选的，所述动作包括内部或终止。Preferably, the action comprises internal or terminate.

优选的，所述随机策略为：Preferably, the random strategy is:

π(a_t|s_t；θ)＝σ(W*s_t+b)π(a _t |s _t ; θ)=σ(W*s _t +b)

其中，π(a_t|s_t；θ)表示选择动作a_t的概率；θ＝{W,b}，表示策略网络的参数；s_t为t时刻下策略网络的状态。Among them, π( _at | _st ; θ) represents the probability of selecting action _at ; θ = {W, b}, represents the parameters of the policy network; _st is the state of the policy network at time t.

优选的，所述分词重组网络根据所述策略网络模块输出的动作序列，对句子进行切割得到短语，并将每个短语进行编码，分别作为相应短语最后一字处细胞状态的输入，得到句子的lattice-LSTM表征。Preferably, the word segmentation and recombination network cuts the sentence into phrases according to the action sequence output by the strategy network module, and encodes each phrase as the input of the cell state at the last word of the corresponding phrase to obtain a lattice-LSTM representation of the sentence.

优选的，所述命名实体识别网络模块通过将分词重组网络得到的lattice-LSTM的输出输入到CRF层中，并利用CRF层的特征函数集给该句子的每个标注序列评分并对这个分数进行指数化和标准化，使用一阶Viterbi算法计算所有可能的标注序列，得分最高的序列作为最终输出，将损失函数的值反向传播进行参数训练，同时将该损失值作为延迟奖励更新策略网络模块；损失函数定义为带有L2正则项的句子层面的对数似然，如下：Preferably, the named entity recognition network module inputs the output of the lattice-LSTM obtained by the word segmentation and recombination network into the CRF layer, and uses the characteristic function set of the CRF layer to score each label sequence of the sentence and index and standardize the score, uses the first-order Viterbi algorithm to calculate all possible label sequences, and the sequence with the highest score is used as the final output, and the value of the loss function is back-propagated for parameter training, and the loss value is used as the delayed reward update strategy network module; the loss function is defined as the sentence-level log-likelihood with an L2 regularization term, as follows:

其中，λ为L₂正则项系数；θ表示参数集；s和y分别表示句子和该句子对应的标注序列。Among them, λ is the coefficient of the _L2 regularization term; θ represents the parameter set; s and y represent the sentence and the annotation sequence corresponding to the sentence respectively.

还提供一种基于强化学习的中文命名实体识别模型及其训练方法的训练方法，用于训练上述的中文命名实体识别模型，包括以下步骤：A reinforcement learning-based Chinese named entity recognition model and a training method thereof are also provided, which are used to train the above-mentioned Chinese named entity recognition model, comprising the following steps:

步骤一：将用于训练的句子数据输入策略网络模块，策略网络模块在各个状态空间下对句子中的每个字采样一个动作，输出整个句子的动作序列；Step 1: Input the sentence data for training into the policy network module. The policy network module samples an action for each word in the sentence in each state space and outputs the action sequence of the entire sentence.

步骤二：分词重组网络根据所述策略网络模块输出的动作序列，对句子进行划分，将句子切割成一个个短语，将短语进行编码和该短语的最后一个字的编码向量结合，从而得到字的lattice-LSTM表征；Step 2: The word segmentation and reorganization network divides the sentence according to the action sequence output by the strategy network module, cuts the sentence into phrases, encodes the phrase and combines it with the encoding vector of the last word of the phrase to obtain the lattice-LSTM representation of the word;

步骤三：命名实体识别网络从所述分词重组网络得到的隐藏状态输入到CRF层中，最后得到命名实体识别结果，并根据识别结果计算得到一个损失值用来训练命名实体识别模型，同时将该损失值作为延迟奖励指导所述策略网络模块的更新；Step 3: The named entity recognition network inputs the hidden state obtained from the word segmentation and recombination network into the CRF layer, and finally obtains the named entity recognition result, and calculates a loss value based on the recognition result to train the named entity recognition model, and uses the loss value as a delayed reward to guide the update of the strategy network module;

句子通过lattice-LSTM模型进行表征，就会得到句子中每个字的隐藏状态向量h_i，然后将该状态向量序列H＝{h₁,h₂,…,h_n}输入CRF层；令y＝l₁,l₂,…,l_n表示CRF层的输出标签，输出标签序列概率通过下式计算：The sentence is represented by the lattice-LSTM model, and the hidden state vector h _i of each word in the sentence is obtained. Then the state vector sequence H = {h ₁ ,h ₂ ,…, _hn } is input into the CRF layer; let y = l ₁ ,l ₂ ,…,l _n represent the output label of the CRF layer, and the output label sequence probability is calculated by the following formula:

其中，s表示句子；

是针对于l_i的模型参数；

是针对于l_i-1和l_i的偏置参数；y′表示所有可能的输出标签集合。Among them, s represents a sentence;

is the model parameter for l _i ;

is the bias parameter for li _-1 and _li ; y′ represents the set of all possible output labels.

损失值函数的计算公式为：The calculation formula of the loss value function is:

其中，λ为L₂正则项系数；θ表示参数集；s和y分别表示句子和该句子对应的正确的标注序列；P表示为句子s标注为序列y的概率，即标注正确的概率。Among them, λ is the coefficient of the _L2 regularization term; θ represents the parameter set; s and y represent the sentence and the correct annotation sequence corresponding to the sentence respectively; P represents the probability that sentence s is annotated with sequence y, that is, the probability of correct annotation.

优选的，在所述步骤一中，所述动作包括内部或终止，随机策略的公式如下：Preferably, in step 1, the action includes internal or termination, and the formula of the random strategy is as follows:

π(a_t|s_t；θ)＝ρ(W*s_t+b)π(a _t |s _t ; θ)=ρ(W*s _t +b)

优选的，在所述步骤二中，字通过LSTM来进行字符层面的表征，更新公式如下所示：Preferably, in step 2, the word is represented at the character level by LSTM, and the update formula is as follows:

其中，

表示LSTM的转换函数；x_t表示句子t时刻输入的字的编码向量；

和

分别表示时刻t时的细胞状态和隐藏状态。in,

represents the conversion function of LSTM; _xt represents the encoding vector of the word input at time t of the sentence;

and

Represent the cell state and hidden state at time t respectively.

在完成句子的划分后，将短语信息整合进基于字粒度的LSTM模型中，基于字粒度的LSTM模型是基本的循环LSTM函数，如下：After completing the sentence division, the phrase information is integrated into the word-granularity-based LSTM model. The word-granularity-based LSTM model is a basic recurrent LSTM function, as follows:

其中，

表示句子中第j个字的编码向量；

表示句子第j-1个字时刻的隐藏状态；W^cT和b^c是模型参数；

分别代表输入、忘记和输出门；

表示新的候选状态；

表示句子第j-1个字时刻的细胞状态；

表示更新后的细胞状态；

表示句子第j个字时刻的隐藏状态，由输出门

和当前时刻的细胞状态

决定；σ()表示sigmoid函数；tanh()表示双曲正切激活函数。in,

Represents the encoding vector of the jth word in the sentence;

represents the hidden state of the sentence at the j-1th word moment; W ^cT and b ^c are model parameters;

Represent the input, forget and output gates respectively;

Indicates a new candidate state;

Represents the cell state at the j-1th word of the sentence;

Represents the updated cell state;

Represents the hidden state of the jth word in the sentence, which is represented by the output gate

and the current cell state

decision; σ() represents the sigmoid function; tanh() represents the hyperbolic tangent activation function.

短语信息通过没有输出门的LSTM模型进行表征，具体的公式如下：The phrase information is represented by an LSTM model without an output gate. The specific formula is as follows:

其中，

表示句子中从第b个字开始到第e个字结束的短语的编码向量；

表示句子第b个字时刻的隐藏状态，即短语第一个字的隐藏状态；W^wT和b^w是模型参数；

分别代表输入和忘记门；

表示新的候选状态；

表示短语第一个字的细胞状态；

表示更新后的细胞状态；σ()表示sigmoid函数；tanh()表示双曲正切激活函数。in,

The encoding vector representing the phrase starting from the bth word and ending at the eth word in the sentence;

represents the hidden state of the sentence at the bth word, that is, the hidden state of the first word of the phrase; W ^wT and b ^w are model parameters;

Represent the input and forget gates respectively;

Indicates a new candidate state;

The state of the cell representing the first word of the phrase;

Represents the updated cell state; σ() represents the sigmoid function; tanh() represents the hyperbolic tangent activation function.

另外增加一个附加门对字粒度和词粒度信息进行选取，输入为字的编码向量和以该字结尾的短语的细胞状态，公式定义如下：In addition, an additional gate is added to select the word granularity and term granularity information. The input is the encoding vector of the word and the cell state of the phrase ending with the word. The formula is defined as follows:

其中，

表示句子中第e个字的编码向量；

表示从第b个字开始到第e个字结束的短语的细胞状态，即句子中以第e个字为词尾的短语的细胞状态；W^lT和b^l是模型参数；

表示附加门；σ()表示sigmoid函数。in,

Represents the encoding vector of the e-th word in the sentence;

represents the cell state of the phrase starting from the bth word and ending with the eth word, that is, the cell state of the phrase ending with the eth word in the sentence; W ^lT and b ^l are model parameters;

represents an additional gate; σ() represents the sigmoid function.

的更新方式就变了，隐藏状态的更新没有变化，基于lattice-LSTM模型的表征最终公式如下：

The update method of has changed, and the update of the hidden state has not changed. The final representation formula based on the lattice-LSTM model is as follows:

其中，

为第j个字的输入门向量；

为从b开始以j结尾的短语的输入门向量；

为短语细胞状态；

为字的新候选细胞状态；

为短语信息向量；

为字信息向量。in,

is the input gate vector of the jth word;

is the input gate vector of the phrase starting from b and ending with j;

for the phrase cell state;

is the new candidate cell state for the word;

is the phrase information vector;

is the word information vector.

优选的，在进行步骤一前，在进行步骤一前，预训练命名实体识别网络及其网络参数，此时命名实体识别网络用到的词是通过简单的启发式算法对原始句子进行划分得到的词语；Preferably, before performing step 1, a named entity recognition network and its network parameters are pre-trained, wherein the words used in the named entity recognition network are words obtained by dividing the original sentence through a simple heuristic algorithm;

将实体识别网络预训练好的部分网络参数暂时定为命名实体识别网络的网络参数，再进行策略网络的预训练，最后联合训练整个网络参数。The pre-trained network parameters of the entity recognition network are temporarily set as the network parameters of the named entity recognition network, and then the policy network is pre-trained, and finally the entire network parameters are jointly trained.

与现有技术相比，本发明的有益效果是：本发明一种基于强化学习的中文命名实体识别模型及其方法通过利用强化学习对句子进行有效划分，避免对句子中匹配出的多余干扰词语进行建模，以及有效避免对外部词典的依赖和长文本的影响，本发明能够更好地利用这些正确的词语信息，更好地帮助中文命名实体识别模型提高识别效果。Compared with the prior art, the beneficial effects of the present invention are: a Chinese named entity recognition model based on reinforcement learning and its method of the present invention effectively divide sentences by utilizing reinforcement learning, avoid modeling redundant interference words matched in sentences, and effectively avoid dependence on external dictionaries and the influence of long texts. The present invention can better utilize these correct word information and better help the Chinese named entity recognition model to improve the recognition effect.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明的一种基于强化学习的中文命名实体识别模型的架构示意图；FIG1 is a schematic diagram of the architecture of a Chinese named entity recognition model based on reinforcement learning of the present invention;

图2是本发明的一种基于强化学习的中文命名实体识别模型的策略网络模块的架构示意图；FIG2 is a schematic diagram of the architecture of a strategy network module of a Chinese named entity recognition model based on reinforcement learning of the present invention;

图3是本发明的一种基于强化学习的中文命名实体识别模型的命名实体识别网络模块的架构示意图；FIG3 is a schematic diagram of the architecture of a named entity recognition network module of a Chinese named entity recognition model based on reinforcement learning of the present invention;

图4是本发明的一种基于强化学习的中文命名实体识别模型的训练方法的流程图；FIG4 is a flow chart of a method for training a Chinese named entity recognition model based on reinforcement learning of the present invention;

图5是本发明的一种基于强化学习的中文命名实体识别模型的训练方法的句子分词示例图。FIG5 is a sentence segmentation example diagram of a training method for a Chinese named entity recognition model based on reinforcement learning according to the present invention.

具体实施方式DETAILED DESCRIPTION

附图仅用于示例性说明，不能理解为对本专利的限制；为了更好说明本实施例，附图某些部件会有省略、放大或缩小，并不代表实际产品的尺寸；对于本领域技术人员来说，附图中某些公知结构及其说明可能省略是可以理解的。附图中描述位置关系仅用于示例性说明，不能理解为对本专利的限制。The drawings are only for illustrative purposes and cannot be construed as limiting the present invention. To better illustrate the present embodiment, some parts of the drawings may be omitted, enlarged, or reduced, and do not represent the size of the actual product. For those skilled in the art, it is understandable that some well-known structures and their descriptions may be omitted in the drawings. The positional relationships described in the drawings are only for illustrative purposes and cannot be construed as limiting the present invention.

本发明实施例的附图中相同或相似的标号对应相同或相似的部件；在本发明的描述中，需要理解的是，若有术语“上”、“下”、“左”、“右”“长”“短”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此附图中描述位置关系的用语仅用于示例性说明，不能理解为对本专利的限制，对于本领域的普通技术人员而言，可以根据具体情况理解上述术语的具体含义。The same or similar numbers in the drawings of the embodiments of the present invention correspond to the same or similar parts; in the description of the present invention, it should be understood that if the terms "upper", "lower", "left", "right", "long", "short" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, they are only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation. Therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and cannot be understood as limitations on this patent. For ordinary technicians in this field, the specific meanings of the above terms can be understood according to specific circumstances.

下面通过具体实施例，并结合附图，对本发明的技术方案作进一步的具体描述：The technical solution of the present invention is further described in detail below through specific embodiments and in conjunction with the accompanying drawings:

实施例1Example 1

如图1-3所示为一种基于强化学习的中文命名实体识别模型的实施例，包括策略网络模块、分词重组网络和命名实体识别网络模块；As shown in FIG1-3, an embodiment of a Chinese named entity recognition model based on reinforcement learning includes a strategy network module, a word segmentation and reorganization network and a named entity recognition network module;

策略网络模块用于采用随机策略，在各个状态空间下对句子中的每个字采样一个动作(动作包括内部或终止)，从而对整个句子得到一个动作序列，并根据中文命名实体识别网络的识别结果获得延时奖励，以指导策略网络模块更新；随机策略为：The policy network module is used to adopt a random strategy to sample an action (including internal or terminal) for each word in the sentence in each state space, so as to obtain an action sequence for the entire sentence, and obtain a delayed reward based on the recognition result of the Chinese named entity recognition network to guide the update of the policy network module; the random strategy is:

π(a_t|s_t；θ)＝σ(W*s_t+b)π(a _t |s _t ; θ)=σ(W*s _t +b)

分词重组网络，用于根据策略网络模块输出的动作序列，对句子进行划分，将句子切割成一个个短语，将短语进行编码和该短语的最后一个字的编码向量结合，从而得到句子的lattice-LSTM表达；The word segmentation and reorganization network is used to divide the sentence according to the action sequence output by the policy network module, cut the sentence into phrases, encode the phrases and combine them with the encoding vector of the last word of the phrase to obtain the lattice-LSTM expression of the sentence;

具体的，分词重组网络根据策略网络模块输出的动作序列，对句子进行切割得到短语，并将每个短语进行编码，分别作为相应短语最后一字处细胞状态的输入，得到句子的lattice-LSTM表征。Specifically, the word segmentation and reorganization network cuts the sentence into phrases according to the action sequence output by the strategy network module, and encodes each phrase as the input of the cell state at the last word of the corresponding phrase to obtain the lattice-LSTM representation of the sentence.

命名实体识别网络模块，用于将句子的lattice-LSTM表达的隐藏状态输入到条件随机场中，最后得到命名实体识别结果，并根据识别结果计算得到一个损失值用来训练命名实体识别模型，同时将该损失值作为延迟奖励指导所述策略网络模块的更新。其中，损失值的计算公式为，The named entity recognition network module is used to input the hidden state of the sentence's lattice-LSTM expression into the conditional random field, and finally obtain the named entity recognition result. A loss value is calculated based on the recognition result to train the named entity recognition model, and the loss value is used as a delayed reward to guide the update of the strategy network module. The calculation formula of the loss value is:

其中，λ为L₂正则项系数；θ表示参数集；s和y分别表示句子和该句子对应的正确标注序列；P表示为句子s标注为序列y的概率，即标注正确的概率。Among them, λ is the coefficient of the _L2 regularization term; θ represents the parameter set; s and y represent the sentence and the correct annotation sequence corresponding to the sentence respectively; P represents the probability that sentence s is annotated as sequence y, that is, the probability of correct annotation.

本实施例的工作原理：先是策略网络指定动作序列，然后分词重组网络会逐个执行该动作序列中的动作，通过“终止”动作得到一个短语，将该短语作为该短语的最后一个字的输入信息，进行lattice-LSTM建模得到隐状态序列，并将该隐状态输入到命名实体识别网络，得到句子的标签序列，并将识别结果作为延迟奖励指导策略网络模块的更新。The working principle of this embodiment is as follows: first, the strategy network specifies the action sequence, and then the word segmentation and recombination network executes the actions in the action sequence one by one, and obtains a phrase through the "termination" action. The phrase is used as the input information of the last word of the phrase, and lattice-LSTM modeling is performed to obtain a hidden state sequence, and the hidden state is input into the named entity recognition network to obtain the label sequence of the sentence, and the recognition result is used as a delayed reward to guide the update of the strategy network module.

本实施例的有益效果：本实施例是一种基于神经网络的LSTM-CRF模型的强化，结合了强化学习的框架，来学习句子的内在关系，高效划分句子结构，将得到的短语信息集成到基于字粒度的lattice-LSTM模型中，充分学习到字粒度信息及与之相关的词粒度信息，以达到更好的识别效果。Beneficial effects of this embodiment: This embodiment is an enhancement of the LSTM-CRF model based on a neural network, which combines the framework of reinforcement learning to learn the intrinsic relationship of sentences, efficiently divide the sentence structure, and integrate the obtained phrase information into the lattice-LSTM model based on word granularity, fully learning the word granularity information and related word granularity information to achieve better recognition effect.

实施例2Example 2

如图4所示为一种基于强化学习的中文命名实体识别模型的训练方法的实施例，用于训练实施例1所述的模型，包括以下步骤：FIG4 is an embodiment of a training method for a Chinese named entity recognition model based on reinforcement learning, which is used to train the model described in Example 1, and includes the following steps:

预处理：预训练命名实体识别网络及其网络参数，此时命名实体识别网络用到的词是通过简单的启发式算法对原始句子进行划分得到的词语；Preprocessing: Pretrain the named entity recognition network and its network parameters. The words used by the named entity recognition network are obtained by dividing the original sentence through a simple heuristic algorithm.

在步骤一中，状态、动作、策略定义如下：In step 1, the states, actions, and policies are defined as follows:

1、状态：当前输入的字的编码向量和该字之前的上下文向量；1. State: the encoding vector of the current input word and the context vector before the word;

2、动作：定义两者不同的操作，包括内部和终止；2. Action: define different operations for the two, including internal and termination;

3、策略：定义随机策略如下：3. Strategy: Define the random strategy as follows:

π(a_t|s_t；θ)＝σ(W*s_t+b)π(a _t |s _t ; θ)=σ(W*s _t +b)

如图5所示，将“美国的华盛顿”划分为“美国”“的”“华盛顿”。字通过LSTM来进行字符层面的表征，更新公式如下所示：As shown in Figure 5, "Washington, USA" is divided into "United States", "of", and "Washington". The characters are represented at the character level through LSTM, and the update formula is as follows:

其中，

和

分别表示时刻t时的细胞状态和隐藏状态。in,

and

Represent the cell state and hidden state at time t respectively.

其中，

表示句子中第j个字的编码向量；

表示句子第j-1个字时刻的隐藏状态；W^cT和b^c是模型参数；

分别代表输入、忘记和输出门；

表示新的候选状态；

表示句子第j-1个字时刻的细胞状态；

表示更新后的细胞状态；

表示句子第j个字时刻的隐藏状态；由输出门

和当前时刻的细胞状态

决定；σ()表示sigmoid函数，tanh()表示双曲正切激活函数。in,

Represents the encoding vector of the jth word in the sentence;

Represent the input, forget and output gates respectively;

Indicates a new candidate state;

Represents the cell state at the j-1th word of the sentence;

Represents the updated cell state;

Represents the hidden state of the jth word in the sentence; the output gate

and the current cell state

Decision; σ() represents the sigmoid function, and tanh() represents the hyperbolic tangent activation function.

分别代表输入和忘记门；

表示新的候选状态；

表示短语第一个字的细胞状态；

表示更新后的细胞状态；σ()表示sigmoid函数，tanh()表示双曲正切激活函数。

Represent the input and forget gates respectively;

Indicates a new candidate state;

The state of the cell representing the first word of the phrase;

Represents the updated cell state; σ() represents the sigmoid function, and tanh() represents the hyperbolic tangent activation function.

其中，

表示句子中第e个字的编码向量；

表示附加门；σ()表示sigmoid函数。in,

Represents the encoding vector of the e-th word in the sentence;

represents an additional gate; σ() represents the sigmoid function.

其中，

为第j个字的输入门向量；

为从b开始以j结尾的短语的输入门向量；

为短语细胞状态；

为字的新候选细胞状态；

为短语信息向量；

为字信息向量。in,

is the input gate vector of the jth word;

is the input gate vector of the phrase starting from b and ending with j;

for the phrase cell state;

is the new candidate cell state for the word;

is the phrase information vector;

is the word information vector.

其中，s表示句子；

是针对于l_i的模型参数；

is the model parameter for l _i ;

其中，λ为L₂正则项系数，θ表示参数集，s和y分别表示句子和该句子对应的正确的标注序列。Among them, λ is the coefficient of the _L2 regularization term, θ represents the parameter set, s and y represent the sentence and the correct annotation sequence corresponding to the sentence, respectively.

奖励的定义为：当通过策略网络采样到动作序列后，就可以得到句子的划分，将句子划分后得到的一个个短语作为词粒度信息加入到基于字粒度的LSTM模型中，得到基于lattice-LSTM模型的表征，将其输入到命名实体识别网络模块当中，通过CRF层得到每个字的实体标注，解码出实体标签，根据识别结果计算奖励值。由于要得到最后的识别结果才能计算该奖励值，因此这是一个延时奖励，利用该延迟奖励可以指导策略网络模块更新。The definition of reward is: after sampling the action sequence through the policy network, the sentence division can be obtained. The phrases obtained after the sentence division are added as word granularity information to the LSTM model based on the character granularity to obtain the representation based on the lattice-LSTM model, which is input into the named entity recognition network module, and the entity annotation of each word is obtained through the CRF layer, the entity label is decoded, and the reward value is calculated according to the recognition result. Since the reward value can only be calculated after the final recognition result is obtained, this is a delayed reward, which can be used to guide the update of the policy network module.

显然，本发明的上述实施例仅仅是为清楚地说明本发明所作的举例，而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明权利要求的保护范围之内。Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. For those skilled in the art, other different forms of changes or modifications can be made based on the above description. It is not necessary and impossible to list all the embodiments here. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A training method of a Chinese named entity recognition model based on reinforcement learning is characterized by comprising the following steps:

step one: inputting sentence data for training into a strategy network module, wherein the strategy network module samples each word in a sentence with one action under each state space, and outputs an action sequence of the whole sentence;

step two: dividing sentences by the word segmentation and recombination network according to the action sequence output by the strategy network module, breaking the sentences into phrases, and combining the codes of the phrases with the code vector of the last word of the phrases so as to obtain the law-LSTM representation of the word; the character is characterized in character level through LSTM, and each phrase is obtained according to termination, and the updated formula is as follows:

wherein ,

a transfer function representing LSTM; x is x _t A code vector representing a word input at time t of the sentence;

and

Respectively representing a cell state and a hidden state at a time t;

after the division of sentences is completed, phrase information is integrated into a word-granularity-based LSTM model, which is a basic cyclic LSTM function, as follows:

wherein ,

a code vector representing a j-th word in the sentence;

The hidden state of the j-1 th word moment of the sentence is represented; w (W) ^cT and b^c Is a model parameter;

Representing input, forget and output gates, respectively;

Representing a new candidate state;

Cell states representing the j-1 th word of a sentence;

Representing the updated cell state;

The hidden state of the j-th word moment of the sentence is represented; by the output door->

And the cell state at the present moment->

Determining; sigma () represents a sigmoid function, and tanh () represents a hyperbolic tangent activation function;

phrase information is characterized by an LSTM model without an output gate, and a specific formula is as follows:

wherein ,

a code vector representing a phrase in the sentence starting from the b-th word and ending with the w-th word;

The hidden state of the b word moment of the sentence is represented, namely the hidden state of the first word of the phrase; w (W) ^wT and b^w Is a model parameter;

Representing input and forget gates, respectively;

Representing a new candidate state;

A cell state representing the phrase first word;

Representing the updated cell state; sigma () represents a sigmoid function; tanh () represents a hyperbolic tangent activation function;

additionally, an additional gate is added to select the granularity of the word and the granularity information of the word, and the cell states of the code vector which is input as the word and the phrase ending with the word are input, and the formula is defined as follows:

wherein ,

a code vector representing an e-th word in the sentence;

Representing the cell state of a phrase starting from the b-th word and ending with the e-th word, i.e. the cell state of a phrase ending with the e-th word in a sentence; w (W) ^lT and b^l Is a model parameter;

Representing an additional door; sigma () represents a sigmoid function;

the updating mode of the hidden state is changed, the updating of the hidden state is unchanged, and the final representation formula based on the lattice-LSTM model is as follows:

wherein ,

an input gate vector for the j-th word;

An input gate vector that is a phrase ending with j starting with b;

Is the phrase cell state;

New candidate cell states for the word;

Is a phrase information vector;

is a word information vector;

step three: the hidden state obtained by the named entity recognition network from the word segmentation and recombination network is input into a conditional random field layer, a named entity recognition result is finally obtained, a loss value is obtained through calculation according to the recognition result and used for training a named entity recognition model, and meanwhile the loss value is used as a delay reward to guide the update of the strategy network module;

the sentence is characterized by a lattice-LSTM model, so that the hidden state vector h of each word in the sentence can be obtained _i The state vector sequence h= { H is then ₁ ,h ₂ ,…,h _n Inputting a conditional random field layer; let y=l ₁ ,l ₂ ,…,l _n The output label representing the conditional random field layer, the output label sequence probability is calculated by:

wherein s represents a sentence;

is directed to l _i Model parameters of (2);

Is directed to l _i-1 and l_i Is set to be a bias parameter of (a); y' represents all possible output tag sets;

the calculation formula of the loss value function is as follows:

wherein lambda is L ₂ A regularization term coefficient; θ represents a parameter set; s and y respectively represent sentences and correct labeling sequences corresponding to the sentences; p denotes the probability that the sentence s is labeled as sequence y, i.e. the probability that the label is correct.

2. The training method of a reinforcement learning-based chinese named entity recognition model of claim 1, wherein in said step one, said actions include internal or termination, and the formula of the random strategy is as follows:

π(a _t | _t ；)＝(W*s _t +)

wherein ,π(a_t | _t The method comprises the steps of carrying out a first treatment on the surface of the ) Representing selection action a _t Probability of (2); θ= { W, b }, representing parameters of the policy network; s is(s) _t The state of the strategy network at the moment t; sigma () represents a sigmoid function; w, b denotes network parameters.

3. The training method of a Chinese named entity recognition model based on reinforcement learning according to claim 1, wherein before the first step, the named entity recognition network and network parameters thereof are pre-trained, and words used by the named entity recognition network are words obtained by dividing an original sentence through a simple heuristic algorithm;

and (3) temporarily fixing the pre-trained partial network parameters of the entity identification network as the network parameters of the named entity identification network, then pre-training the strategy network, and finally jointly training the whole network parameters.

4. The Chinese named entity recognition model based on reinforcement learning is characterized by comprising a strategy network module, a word segmentation recombination network and a named entity recognition network module; training with the training method of the preceding claims 1-3;

the strategy network module is used for sampling an action for each word in the sentence under each state space by adopting a random strategy, so as to obtain an action sequence for the whole sentence;

the word segmentation and recombination network is used for dividing sentences according to the action sequence output by the strategy network module, breaking the sentences into phrases, and combining the codes of the phrases with the code vector of the last word of the phrases so as to obtain the lattice-LSTM expression of the sentences;

and the named entity recognition network module is used for inputting the hidden state of the language-LSTM expression of the sentence into the conditional random field, finally obtaining a named entity recognition result, calculating a loss value according to the recognition result to train the named entity recognition model, and simultaneously guiding the updating of the strategy network module by taking the loss value as a delay reward.

5. The reinforcement-learning-based chinese named entity recognition model of claim 4, wherein said actions comprise internal or termination.

6. The reinforcement-learning-based chinese named entity recognition model of claim 4, wherein said random strategy is:

π(a _t | _t ；)＝(W*s _t +)

7. The reinforcement learning-based Chinese named entity recognition model of claim 6, wherein the word segmentation and recombination network cuts sentences according to the action sequences output by the strategy network module to obtain phrases, and encodes each phrase as the cell state input at the last word of the corresponding phrase to obtain the language-LSTM representation of the sentences.

8. The reinforcement learning-based Chinese named entity recognition model of claim 7, wherein the named entity recognition network module inputs the output of lattice-LSTM obtained by the word segmentation and recombination network into a conditional random field layer, scores each labeling sequence of the sentence by using a feature function set of the conditional random field layer, indexes and normalizes the score, calculates all possible labeling sequences by using a first-order Viterbi algorithm, and the labeling sequence with the highest score is used as a final output. Meanwhile, defining a loss function, carrying out parameter training on the back propagation of the loss value, and taking the loss value as a delay rewarding updating strategy network module; the penalty function is defined as the log-likelihood of the sentence level with the L2 regularization term as follows: