CN114417874B - Chinese named entity recognition method and system based on graph attention network - Google Patents
Chinese named entity recognition method and system based on graph attention network Download PDFInfo
- Publication number
- CN114417874B CN114417874B CN202210083152.7A CN202210083152A CN114417874B CN 114417874 B CN114417874 B CN 114417874B CN 202210083152 A CN202210083152 A CN 202210083152A CN 114417874 B CN114417874 B CN 114417874B
- Authority
- CN
- China
- Prior art keywords
- chinese
- chinese sentence
- word
- named entity
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 239000013598 vector Substances 0.000 claims abstract description 158
- 238000004364 calculation method Methods 0.000 claims abstract description 9
- 230000003993 interaction Effects 0.000 claims description 47
- 238000010586 diagram Methods 0.000 claims description 33
- 238000002372 labelling Methods 0.000 claims description 27
- 239000011159 matrix material Substances 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 16
- 230000004927 fusion Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 230000015654 memory Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Probability & Statistics with Applications (AREA)
- Character Discrimination (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a Chinese named entity recognition method based on a graph attention network, which comprises the following steps: obtaining a Chinese sentence to be identified by a Chinese named entity, constructing a word vector set X corresponding to the Chinese sentence based on the obtained Chinese sentence, and inputting the word vector set X corresponding to the obtained Chinese sentence into a trained Chinese named entity identification model based on a graph attention network to obtain a Chinese named entity label corresponding to the Chinese sentence. The invention can solve the technical problems that word boundaries and entity boundaries are inconsistent and model input characteristics are single in the existing BiLSTM-CRF model and the traditional graph attention computing method in the existing collaborative graph network model based on the graph attention network damages the graph attention expression capability.
Description
Technical Field
The invention belongs to the technical field of entity identification, and particularly relates to a Chinese named entity identification method and system based on a graph attention network.
Background
Named Entity Recognition (NER) is a fundamental problem of natural language processing, and is the first step of a series of downstream tasks, such as relation extraction, knowledge graph construction, intent detection, and the like. The main goal of NER is to identify entities in unstructured text that have a specific meaning, mainly including names of people, places, institutions, proper nouns, etc., as well as words such as time, quantity, currency, scale values, etc.
Early named entity recognition was rolled out around rule-based and dictionary-based methods, but these methods are inefficient, cost-prohibitive, and require a lot of expertise, and models that perform better on NER are currently based on deep learning or statistical learning methods. Of these BiLSTM-CRF is a widely used architecture for the NER in english, which uses word-level representation and takes words as the basic unit of predictive labels, and chinese named entities are more difficult to recognize than the NER in english, thus deriving the use of word segmentation tools first and then execution of a word sequence-based markup model like english. In addition, the collaborative graph network model based on the graph attention network introduces the graph attention mechanism into NER for the first time, and lexical knowledge such as self-matching lexicon and recent context lexicon is integrated into the coding layer, so that the effect of identifying the named entities is further improved.
The traditional BiLSTM-CRF model still has the following problems after being improved by adding a Chinese word segmentation tool, and the performance of the model is poor: the first, word boundary is not necessarily an entity boundary: for example, "Beijing hometown museum" should be the physical location type as a whole, but would be divided into three words, namely, "Beijing", "hometown" and "museum", by the word segmentation tool; second, although Chinese word segmentation has advanced greatly due to the introduction of neural networks, existing models are far from perfect, and the characteristics considered by the models are single, which necessarily results in error propagation.
Collaborative graph network model based on graph attention network when using traditional graph attention network to calculate attention, static attention is obtained due to continuous linear calculation, and the expression capability of graph attention is damaged.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a Chinese naming entity identification method and a Chinese naming entity identification system based on a graph attention network, which aim to solve the technical problems that word boundaries and entity boundaries are inconsistent and model input characteristics are single in the existing BiLSTM-CRF model and the traditional graph attention calculation method in the existing collaborative graph network model based on the graph attention network damages the graph attention expression capability.
To achieve the above object, according to one aspect of the present invention, there is provided a method for identifying chinese named entities based on a graph attention network, comprising the steps of:
(1) And obtaining a Chinese sentence to be identified by the Chinese named entity.
(2) And (3) constructing a word vector set X corresponding to the Chinese sentence based on the Chinese sentence obtained in the step (1).
(3) Inputting the word vector set X corresponding to the Chinese sentence obtained in the step (2) into a trained Chinese named entity recognition model based on a graph attention network to obtain a Chinese named entity label corresponding to the Chinese sentence.
Preferably, step (2) first represents the chinese sentence as a sequence of characters s= { s 1,s2,…,sm }, where s m represents the mth character in the chinese sentence, where M e [1 ], the total number of characters M in the chinese sentence; then, for each character in the character sequence, the character is represented as a word vector X m=f(sm by looking up a character embedding matrix), and all word vectors form a word vector set X corresponding to the chinese sentence, where f is a character embedding lookup table, and is trained by the continuous word bag model CBOW.
Preferably, the Chinese named entity recognition model based on the graph attention network in the step (3) is obtained through training by the following steps:
And (3-1) acquiring a Chinese named entity recognition data set marked by adopting a BIOES marking scheme, and mapping the text of each Chinese sentence in the Chinese named entity recognition data set into a word vector to obtain a word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set.
And (3-2) inputting the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) into a bidirectional long-short-time memory BiLSTM model to obtain a preliminary feature vector of the word vector, and inputting the preliminary feature vector of the word vector into an improved graph and note force network GAT model to obtain a final feature vector corresponding to the Chinese sentence.
And (3-3) inputting the final feature vector corresponding to the Chinese sentence obtained in the step (3-2) into a conditional random field model for decoding to obtain a Chinese named entity label corresponding to the Chinese sentence, calculating a loss function of a Chinese named entity recognition model based on a graph attention network by using a labeling result, and training parameters of a BiLSTM model and a GAT model to obtain a trained Chinese named entity recognition model based on the graph attention network, wherein the trained Chinese named entity recognition model comprises the BiLSTM model in the step (3-2), the GAT model and the conditional random field model in the step (3-3).
Preferably, step (3-1) comprises the steps of:
(3-1-1) obtaining Chinese named entity recognition data sets of a plurality of fields, and labeling the Chinese named entity recognition data sets by using BIOES labeling schemes to obtain labeled Chinese named entity recognition data sets;
(3-1-2) constructing a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set based on the labeled Chinese named entity recognition data set obtained in the step (3-1-1).
Preferably, step (3-2) comprises in particular the following sub-steps:
(3-2-1) initially modeling each word vector in a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) by using a BiLSTM model to obtain two different forward and backward feature representations, and splicing the two feature representations to obtain a Chinese sentence feature vector corresponding to the word vector containing a context feature, wherein the Chinese sentence feature vector corresponding to the word vector set corresponding to the Chinese sentence forms a Chinese sentence feature vector set h= { H 1,h2,…,hm } corresponding to the Chinese sentence, wherein M e [1 ], and the total number of characters M in the Chinese sentence;
(3-2-2) constructing a word-character interaction diagram G= (V, E) corresponding to each Chinese sentence by utilizing the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1).
V is a node set, wherein the node set comprises all characters and self-matching words in a word vector set corresponding to a Chinese sentence; e is an edge set, wherein the edge comprises a connection relation between characters in a character vector set, a containing relation between the characters and self-matching words and a connection relation between the self-matching words;
(3-2-3) obtaining a word information fusion correlation coefficient matrix e corresponding to the Chinese sentence according to the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) and the word-character interaction graph G corresponding to the Chinese sentence constructed in the step (3-2-2).
(3-2-4) Carrying out normalization processing on each element e (h i,hj) in the word information fusion correlation coefficient matrix e corresponding to the Chinese sentence obtained in the step (3-2-3) so as to obtain an attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-5) obtaining a feature vector K i of each node in the word-character interaction diagram G corresponding to the Chinese sentence based on the attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-4) by adopting a cardinal-reserved graph attention network calculation method, wherein the feature vectors K i of all nodes in the word-character interaction diagram G corresponding to the Chinese sentence form a feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-6) carrying out weighted summation on the feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-5) and the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) to obtain a final feature vector R=W 1H+W2 K corresponding to the Chinese sentence, wherein W 1 and W 2 are trainable matrixes.
Preferably, the feature vector h m of the Chinese sentence corresponding to the mth word vector in the Chinese sentence in step (3-2-1) is given by:
Wherein, The hidden layer output at time t representing the forward LSTM,Hidden layer output at time t, which represents reverse LSTM, h m representsAndX m represents the mth word vector in the Chinese sentence;
The element of the j-th column of the i-th row in the word information fusion correlation coefficient matrix e in the step (3-2-3), namely the word information fusion correlation coefficient e (h i,hj) between the node i and the node i in the word-character interaction diagram G, is given by the following formula:
e(hi,hj)=aTLeaky ReLU(Whi||Whj)
wherein the Leaky ReLU is an activation function, a and W are both learnable parameter matrices, i and j are both E [1 ], the total number of nodes N in the word-character interaction graph G ].
The attention coefficient alpha ij between the node i and the node j in the graph G in the step (3-2-4) is normalized by adopting a soft max normalization function:
αij=soft max(e(hi,hj))
Preferably, the feature vector k i of the i-th node in the word-character interaction graph G in step (3-2-5) is calculated using the following formula:
where N represents the total number of nodes in the word-character interaction graph G, w is a matrix of learnable parameters, and where, by weight, k j represents the feature vector of the j-th node in the word-character interaction graph G.
Preferably, step (3-3) comprises in particular the following sub-steps:
(3-3-1) decoding the final feature vector R corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-2) by adopting CRF so as to obtain a labeling result corresponding to the Chinese sentence;
(3-3-2) calculating a loss function of a Chinese named entity recognition model based on the graph attention network according to the labeling result Y of each entity of the Chinese sentence obtained in the step (3-3-1), and iterating the training model to obtain a trained Chinese named entity recognition model based on the graph attention network.
Preferably, for the final feature vector R corresponding to each chinese sentence, the entity labeling result obtained after decoding is y= { Y 1,y2,…,ym }, and the probability P (Y m|sm) that the labeling result is Y m, where Y m represents the labeling result of the mth character in the chinese sentence;
The training process of the model in step (3-3-2) optimizes the model using L2 regularization to minimize log likelihood loss, the loss function being defined as:
where γ is the regularization parameter of L2, preferably a value of 0.5, and θ is a parameter of all trainable sets.
According to another aspect of the present invention, there is provided a chinese named entity recognition system based on a graph attention network, comprising:
the first module is used for acquiring a Chinese sentence to be identified by a Chinese named entity.
And the second module is used for constructing a word vector set X corresponding to the Chinese sentence based on the Chinese sentence obtained by the first module.
And the third module is used for inputting the word vector set X corresponding to the Chinese sentence obtained by the second module into a trained Chinese named entity recognition model based on the graph attention network so as to obtain a Chinese named entity label corresponding to the Chinese sentence.
In general, the above technical solutions conceived by the present invention, compared with the prior art, enable the following beneficial effects to be obtained:
1. the invention adopts the step (3-1) which adopts the marked Chinese naming entity to identify the data set, thereby solving the technical problem that the word boundary and the entity boundary are inconsistent in the existing BiLSTM-CRF model.
2. The invention adopts the step (3-1) and the step (3-2), which combine word segmentation characteristics and character characteristics of Chinese sentences, thereby solving the technical problem of single model input characteristics in the existing BiLSTM-CRF model.
3. The invention adopts the step (3-2) to calculate the graph attention by adopting the graph attention network calculation method with reserved base numbers, so that the problem that the traditional graph attention calculation method impairs the graph attention expression capability in the existing collaborative graph network model based on the graph attention network can be solved.
Drawings
FIG. 1 is a flow chart of a method for identifying Chinese named entities based on a graph attention network according to the present invention;
FIG. 2 is a schematic diagram of the operation of the Chinese named entity recognition model based on the graph attention network of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
As shown in fig. 1, the invention provides a Chinese named entity recognition method based on a graph attention network, which comprises the following steps:
(1) And obtaining a Chinese sentence to be identified by the Chinese named entity.
(2) And (3) constructing a word vector set X corresponding to the Chinese sentence based on the Chinese sentence obtained in the step (1).
Specifically, the step is to represent the Chinese sentence as a character sequence s= { s 1,s2,…,sm }, where s m represents the mth character in the Chinese sentence, where m∈ [1 ], the total number of characters M in the Chinese sentence; then, for each character in the character sequence, the character is represented as a Word vector X m=f(sm by searching a character embedding matrix, and all Word vectors form a Word vector set X corresponding to the Chinese sentence, wherein f is a character embedding lookup table, and the character embedding lookup table is trained by a Continuous Bag-of-Word Model (CBOW).
(3) Inputting the word vector set X corresponding to the Chinese sentence obtained in the step (2) into a trained Chinese named entity recognition model (shown in figure 2) based on a graph attention network to obtain a Chinese named entity label corresponding to the Chinese sentence.
In the step (3), the Chinese named entity recognition model based on the graph attention network is obtained through training the following steps:
(3-1) obtaining a Chinese named entity recognition dataset marked by adopting a BIOES (B-begin, I-side, E-end, S-single, BIOES for short) marking scheme, and mapping the text of each Chinese sentence in the Chinese named entity recognition dataset into a word vector so as to obtain a word vector set corresponding to each Chinese sentence in the Chinese named entity recognition dataset.
And (3-2) inputting the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) into a Bi-directional Long Short-Term Memory (BiLSTM) model to obtain a preliminary feature vector of the word vector, and inputting the preliminary feature vector of the word vector into an improved graph annotation meaning network (Graph Attention Network, GAT) model to obtain a final feature vector (which contains richer semantic information) corresponding to the Chinese sentence.
And (3-3) inputting the final feature vector corresponding to the Chinese sentence obtained in the step (3-2) into a conditional random field model for decoding to obtain a Chinese named entity label corresponding to the Chinese sentence, calculating a loss function of a Chinese named entity recognition model based on a graph attention network by using a labeling result, and training parameters of a BiLSTM model and a GAT model to obtain a trained Chinese named entity recognition model based on the graph attention network, wherein the trained Chinese named entity recognition model comprises the BiLSTM model in the step (3-2), the GAT model and the conditional random field model in the step (3-3).
Preferably, the series of preprocessing of the named entity recognition data set of the Chinese language recited in the step (3-1) comprises the steps of:
(3-1-1) obtaining Chinese named entity recognition data sets of a plurality of fields, and labeling the Chinese named entity recognition data sets by using BIOES labeling schemes to obtain labeled Chinese named entity recognition data sets.
The chinese named entity recognition dataset includes news, social media, and chinese resume, and its actual types include GPE (geopolitical entity), LOC (location), PER (person), ORG (organization) CONT (country), and EDU (educational background).
In the BIOES notation, the first character in an entity is labeled B-X, where X is the entity type. Similarly, the last character and the inner characters in the entity are labeled E-X and I-X, respectively, S-X indicating that the word itself is an entity X, and the remaining non-entity characters are labeled O.
(3-1-2) Constructing a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set based on the labeled Chinese named entity recognition data set obtained in the step (3-1-1).
Specifically, a word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set is constructed by expressing the Chinese sentence as a character sequence s= { s 1,s2,…,sm }, wherein s m represents the mth character in the Chinese sentence, M is [ 1], and the total number of characters in the Chinese sentence is M ]; then, for each character in the character sequence, the character is represented as a Word vector X m=f(sm by looking up a character embedding matrix, and the Word vectors corresponding to all the characters in the Chinese sentence form a Word vector set X corresponding to the Chinese sentence, wherein f is a character embedding lookup table, and the character embedding lookup table is trained by a Continuous Bag-of-Word Model (CBOW).
Preferably, step (3-2) comprises in particular the following sub-steps:
(3-2-1) for each word vector in the set of word vectors X corresponding to each chinese sentence in the chinese named entity recognition data set obtained in step (3-1), initially modeling the word vector using BiLSTM model to obtain two different forward and backward feature representations, and stitching the two feature representations to obtain a chinese sentence feature vector corresponding to the word vector containing the context feature, where the chinese sentence feature vector corresponding to the set of word vectors corresponding to the chinese sentence constitutes a set of chinese sentence feature vectors h= { H 1,h2,…,hm } corresponding to the chinese sentence, where M e 1, and the total number M of characters in the chinese sentence.
The feature vector h m of the Chinese sentence corresponding to the mth word vector in the Chinese sentence is given by:
Wherein, The hidden layer output at time t representing the forward LSTM,Hidden layer output at time t, which represents reverse LSTM, h m representsAndX m represents the mth word vector in the chinese sentence.
(3-2-2) Constructing a word-character interaction diagram G= (V, E) corresponding to each Chinese sentence by utilizing the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1).
Wherein V is a node set, the node set comprises all characters and self-matching words in a character vector set corresponding to the Chinese sentence (namely, each word segmentation of the Chinese sentence, the word segmentation result of the Chinese sentence can be directly obtained by the Chinese named entity identification data set marked by the BIOES marking scheme); e is an edge set, and the edges comprise connection relations among characters, inclusion relations among the characters and self-matching words and connection relations among the self-matching words in the character vector set.
(3-2-3) Obtaining a word information fusion correlation coefficient matrix e corresponding to the Chinese sentence according to the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) and the word-character interaction graph G corresponding to the Chinese sentence constructed in the step (3-2-2).
Specifically, the element of the ith row and jth column in the word information fusion correlation coefficient matrix e, namely the word information fusion correlation coefficient e (h i,hj) between the node i and the node i in the word-character interaction diagram G, is given by the following formula:
e(hi,hj)=aTLeaky ReLU(Whi||Whj)
wherein the Leaky ReLU is an activation function, a and W are both learnable parameter matrices, i and j are both E [1 ], the total number of nodes N in the word-character interaction graph G ].
(3-2-4) Carrying out normalization processing on each element e (h i,hj) in the word information fusion correlation coefficient matrix e corresponding to the Chinese sentence obtained in the step (3-2-3) so as to obtain an attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence;
The method for normalizing attention coefficient alpha ij between node i and node j in the graph G by adopting soft max normalization function comprises the following steps:
αij=soft max(e(hi,hj))
(3-2-5) obtaining a feature vector K i of each node in the word-character interaction diagram G corresponding to the Chinese sentence based on the attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-4) by adopting a cardinal-reserved graph attention network calculation method, wherein the feature vectors K i of all nodes in the word-character interaction diagram G corresponding to the Chinese sentence form a feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence;
specifically, the method for calculating the graph meaning network with the reserved base in the step can be specifically described in Improving Attention MECHANISM IN GRAPH Neural Networks VIA CARDINALITY Preservation, page 4 of the document written in Shuo Zhang.
The feature vector k i of the ith node in the word-character interaction graph G is calculated using the following formula:
where N represents the total number of nodes in the word-character interaction graph G, w is a matrix of learnable parameters, and where, by weight, k j represents the feature vector of the j-th node in the word-character interaction graph G.
(3-2-6) Carrying out weighted summation on the feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-5) and the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) to obtain a final feature vector R=W 1H+W2 K corresponding to the Chinese sentence, wherein W 1 and W 2 are trainable matrixes;
specifically, the final feature vector R is a feature vector of a chinese sentence containing more abundant semantic information, which is an input vector as a conditional random field model.
Preferably, step (3-3) comprises in particular the following sub-steps:
And (3-3-1) decoding the final feature vector R corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-2) by adopting a conditional random field model (Conditional random field, abbreviated as CRF) so as to obtain a labeling result corresponding to the Chinese sentence.
For the final feature vector R corresponding to each chinese sentence, the entity labeling result of the decoded chinese sentence is y= { Y 1,y2,…,ym }, and the probability P (Y m|sm) that the labeling result is Y m, where Y m represents the labeling result of the mth character in the chinese sentence, s m represents the mth character in the chinese sentence, M e 1, and the total number M of characters in the chinese sentence.
(3-3-2) Calculating a loss function of a Chinese named entity recognition model based on the graph attention network according to the labeling result Y of each entity of the Chinese sentence obtained in the step (3-3-1), and iterating the training model to obtain a trained Chinese named entity recognition model based on the graph attention network.
The training process of the model optimizes the model by adopting L2 regularization to minimize log likelihood loss, and a loss function is defined as:
Where γ is the regularization parameter of L2, preferably with a value of 0.5, θ is the parameter of all trainable sets, corresponding to all trainable parameters and matrices mentioned in the above process.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (7)
1. The Chinese named entity recognition method based on the graph attention network is characterized by comprising the following steps of:
(1) Acquiring a Chinese sentence to be identified by a Chinese named entity;
(2) Constructing a word vector set X corresponding to the Chinese sentence based on the Chinese sentence obtained in the step (1);
(3) Inputting the word vector set X corresponding to the Chinese sentence obtained in the step (2) into a trained Chinese named entity recognition model based on a graph attention network to obtain a Chinese named entity label corresponding to the Chinese sentence; the Chinese named entity recognition model based on the graph attention network in the step (3) is obtained through training the following steps:
(3-1) acquiring a Chinese named entity recognition data set marked by adopting a BIOES marking scheme, and mapping the text of each Chinese sentence in the Chinese named entity recognition data set into a word vector so as to obtain a word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set; step (3-1) comprises the steps of:
(3-1-1) obtaining Chinese named entity recognition data sets of a plurality of fields, and labeling the Chinese named entity recognition data sets by using BIOES labeling schemes to obtain labeled Chinese named entity recognition data sets;
(3-1-2) constructing a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set based on the labeled Chinese named entity recognition data set obtained in the step (3-1-1);
(3-2) inputting the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) into a bidirectional long-short-term memory BiLSTM model to obtain a preliminary feature vector of the word vector, and inputting the preliminary feature vector of the word vector into an improved graph-meaning network GAT model to obtain a final feature vector corresponding to the Chinese sentence; the step (3-2) specifically comprises the following substeps:
(3-2-1) initially modeling each word vector in a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) by using a BiLSTM model to obtain two different forward and backward feature representations, and splicing the two feature representations to obtain a Chinese sentence feature vector corresponding to the word vector containing a context feature, wherein the Chinese sentence feature vector corresponding to the word vector set corresponding to the Chinese sentence forms a Chinese sentence feature vector set h= { H 1,h2,…,hm } corresponding to the Chinese sentence, wherein M e [1 ], and the total number of characters M in the Chinese sentence;
(3-2-2) constructing a word-character interaction diagram G= (V, E) corresponding to each Chinese sentence by utilizing the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1);
V is a node set, wherein the node set comprises all characters and self-matching words in a word vector set corresponding to a Chinese sentence; e is an edge set, wherein the edge comprises a connection relation between characters in a character vector set, a containing relation between the characters and self-matching words and a connection relation between the self-matching words;
(3-2-3) obtaining a word information fusion correlation coefficient matrix e corresponding to the Chinese sentence according to the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) and the word-character interaction graph G corresponding to the Chinese sentence constructed in the step (3-2-2);
(3-2-4) carrying out normalization processing on each element e (h i,hj) in the word information fusion correlation coefficient matrix e corresponding to the Chinese sentence obtained in the step (3-2-3) so as to obtain an attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-5) obtaining a feature vector K i of each node in the word-character interaction diagram G corresponding to the Chinese sentence based on the attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-4) by adopting a cardinal-reserved graph attention network calculation method, wherein the feature vectors K i of all nodes in the word-character interaction diagram G corresponding to the Chinese sentence form a feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-6) carrying out weighted summation on the feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-5) and the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) to obtain a final feature vector R=W 1H+W2 K corresponding to the Chinese sentence, wherein W 1 and W 2 are trainable matrixes;
And (3-3) inputting the final feature vector corresponding to the Chinese sentence obtained in the step (3-2) into a conditional random field model for decoding to obtain a Chinese named entity label corresponding to the Chinese sentence, calculating a loss function of a Chinese named entity recognition model based on a graph attention network by using a labeling result, and training parameters of a BiLSTM model and a GAT model to obtain a trained Chinese named entity recognition model based on the graph attention network, wherein the trained Chinese named entity recognition model comprises the BiLSTM model in the step (3-2), the GAT model and the conditional random field model in the step (3-3).
2. The method of claim 1, wherein step (2) first represents a chinese sentence as a sequence of characters s= { s 1,s2,…,sm }, where s m represents an mth character in the chinese sentence, where M e [1 ], the total number of characters M in the chinese sentence; then, for each character in the character sequence, the character is represented as a word vector X m=f(sm by looking up a character embedding matrix), and all word vectors form a word vector set X corresponding to the chinese sentence, where f is a character embedding lookup table, and is trained by the continuous word bag model CBOW.
3. The method for identifying Chinese named entities based on graph attention network of claim 2, wherein,
The feature vector h m of the Chinese sentence corresponding to the mth word vector in the Chinese sentence in the step (3-2-1) is given by the following formula:
Wherein, The hidden layer output at time t representing the forward LSTM,Hidden layer output at time t, which represents reverse LSTM, h m representsAndX m represents the mth word vector in the Chinese sentence;
The element of the j-th column of the i-th row in the word information fusion correlation coefficient matrix e in the step (3-2-3), namely the word information fusion correlation coefficient e (h i,hj) between the node i and the node i in the word-character interaction diagram G, is given by the following formula:
e(hi,hj)=aTLeaky ReLU(Whi||Whj)
Wherein, the Leaky ReLU is an activation function, a and W are both learnable parameter matrixes, i and j are both epsilon [1 ], and the total number of nodes N in the word-character interaction diagram G is equal to N;
The attention coefficient alpha ij between the node i and the node j in the graph G in the step (3-2-4) is normalized by adopting a soft max normalization function:
αij=soft max(e(hi,hj))。
4. a graph-attention-network-based chinese named entity recognition method of claim 3 wherein the feature vector k i of the i-th node in the word-character interaction graph G in step (3-2-5) is calculated using the following formula:
where N represents the total number of nodes in the word-character interaction graph G, w is a matrix of learnable parameters, and where, by weight, k j represents the feature vector of the j-th node in the word-character interaction graph G.
5. The method for identifying chinese named entities based on graph attention network of claim 4 wherein step (3-3) comprises the sub-steps of:
(3-3-1) decoding the final feature vector R corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-2) by adopting CRF so as to obtain a labeling result corresponding to the Chinese sentence;
(3-3-2) calculating a loss function of a Chinese named entity recognition model based on the graph attention network according to the labeling result Y of each entity of the Chinese sentence obtained in the step (3-3-1), and iterating the training model to obtain a trained Chinese named entity recognition model based on the graph attention network.
6. The method for identifying Chinese named entities based on graph attention network of claim 5, wherein,
For the final feature vector R corresponding to each Chinese sentence, obtaining an entity labeling result of the Chinese sentence as Y= { Y 1,y2,…,ym }, and a probability P (Y m∣sm) of labeling the Chinese sentence as Y m after decoding, wherein Y m represents the labeling result of the mth character in the Chinese sentence;
The training process of the model in step (3-3-2) optimizes the model using L2 regularization to minimize log likelihood loss, the loss function being defined as:
where γ is the regularization parameter of L2, preferably a value of 0.5, and θ is a parameter of all trainable sets.
7. A graph attention network-based chinese named entity recognition system, comprising:
The first module is used for acquiring a Chinese sentence to be identified by a Chinese named entity;
the second module is used for constructing a word vector set X corresponding to the Chinese sentence based on the Chinese sentence obtained by the first module;
The third module is used for inputting the word vector set X corresponding to the Chinese sentence obtained by the second module into a trained Chinese named entity recognition model based on the graph attention network so as to obtain a Chinese named entity label corresponding to the Chinese sentence; the Chinese named entity recognition model based on the graph attention network in the third module is obtained through training the following steps:
(3-1) acquiring a Chinese named entity recognition data set marked by adopting a BIOES marking scheme, and mapping the text of each Chinese sentence in the Chinese named entity recognition data set into a word vector so as to obtain a word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set; step (3-1) comprises the steps of:
(3-1-1) obtaining Chinese named entity recognition data sets of a plurality of fields, and labeling the Chinese named entity recognition data sets by using BIOES labeling schemes to obtain labeled Chinese named entity recognition data sets;
(3-1-2) constructing a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set based on the labeled Chinese named entity recognition data set obtained in the step (3-1-1);
(3-2) inputting the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) into a bidirectional long-short-term memory BiLSTM model to obtain a preliminary feature vector of the word vector, and inputting the preliminary feature vector of the word vector into an improved graph-meaning network GAT model to obtain a final feature vector corresponding to the Chinese sentence; the step (3-2) specifically comprises the following substeps:
(3-2-1) initially modeling each word vector in a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) by using a BiLSTM model to obtain two different forward and backward feature representations, and splicing the two feature representations to obtain a Chinese sentence feature vector corresponding to the word vector containing a context feature, wherein the Chinese sentence feature vector corresponding to the word vector set corresponding to the Chinese sentence forms a Chinese sentence feature vector set h= { H 1,h2,…,hm } corresponding to the Chinese sentence, wherein M e [1 ], and the total number of characters M in the Chinese sentence;
(3-2-2) constructing a word-character interaction diagram G= (V, E) corresponding to each Chinese sentence by utilizing the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1);
V is a node set, wherein the node set comprises all characters and self-matching words in a word vector set corresponding to a Chinese sentence; e is an edge set, wherein the edge comprises a connection relation between characters in a character vector set, a containing relation between the characters and self-matching words and a connection relation between the self-matching words;
(3-2-3) obtaining a word information fusion correlation coefficient matrix e corresponding to the Chinese sentence according to the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) and the word-character interaction graph G corresponding to the Chinese sentence constructed in the step (3-2-2);
(3-2-4) carrying out normalization processing on each element e (h i,hj) in the word information fusion correlation coefficient matrix e corresponding to the Chinese sentence obtained in the step (3-2-3) so as to obtain an attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-5) obtaining a feature vector K i of each node in the word-character interaction diagram G corresponding to the Chinese sentence based on the attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-4) by adopting a cardinal-reserved graph attention network calculation method, wherein the feature vectors K i of all nodes in the word-character interaction diagram G corresponding to the Chinese sentence form a feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-6) carrying out weighted summation on the feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-5) and the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) to obtain a final feature vector R=W 1H+W2 K corresponding to the Chinese sentence, wherein W 1 and W 2 are trainable matrixes;
And (3-3) inputting the final feature vector corresponding to the Chinese sentence obtained in the step (3-2) into a conditional random field model for decoding to obtain a Chinese named entity label corresponding to the Chinese sentence, calculating a loss function of a Chinese named entity recognition model based on a graph attention network by using a labeling result, and training parameters of a BiLSTM model and a GAT model to obtain a trained Chinese named entity recognition model based on the graph attention network, wherein the trained Chinese named entity recognition model comprises the BiLSTM model in the step (3-2), the GAT model and the conditional random field model in the step (3-3).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210083152.7A CN114417874B (en) | 2022-01-25 | 2022-01-25 | Chinese named entity recognition method and system based on graph attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210083152.7A CN114417874B (en) | 2022-01-25 | 2022-01-25 | Chinese named entity recognition method and system based on graph attention network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114417874A CN114417874A (en) | 2022-04-29 |
CN114417874B true CN114417874B (en) | 2024-10-15 |
Family
ID=81277360
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210083152.7A Active CN114417874B (en) | 2022-01-25 | 2022-01-25 | Chinese named entity recognition method and system based on graph attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114417874B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115130468B (en) * | 2022-05-06 | 2023-04-07 | 北京安智因生物技术有限公司 | Myocardial infarction entity recognition method based on word fusion representation and graph attention network |
CN117057350B (en) * | 2023-08-07 | 2024-05-10 | 内蒙古大学 | Chinese electronic medical record named entity recognition method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183102A (en) * | 2020-10-15 | 2021-01-05 | 上海明略人工智能(集团)有限公司 | Named entity identification method based on attention mechanism and graph attention network |
CN112711948A (en) * | 2020-12-22 | 2021-04-27 | 北京邮电大学 | Named entity recognition method and device for Chinese sentences |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109871545B (en) * | 2019-04-22 | 2022-08-05 | 京东方科技集团股份有限公司 | Named entity identification method and device |
CN113010683B (en) * | 2020-08-26 | 2022-11-29 | 齐鲁工业大学 | Entity relationship identification method and system based on improved graph attention network |
-
2022
- 2022-01-25 CN CN202210083152.7A patent/CN114417874B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183102A (en) * | 2020-10-15 | 2021-01-05 | 上海明略人工智能(集团)有限公司 | Named entity identification method based on attention mechanism and graph attention network |
CN112711948A (en) * | 2020-12-22 | 2021-04-27 | 北京邮电大学 | Named entity recognition method and device for Chinese sentences |
Also Published As
Publication number | Publication date |
---|---|
CN114417874A (en) | 2022-04-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110457675B (en) | Predictive model training method and device, storage medium and computer equipment | |
CN111783462B (en) | Chinese named entity recognition model and method based on double neural network fusion | |
CN111985239B (en) | Entity identification method, entity identification device, electronic equipment and storage medium | |
CN111966917B (en) | Event detection and summarization method based on pre-training language model | |
CN111310471B (en) | Travel named entity identification method based on BBLC model | |
CN107729309B (en) | Deep learning-based Chinese semantic analysis method and device | |
CN110263325B (en) | Chinese word segmentation system | |
CN108932226A (en) | A kind of pair of method without punctuate text addition punctuation mark | |
CN110008469A (en) | A kind of multi-level name entity recognition method | |
CN111274829B (en) | Sequence labeling method utilizing cross-language information | |
CN112131883B (en) | Language model training method, device, computer equipment and storage medium | |
CN110502742B (en) | Complex entity extraction method, device, medium and system | |
CN114417874B (en) | Chinese named entity recognition method and system based on graph attention network | |
CN112800239B (en) | Training method of intention recognition model, and intention recognition method and device | |
CN112699685B (en) | Named entity recognition method based on label-guided word fusion | |
CN115600597A (en) | Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium | |
CN113051922A (en) | Triple extraction method and system based on deep learning | |
CN111930931A (en) | Abstract evaluation method and device | |
CN116796730A (en) | Text error correction method, device, equipment and storage medium based on artificial intelligence | |
CN113065349A (en) | Named entity recognition method based on conditional random field | |
US11966700B2 (en) | Neural tagger with deep multi-level model | |
CN112699684B (en) | Named entity recognition method and device, computer readable storage medium and processor | |
CN114417891A (en) | Reply sentence determination method and device based on rough semantics and electronic equipment | |
CN114298047A (en) | Chinese named entity recognition method and system based on stroke volume and word vector | |
Wu et al. | A Text Emotion Analysis Method Using the Dual‐Channel Convolution Neural Network in Social Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |