[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114417874B - Chinese named entity recognition method and system based on graph attention network - Google Patents

Chinese named entity recognition method and system based on graph attention network Download PDF

Info

Publication number
CN114417874B
CN114417874B CN202210083152.7A CN202210083152A CN114417874B CN 114417874 B CN114417874 B CN 114417874B CN 202210083152 A CN202210083152 A CN 202210083152A CN 114417874 B CN114417874 B CN 114417874B
Authority
CN
China
Prior art keywords
chinese
chinese sentence
word
named entity
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210083152.7A
Other languages
Chinese (zh)
Other versions
CN114417874A (en
Inventor
唐卓
王啸
李肯立
伍祚瑶
李虹宇
向婷
罗文明
程欣威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202210083152.7A priority Critical patent/CN114417874B/en
Publication of CN114417874A publication Critical patent/CN114417874A/en
Application granted granted Critical
Publication of CN114417874B publication Critical patent/CN114417874B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Character Discrimination (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a Chinese named entity recognition method based on a graph attention network, which comprises the following steps: obtaining a Chinese sentence to be identified by a Chinese named entity, constructing a word vector set X corresponding to the Chinese sentence based on the obtained Chinese sentence, and inputting the word vector set X corresponding to the obtained Chinese sentence into a trained Chinese named entity identification model based on a graph attention network to obtain a Chinese named entity label corresponding to the Chinese sentence. The invention can solve the technical problems that word boundaries and entity boundaries are inconsistent and model input characteristics are single in the existing BiLSTM-CRF model and the traditional graph attention computing method in the existing collaborative graph network model based on the graph attention network damages the graph attention expression capability.

Description

Chinese named entity recognition method and system based on graph attention network
Technical Field
The invention belongs to the technical field of entity identification, and particularly relates to a Chinese named entity identification method and system based on a graph attention network.
Background
Named Entity Recognition (NER) is a fundamental problem of natural language processing, and is the first step of a series of downstream tasks, such as relation extraction, knowledge graph construction, intent detection, and the like. The main goal of NER is to identify entities in unstructured text that have a specific meaning, mainly including names of people, places, institutions, proper nouns, etc., as well as words such as time, quantity, currency, scale values, etc.
Early named entity recognition was rolled out around rule-based and dictionary-based methods, but these methods are inefficient, cost-prohibitive, and require a lot of expertise, and models that perform better on NER are currently based on deep learning or statistical learning methods. Of these BiLSTM-CRF is a widely used architecture for the NER in english, which uses word-level representation and takes words as the basic unit of predictive labels, and chinese named entities are more difficult to recognize than the NER in english, thus deriving the use of word segmentation tools first and then execution of a word sequence-based markup model like english. In addition, the collaborative graph network model based on the graph attention network introduces the graph attention mechanism into NER for the first time, and lexical knowledge such as self-matching lexicon and recent context lexicon is integrated into the coding layer, so that the effect of identifying the named entities is further improved.
The traditional BiLSTM-CRF model still has the following problems after being improved by adding a Chinese word segmentation tool, and the performance of the model is poor: the first, word boundary is not necessarily an entity boundary: for example, "Beijing hometown museum" should be the physical location type as a whole, but would be divided into three words, namely, "Beijing", "hometown" and "museum", by the word segmentation tool; second, although Chinese word segmentation has advanced greatly due to the introduction of neural networks, existing models are far from perfect, and the characteristics considered by the models are single, which necessarily results in error propagation.
Collaborative graph network model based on graph attention network when using traditional graph attention network to calculate attention, static attention is obtained due to continuous linear calculation, and the expression capability of graph attention is damaged.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a Chinese naming entity identification method and a Chinese naming entity identification system based on a graph attention network, which aim to solve the technical problems that word boundaries and entity boundaries are inconsistent and model input characteristics are single in the existing BiLSTM-CRF model and the traditional graph attention calculation method in the existing collaborative graph network model based on the graph attention network damages the graph attention expression capability.
To achieve the above object, according to one aspect of the present invention, there is provided a method for identifying chinese named entities based on a graph attention network, comprising the steps of:
(1) And obtaining a Chinese sentence to be identified by the Chinese named entity.
(2) And (3) constructing a word vector set X corresponding to the Chinese sentence based on the Chinese sentence obtained in the step (1).
(3) Inputting the word vector set X corresponding to the Chinese sentence obtained in the step (2) into a trained Chinese named entity recognition model based on a graph attention network to obtain a Chinese named entity label corresponding to the Chinese sentence.
Preferably, step (2) first represents the chinese sentence as a sequence of characters s= { s 1,s2,…,sm }, where s m represents the mth character in the chinese sentence, where M e [1 ], the total number of characters M in the chinese sentence; then, for each character in the character sequence, the character is represented as a word vector X m=f(sm by looking up a character embedding matrix), and all word vectors form a word vector set X corresponding to the chinese sentence, where f is a character embedding lookup table, and is trained by the continuous word bag model CBOW.
Preferably, the Chinese named entity recognition model based on the graph attention network in the step (3) is obtained through training by the following steps:
And (3-1) acquiring a Chinese named entity recognition data set marked by adopting a BIOES marking scheme, and mapping the text of each Chinese sentence in the Chinese named entity recognition data set into a word vector to obtain a word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set.
And (3-2) inputting the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) into a bidirectional long-short-time memory BiLSTM model to obtain a preliminary feature vector of the word vector, and inputting the preliminary feature vector of the word vector into an improved graph and note force network GAT model to obtain a final feature vector corresponding to the Chinese sentence.
And (3-3) inputting the final feature vector corresponding to the Chinese sentence obtained in the step (3-2) into a conditional random field model for decoding to obtain a Chinese named entity label corresponding to the Chinese sentence, calculating a loss function of a Chinese named entity recognition model based on a graph attention network by using a labeling result, and training parameters of a BiLSTM model and a GAT model to obtain a trained Chinese named entity recognition model based on the graph attention network, wherein the trained Chinese named entity recognition model comprises the BiLSTM model in the step (3-2), the GAT model and the conditional random field model in the step (3-3).
Preferably, step (3-1) comprises the steps of:
(3-1-1) obtaining Chinese named entity recognition data sets of a plurality of fields, and labeling the Chinese named entity recognition data sets by using BIOES labeling schemes to obtain labeled Chinese named entity recognition data sets;
(3-1-2) constructing a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set based on the labeled Chinese named entity recognition data set obtained in the step (3-1-1).
Preferably, step (3-2) comprises in particular the following sub-steps:
(3-2-1) initially modeling each word vector in a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) by using a BiLSTM model to obtain two different forward and backward feature representations, and splicing the two feature representations to obtain a Chinese sentence feature vector corresponding to the word vector containing a context feature, wherein the Chinese sentence feature vector corresponding to the word vector set corresponding to the Chinese sentence forms a Chinese sentence feature vector set h= { H 1,h2,…,hm } corresponding to the Chinese sentence, wherein M e [1 ], and the total number of characters M in the Chinese sentence;
(3-2-2) constructing a word-character interaction diagram G= (V, E) corresponding to each Chinese sentence by utilizing the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1).
V is a node set, wherein the node set comprises all characters and self-matching words in a word vector set corresponding to a Chinese sentence; e is an edge set, wherein the edge comprises a connection relation between characters in a character vector set, a containing relation between the characters and self-matching words and a connection relation between the self-matching words;
(3-2-3) obtaining a word information fusion correlation coefficient matrix e corresponding to the Chinese sentence according to the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) and the word-character interaction graph G corresponding to the Chinese sentence constructed in the step (3-2-2).
(3-2-4) Carrying out normalization processing on each element e (h i,hj) in the word information fusion correlation coefficient matrix e corresponding to the Chinese sentence obtained in the step (3-2-3) so as to obtain an attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-5) obtaining a feature vector K i of each node in the word-character interaction diagram G corresponding to the Chinese sentence based on the attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-4) by adopting a cardinal-reserved graph attention network calculation method, wherein the feature vectors K i of all nodes in the word-character interaction diagram G corresponding to the Chinese sentence form a feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-6) carrying out weighted summation on the feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-5) and the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) to obtain a final feature vector R=W 1H+W2 K corresponding to the Chinese sentence, wherein W 1 and W 2 are trainable matrixes.
Preferably, the feature vector h m of the Chinese sentence corresponding to the mth word vector in the Chinese sentence in step (3-2-1) is given by:
Wherein, The hidden layer output at time t representing the forward LSTM,Hidden layer output at time t, which represents reverse LSTM, h m representsAndX m represents the mth word vector in the Chinese sentence;
The element of the j-th column of the i-th row in the word information fusion correlation coefficient matrix e in the step (3-2-3), namely the word information fusion correlation coefficient e (h i,hj) between the node i and the node i in the word-character interaction diagram G, is given by the following formula:
e(hi,hj)=aTLeaky ReLU(Whi||Whj)
wherein the Leaky ReLU is an activation function, a and W are both learnable parameter matrices, i and j are both E [1 ], the total number of nodes N in the word-character interaction graph G ].
The attention coefficient alpha ij between the node i and the node j in the graph G in the step (3-2-4) is normalized by adopting a soft max normalization function:
αij=soft max(e(hi,hj))
Preferably, the feature vector k i of the i-th node in the word-character interaction graph G in step (3-2-5) is calculated using the following formula:
where N represents the total number of nodes in the word-character interaction graph G, w is a matrix of learnable parameters, and where, by weight, k j represents the feature vector of the j-th node in the word-character interaction graph G.
Preferably, step (3-3) comprises in particular the following sub-steps:
(3-3-1) decoding the final feature vector R corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-2) by adopting CRF so as to obtain a labeling result corresponding to the Chinese sentence;
(3-3-2) calculating a loss function of a Chinese named entity recognition model based on the graph attention network according to the labeling result Y of each entity of the Chinese sentence obtained in the step (3-3-1), and iterating the training model to obtain a trained Chinese named entity recognition model based on the graph attention network.
Preferably, for the final feature vector R corresponding to each chinese sentence, the entity labeling result obtained after decoding is y= { Y 1,y2,…,ym }, and the probability P (Y m|sm) that the labeling result is Y m, where Y m represents the labeling result of the mth character in the chinese sentence;
The training process of the model in step (3-3-2) optimizes the model using L2 regularization to minimize log likelihood loss, the loss function being defined as:
where γ is the regularization parameter of L2, preferably a value of 0.5, and θ is a parameter of all trainable sets.
According to another aspect of the present invention, there is provided a chinese named entity recognition system based on a graph attention network, comprising:
the first module is used for acquiring a Chinese sentence to be identified by a Chinese named entity.
And the second module is used for constructing a word vector set X corresponding to the Chinese sentence based on the Chinese sentence obtained by the first module.
And the third module is used for inputting the word vector set X corresponding to the Chinese sentence obtained by the second module into a trained Chinese named entity recognition model based on the graph attention network so as to obtain a Chinese named entity label corresponding to the Chinese sentence.
In general, the above technical solutions conceived by the present invention, compared with the prior art, enable the following beneficial effects to be obtained:
1. the invention adopts the step (3-1) which adopts the marked Chinese naming entity to identify the data set, thereby solving the technical problem that the word boundary and the entity boundary are inconsistent in the existing BiLSTM-CRF model.
2. The invention adopts the step (3-1) and the step (3-2), which combine word segmentation characteristics and character characteristics of Chinese sentences, thereby solving the technical problem of single model input characteristics in the existing BiLSTM-CRF model.
3. The invention adopts the step (3-2) to calculate the graph attention by adopting the graph attention network calculation method with reserved base numbers, so that the problem that the traditional graph attention calculation method impairs the graph attention expression capability in the existing collaborative graph network model based on the graph attention network can be solved.
Drawings
FIG. 1 is a flow chart of a method for identifying Chinese named entities based on a graph attention network according to the present invention;
FIG. 2 is a schematic diagram of the operation of the Chinese named entity recognition model based on the graph attention network of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
As shown in fig. 1, the invention provides a Chinese named entity recognition method based on a graph attention network, which comprises the following steps:
(1) And obtaining a Chinese sentence to be identified by the Chinese named entity.
(2) And (3) constructing a word vector set X corresponding to the Chinese sentence based on the Chinese sentence obtained in the step (1).
Specifically, the step is to represent the Chinese sentence as a character sequence s= { s 1,s2,…,sm }, where s m represents the mth character in the Chinese sentence, where m∈ [1 ], the total number of characters M in the Chinese sentence; then, for each character in the character sequence, the character is represented as a Word vector X m=f(sm by searching a character embedding matrix, and all Word vectors form a Word vector set X corresponding to the Chinese sentence, wherein f is a character embedding lookup table, and the character embedding lookup table is trained by a Continuous Bag-of-Word Model (CBOW).
(3) Inputting the word vector set X corresponding to the Chinese sentence obtained in the step (2) into a trained Chinese named entity recognition model (shown in figure 2) based on a graph attention network to obtain a Chinese named entity label corresponding to the Chinese sentence.
In the step (3), the Chinese named entity recognition model based on the graph attention network is obtained through training the following steps:
(3-1) obtaining a Chinese named entity recognition dataset marked by adopting a BIOES (B-begin, I-side, E-end, S-single, BIOES for short) marking scheme, and mapping the text of each Chinese sentence in the Chinese named entity recognition dataset into a word vector so as to obtain a word vector set corresponding to each Chinese sentence in the Chinese named entity recognition dataset.
And (3-2) inputting the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) into a Bi-directional Long Short-Term Memory (BiLSTM) model to obtain a preliminary feature vector of the word vector, and inputting the preliminary feature vector of the word vector into an improved graph annotation meaning network (Graph Attention Network, GAT) model to obtain a final feature vector (which contains richer semantic information) corresponding to the Chinese sentence.
And (3-3) inputting the final feature vector corresponding to the Chinese sentence obtained in the step (3-2) into a conditional random field model for decoding to obtain a Chinese named entity label corresponding to the Chinese sentence, calculating a loss function of a Chinese named entity recognition model based on a graph attention network by using a labeling result, and training parameters of a BiLSTM model and a GAT model to obtain a trained Chinese named entity recognition model based on the graph attention network, wherein the trained Chinese named entity recognition model comprises the BiLSTM model in the step (3-2), the GAT model and the conditional random field model in the step (3-3).
Preferably, the series of preprocessing of the named entity recognition data set of the Chinese language recited in the step (3-1) comprises the steps of:
(3-1-1) obtaining Chinese named entity recognition data sets of a plurality of fields, and labeling the Chinese named entity recognition data sets by using BIOES labeling schemes to obtain labeled Chinese named entity recognition data sets.
The chinese named entity recognition dataset includes news, social media, and chinese resume, and its actual types include GPE (geopolitical entity), LOC (location), PER (person), ORG (organization) CONT (country), and EDU (educational background).
In the BIOES notation, the first character in an entity is labeled B-X, where X is the entity type. Similarly, the last character and the inner characters in the entity are labeled E-X and I-X, respectively, S-X indicating that the word itself is an entity X, and the remaining non-entity characters are labeled O.
(3-1-2) Constructing a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set based on the labeled Chinese named entity recognition data set obtained in the step (3-1-1).
Specifically, a word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set is constructed by expressing the Chinese sentence as a character sequence s= { s 1,s2,…,sm }, wherein s m represents the mth character in the Chinese sentence, M is [ 1], and the total number of characters in the Chinese sentence is M ]; then, for each character in the character sequence, the character is represented as a Word vector X m=f(sm by looking up a character embedding matrix, and the Word vectors corresponding to all the characters in the Chinese sentence form a Word vector set X corresponding to the Chinese sentence, wherein f is a character embedding lookup table, and the character embedding lookup table is trained by a Continuous Bag-of-Word Model (CBOW).
Preferably, step (3-2) comprises in particular the following sub-steps:
(3-2-1) for each word vector in the set of word vectors X corresponding to each chinese sentence in the chinese named entity recognition data set obtained in step (3-1), initially modeling the word vector using BiLSTM model to obtain two different forward and backward feature representations, and stitching the two feature representations to obtain a chinese sentence feature vector corresponding to the word vector containing the context feature, where the chinese sentence feature vector corresponding to the set of word vectors corresponding to the chinese sentence constitutes a set of chinese sentence feature vectors h= { H 1,h2,…,hm } corresponding to the chinese sentence, where M e 1, and the total number M of characters in the chinese sentence.
The feature vector h m of the Chinese sentence corresponding to the mth word vector in the Chinese sentence is given by:
Wherein, The hidden layer output at time t representing the forward LSTM,Hidden layer output at time t, which represents reverse LSTM, h m representsAndX m represents the mth word vector in the chinese sentence.
(3-2-2) Constructing a word-character interaction diagram G= (V, E) corresponding to each Chinese sentence by utilizing the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1).
Wherein V is a node set, the node set comprises all characters and self-matching words in a character vector set corresponding to the Chinese sentence (namely, each word segmentation of the Chinese sentence, the word segmentation result of the Chinese sentence can be directly obtained by the Chinese named entity identification data set marked by the BIOES marking scheme); e is an edge set, and the edges comprise connection relations among characters, inclusion relations among the characters and self-matching words and connection relations among the self-matching words in the character vector set.
(3-2-3) Obtaining a word information fusion correlation coefficient matrix e corresponding to the Chinese sentence according to the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) and the word-character interaction graph G corresponding to the Chinese sentence constructed in the step (3-2-2).
Specifically, the element of the ith row and jth column in the word information fusion correlation coefficient matrix e, namely the word information fusion correlation coefficient e (h i,hj) between the node i and the node i in the word-character interaction diagram G, is given by the following formula:
e(hi,hj)=aTLeaky ReLU(Whi||Whj)
wherein the Leaky ReLU is an activation function, a and W are both learnable parameter matrices, i and j are both E [1 ], the total number of nodes N in the word-character interaction graph G ].
(3-2-4) Carrying out normalization processing on each element e (h i,hj) in the word information fusion correlation coefficient matrix e corresponding to the Chinese sentence obtained in the step (3-2-3) so as to obtain an attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence;
The method for normalizing attention coefficient alpha ij between node i and node j in the graph G by adopting soft max normalization function comprises the following steps:
αij=soft max(e(hi,hj))
(3-2-5) obtaining a feature vector K i of each node in the word-character interaction diagram G corresponding to the Chinese sentence based on the attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-4) by adopting a cardinal-reserved graph attention network calculation method, wherein the feature vectors K i of all nodes in the word-character interaction diagram G corresponding to the Chinese sentence form a feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence;
specifically, the method for calculating the graph meaning network with the reserved base in the step can be specifically described in Improving Attention MECHANISM IN GRAPH Neural Networks VIA CARDINALITY Preservation, page 4 of the document written in Shuo Zhang.
The feature vector k i of the ith node in the word-character interaction graph G is calculated using the following formula:
where N represents the total number of nodes in the word-character interaction graph G, w is a matrix of learnable parameters, and where, by weight, k j represents the feature vector of the j-th node in the word-character interaction graph G.
(3-2-6) Carrying out weighted summation on the feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-5) and the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) to obtain a final feature vector R=W 1H+W2 K corresponding to the Chinese sentence, wherein W 1 and W 2 are trainable matrixes;
specifically, the final feature vector R is a feature vector of a chinese sentence containing more abundant semantic information, which is an input vector as a conditional random field model.
Preferably, step (3-3) comprises in particular the following sub-steps:
And (3-3-1) decoding the final feature vector R corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-2) by adopting a conditional random field model (Conditional random field, abbreviated as CRF) so as to obtain a labeling result corresponding to the Chinese sentence.
For the final feature vector R corresponding to each chinese sentence, the entity labeling result of the decoded chinese sentence is y= { Y 1,y2,…,ym }, and the probability P (Y m|sm) that the labeling result is Y m, where Y m represents the labeling result of the mth character in the chinese sentence, s m represents the mth character in the chinese sentence, M e 1, and the total number M of characters in the chinese sentence.
(3-3-2) Calculating a loss function of a Chinese named entity recognition model based on the graph attention network according to the labeling result Y of each entity of the Chinese sentence obtained in the step (3-3-1), and iterating the training model to obtain a trained Chinese named entity recognition model based on the graph attention network.
The training process of the model optimizes the model by adopting L2 regularization to minimize log likelihood loss, and a loss function is defined as:
Where γ is the regularization parameter of L2, preferably with a value of 0.5, θ is the parameter of all trainable sets, corresponding to all trainable parameters and matrices mentioned in the above process.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (7)

1. The Chinese named entity recognition method based on the graph attention network is characterized by comprising the following steps of:
(1) Acquiring a Chinese sentence to be identified by a Chinese named entity;
(2) Constructing a word vector set X corresponding to the Chinese sentence based on the Chinese sentence obtained in the step (1);
(3) Inputting the word vector set X corresponding to the Chinese sentence obtained in the step (2) into a trained Chinese named entity recognition model based on a graph attention network to obtain a Chinese named entity label corresponding to the Chinese sentence; the Chinese named entity recognition model based on the graph attention network in the step (3) is obtained through training the following steps:
(3-1) acquiring a Chinese named entity recognition data set marked by adopting a BIOES marking scheme, and mapping the text of each Chinese sentence in the Chinese named entity recognition data set into a word vector so as to obtain a word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set; step (3-1) comprises the steps of:
(3-1-1) obtaining Chinese named entity recognition data sets of a plurality of fields, and labeling the Chinese named entity recognition data sets by using BIOES labeling schemes to obtain labeled Chinese named entity recognition data sets;
(3-1-2) constructing a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set based on the labeled Chinese named entity recognition data set obtained in the step (3-1-1);
(3-2) inputting the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) into a bidirectional long-short-term memory BiLSTM model to obtain a preliminary feature vector of the word vector, and inputting the preliminary feature vector of the word vector into an improved graph-meaning network GAT model to obtain a final feature vector corresponding to the Chinese sentence; the step (3-2) specifically comprises the following substeps:
(3-2-1) initially modeling each word vector in a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) by using a BiLSTM model to obtain two different forward and backward feature representations, and splicing the two feature representations to obtain a Chinese sentence feature vector corresponding to the word vector containing a context feature, wherein the Chinese sentence feature vector corresponding to the word vector set corresponding to the Chinese sentence forms a Chinese sentence feature vector set h= { H 1,h2,…,hm } corresponding to the Chinese sentence, wherein M e [1 ], and the total number of characters M in the Chinese sentence;
(3-2-2) constructing a word-character interaction diagram G= (V, E) corresponding to each Chinese sentence by utilizing the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1);
V is a node set, wherein the node set comprises all characters and self-matching words in a word vector set corresponding to a Chinese sentence; e is an edge set, wherein the edge comprises a connection relation between characters in a character vector set, a containing relation between the characters and self-matching words and a connection relation between the self-matching words;
(3-2-3) obtaining a word information fusion correlation coefficient matrix e corresponding to the Chinese sentence according to the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) and the word-character interaction graph G corresponding to the Chinese sentence constructed in the step (3-2-2);
(3-2-4) carrying out normalization processing on each element e (h i,hj) in the word information fusion correlation coefficient matrix e corresponding to the Chinese sentence obtained in the step (3-2-3) so as to obtain an attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-5) obtaining a feature vector K i of each node in the word-character interaction diagram G corresponding to the Chinese sentence based on the attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-4) by adopting a cardinal-reserved graph attention network calculation method, wherein the feature vectors K i of all nodes in the word-character interaction diagram G corresponding to the Chinese sentence form a feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-6) carrying out weighted summation on the feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-5) and the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) to obtain a final feature vector R=W 1H+W2 K corresponding to the Chinese sentence, wherein W 1 and W 2 are trainable matrixes;
And (3-3) inputting the final feature vector corresponding to the Chinese sentence obtained in the step (3-2) into a conditional random field model for decoding to obtain a Chinese named entity label corresponding to the Chinese sentence, calculating a loss function of a Chinese named entity recognition model based on a graph attention network by using a labeling result, and training parameters of a BiLSTM model and a GAT model to obtain a trained Chinese named entity recognition model based on the graph attention network, wherein the trained Chinese named entity recognition model comprises the BiLSTM model in the step (3-2), the GAT model and the conditional random field model in the step (3-3).
2. The method of claim 1, wherein step (2) first represents a chinese sentence as a sequence of characters s= { s 1,s2,…,sm }, where s m represents an mth character in the chinese sentence, where M e [1 ], the total number of characters M in the chinese sentence; then, for each character in the character sequence, the character is represented as a word vector X m=f(sm by looking up a character embedding matrix), and all word vectors form a word vector set X corresponding to the chinese sentence, where f is a character embedding lookup table, and is trained by the continuous word bag model CBOW.
3. The method for identifying Chinese named entities based on graph attention network of claim 2, wherein,
The feature vector h m of the Chinese sentence corresponding to the mth word vector in the Chinese sentence in the step (3-2-1) is given by the following formula:
Wherein, The hidden layer output at time t representing the forward LSTM,Hidden layer output at time t, which represents reverse LSTM, h m representsAndX m represents the mth word vector in the Chinese sentence;
The element of the j-th column of the i-th row in the word information fusion correlation coefficient matrix e in the step (3-2-3), namely the word information fusion correlation coefficient e (h i,hj) between the node i and the node i in the word-character interaction diagram G, is given by the following formula:
e(hi,hj)=aTLeaky ReLU(Whi||Whj)
Wherein, the Leaky ReLU is an activation function, a and W are both learnable parameter matrixes, i and j are both epsilon [1 ], and the total number of nodes N in the word-character interaction diagram G is equal to N;
The attention coefficient alpha ij between the node i and the node j in the graph G in the step (3-2-4) is normalized by adopting a soft max normalization function:
αij=soft max(e(hi,hj))。
4. a graph-attention-network-based chinese named entity recognition method of claim 3 wherein the feature vector k i of the i-th node in the word-character interaction graph G in step (3-2-5) is calculated using the following formula:
where N represents the total number of nodes in the word-character interaction graph G, w is a matrix of learnable parameters, and where, by weight, k j represents the feature vector of the j-th node in the word-character interaction graph G.
5. The method for identifying chinese named entities based on graph attention network of claim 4 wherein step (3-3) comprises the sub-steps of:
(3-3-1) decoding the final feature vector R corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-2) by adopting CRF so as to obtain a labeling result corresponding to the Chinese sentence;
(3-3-2) calculating a loss function of a Chinese named entity recognition model based on the graph attention network according to the labeling result Y of each entity of the Chinese sentence obtained in the step (3-3-1), and iterating the training model to obtain a trained Chinese named entity recognition model based on the graph attention network.
6. The method for identifying Chinese named entities based on graph attention network of claim 5, wherein,
For the final feature vector R corresponding to each Chinese sentence, obtaining an entity labeling result of the Chinese sentence as Y= { Y 1,y2,…,ym }, and a probability P (Y m∣sm) of labeling the Chinese sentence as Y m after decoding, wherein Y m represents the labeling result of the mth character in the Chinese sentence;
The training process of the model in step (3-3-2) optimizes the model using L2 regularization to minimize log likelihood loss, the loss function being defined as:
where γ is the regularization parameter of L2, preferably a value of 0.5, and θ is a parameter of all trainable sets.
7. A graph attention network-based chinese named entity recognition system, comprising:
The first module is used for acquiring a Chinese sentence to be identified by a Chinese named entity;
the second module is used for constructing a word vector set X corresponding to the Chinese sentence based on the Chinese sentence obtained by the first module;
The third module is used for inputting the word vector set X corresponding to the Chinese sentence obtained by the second module into a trained Chinese named entity recognition model based on the graph attention network so as to obtain a Chinese named entity label corresponding to the Chinese sentence; the Chinese named entity recognition model based on the graph attention network in the third module is obtained through training the following steps:
(3-1) acquiring a Chinese named entity recognition data set marked by adopting a BIOES marking scheme, and mapping the text of each Chinese sentence in the Chinese named entity recognition data set into a word vector so as to obtain a word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set; step (3-1) comprises the steps of:
(3-1-1) obtaining Chinese named entity recognition data sets of a plurality of fields, and labeling the Chinese named entity recognition data sets by using BIOES labeling schemes to obtain labeled Chinese named entity recognition data sets;
(3-1-2) constructing a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set based on the labeled Chinese named entity recognition data set obtained in the step (3-1-1);
(3-2) inputting the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) into a bidirectional long-short-term memory BiLSTM model to obtain a preliminary feature vector of the word vector, and inputting the preliminary feature vector of the word vector into an improved graph-meaning network GAT model to obtain a final feature vector corresponding to the Chinese sentence; the step (3-2) specifically comprises the following substeps:
(3-2-1) initially modeling each word vector in a word vector set X corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1) by using a BiLSTM model to obtain two different forward and backward feature representations, and splicing the two feature representations to obtain a Chinese sentence feature vector corresponding to the word vector containing a context feature, wherein the Chinese sentence feature vector corresponding to the word vector set corresponding to the Chinese sentence forms a Chinese sentence feature vector set h= { H 1,h2,…,hm } corresponding to the Chinese sentence, wherein M e [1 ], and the total number of characters M in the Chinese sentence;
(3-2-2) constructing a word-character interaction diagram G= (V, E) corresponding to each Chinese sentence by utilizing the word vector set corresponding to each Chinese sentence in the Chinese named entity recognition data set obtained in the step (3-1);
V is a node set, wherein the node set comprises all characters and self-matching words in a word vector set corresponding to a Chinese sentence; e is an edge set, wherein the edge comprises a connection relation between characters in a character vector set, a containing relation between the characters and self-matching words and a connection relation between the self-matching words;
(3-2-3) obtaining a word information fusion correlation coefficient matrix e corresponding to the Chinese sentence according to the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) and the word-character interaction graph G corresponding to the Chinese sentence constructed in the step (3-2-2);
(3-2-4) carrying out normalization processing on each element e (h i,hj) in the word information fusion correlation coefficient matrix e corresponding to the Chinese sentence obtained in the step (3-2-3) so as to obtain an attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-5) obtaining a feature vector K i of each node in the word-character interaction diagram G corresponding to the Chinese sentence based on the attention coefficient alpha ij between every two nodes in the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-4) by adopting a cardinal-reserved graph attention network calculation method, wherein the feature vectors K i of all nodes in the word-character interaction diagram G corresponding to the Chinese sentence form a feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence;
(3-2-6) carrying out weighted summation on the feature vector set K corresponding to the word-character interaction diagram G corresponding to the Chinese sentence obtained in the step (3-2-5) and the Chinese sentence feature vector set H corresponding to the Chinese sentence obtained in the step (3-2-1) to obtain a final feature vector R=W 1H+W2 K corresponding to the Chinese sentence, wherein W 1 and W 2 are trainable matrixes;
And (3-3) inputting the final feature vector corresponding to the Chinese sentence obtained in the step (3-2) into a conditional random field model for decoding to obtain a Chinese named entity label corresponding to the Chinese sentence, calculating a loss function of a Chinese named entity recognition model based on a graph attention network by using a labeling result, and training parameters of a BiLSTM model and a GAT model to obtain a trained Chinese named entity recognition model based on the graph attention network, wherein the trained Chinese named entity recognition model comprises the BiLSTM model in the step (3-2), the GAT model and the conditional random field model in the step (3-3).
CN202210083152.7A 2022-01-25 2022-01-25 Chinese named entity recognition method and system based on graph attention network Active CN114417874B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210083152.7A CN114417874B (en) 2022-01-25 2022-01-25 Chinese named entity recognition method and system based on graph attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210083152.7A CN114417874B (en) 2022-01-25 2022-01-25 Chinese named entity recognition method and system based on graph attention network

Publications (2)

Publication Number Publication Date
CN114417874A CN114417874A (en) 2022-04-29
CN114417874B true CN114417874B (en) 2024-10-15

Family

ID=81277360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210083152.7A Active CN114417874B (en) 2022-01-25 2022-01-25 Chinese named entity recognition method and system based on graph attention network

Country Status (1)

Country Link
CN (1) CN114417874B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115130468B (en) * 2022-05-06 2023-04-07 北京安智因生物技术有限公司 Myocardial infarction entity recognition method based on word fusion representation and graph attention network
CN117057350B (en) * 2023-08-07 2024-05-10 内蒙古大学 Chinese electronic medical record named entity recognition method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183102A (en) * 2020-10-15 2021-01-05 上海明略人工智能(集团)有限公司 Named entity identification method based on attention mechanism and graph attention network
CN112711948A (en) * 2020-12-22 2021-04-27 北京邮电大学 Named entity recognition method and device for Chinese sentences

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871545B (en) * 2019-04-22 2022-08-05 京东方科技集团股份有限公司 Named entity identification method and device
CN113010683B (en) * 2020-08-26 2022-11-29 齐鲁工业大学 Entity relationship identification method and system based on improved graph attention network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183102A (en) * 2020-10-15 2021-01-05 上海明略人工智能(集团)有限公司 Named entity identification method based on attention mechanism and graph attention network
CN112711948A (en) * 2020-12-22 2021-04-27 北京邮电大学 Named entity recognition method and device for Chinese sentences

Also Published As

Publication number Publication date
CN114417874A (en) 2022-04-29

Similar Documents

Publication Publication Date Title
CN110457675B (en) Predictive model training method and device, storage medium and computer equipment
CN111783462B (en) Chinese named entity recognition model and method based on double neural network fusion
CN111985239B (en) Entity identification method, entity identification device, electronic equipment and storage medium
CN111966917B (en) Event detection and summarization method based on pre-training language model
CN111310471B (en) Travel named entity identification method based on BBLC model
CN107729309B (en) Deep learning-based Chinese semantic analysis method and device
CN110263325B (en) Chinese word segmentation system
CN108932226A (en) A kind of pair of method without punctuate text addition punctuation mark
CN110008469A (en) A kind of multi-level name entity recognition method
CN111274829B (en) Sequence labeling method utilizing cross-language information
CN112131883B (en) Language model training method, device, computer equipment and storage medium
CN110502742B (en) Complex entity extraction method, device, medium and system
CN114417874B (en) Chinese named entity recognition method and system based on graph attention network
CN112800239B (en) Training method of intention recognition model, and intention recognition method and device
CN112699685B (en) Named entity recognition method based on label-guided word fusion
CN115600597A (en) Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium
CN113051922A (en) Triple extraction method and system based on deep learning
CN111930931A (en) Abstract evaluation method and device
CN116796730A (en) Text error correction method, device, equipment and storage medium based on artificial intelligence
CN113065349A (en) Named entity recognition method based on conditional random field
US11966700B2 (en) Neural tagger with deep multi-level model
CN112699684B (en) Named entity recognition method and device, computer readable storage medium and processor
CN114417891A (en) Reply sentence determination method and device based on rough semantics and electronic equipment
CN114298047A (en) Chinese named entity recognition method and system based on stroke volume and word vector
Wu et al. A Text Emotion Analysis Method Using the Dual‐Channel Convolution Neural Network in Social Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant