CN115238693A - Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory - Google Patents
Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory Download PDFInfo
- Publication number
- CN115238693A CN115238693A CN202210809038.8A CN202210809038A CN115238693A CN 115238693 A CN115238693 A CN 115238693A CN 202210809038 A CN202210809038 A CN 202210809038A CN 115238693 A CN115238693 A CN 115238693A
- Authority
- CN
- China
- Prior art keywords
- layer
- model
- output
- bilstm
- named entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory, which improves the recognition precision of named entities by modifying a BERT-BILSTM-CRF model; determining input and output of the named entity recognition model: taking a medical text as a research object, taking a medical text data set with entity labels as the input of a named entity recognition model, wherein the output of the model is an entity label result given after medical entity prediction is carried out on the data set; according to the invention, by further enhancing the context feature extraction performance of the text of the model, on one hand, a multi-word segmentation method is considered to increase local context features, on the other hand, a multi-layer bidirectional long-term and short-term memory method is introduced, global context features are increased by setting BILSTM models with different depths, external knowledge of a medical dictionary is introduced, and the precision of the named entity recognition task is further improved by enriching semantic feature information in the model learning process.
Description
Technical Field
The invention relates to the technical field of Chinese named entity recognition, in particular to a Chinese named entity recognition method based on multi-participle and multi-layer bidirectional long-short term memory.
Background
Named entity recognition is a challenging fundamental task in the task of natural language processing. Named entity recognition is used as a basic task in natural language processing, and plays a key role in information extraction, knowledge graph construction and the like. In some specific areas, research into named entity recognition has found widespread and mature applications. Current methods of named entity recognition focus mainly on dictionary and rule based methods, statistical machine learning based methods, and deep learning based methods.
Dictionary-based and rule-based methods for entity recognition by using string matching and manually constructed entity extraction rules can achieve better accuracy on small data sets, but are not applicable as data sets increase. The learning method based on the statistical machine comprises a hidden Markov model, a support vector machine, a conditional random field and the like. While these approaches reduce vocabulary and rule-based workload to some extent, the need for manual assignment of features and extrinsic knowledge information is inevitable. Therefore, the methods can only be generally suitable for the current field, and the problem of named entity identification in a brand-new field is difficult to directly solve. Deep learning-based methods have gained widespread use and breakthrough in recent years, including BERT models, CNN models, BILSTM models, and the like. Compared with a machine learning model, the deep learning method can learn high-dimensional and deep feature representation, and is beneficial to improving the generalization capability of entity recognition.
Although the deep learning method of the existing research achieves better effect in the identification of the medical named entity, the identification of the medical named entity still faces some difficulties and challenges:
(1) The text representation with single granularity of the existing method can only obtain the global context characteristics of the text and lacks the local context characteristic information, thereby hindering the further improvement of the model performance;
(2) The commonly adopted single BILSTM can only capture the context characteristics of a specific dimension, and the contribution of the context characteristics to the model performance under other dimensions is not considered;
therefore, it is necessary to design a Chinese named entity recognition method based on multi-segmented words and multi-layer bidirectional long-short term memory.
Disclosure of Invention
The invention solves the problem of providing a Chinese named entity recognition method based on multi-participle and multi-layer bidirectional long-short term memory, which modifies a BERT-BILSTM-CRF model widely used in the named entity recognition technology; a multi-Word segmentation module is introduced, all data sets can be segmented into a plurality of words in the module, and then local features are extracted through Word-Level BILSTM; introducing a multi-layer BILSTM module, wherein the module consists of BILSTM and Attention, and by setting different hidden layer parameters for the BILSTM, text context characteristics with different dimensionalities can be learned, and then important information is captured by the Attention; the two modules can enrich information in the model learning process, so that the accuracy of named entity identification is improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a Chinese named entity recognition method based on multi-word segmentation and multilayer bidirectional long-short term memory improves the recognition precision of named entities by modifying a BERT-BILSTM-CRF model; the method comprises the following steps:
step S1: determining input and output of the named entity recognition model: taking a medical text as a research object, taking a medical text data set with entity labels as the input of a named entity recognition model, wherein the output of the model is an entity label result given after medical entity prediction is carried out on the data set;
step S2: designing a medical named entity recognition model with multiple words and multi-layer bidirectional long and short term memory, wherein the model consists of an input layer, a word embedding layer, a semantic feature extraction layer, a CRF layer and an output layer; the model comprises a BERT pre-training language model, a bidirectional long-short term memory model BILSTM, an attention mechanism and a conditional random field CRF; the main method of the medical named entity recognition model comprises the following steps:
(1) an input layer: this layer is used to input the data set;
(2) word embedding layer: the layer encodes characters in a text into a vector representation form through a BERT pre-training language model; the output result after passing through the BERT model is represented as V = V 1 ,V 2 ,...,V n Wherein n represents the total number of characters contained in the current sentence;
(3) semantic feature extraction layer: the layer is formed by combining a multi-word segmentation module and a multi-layer BILSTM module. The multi-Word segmentation module mainly extracts features through a Word-LeveI BILSTM module, and the multi-layer BILSTM module acquires feature information of different dimensions by setting hidden layers of different sizes and captures important information by using an attention mechanism; the specific process is as follows:
1) Multi-word segmentation module
The Word-Level BILSTM module is formed based on a BILSTM model; BILSTM is formed by combining forward LSTM and backward LSTM; LSTM is expressed by mathematical expressions as shown in equations 1-6:
f t =σ(W f ·[h t-1 ,x t ]+b f )#(1)
i t =σ(W i ·[h t-1 ,x t ]+b i )#(2)
C t =tanh(W C ·[h t-1 ,x t ]+b c )#(3)
C t =f t *C t-1 +i t *C t #(4)
o t =σ(W o ·[h t-1 ,x t ]+b o )#(5)
h t =O t *tanh(C t )#(6)
wherein t and t-1 respectively represent the current moment and the last moment, h represents a hidden state, and sigma and tanh respectively represent a sigmoid activation function and a tanh activation function; w represents a weight matrix, b represents a bias vector; * Represents a dot product;
2) Multilayer BILSTM module
The module integrates a BILSTM and an Attention mechanism; setting hidden layers with different sizes for the BILSTM to extract context features with different dimensions; the Attention mechanism is used to distinguish different degrees of importance of different features;
feature vector h output by attention mechanism layer to BILSTM layer t Carrying out weight distribution, and calculating to obtain a common output feature vector W of the t-th word in a BILSTM layer and an attention layer t(k) Expressed mathematically as in equations 7-9:
score(s t ,h i )=vtanh(w[s t ,h i ])#(9)
wherein a is t,i For the attention function, the score function is an alignment model that assigns a score based on how well the inputs and outputs match at time i, defining each output as hidden to each inputHow much weight the hidden state is; w t(k) Representing the output of the tth word passing through the kth MBA model, wherein the value of k is 1,2;
the final output O of the semantic feature extraction layer is obtained by fusing the output of the multi-word segmentation module and the output of the multi-layer BILSTM, and is expressed by a mathematical expression as formula 10:
the final output sequence of the layer model is [ O ] 1 ,O 2 ...,O n ];
(4) CRF layer:
the main role of this layer is to predict the label; in the process of training data, the layer automatically learns the constraint between the labels and ensures that the predicted labels are legal; the matrix P is a scoring matrix, P i,j Is a probability value, A, that classifies the ith character as the jth token i,j Is the state transition score from the ith marker to the jth marker; if the input sentence x = (x) 1 ,x 2 …,x n ) The mark sequence is y = (y) 1 ,y 2 ,...,y n ) The score is as follows:
for Score (x, y), the normalization process was performed using the Softmax function, as follows:
in training, for training sample (x, y) x ) Maximizing the log probability of the marker sequence using the following formula;
the experiment adopts the Viterbi algorithm to solve the probability maximum path of the dynamic programming, and the formula is as follows:
Y * is the sequence with the highest score in the scoring function, i.e. the expected output of the model,is a maximize score function;
(5) and (3) an output layer: the layer is used for inputting labeling results of all texts in the data set; the evaluation index is measured by the accuracy P, recall R and F1 values, as shown in equations 15, 16 and 17:
wherein T is p Number of correctly identified medical entities of the representation model, F p Representing the number of unrelated medical entities identified by the model, F N The number of relevant medical entities which cannot be identified by the model; f1 is the weighted harmonic mean of P and R.
The invention has the beneficial effects that: according to the invention, by further enhancing the context feature extraction performance of the text of the model, on one hand, a multi-word segmentation method is considered to increase local context features and obtain feature information of different dimensions, and an attention mechanism is utilized to capture important information, on the other hand, a multi-layer bidirectional long-short term memory method is introduced, global context features are increased by setting BILSTM models of different depths, external knowledge of a medical dictionary is introduced, and the precision of the named entity recognition task is further improved by enriching semantic feature information in the model learning process.
Drawings
FIG. 1 is an overall process flow of the medical named entity recognition model of the present invention;
FIG. 2 is the overall processing procedure of the multi-segmentation module of the present invention;
FIG. 3 is an overall process of the multi-layer BILSTM module of the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Specific examples are given below.
Referring to fig. 1-3, a method for recognizing a named entity in chinese based on multi-segmentation and multi-layer bidirectional long-short term memory, which improves recognition accuracy of the named entity by modifying a BERT-blst (tm) -CRF model; with reference to fig. 1, the method comprises the following steps:
step S1: determining input and output of the named entity recognition model: taking a medical text as a research object, taking a medical text data set with entity labels as the input of a named entity recognition model, wherein the output of the model is an entity label result given after medical entity prediction is carried out on the data set;
step S2: designing a medical named entity recognition model with multiple word segments and multi-layer bidirectional long-term and short-term memory, wherein the model consists of an input layer, a word embedding layer, a semantic feature extraction layer, a CRF layer and an output layer; the model comprises a BERT pre-training language model, a bidirectional long-short term memory model BILSTM, an attention mechanism and a conditional random field CRF; the main method of the medical named entity recognition model sequentially comprises the following steps:
(1) an input layer: this layer is used to input the data set;
(2) word embedding layer: the layer encodes characters in a text into a vector representation form through a BERT pre-training language model; the output result after passing through the BERT model is represented as V = V 1 ,V 2 ,...,V n Wherein n represents the total number of characters contained in the current sentence;
(3) semantic feature extraction layer: the layer is formed by combining a multi-word segmentation module and a multi-layer BILSTM module. The multi-Word segmentation module mainly extracts features through a Word-LeveI BILSTM module, and the multi-layer BILSTM module acquires feature information of different dimensions by setting hidden layers of different sizes and captures important information by using an attention mechanism; the specific process is as follows:
1) Multi-word segmentation module
The Word-Level BILSTM module is formed based on a BILSTM model; the BILSTM is formed by combining a forward LSTM and a backward LSTM; LSTM is expressed by mathematical expressions as shown in equations 1-6:
f t =σ(W f ·[h t-1 ,x t ]+b f )#(1)
i t =σ(W i ·[h t-1 ,x t ]+b i )#(2)
C t =tanh(W C ·[h t-1 ,x t ]+b c )#(3)
C t =f t *C t-1 +i t *C t #(4)
o t =σ(W o ·[h t-1 ,x t ]+b o )#(5)
h t =O t *tanh(C t )#(6)
wherein t and t-1 respectively represent the current moment and the last moment, h represents a hidden state, and sigma and tanh respectively represent a sigmoid activation function and a tanh activation function; w represents a weight matrix, b represents an offset vector; * Represents a dot product;
The multi-segmentation module integrates a plurality of BILSTMs as shown in figure 2; taking the text of 'postpartum diagnosis as diabetes' as an example, the sequence of the text after full word segmentation is expressed as [ 'postpartum', 'diagnosis as', 'diabetes' ], and each word is subjected to word local context feature capture through a BILSTM model;
v in the diagram of FIG. 2 i The representative word embedding layer generates a character vector for the ith word; words i Represents the ith word after the multi-word segmentation,represents the output representation of the ith word after the word-level local feature analysis when the jth word appears in a certain word, W i Representing the output of the ith Word after passing through a Word-LeveI BILSTM module;
2) Multilayer BILSTM module
The module integrates the BILSTM and the Attention mechanism, see FIG. 3; hidden layers with different sizes are arranged on the BILSTM to extract context features with different dimensions; the Attention mechanism is used to distinguish different degrees of importance of different features;
feature vector h output by attention mechanism layer to BILSTM layer t Carrying out weight distribution, and calculating to obtain a common output feature vector W of the t-th word in a BILSTM layer and an attention layer t(k) Expressed mathematically as in equations 7-9:
score(s t ,h i )=vtanh(w[s t ,h i ])#(9)
wherein a is t,i For the attention function, the score function is an alignment model, which assigns a score based on the degree of matching of the input and output at time i, defining how much weight each output gives to each input hidden state; w is a group of t(k) Representing the output of the tth word passing through the kth MBA model, wherein the value of k is 1,2;
the final output 0 of the semantic feature extraction layer is obtained by fusing the multi-word segmentation module output and the multi-layer BILSTM output, and is expressed by a mathematical expression as formula 10:
the final output sequence of the layer model is [ O ] 1 ,O 2 ...,O n ];
(4) CRF layer:
the main role of this layer is to predict the label; in the process of training data, the layer automatically learns the constraint between the labels and ensures that the predicted labels are legal; the matrix P is a scoring matrix, P i,j Is a probability value, A, that classifies the ith character as the jth token i,j Is the state transition score from the ith marker to the jth marker; if the input sentence x = (x) 1 ,x 2 …,x n ) The mark sequence is y = (y) 1 ,y 2 ,...,y n ) The score is as follows:
for Score (x, y), the normalization process was performed using the Softmax function, as follows:
in the training, for trainingSample (x, y) x ) Maximizing the log probability of the marker sequence using the following formula;
the experiment adopts the Viterbi algorithm to solve the probability maximum path of the dynamic programming, and the formula is as follows:
Y * is the sequence of the scoring function with the highest score, i.e. the expected output of the model,is a maximize score function;
(5) an output layer: the layer is used for inputting labeling results of all texts in the data set; the evaluation index is measured by the accuracy P, recall R and F1 values, as shown in equations 15, 16 and 17:
wherein T is p Representing the number of correctly identified medical entities of the model, F p The FN is the number of the relevant medical entities which can not be identified by the model; f1 is the weighted harmonic mean of P and R.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered as the technical solutions and the inventive concepts of the present invention within the technical scope of the present invention.
Claims (1)
1. A Chinese named entity recognition method based on multi-word segmentation and multilayer bidirectional long-short term memory is characterized in that the recognition precision of a named entity is improved by modifying a BERT-BILSTM-CRF model; the method comprises the following steps:
step S1: determining input and output of the named entity recognition model: taking a medical text as a research object, taking a medical text data set with entity labels as the input of a named entity recognition model, wherein the output of the model is an entity label result given after medical entity prediction is carried out on the data set;
step S2: designing a medical named entity recognition model with multiple words and multi-layer bidirectional long and short term memory, wherein the model consists of an input layer, a word embedding layer, a semantic feature extraction layer, a CRF layer and an output layer; the model comprises a BERT pre-training language model, a bidirectional long-short term memory model BILSTM, an attention mechanism and a conditional random field CRF; the main method of the medical named entity recognition model sequentially comprises the following steps:
(1) an input layer: this layer is used to input the data set;
(2) word embedding layer: the layer encodes characters in a text into a vector representation form through a BERT pre-training language model; the output result after passing through the BERT model is represented as V = V 1 ,V 2 ,...,V n Wherein n represents the total number of characters contained in the current sentence;
(3) semantic feature extraction layer: the layer is formed by combining a multi-word segmentation module and a multi-layer BILSTM module. The multi-Word segmentation module mainly extracts features through a Word-LeveI BILSTM module, and the multi-layer BILSTM module acquires feature information of different dimensions by setting hidden layers of different sizes and captures important information by using an attention mechanism; the specific process is as follows:
1) Multi-word segmentation module
The Word-Level BILSTM module is formed based on a BILSTM model; the BILSTM is formed by combining a forward LSTM and a backward LSTM; LSTM is expressed by mathematical expressions as shown in equations 1-6:
f t =σ(W f ·[h t-1 ,x t ]+b f )#(1)
i t =σ(W i ·[h t-1 ,x t ]+b i )#(2)
C t =tanh(W C ·[h t-1 ,x t ]+b c )#(3)
C t =f t *C t-1 +i t *C t #(4)
o t =σ(W o ·[h t-1 ,x t ]+b o )#(5)
h t =O t *tanh(C t )#(6)
wherein t and t-1 respectively represent the current moment and the last moment, h represents a hidden state, and sigma and tanh respectively represent a sigmoid activation function and a tanh activation function; w represents a weight matrix, b represents a bias vector; * Represents a dot product;
2) Multilayer BILSTM module
The module integrates a BILSTM and an Attention mechanism; setting hidden layers with different sizes for the BlLSTM to extract context features with different dimensions; the Attention mechanism is used to distinguish different degrees of importance of different features;
feature vector h output by attention mechanism layer to BILSTM layer t Weight distribution is carried out, and common output feature vectors of the t-th word in a BILSTM layer and an attention layer are obtained through calculationW t(k) Expressed by mathematical formulas such as 7-9:
score(s t ,h i )=vtanh(w[s t ,h i ])#(9)
wherein a is t,j For the attention function, the score function is an alignment model, which assigns a score based on the degree of matching of the input and output at time i, defining how much weight each output gives to each input hidden state; w t(k) Representing the output of the tth word passing through the kth MBA model, wherein the value of k is 1,2;
the final output 0 of the semantic feature extraction layer is obtained by fusing the output of the multi-word segmentation module and the output of the multi-layer BILSTM, and is expressed by a mathematical expression as formula 10:
the final output sequence of the layer model is [ O ] 1 ,O 2 ...,O n ];
(4) CRF layer:
the main role of this layer is to predict the label; in the process of training data, the layer automatically learns the constraint between the labels and ensures that the predicted labels are legal; the matrix P is a scoring matrix, P i,j Is a probability value, A, that classifies the ith character as the jth token i,j Is the state transition score from the ith marker to the jth marker; if the input sentence x = (x) 1 ,x 2 …,x n ) The mark sequence is y = (y) 1 ,y 2 ,...,y n ) The score is as follows:
for Score (x, y), the normalization process was performed using the Softmax function, as follows:
in training, for training sample (x, y) x ) Maximizing the log probability of the marker sequence using the following formula;
the experiment adopts the Viterbi algorithm to solve the probability maximum path of the dynamic programming, and the formula is as follows:
Y * is the sequence of the scoring function with the highest score, i.e. the expected output of the model,is a maximize score function;
(5) an output layer: the layer is used for inputting labeling results of all texts in the data set; the evaluation index is measured by the accuracy P, recall R and F1 values, as shown in equations 15, 16 and 17:
wherein T is p Representing the number of correctly identified medical entities of the model, F p Number of unrelated medical entities identified by the representation model, F N The number of relevant medical entities which cannot be identified by the model; f1 is the weighted harmonic mean of P and R.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210809038.8A CN115238693A (en) | 2022-07-11 | 2022-07-11 | Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210809038.8A CN115238693A (en) | 2022-07-11 | 2022-07-11 | Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115238693A true CN115238693A (en) | 2022-10-25 |
Family
ID=83671477
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210809038.8A Pending CN115238693A (en) | 2022-07-11 | 2022-07-11 | Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115238693A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116501884A (en) * | 2023-03-31 | 2023-07-28 | 重庆大学 | Medical entity identification method based on BERT-BiLSTM-CRF |
CN117933259A (en) * | 2024-03-25 | 2024-04-26 | 成都中医药大学 | Named entity recognition method based on local text information |
CN118278507A (en) * | 2024-06-04 | 2024-07-02 | 南京大学 | Method for constructing knowledge graph of biological medicine industry |
-
2022
- 2022-07-11 CN CN202210809038.8A patent/CN115238693A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116501884A (en) * | 2023-03-31 | 2023-07-28 | 重庆大学 | Medical entity identification method based on BERT-BiLSTM-CRF |
CN117933259A (en) * | 2024-03-25 | 2024-04-26 | 成都中医药大学 | Named entity recognition method based on local text information |
CN118278507A (en) * | 2024-06-04 | 2024-07-02 | 南京大学 | Method for constructing knowledge graph of biological medicine industry |
CN118278507B (en) * | 2024-06-04 | 2024-10-01 | 南京大学 | Method for constructing knowledge graph of biological medicine industry |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110083831B (en) | Chinese named entity identification method based on BERT-BiGRU-CRF | |
CN112733541A (en) | Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism | |
CN111401061A (en) | Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention | |
CN110489750A (en) | Burmese participle and part-of-speech tagging method and device based on two-way LSTM-CRF | |
CN115238693A (en) | Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory | |
CN106980609A (en) | A kind of name entity recognition method of the condition random field of word-based vector representation | |
CN113673254B (en) | Knowledge distillation position detection method based on similarity maintenance | |
CN111414481A (en) | Chinese semantic matching method based on pinyin and BERT embedding | |
CN107797987B (en) | Bi-LSTM-CNN-based mixed corpus named entity identification method | |
CN110555084A (en) | remote supervision relation classification method based on PCNN and multi-layer attention | |
Xing et al. | A convolutional neural network for aspect-level sentiment classification | |
CN114153971B (en) | Error correction recognition and classification equipment for Chinese text containing errors | |
CN115687626A (en) | Legal document classification method based on prompt learning fusion key words | |
CN114386417A (en) | Chinese nested named entity recognition method integrated with word boundary information | |
CN113987183A (en) | Power grid fault handling plan auxiliary decision-making method based on data driving | |
CN115600597A (en) | Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium | |
CN110134950B (en) | Automatic text proofreading method combining words | |
CN113239663B (en) | Multi-meaning word Chinese entity relation identification method based on Hopkinson | |
CN114398488A (en) | Bilstm multi-label text classification method based on attention mechanism | |
CN115238697A (en) | Judicial named entity recognition method based on natural language processing | |
CN116029305A (en) | Chinese attribute-level emotion analysis method, system, equipment and medium based on multitask learning | |
CN114417851A (en) | Emotion analysis method based on keyword weighted information | |
CN108536781B (en) | Social network emotion focus mining method and system | |
CN115169349A (en) | Chinese electronic resume named entity recognition method based on ALBERT | |
CN115098673A (en) | Business document information extraction method based on variant attention and hierarchical structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |