CN114490950B - Method and storage medium for training encoder model, and method and system for predicting similarity - Google Patents
Method and storage medium for training encoder model, and method and system for predicting similarity Download PDFInfo
- Publication number
- CN114490950B CN114490950B CN202210360834.8A CN202210360834A CN114490950B CN 114490950 B CN114490950 B CN 114490950B CN 202210360834 A CN202210360834 A CN 202210360834A CN 114490950 B CN114490950 B CN 114490950B
- Authority
- CN
- China
- Prior art keywords
- neural network
- text
- encoder model
- text sequence
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000012549 training Methods 0.000 title claims abstract description 50
- 238000013528 artificial neural network Methods 0.000 claims abstract description 152
- 239000013598 vector Substances 0.000 claims abstract description 141
- 230000006870 function Effects 0.000 claims abstract description 61
- 238000011176 pooling Methods 0.000 claims abstract description 49
- 238000004364 calculation method Methods 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 claims description 2
- 230000000873 masking effect Effects 0.000 claims 2
- 230000009286 beneficial effect Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Acoustics & Sound (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system, comprising the following steps: inputting the two text sequences into the embedding layer to obtain a text sequence vector; inputting two text sequence vectors into a twin neural network encoder model so as to determine a hidden state based on the same neural network parameters; constructing an automatic supervision loss function according to the neural network parameters; inputting the hidden state into a pooling layer to perform pooling according to the hidden state, determining similarity of two text sequences according to the text sequence vectors after the pooling, and constructing a supervision loss function by using the similarity; determining a loss function according to the self-supervision and supervised loss functions to update neural network parameters; the new text sequence continues to be entered until the value of the loss function is at a minimum. The method greatly improves the reasoning bandwidth of the model when the similarity of the text sequences is calculated, and can realize the accurate calculation of the similarity of the two text sequences based on the trained neural network encoder model.
Description
Technical Field
The invention relates to the field of text similarity, in particular to a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system.
Background
The text similarity refers to the similarity of two texts, and the application scenes comprise text classification, clustering, text topic detection, topic tracking, machine translation and the like. More specifically, monitoring the call line in the voice communication scene also requires determining the similarity between texts, but the conversation content acquired in the voice communication scene is noisy, mixed with accent and insufficient in information integrity, and in the prior art, whether the conversation content is similar or not needs to be checked manually, which consumes a lot of manpower and time.
Disclosure of Invention
The invention aims to overcome at least one defect of the prior art, and provides a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system, which are used for solving the problems that in the prior art, manual sampling inspection is relied on when text similarity is determined, the detection coverage is small, and the subjectivity is high.
The technical scheme adopted by the invention comprises the following steps:
in a first aspect, the present invention provides a method for training a deep neural network encoder model, including: performing training operation on two different text sequences; the training operation is as follows: inputting the two text sequences into an embedded layer for vectorization to obtain two text sequence vectors; inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters; meanwhile, constructing an auto-supervision loss function of the neural network encoder model according to the neural network parameters; inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors, and determining similarity of the two text sequences according to the two text sequence vectors after the pooling processing; constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences; determining a loss function of the neural network encoder model according to the self-supervision loss function and the supervision loss function, so that the neural network encoder model updates neural network parameters according to the loss function; and continuing to execute the training operation on the new two different text sequences until the numerical value of the loss function is the minimum value, so as to obtain the trained neural network encoder model.
The invention provides a method for predicting similarity of text sequences, which comprises the steps of inputting two different text sequences into an embedded layer for vectorization to obtain two text sequence vectors; inputting the two text sequence vectors into a twin neural network encoder model obtained by training the deep neural network encoder model by the training method, so that the neural network encoder model outputs the hidden states of the two text sequence vectors; inputting the hidden states of the two text sequence vectors into a pooling layer so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors; and determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
In a third aspect, the present invention provides a system for predicting similarity of text sequences, including: the system comprises a word input module, a word embedding module, a twin neural network encoder model obtained by training the deep neural network encoder model by the training method, a hidden state pooling module and a vector similarity calculation module; the word input module is used for serializing two different text data input from the outside to obtain two different text sequences and outputting the two different text sequences to the word embedding module; the word embedding module is used for vectorizing the two text sequences to obtain two text sequence vectors and outputting the two text sequence vectors to the neural network encoder model; the neural network encoder model is used for determining the hidden states of the two text sequence vectors based on the neural network parameters and outputting the hidden states to a hidden state pooling module; the hidden state pooling module is used for pooling the two text sequence vectors according to the hidden states of the two text sequence vectors and outputting the pooled text sequence vectors to the vector similarity calculation module; and the vector similarity calculation module is used for determining the similarity of the two text sequences according to the two text sequence vectors after the pooling processing.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the above-mentioned method for training a deep neural network encoder model, and/or the above-mentioned method for predicting similarity of text sequences.
Compared with the prior art, the invention has the beneficial effects that:
the training method of the encoder model provided by the embodiment is used for training to obtain a trained twin neural network encoder model, and the twin neural network encoder model shares the same neural network parameter, so that the inference bandwidth of the model in calculating the semantic similarity between text sequences is greatly increased, and the trained neural network encoder model can be used for realizing the accurate calculation of the similarity between two text sequences. Meanwhile, in the training process, the neural network encoder model is trained in a combined manner of self-supervision and supervision, so that the finally updated neural network parameters are beneficial to improving the accuracy of semantic similarity calculation of the neural network encoder model at the semantic level.
Drawings
FIG. 1 is a schematic flow chart of the method steps S110-S180 in example 1.
Fig. 2 is a schematic diagram of a training process of the neural network encoder model according to embodiment 1.
Fig. 3 is a schematic diagram of a hidden state calculation process of the neural network encoder model of embodiment 1.
FIG. 4 is a flowchart illustrating steps S210-S240 of the method of embodiment 2.
Fig. 5 is a schematic diagram of a prediction process of the prediction method of embodiment 2.
Fig. 6 is a schematic diagram of a prediction process of the prediction system of embodiment 3.
Detailed Description
The drawings are only for purposes of illustration and are not to be construed as limiting the invention. For a better understanding of the following embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
Example 1
The embodiment provides a training method of a deep neural network encoder model, which is used for training a twin neural network encoder model, wherein the twin neural network can be composed of two sub-networks or one network in a broad sense, and the key point is that the twin neural networks share the same neural network parameter.
As shown in fig. 1 and 2, the method includes the following steps:
s110, inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;
in this step, the text sequence refers to text data that has been preprocessed so as to satisfy an input format compatible with the embedding layer. In a specific embodiment, the pre-treatment comprises:
carrying out data cleaning on the original text data; reading preset special symbols, stop words and a user dictionary word list, removing the special symbols in the text data, segmenting words of the text sequence by combining the read user dictionary, and removing the stop words in the text data. And converting the text data into a plurality of sub-text sequences, sequencing and splicing the plurality of sub-text sequences according to the length, and cutting according to the preset data size of the training batch to obtain a plurality of text sequences as training data.
The training method provided by the embodiment is used for training a neural network encoder model for calculating the similarity of the text sequences, so that the label is the real similarity between two different text sequences in each group of text sequences. The sets of text sequences that have been selected as input are converted into integer data before entering the embedding layer. In a preferred embodiment, Tokenizer may be employed to convert the text data into integer data.
The embedding layer is used for converting an input text sequence into a vector with a fixed size, specifically mapping the text sequence into a vector space, thereby obtaining text sequence vectors of two text sequences.
S120, inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors;
in this step, after receiving the two text sequence vectors, the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters. The neural network parameters refer to parameters of a neural network encoder model backbone network. The hidden state is a high-dimensional vector obtained by a series of matrix operations and nonlinear transformation in a neural network.
When the neural network encoder model is initialized, video memory space is distributed according to each internal module, pre-trained parameters are loaded, and neural network parameters are read. In a specific embodiment, the neural network coding model may be implemented by a (Bidirectional Encoder) pre-training language model, and when the neural network coding model is initialized, pre-trained BERT parameters are loaded and then the neural network parameters are read.
As shown in fig. 3, in a specific implementation process, the neural network encoder model is composed of N neural network encoder sub-modules, and is used for iteratively calculating the hidden state of the text sequence vector.
A single encoder model submodule in a neural network encoder model receives two text sequence vectors x1And x2And then, firstly determining the hidden state of each text sequence vector, and carrying out layer standardization processing on the obtained hidden state to relieve the problem of gradient explosion in the model training process. Will be passedAnd (4) calculating a residual error module in the hidden state input submodule after the layer-crossing standardization treatment so as to avoid gradient diffusion caused by excessive network layer number of the neural network encoder model. Inputting the hidden state output by the residual module into the full-link layer in the submodule for processing to obtain a corresponding text sequence vector x output by the encoder submodule1Hidden state u of1And corresponding text sequence vector x2Hidden state u of2。
N encoder sub-modules are connected in series, each encoder model sub-module calculates the hidden state of the encoder model sub-module relative to the text sequence vector based on the respective internal neural network parameters, the hidden state of the text sequence vector finally output by the sub-module is output to the next encoder model sub-module and serves as the input of the next encoder model sub-module until the last encoder model sub-module outputs the hidden state of the text sequence vector and serves as the hidden state of the text sequence vector output by the final model.
In particular, each encoder model sub-module in the neural network encoder model may be according to the equation:
determining a hidden state of the text sequence vector, wherein,is a hidden state of the text sequence vector,in order to be a non-linear activation function,in order to take care of the change of mechanism,in order to be a parameter of the neural network,is the input text sequence vector.
S130, constructing an automatic supervision loss function of a neural network encoder model according to the neural network parameters;
the variables of the self-supervision loss function are neural network parameters of a neural network encoder model, and are used for updating the neural network parameters in a gradient descending mode so as to enable the loss function to reach the minimum value.
In a specific embodiment, the auto-supervised loss function is:
wherein,the function of the probability density is represented by,are the parameters of the neural network and are,to obscure the corresponding parameters of the language model output layer,the corresponding parameters of the layer are output for the next sentence of the prediction model. The Mask Language Model (MLM) refers to a Model that randomly masks some positions in an input text sequence and then predicts the positions Masked by the text sequence. The next sentence prediction model (NSP) refers to a model for predicting whether two sentences are consecutive two sentences.In order to obscure the training data set of the language model,predicting the training number of the model for the next sentenceAccording to the data set, the data of the data set,andthe words predicted for the mask position and the words for which the position is true for the mask language model,showing the connection relation between the next sentence prediction model output and the two text sequences before and after the next sentence prediction model output,representing the true connection relationship with the two text sequences before and after.
S140, inputting the hidden states of the two text sequence vectors output by the neural network encoder model into a pooling layer, so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors;
in this step, after receiving the hidden states of the two text sequence vectors, the pooling layer maps the hidden states to a semantic vector space with a fixed size, so as to obtain semantic vectors of the text sequence vectors in a uniform size, that is, the text sequence vectors after pooling. The fixed size is preset.
S150, determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment;
in this step, the similarity of two text sequences can be determined by using a method of calculating the similarity between two vectors, which is commonly used in the prior art. In particular embodiments, the formula may be utilizedThe similarity of two text sequences is determined.
Wherein,is the degree of similarity of two text sequences,andrespectively, to represent two text sequences that are,is the vector product of the two pooled text sequence vectors,is the product of the moduli of the two pooled text sequence vectors.
S160, constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences;
the supervised loss function is constructed by the similarity and the real similarity of two text sequences determined by a neural network encoder model, the similarity of the two text sequences is calculated based on pooled text sequence vectors, the pooled text sequence vectors are obtained based on a hidden state output by the neural network encoder model, and the hidden state is obtained based on neural network parameters, so that the neural network parameters certainly influence the similarity calculation of the two text sequences.
In a specific embodiment, the supervised loss function is:
wherein,is composed ofAndthe degree of similarity of the real text of (c),is the number of text sequences that are grabbed each time a training operation is performed.
S170, determining a loss function of the neural network encoder model according to the self-supervision loss function and the supervision loss function, so that the neural network encoder model updates neural network parameters according to the loss function;
in the step, the loss function of the neural network encoder model is constructed by combining the self-supervision loss function and the supervision loss function, namely, the neural network encoder is jointly trained by combining the self-supervision mode and the supervision mode, which is beneficial to obtaining the optimal solution of the neural network parameters. The combination of the auto-supervised and supervised loss functions may be by adding the two or by performing any suitable operation on the two.
In a specific embodiment, the loss function is. Wherein,is an auto-supervision loss function;in order to have a supervised loss function,for adjusting the hyperparameters of the weights, i.e. by adjustingCan adjust the supervised loss function and the self-supervisionThe loss function takes the weight of the overall loss function,and satisfies less than 1.
And S180, judging whether the numerical value of the loss function reaches the minimum value, if not, updating the neural network parameters, and re-executing the step S110 on the new two different text sequences, if so, obtaining the trained neural network encoder model.
Because only one group of two different text sequences are input into the neural network encoder model in the above steps, step S110 needs to be executed again, new text sequences are continuously input into the neural network encoder model to train the neural network encoder model, the neural network parameters of the neural network encoder model are continuously updated in a gradient descending manner in the training process until the numerical value of the loss function is the minimum value, and the training of the neural network encoder model is completed to obtain the trained neural network encoder model.
The training method of the deep neural network encoder model provided by the embodiment is used for training a twin neural network encoder model, the neural network encoder model obtained through training greatly improves the inference bandwidth during semantic similarity calculation between text sequences, and accurate calculation of similarity between two text sequences can be achieved based on the neural network encoder model. Meanwhile, in the training process, a loss function of the neural network encoder model is constructed in a mode of combining self-supervision and supervision to jointly train the neural network encoder model, and finally, the updated neural network parameters are beneficial to improving the accuracy of semantic similarity calculation of the neural network encoder model on the semantic level. Because the neural network encoder model captures context semantic information well, when the neural network encoder model is applied to multi-turn conversation scenes such as communication lines, different conversation scenes can be distinguished more intelligently and automatically, abnormal communication behaviors can be found in time, and the intelligent degree of voice service management is improved.
Example 2
Based on the same concept as that of embodiment 1, this embodiment provides a method for predicting similarity of text sequences, which mainly predicts the similarity of two different text sequences by using a neural network encoder model obtained by training the neural network encoder model provided in the embodiment.
As shown in fig. 3 and 4, the method includes:
s210, inputting two different text sequences into an embedding layer for vectorization to obtain two text sequence vectors;
before this step is performed, two types of text data requiring prediction similarity may be determined, and may be preprocessed by serialization or the like, so that the two types of text data become two types of text sequences and are compatible with the embedding layer, the neural network encoder model, and the pooling layer.
S220, inputting the two text sequence vectors into the trained neural network encoder model so that the neural network encoder model outputs the hidden states of the two text sequence vectors;
after the trained neural network encoder model receives the two text sequence vectors, each encoder model submodule of the neural network encoder model is according to the formulaDetermining a hidden state of the text sequence vector, wherein,is a hidden state of the text sequence vector,in order to be a non-linear activation function,in order to take care of the force-mechanism transformation,in order to be a parameter of the neural network,is the input text sequence vector.
In a specific implementation process, the neural network encoder model comprises a plurality of neural network encoder model sub-modules, the output of one sub-module is used as the input of the next sub-module in a front-back series connection mode and used for iteratively calculating the hidden state of the text sequence vector, and the last encoder model sub-module outputs the hidden state of the text sequence vector as the hidden state of the text sequence vector output by the final model.
S230, inputting the hidden states of the two text sequence vectors into a pooling layer so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors;
and after receiving the hidden states of the two text sequence vectors, the pooling layer maps the hidden states of the two text sequences to a semantic vector space with a fixed size to obtain the semantic vectors with a uniform size.
And S240, determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
Wherein,is the degree of similarity of two text sequences,andrespectively, to represent two text sequences that are,is the vector product of the two pooled text sequence vectors,for two pooling treatmentsThe product of the moduli of the subsequent text sequence vectors.
The twin neural network encoder model obtained by the training method provided in embodiment 1 can realize high accuracy of semantic similarity calculation at a semantic level based on the determined neural network parameters, and when the input text sequence is conversation content supervised in a communication line, the neural network encoder model can automatically distinguish different conversation scenes more intelligently, discover abnormal communication behaviors in time, and improve the intelligent degree of voice service management.
The method for predicting similarity of text sequences provided in this embodiment is based on the same concept as that of embodiment 1, and therefore, the same steps and terms, definitions, explanations, specific/preferred embodiments, and beneficial effects thereof as those in embodiment 1 can be referred to the description in embodiment 1, and are not repeated in this embodiment.
Example 3
Based on the same concept as that in embodiments 1 and 2, the present embodiment provides a text sequence similarity prediction system, which mainly predicts the similarity of two different text sequences by using a neural network encoder model obtained by training through the neural network encoder model training method provided in embodiment 1.
As shown in fig. 5, the system includes: the word input module 310, the word embedding module 320, the neural network encoder model trained by the training method provided in embodiment 1, the hidden state pooling module 330, and the vector similarity calculation module 340.
The word input module 310 is configured to receive two types of text data input from the outside, serialize the two types of text data to obtain two different text sequences, and output the two different text sequences to the word embedding module 320.
The word embedding module 320 is configured to vector the two text sequences, specifically, map the text sequences into a vector space, so as to obtain text sequence vectors of the two text sequences, and output the text sequence vectors to the neural network encoder model. The neural network encoder model is used to determine the hidden states of the two text sequence vectors based on the neural network parameters and output them to the hidden state pooling module 330.
After the trained neural network encoder model receives the two text sequence vectors, each encoder model submodule of the neural network encoder model is according to the formulaDetermining a hidden state of the text sequence vector, wherein,is a hidden state of the text sequence vector,in order to be a non-linear activation function,in order to take care of the force-mechanism transformation,in order to be a parameter of the neural network,is the input text sequence vector.
In a specific implementation process, the neural network encoder model comprises a plurality of neural network encoder model sub-modules, the output of one sub-module is used as the input of the next sub-module in a front-back series connection mode and used for iteratively calculating the hidden state of the text sequence vector, and the last encoder model sub-module outputs the hidden state of the text sequence vector as the hidden state of the text sequence vector output by the final model.
The hidden state pooling module 330 is configured to pool the two text sequence vectors according to the hidden states of the two text sequence vectors, specifically, map the hidden states of the two text sequences to a semantic vector space with a fixed size to obtain semantic vectors with a uniform size, and output the semantic vectors to the vector similarity calculation module 340 as the text sequence vectors after pooling.
The vector similarity calculation module 340 is configured to determine the similarity between two text sequences according to the two text sequence vectors after the pooling process.
The vector similarity calculation module 340 is particularly useful for utilizing equationsThe similarity of two text sequences is determined. Wherein,the degree of similarity of the two text sequences,andrespectively, to represent two text sequences that are,is the vector product of the two pooled text sequence vectors,is the product of the moduli of the two pooled text sequence vectors.
The similarity prediction system for text sequences provided in this embodiment is based on the same concept as that of embodiments 1 and 2, and therefore, the same steps and terms, definitions, explanations, specific/preferred embodiments, and the beneficial effects thereof as those of embodiments 1 and 2 can be referred to the descriptions in embodiments 1 and 2, and are not repeated in this embodiment.
It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the technical solutions of the present invention, and are not intended to limit the specific embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention claims should be included in the protection scope of the present invention claims.
Claims (8)
1. A training method of a deep neural network encoder model is characterized by comprising the following steps:
performing training operations on two different text sequences;
the training operation is as follows:
inputting the two text sequences into an embedded layer for vectorization to obtain two text sequence vectors;
inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters;
simultaneously constructing an auto-supervision loss function of the neural network encoder model according to the neural network parameters;
the auto-supervision loss function is:
wherein,the function of the probability density is represented by,is a parameter of the neural network, and,andrespectively representing the parameters of the output layer corresponding to the masking language model and the next sentence prediction model,andtraining for masking language model and next sentence prediction model respectivelyThe data set is composed of a plurality of data sets,andrespectively for the predicted words and the real words of the masked language model,showing the connection relation between the next sentence prediction model output and the two text sequences before and after the next sentence prediction model output,representing the real connection relation with the front text sequence and the back text sequence;
inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors, and determining similarity of the two text sequences according to the two text sequence vectors after the pooling processing;
constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences;
the supervised loss function is:
wherein,is composed ofAndthe degree of similarity of the real text of (c),the number of text sequences captured each time a training operation is performed;
determining a loss function of the neural network encoder model according to the self-supervision loss function and the supervision loss function, so that the neural network encoder model updates neural network parameters according to the loss function;
and continuing to execute the training operation on the new two different text sequences until the numerical value of the loss function is the minimum value, so as to obtain a trained neural network encoder model.
2. The method of claim 1, wherein the deep neural network encoder model is a deep neural network encoder model,
determining a loss function of the neural network encoder model according to the auto-supervised loss function and the supervised loss function, specifically comprising: taking the sum of the auto-supervised loss function and the supervised loss function as a loss function of the neural network encoder model.
3. The method of claim 1, wherein the deep neural network encoder model is a deep neural network encoder model,
determining the similarity of the two text sequences according to the two text sequence vectors after the pooling process, specifically comprising: utilizing typeDetermining the similarity of the two text sequences;
5. The method for training the deep neural network encoder model according to claim 1, wherein the neural network encoder model determines hidden states of two text sequence vectors based on the same neural network parameters, and specifically comprises:
the neural network encoder model utilizesDetermining the hidden states of the two text sequence vectors;
6. A method for predicting similarity of text sequences is characterized in that,
inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;
inputting two text sequence vectors into a twin neural network encoder model obtained by training the deep neural network encoder model according to any one of claims 1-5, so that the neural network encoder model outputs the hidden states of the two text sequence vectors;
inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors;
and determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
7. A system for predicting similarity of text sequences, comprising: the device comprises a word input module, a word embedding module, a twin neural network encoder model obtained by training the deep neural network encoder model according to any one of claims 1-5, a hidden state pooling module and a vector similarity calculation module;
the word input module is used for serializing two different text data input from the outside to obtain two different text sequences and outputting the two different text sequences to the word embedding module;
the word embedding module is used for vectorizing the two text sequences to obtain two text sequence vectors and outputting the two text sequence vectors to the neural network encoder model;
the neural network encoder model is used for determining the hidden states of the two text sequence vectors based on the neural network parameters and outputting the hidden states to a hidden state pooling module;
the hidden state pooling module is used for pooling the two text sequence vectors according to the hidden states of the two text sequence vectors and outputting the pooled text sequence vectors to the vector similarity calculation module;
and the vector similarity calculation module is used for determining the similarity of the two text sequences according to the two text sequence vectors after the pooling processing.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for training a deep neural network encoder model according to any one of claims 1 to 5 and/or the method for predicting similarity of text sequences according to claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210360834.8A CN114490950B (en) | 2022-04-07 | 2022-04-07 | Method and storage medium for training encoder model, and method and system for predicting similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210360834.8A CN114490950B (en) | 2022-04-07 | 2022-04-07 | Method and storage medium for training encoder model, and method and system for predicting similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114490950A CN114490950A (en) | 2022-05-13 |
CN114490950B true CN114490950B (en) | 2022-07-12 |
Family
ID=81487384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210360834.8A Active CN114490950B (en) | 2022-04-07 | 2022-04-07 | Method and storage medium for training encoder model, and method and system for predicting similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114490950B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114743545B (en) * | 2022-06-14 | 2022-09-02 | 联通(广东)产业互联网有限公司 | Dialect type prediction model training method and device and storage medium |
CN115618950A (en) * | 2022-09-30 | 2023-01-17 | 华为技术有限公司 | Data processing method and related device |
CN115357690B (en) * | 2022-10-19 | 2023-04-07 | 有米科技股份有限公司 | Text repetition removing method and device based on text mode self-supervision |
CN115660871B (en) * | 2022-11-08 | 2023-06-06 | 上海栈略数据技术有限公司 | Unsupervised modeling method for medical clinical process, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347839A (en) * | 2019-07-18 | 2019-10-18 | 湖南数定智能科技有限公司 | A kind of file classification method based on production multi-task learning model |
CN113159945A (en) * | 2021-03-12 | 2021-07-23 | 华东师范大学 | Stock fluctuation prediction method based on multitask self-supervision learning |
CN113705772A (en) * | 2021-07-21 | 2021-11-26 | 浪潮(北京)电子信息产业有限公司 | Model training method, device and equipment and readable storage medium |
CN114003698A (en) * | 2021-12-27 | 2022-02-01 | 成都晓多科技有限公司 | Text retrieval method, system, equipment and storage medium |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11080587B2 (en) * | 2015-02-06 | 2021-08-03 | Deepmind Technologies Limited | Recurrent neural networks for data item generation |
CN108388888B (en) * | 2018-03-23 | 2022-04-05 | 腾讯科技(深圳)有限公司 | Vehicle identification method and device and storage medium |
CN109614471B (en) * | 2018-12-07 | 2021-07-02 | 北京大学 | Open type problem automatic generation method based on generation type countermeasure network |
CN110009013B (en) * | 2019-03-21 | 2021-04-27 | 腾讯科技(深圳)有限公司 | Encoder training and representation information extraction method and device |
US11227179B2 (en) * | 2019-09-27 | 2022-01-18 | Intel Corporation | Video tracking with deep Siamese networks and Bayesian optimization |
CN111144565B (en) * | 2019-12-27 | 2020-10-27 | 中国人民解放军军事科学院国防科技创新研究院 | Self-supervision field self-adaptive deep learning method based on consistency training |
JP7505025B2 (en) * | 2020-04-21 | 2024-06-24 | グーグル エルエルシー | Supervised Contrastive Learning Using Multiple Positive Examples |
CN112149689B (en) * | 2020-09-28 | 2022-12-09 | 上海交通大学 | Unsupervised domain adaptation method and system based on target domain self-supervised learning |
CN112396479B (en) * | 2021-01-20 | 2021-05-25 | 成都晓多科技有限公司 | Clothing matching recommendation method and system based on knowledge graph |
CN113553906B (en) * | 2021-06-16 | 2024-10-29 | 之江实验室 | Discrimination non-supervision cross-domain pedestrian re-identification method based on class center domain alignment |
CN113936647B (en) * | 2021-12-17 | 2022-04-01 | 中国科学院自动化研究所 | Training method of voice recognition model, voice recognition method and system |
-
2022
- 2022-04-07 CN CN202210360834.8A patent/CN114490950B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347839A (en) * | 2019-07-18 | 2019-10-18 | 湖南数定智能科技有限公司 | A kind of file classification method based on production multi-task learning model |
CN113159945A (en) * | 2021-03-12 | 2021-07-23 | 华东师范大学 | Stock fluctuation prediction method based on multitask self-supervision learning |
CN113705772A (en) * | 2021-07-21 | 2021-11-26 | 浪潮(北京)电子信息产业有限公司 | Model training method, device and equipment and readable storage medium |
CN114003698A (en) * | 2021-12-27 | 2022-02-01 | 成都晓多科技有限公司 | Text retrieval method, system, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114490950A (en) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114490950B (en) | Method and storage medium for training encoder model, and method and system for predicting similarity | |
Gu et al. | Stack-captioning: Coarse-to-fine learning for image captioning | |
CN110648659B (en) | Voice recognition and keyword detection device and method based on multitask model | |
CN111930914B (en) | Problem generation method and device, electronic equipment and computer readable storage medium | |
CN111813954B (en) | Method and device for determining relationship between two entities in text statement and electronic equipment | |
CN112632996A (en) | Entity relation triple extraction method based on comparative learning | |
CN110162628A (en) | A kind of content identification method and device | |
CN112101042A (en) | Text emotion recognition method and device, terminal device and storage medium | |
CN113254615A (en) | Text processing method, device, equipment and medium | |
CN115064154A (en) | Method and device for generating mixed language voice recognition model | |
CN115204143A (en) | Method and system for calculating text similarity based on prompt | |
CN110942774A (en) | Man-machine interaction system, and dialogue method, medium and equipment thereof | |
CN115757695A (en) | Log language model training method and system | |
CN113177113B (en) | Task type dialogue model pre-training method, device, equipment and storage medium | |
CN114880991A (en) | Knowledge map question-answer entity linking method, device, equipment and medium | |
CN114925681B (en) | Knowledge graph question-answering question-sentence entity linking method, device, equipment and medium | |
CN114003708B (en) | Automatic question-answering method and device based on artificial intelligence, storage medium and server | |
CN116432660A (en) | Pre-training method and device for emotion analysis model and electronic equipment | |
CN115600584A (en) | Mongolian emotion analysis method combining DRCNN-BiGRU dual channels with GAP | |
CN114333790A (en) | Data processing method, device, equipment, storage medium and program product | |
CN115495579A (en) | Method and device for classifying text of 5G communication assistant, electronic equipment and storage medium | |
CN113761874A (en) | Event reality prediction method and device, electronic equipment and storage medium | |
CN113327581A (en) | Recognition model optimization method and system for improving speech recognition accuracy | |
CN112949313A (en) | Information processing model training method, device, equipment and storage medium | |
CN112015894A (en) | Text single classification method and system based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |