[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114490950B - Method and storage medium for training encoder model, and method and system for predicting similarity - Google Patents

Method and storage medium for training encoder model, and method and system for predicting similarity Download PDF

Info

Publication number
CN114490950B
CN114490950B CN202210360834.8A CN202210360834A CN114490950B CN 114490950 B CN114490950 B CN 114490950B CN 202210360834 A CN202210360834 A CN 202210360834A CN 114490950 B CN114490950 B CN 114490950B
Authority
CN
China
Prior art keywords
neural network
text
encoder model
text sequence
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210360834.8A
Other languages
Chinese (zh)
Other versions
CN114490950A (en
Inventor
肖清
赵文博
李剑锋
许程冲
周丽萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unicom Guangdong Industrial Internet Co Ltd
Original Assignee
China Unicom Guangdong Industrial Internet Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unicom Guangdong Industrial Internet Co Ltd filed Critical China Unicom Guangdong Industrial Internet Co Ltd
Priority to CN202210360834.8A priority Critical patent/CN114490950B/en
Publication of CN114490950A publication Critical patent/CN114490950A/en
Application granted granted Critical
Publication of CN114490950B publication Critical patent/CN114490950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system, comprising the following steps: inputting the two text sequences into the embedding layer to obtain a text sequence vector; inputting two text sequence vectors into a twin neural network encoder model so as to determine a hidden state based on the same neural network parameters; constructing an automatic supervision loss function according to the neural network parameters; inputting the hidden state into a pooling layer to perform pooling according to the hidden state, determining similarity of two text sequences according to the text sequence vectors after the pooling, and constructing a supervision loss function by using the similarity; determining a loss function according to the self-supervision and supervised loss functions to update neural network parameters; the new text sequence continues to be entered until the value of the loss function is at a minimum. The method greatly improves the reasoning bandwidth of the model when the similarity of the text sequences is calculated, and can realize the accurate calculation of the similarity of the two text sequences based on the trained neural network encoder model.

Description

Method and storage medium for training encoder model, and method and system for predicting similarity
Technical Field
The invention relates to the field of text similarity, in particular to a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system.
Background
The text similarity refers to the similarity of two texts, and the application scenes comprise text classification, clustering, text topic detection, topic tracking, machine translation and the like. More specifically, monitoring the call line in the voice communication scene also requires determining the similarity between texts, but the conversation content acquired in the voice communication scene is noisy, mixed with accent and insufficient in information integrity, and in the prior art, whether the conversation content is similar or not needs to be checked manually, which consumes a lot of manpower and time.
Disclosure of Invention
The invention aims to overcome at least one defect of the prior art, and provides a training method and a storage medium of an encoder model, and a similarity prediction method and a similarity prediction system, which are used for solving the problems that in the prior art, manual sampling inspection is relied on when text similarity is determined, the detection coverage is small, and the subjectivity is high.
The technical scheme adopted by the invention comprises the following steps:
in a first aspect, the present invention provides a method for training a deep neural network encoder model, including: performing training operation on two different text sequences; the training operation is as follows: inputting the two text sequences into an embedded layer for vectorization to obtain two text sequence vectors; inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters; meanwhile, constructing an auto-supervision loss function of the neural network encoder model according to the neural network parameters; inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors, and determining similarity of the two text sequences according to the two text sequence vectors after the pooling processing; constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences; determining a loss function of the neural network encoder model according to the self-supervision loss function and the supervision loss function, so that the neural network encoder model updates neural network parameters according to the loss function; and continuing to execute the training operation on the new two different text sequences until the numerical value of the loss function is the minimum value, so as to obtain the trained neural network encoder model.
The invention provides a method for predicting similarity of text sequences, which comprises the steps of inputting two different text sequences into an embedded layer for vectorization to obtain two text sequence vectors; inputting the two text sequence vectors into a twin neural network encoder model obtained by training the deep neural network encoder model by the training method, so that the neural network encoder model outputs the hidden states of the two text sequence vectors; inputting the hidden states of the two text sequence vectors into a pooling layer so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors; and determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
In a third aspect, the present invention provides a system for predicting similarity of text sequences, including: the system comprises a word input module, a word embedding module, a twin neural network encoder model obtained by training the deep neural network encoder model by the training method, a hidden state pooling module and a vector similarity calculation module; the word input module is used for serializing two different text data input from the outside to obtain two different text sequences and outputting the two different text sequences to the word embedding module; the word embedding module is used for vectorizing the two text sequences to obtain two text sequence vectors and outputting the two text sequence vectors to the neural network encoder model; the neural network encoder model is used for determining the hidden states of the two text sequence vectors based on the neural network parameters and outputting the hidden states to a hidden state pooling module; the hidden state pooling module is used for pooling the two text sequence vectors according to the hidden states of the two text sequence vectors and outputting the pooled text sequence vectors to the vector similarity calculation module; and the vector similarity calculation module is used for determining the similarity of the two text sequences according to the two text sequence vectors after the pooling processing.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the above-mentioned method for training a deep neural network encoder model, and/or the above-mentioned method for predicting similarity of text sequences.
Compared with the prior art, the invention has the beneficial effects that:
the training method of the encoder model provided by the embodiment is used for training to obtain a trained twin neural network encoder model, and the twin neural network encoder model shares the same neural network parameter, so that the inference bandwidth of the model in calculating the semantic similarity between text sequences is greatly increased, and the trained neural network encoder model can be used for realizing the accurate calculation of the similarity between two text sequences. Meanwhile, in the training process, the neural network encoder model is trained in a combined manner of self-supervision and supervision, so that the finally updated neural network parameters are beneficial to improving the accuracy of semantic similarity calculation of the neural network encoder model at the semantic level.
Drawings
FIG. 1 is a schematic flow chart of the method steps S110-S180 in example 1.
Fig. 2 is a schematic diagram of a training process of the neural network encoder model according to embodiment 1.
Fig. 3 is a schematic diagram of a hidden state calculation process of the neural network encoder model of embodiment 1.
FIG. 4 is a flowchart illustrating steps S210-S240 of the method of embodiment 2.
Fig. 5 is a schematic diagram of a prediction process of the prediction method of embodiment 2.
Fig. 6 is a schematic diagram of a prediction process of the prediction system of embodiment 3.
Detailed Description
The drawings are only for purposes of illustration and are not to be construed as limiting the invention. For a better understanding of the following embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
Example 1
The embodiment provides a training method of a deep neural network encoder model, which is used for training a twin neural network encoder model, wherein the twin neural network can be composed of two sub-networks or one network in a broad sense, and the key point is that the twin neural networks share the same neural network parameter.
As shown in fig. 1 and 2, the method includes the following steps:
s110, inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;
in this step, the text sequence refers to text data that has been preprocessed so as to satisfy an input format compatible with the embedding layer. In a specific embodiment, the pre-treatment comprises:
carrying out data cleaning on the original text data; reading preset special symbols, stop words and a user dictionary word list, removing the special symbols in the text data, segmenting words of the text sequence by combining the read user dictionary, and removing the stop words in the text data. And converting the text data into a plurality of sub-text sequences, sequencing and splicing the plurality of sub-text sequences according to the length, and cutting according to the preset data size of the training batch to obtain a plurality of text sequences as training data.
The training method provided by the embodiment is used for training a neural network encoder model for calculating the similarity of the text sequences, so that the label is the real similarity between two different text sequences in each group of text sequences. The sets of text sequences that have been selected as input are converted into integer data before entering the embedding layer. In a preferred embodiment, Tokenizer may be employed to convert the text data into integer data.
The embedding layer is used for converting an input text sequence into a vector with a fixed size, specifically mapping the text sequence into a vector space, thereby obtaining text sequence vectors of two text sequences.
S120, inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors;
in this step, after receiving the two text sequence vectors, the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters. The neural network parameters refer to parameters of a neural network encoder model backbone network. The hidden state is a high-dimensional vector obtained by a series of matrix operations and nonlinear transformation in a neural network.
When the neural network encoder model is initialized, video memory space is distributed according to each internal module, pre-trained parameters are loaded, and neural network parameters are read. In a specific embodiment, the neural network coding model may be implemented by a (Bidirectional Encoder) pre-training language model, and when the neural network coding model is initialized, pre-trained BERT parameters are loaded and then the neural network parameters are read.
As shown in fig. 3, in a specific implementation process, the neural network encoder model is composed of N neural network encoder sub-modules, and is used for iteratively calculating the hidden state of the text sequence vector.
A single encoder model submodule in a neural network encoder model receives two text sequence vectors x1And x2And then, firstly determining the hidden state of each text sequence vector, and carrying out layer standardization processing on the obtained hidden state to relieve the problem of gradient explosion in the model training process. Will be passedAnd (4) calculating a residual error module in the hidden state input submodule after the layer-crossing standardization treatment so as to avoid gradient diffusion caused by excessive network layer number of the neural network encoder model. Inputting the hidden state output by the residual module into the full-link layer in the submodule for processing to obtain a corresponding text sequence vector x output by the encoder submodule1Hidden state u of1And corresponding text sequence vector x2Hidden state u of2
N encoder sub-modules are connected in series, each encoder model sub-module calculates the hidden state of the encoder model sub-module relative to the text sequence vector based on the respective internal neural network parameters, the hidden state of the text sequence vector finally output by the sub-module is output to the next encoder model sub-module and serves as the input of the next encoder model sub-module until the last encoder model sub-module outputs the hidden state of the text sequence vector and serves as the hidden state of the text sequence vector output by the final model.
In particular, each encoder model sub-module in the neural network encoder model may be according to the equation:
Figure 177999DEST_PATH_IMAGE001
determining a hidden state of the text sequence vector, wherein,
Figure 222485DEST_PATH_IMAGE002
is a hidden state of the text sequence vector,
Figure 685828DEST_PATH_IMAGE003
in order to be a non-linear activation function,
Figure 233484DEST_PATH_IMAGE004
in order to take care of the change of mechanism,
Figure 465882DEST_PATH_IMAGE005
in order to be a parameter of the neural network,
Figure 706370DEST_PATH_IMAGE006
is the input text sequence vector.
S130, constructing an automatic supervision loss function of a neural network encoder model according to the neural network parameters;
the variables of the self-supervision loss function are neural network parameters of a neural network encoder model, and are used for updating the neural network parameters in a gradient descending mode so as to enable the loss function to reach the minimum value.
In a specific embodiment, the auto-supervised loss function is:
Figure 657009DEST_PATH_IMAGE007
wherein,
Figure 273935DEST_PATH_IMAGE008
the function of the probability density is represented by,
Figure 610107DEST_PATH_IMAGE009
are the parameters of the neural network and are,
Figure 21497DEST_PATH_IMAGE010
to obscure the corresponding parameters of the language model output layer,
Figure 193852DEST_PATH_IMAGE011
the corresponding parameters of the layer are output for the next sentence of the prediction model. The Mask Language Model (MLM) refers to a Model that randomly masks some positions in an input text sequence and then predicts the positions Masked by the text sequence. The next sentence prediction model (NSP) refers to a model for predicting whether two sentences are consecutive two sentences.
Figure 348890DEST_PATH_IMAGE012
In order to obscure the training data set of the language model,
Figure 290301DEST_PATH_IMAGE013
predicting the training number of the model for the next sentenceAccording to the data set, the data of the data set,
Figure 872592DEST_PATH_IMAGE014
and
Figure 781511DEST_PATH_IMAGE015
the words predicted for the mask position and the words for which the position is true for the mask language model,
Figure 740240DEST_PATH_IMAGE016
showing the connection relation between the next sentence prediction model output and the two text sequences before and after the next sentence prediction model output,
Figure 801737DEST_PATH_IMAGE017
representing the true connection relationship with the two text sequences before and after.
S140, inputting the hidden states of the two text sequence vectors output by the neural network encoder model into a pooling layer, so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors;
in this step, after receiving the hidden states of the two text sequence vectors, the pooling layer maps the hidden states to a semantic vector space with a fixed size, so as to obtain semantic vectors of the text sequence vectors in a uniform size, that is, the text sequence vectors after pooling. The fixed size is preset.
S150, determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment;
in this step, the similarity of two text sequences can be determined by using a method of calculating the similarity between two vectors, which is commonly used in the prior art. In particular embodiments, the formula may be utilized
Figure 554929DEST_PATH_IMAGE018
The similarity of two text sequences is determined.
Wherein,
Figure 701877DEST_PATH_IMAGE019
is the degree of similarity of two text sequences,
Figure 198717DEST_PATH_IMAGE020
and
Figure 98409DEST_PATH_IMAGE021
respectively, to represent two text sequences that are,
Figure 288082DEST_PATH_IMAGE022
is the vector product of the two pooled text sequence vectors,
Figure 922326DEST_PATH_IMAGE023
is the product of the moduli of the two pooled text sequence vectors.
S160, constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences;
the supervised loss function is constructed by the similarity and the real similarity of two text sequences determined by a neural network encoder model, the similarity of the two text sequences is calculated based on pooled text sequence vectors, the pooled text sequence vectors are obtained based on a hidden state output by the neural network encoder model, and the hidden state is obtained based on neural network parameters, so that the neural network parameters certainly influence the similarity calculation of the two text sequences.
In a specific embodiment, the supervised loss function is:
Figure 957278DEST_PATH_IMAGE024
wherein,
Figure 727788DEST_PATH_IMAGE025
is composed of
Figure 88362DEST_PATH_IMAGE026
And
Figure 209902DEST_PATH_IMAGE027
the degree of similarity of the real text of (c),
Figure 300742DEST_PATH_IMAGE028
is the number of text sequences that are grabbed each time a training operation is performed.
S170, determining a loss function of the neural network encoder model according to the self-supervision loss function and the supervision loss function, so that the neural network encoder model updates neural network parameters according to the loss function;
in the step, the loss function of the neural network encoder model is constructed by combining the self-supervision loss function and the supervision loss function, namely, the neural network encoder is jointly trained by combining the self-supervision mode and the supervision mode, which is beneficial to obtaining the optimal solution of the neural network parameters. The combination of the auto-supervised and supervised loss functions may be by adding the two or by performing any suitable operation on the two.
In a specific embodiment, the loss function is
Figure 191338DEST_PATH_IMAGE029
. Wherein,
Figure 457234DEST_PATH_IMAGE030
is an auto-supervision loss function;
Figure 800490DEST_PATH_IMAGE031
in order to have a supervised loss function,
Figure 708404DEST_PATH_IMAGE032
for adjusting the hyperparameters of the weights, i.e. by adjusting
Figure 453506DEST_PATH_IMAGE033
Can adjust the supervised loss function and the self-supervisionThe loss function takes the weight of the overall loss function,
Figure 155882DEST_PATH_IMAGE032
and satisfies less than 1.
And S180, judging whether the numerical value of the loss function reaches the minimum value, if not, updating the neural network parameters, and re-executing the step S110 on the new two different text sequences, if so, obtaining the trained neural network encoder model.
Because only one group of two different text sequences are input into the neural network encoder model in the above steps, step S110 needs to be executed again, new text sequences are continuously input into the neural network encoder model to train the neural network encoder model, the neural network parameters of the neural network encoder model are continuously updated in a gradient descending manner in the training process until the numerical value of the loss function is the minimum value, and the training of the neural network encoder model is completed to obtain the trained neural network encoder model.
The training method of the deep neural network encoder model provided by the embodiment is used for training a twin neural network encoder model, the neural network encoder model obtained through training greatly improves the inference bandwidth during semantic similarity calculation between text sequences, and accurate calculation of similarity between two text sequences can be achieved based on the neural network encoder model. Meanwhile, in the training process, a loss function of the neural network encoder model is constructed in a mode of combining self-supervision and supervision to jointly train the neural network encoder model, and finally, the updated neural network parameters are beneficial to improving the accuracy of semantic similarity calculation of the neural network encoder model on the semantic level. Because the neural network encoder model captures context semantic information well, when the neural network encoder model is applied to multi-turn conversation scenes such as communication lines, different conversation scenes can be distinguished more intelligently and automatically, abnormal communication behaviors can be found in time, and the intelligent degree of voice service management is improved.
Example 2
Based on the same concept as that of embodiment 1, this embodiment provides a method for predicting similarity of text sequences, which mainly predicts the similarity of two different text sequences by using a neural network encoder model obtained by training the neural network encoder model provided in the embodiment.
As shown in fig. 3 and 4, the method includes:
s210, inputting two different text sequences into an embedding layer for vectorization to obtain two text sequence vectors;
before this step is performed, two types of text data requiring prediction similarity may be determined, and may be preprocessed by serialization or the like, so that the two types of text data become two types of text sequences and are compatible with the embedding layer, the neural network encoder model, and the pooling layer.
S220, inputting the two text sequence vectors into the trained neural network encoder model so that the neural network encoder model outputs the hidden states of the two text sequence vectors;
after the trained neural network encoder model receives the two text sequence vectors, each encoder model submodule of the neural network encoder model is according to the formula
Figure 235703DEST_PATH_IMAGE034
Determining a hidden state of the text sequence vector, wherein,
Figure 150569DEST_PATH_IMAGE035
is a hidden state of the text sequence vector,
Figure 15757DEST_PATH_IMAGE036
in order to be a non-linear activation function,
Figure 623456DEST_PATH_IMAGE037
in order to take care of the force-mechanism transformation,
Figure 941305DEST_PATH_IMAGE038
in order to be a parameter of the neural network,
Figure 925441DEST_PATH_IMAGE039
is the input text sequence vector.
In a specific implementation process, the neural network encoder model comprises a plurality of neural network encoder model sub-modules, the output of one sub-module is used as the input of the next sub-module in a front-back series connection mode and used for iteratively calculating the hidden state of the text sequence vector, and the last encoder model sub-module outputs the hidden state of the text sequence vector as the hidden state of the text sequence vector output by the final model.
S230, inputting the hidden states of the two text sequence vectors into a pooling layer so that the pooling layer performs pooling treatment on the two text sequence vectors according to the hidden states of the two text sequence vectors;
and after receiving the hidden states of the two text sequence vectors, the pooling layer maps the hidden states of the two text sequences to a semantic vector space with a fixed size to obtain the semantic vectors with a uniform size.
And S240, determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
In this step, the formula is used
Figure 628824DEST_PATH_IMAGE040
The similarity of two text sequences is determined.
Wherein,
Figure 673003DEST_PATH_IMAGE041
is the degree of similarity of two text sequences,
Figure 212569DEST_PATH_IMAGE042
and
Figure 396DEST_PATH_IMAGE043
respectively, to represent two text sequences that are,
Figure 574597DEST_PATH_IMAGE044
is the vector product of the two pooled text sequence vectors,
Figure 773366DEST_PATH_IMAGE045
for two pooling treatmentsThe product of the moduli of the subsequent text sequence vectors.
The twin neural network encoder model obtained by the training method provided in embodiment 1 can realize high accuracy of semantic similarity calculation at a semantic level based on the determined neural network parameters, and when the input text sequence is conversation content supervised in a communication line, the neural network encoder model can automatically distinguish different conversation scenes more intelligently, discover abnormal communication behaviors in time, and improve the intelligent degree of voice service management.
The method for predicting similarity of text sequences provided in this embodiment is based on the same concept as that of embodiment 1, and therefore, the same steps and terms, definitions, explanations, specific/preferred embodiments, and beneficial effects thereof as those in embodiment 1 can be referred to the description in embodiment 1, and are not repeated in this embodiment.
Example 3
Based on the same concept as that in embodiments 1 and 2, the present embodiment provides a text sequence similarity prediction system, which mainly predicts the similarity of two different text sequences by using a neural network encoder model obtained by training through the neural network encoder model training method provided in embodiment 1.
As shown in fig. 5, the system includes: the word input module 310, the word embedding module 320, the neural network encoder model trained by the training method provided in embodiment 1, the hidden state pooling module 330, and the vector similarity calculation module 340.
The word input module 310 is configured to receive two types of text data input from the outside, serialize the two types of text data to obtain two different text sequences, and output the two different text sequences to the word embedding module 320.
The word embedding module 320 is configured to vector the two text sequences, specifically, map the text sequences into a vector space, so as to obtain text sequence vectors of the two text sequences, and output the text sequence vectors to the neural network encoder model. The neural network encoder model is used to determine the hidden states of the two text sequence vectors based on the neural network parameters and output them to the hidden state pooling module 330.
After the trained neural network encoder model receives the two text sequence vectors, each encoder model submodule of the neural network encoder model is according to the formula
Figure 800228DEST_PATH_IMAGE046
Determining a hidden state of the text sequence vector, wherein,
Figure 391747DEST_PATH_IMAGE047
is a hidden state of the text sequence vector,
Figure 820454DEST_PATH_IMAGE048
in order to be a non-linear activation function,
Figure 206436DEST_PATH_IMAGE049
in order to take care of the force-mechanism transformation,
Figure 986173DEST_PATH_IMAGE050
in order to be a parameter of the neural network,
Figure 850224DEST_PATH_IMAGE051
is the input text sequence vector.
In a specific implementation process, the neural network encoder model comprises a plurality of neural network encoder model sub-modules, the output of one sub-module is used as the input of the next sub-module in a front-back series connection mode and used for iteratively calculating the hidden state of the text sequence vector, and the last encoder model sub-module outputs the hidden state of the text sequence vector as the hidden state of the text sequence vector output by the final model.
The hidden state pooling module 330 is configured to pool the two text sequence vectors according to the hidden states of the two text sequence vectors, specifically, map the hidden states of the two text sequences to a semantic vector space with a fixed size to obtain semantic vectors with a uniform size, and output the semantic vectors to the vector similarity calculation module 340 as the text sequence vectors after pooling.
The vector similarity calculation module 340 is configured to determine the similarity between two text sequences according to the two text sequence vectors after the pooling process.
The vector similarity calculation module 340 is particularly useful for utilizing equations
Figure 385635DEST_PATH_IMAGE052
The similarity of two text sequences is determined. Wherein,
Figure 208097DEST_PATH_IMAGE053
the degree of similarity of the two text sequences,
Figure 209551DEST_PATH_IMAGE020
and
Figure 611714DEST_PATH_IMAGE054
respectively, to represent two text sequences that are,
Figure 15013DEST_PATH_IMAGE055
is the vector product of the two pooled text sequence vectors,
Figure 8377DEST_PATH_IMAGE056
is the product of the moduli of the two pooled text sequence vectors.
The similarity prediction system for text sequences provided in this embodiment is based on the same concept as that of embodiments 1 and 2, and therefore, the same steps and terms, definitions, explanations, specific/preferred embodiments, and the beneficial effects thereof as those of embodiments 1 and 2 can be referred to the descriptions in embodiments 1 and 2, and are not repeated in this embodiment.
It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the technical solutions of the present invention, and are not intended to limit the specific embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention claims should be included in the protection scope of the present invention claims.

Claims (8)

1. A training method of a deep neural network encoder model is characterized by comprising the following steps:
performing training operations on two different text sequences;
the training operation is as follows:
inputting the two text sequences into an embedded layer for vectorization to obtain two text sequence vectors;
inputting the two text sequence vectors into a twin neural network encoder model so that the neural network encoder model determines the hidden states of the two text sequence vectors based on the same neural network parameters;
simultaneously constructing an auto-supervision loss function of the neural network encoder model according to the neural network parameters;
the auto-supervision loss function is:
Figure 883592DEST_PATH_IMAGE001
wherein,
Figure 953179DEST_PATH_IMAGE002
the function of the probability density is represented by,
Figure 885363DEST_PATH_IMAGE003
is a parameter of the neural network, and,
Figure 433019DEST_PATH_IMAGE004
and
Figure 399838DEST_PATH_IMAGE005
respectively representing the parameters of the output layer corresponding to the masking language model and the next sentence prediction model,
Figure 437064DEST_PATH_IMAGE006
and
Figure 856544DEST_PATH_IMAGE007
training for masking language model and next sentence prediction model respectivelyThe data set is composed of a plurality of data sets,
Figure 207891DEST_PATH_IMAGE008
and
Figure 763637DEST_PATH_IMAGE009
respectively for the predicted words and the real words of the masked language model,
Figure 673562DEST_PATH_IMAGE010
showing the connection relation between the next sentence prediction model output and the two text sequences before and after the next sentence prediction model output,
Figure 377076DEST_PATH_IMAGE011
representing the real connection relation with the front text sequence and the back text sequence;
inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors, and determining similarity of the two text sequences according to the two text sequence vectors after the pooling processing;
constructing a supervised loss function of the neural network encoder model according to the similarity of the two text sequences;
the supervised loss function is:
Figure 266535DEST_PATH_IMAGE012
wherein,
Figure 942367DEST_PATH_IMAGE013
is composed of
Figure 524658DEST_PATH_IMAGE014
And
Figure 918730DEST_PATH_IMAGE015
the degree of similarity of the real text of (c),
Figure 611879DEST_PATH_IMAGE016
the number of text sequences captured each time a training operation is performed;
determining a loss function of the neural network encoder model according to the self-supervision loss function and the supervision loss function, so that the neural network encoder model updates neural network parameters according to the loss function;
and continuing to execute the training operation on the new two different text sequences until the numerical value of the loss function is the minimum value, so as to obtain a trained neural network encoder model.
2. The method of claim 1, wherein the deep neural network encoder model is a deep neural network encoder model,
determining a loss function of the neural network encoder model according to the auto-supervised loss function and the supervised loss function, specifically comprising: taking the sum of the auto-supervised loss function and the supervised loss function as a loss function of the neural network encoder model.
3. The method of claim 1, wherein the deep neural network encoder model is a deep neural network encoder model,
determining the similarity of the two text sequences according to the two text sequence vectors after the pooling process, specifically comprising: utilizing type
Figure 938955DEST_PATH_IMAGE017
Determining the similarity of the two text sequences;
wherein,
Figure 692148DEST_PATH_IMAGE018
for the similarity of two of said text sequences,
Figure 307937DEST_PATH_IMAGE019
and
Figure 303313DEST_PATH_IMAGE020
respectively, to represent two text sequences that are,
Figure 750474DEST_PATH_IMAGE021
the vector product of the two text sequence vectors after the pooling treatment is obtained;
Figure 408989DEST_PATH_IMAGE022
is the product of the moduli of the two pooled text sequence vectors.
4. The method of claim 1, wherein the loss function is:
Figure 512074DEST_PATH_IMAGE023
wherein,
Figure 547026DEST_PATH_IMAGE024
is the auto-supervision loss function;
Figure 317536DEST_PATH_IMAGE025
for the purpose of the supervised function of loss,
Figure 943689DEST_PATH_IMAGE026
for the hyper-parameters used to adjust the weights of the supervised and the unsupervised loss functions,
Figure 534071DEST_PATH_IMAGE027
less than 1 is satisfied.
5. The method for training the deep neural network encoder model according to claim 1, wherein the neural network encoder model determines hidden states of two text sequence vectors based on the same neural network parameters, and specifically comprises:
the neural network encoder model utilizes
Figure 372714DEST_PATH_IMAGE028
Determining the hidden states of the two text sequence vectors;
wherein,
Figure 732151DEST_PATH_IMAGE029
is a hidden state of the text sequence vector,
Figure 794785DEST_PATH_IMAGE030
in order to be a non-linear activation function,
Figure 370997DEST_PATH_IMAGE031
in order to take care of the change of mechanism,
Figure 747752DEST_PATH_IMAGE032
for the purpose of the neural network parameters,
Figure 227275DEST_PATH_IMAGE033
is the input text sequence vector.
6. A method for predicting similarity of text sequences is characterized in that,
inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;
inputting two text sequence vectors into a twin neural network encoder model obtained by training the deep neural network encoder model according to any one of claims 1-5, so that the neural network encoder model outputs the hidden states of the two text sequence vectors;
inputting the hidden states of the two text sequence vectors into a pooling layer, so that the pooling layer performs pooling processing on the two text sequence vectors according to the hidden states of the two text sequence vectors;
and determining the similarity of the two text sequences according to the two text sequence vectors after the pooling treatment.
7. A system for predicting similarity of text sequences, comprising: the device comprises a word input module, a word embedding module, a twin neural network encoder model obtained by training the deep neural network encoder model according to any one of claims 1-5, a hidden state pooling module and a vector similarity calculation module;
the word input module is used for serializing two different text data input from the outside to obtain two different text sequences and outputting the two different text sequences to the word embedding module;
the word embedding module is used for vectorizing the two text sequences to obtain two text sequence vectors and outputting the two text sequence vectors to the neural network encoder model;
the neural network encoder model is used for determining the hidden states of the two text sequence vectors based on the neural network parameters and outputting the hidden states to a hidden state pooling module;
the hidden state pooling module is used for pooling the two text sequence vectors according to the hidden states of the two text sequence vectors and outputting the pooled text sequence vectors to the vector similarity calculation module;
and the vector similarity calculation module is used for determining the similarity of the two text sequences according to the two text sequence vectors after the pooling processing.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for training a deep neural network encoder model according to any one of claims 1 to 5 and/or the method for predicting similarity of text sequences according to claim 6.
CN202210360834.8A 2022-04-07 2022-04-07 Method and storage medium for training encoder model, and method and system for predicting similarity Active CN114490950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210360834.8A CN114490950B (en) 2022-04-07 2022-04-07 Method and storage medium for training encoder model, and method and system for predicting similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210360834.8A CN114490950B (en) 2022-04-07 2022-04-07 Method and storage medium for training encoder model, and method and system for predicting similarity

Publications (2)

Publication Number Publication Date
CN114490950A CN114490950A (en) 2022-05-13
CN114490950B true CN114490950B (en) 2022-07-12

Family

ID=81487384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210360834.8A Active CN114490950B (en) 2022-04-07 2022-04-07 Method and storage medium for training encoder model, and method and system for predicting similarity

Country Status (1)

Country Link
CN (1) CN114490950B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743545B (en) * 2022-06-14 2022-09-02 联通(广东)产业互联网有限公司 Dialect type prediction model training method and device and storage medium
CN115618950A (en) * 2022-09-30 2023-01-17 华为技术有限公司 Data processing method and related device
CN115357690B (en) * 2022-10-19 2023-04-07 有米科技股份有限公司 Text repetition removing method and device based on text mode self-supervision
CN115660871B (en) * 2022-11-08 2023-06-06 上海栈略数据技术有限公司 Unsupervised modeling method for medical clinical process, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347839A (en) * 2019-07-18 2019-10-18 湖南数定智能科技有限公司 A kind of file classification method based on production multi-task learning model
CN113159945A (en) * 2021-03-12 2021-07-23 华东师范大学 Stock fluctuation prediction method based on multitask self-supervision learning
CN113705772A (en) * 2021-07-21 2021-11-26 浪潮(北京)电子信息产业有限公司 Model training method, device and equipment and readable storage medium
CN114003698A (en) * 2021-12-27 2022-02-01 成都晓多科技有限公司 Text retrieval method, system, equipment and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11080587B2 (en) * 2015-02-06 2021-08-03 Deepmind Technologies Limited Recurrent neural networks for data item generation
CN108388888B (en) * 2018-03-23 2022-04-05 腾讯科技(深圳)有限公司 Vehicle identification method and device and storage medium
CN109614471B (en) * 2018-12-07 2021-07-02 北京大学 Open type problem automatic generation method based on generation type countermeasure network
CN110009013B (en) * 2019-03-21 2021-04-27 腾讯科技(深圳)有限公司 Encoder training and representation information extraction method and device
US11227179B2 (en) * 2019-09-27 2022-01-18 Intel Corporation Video tracking with deep Siamese networks and Bayesian optimization
CN111144565B (en) * 2019-12-27 2020-10-27 中国人民解放军军事科学院国防科技创新研究院 Self-supervision field self-adaptive deep learning method based on consistency training
JP7505025B2 (en) * 2020-04-21 2024-06-24 グーグル エルエルシー Supervised Contrastive Learning Using Multiple Positive Examples
CN112149689B (en) * 2020-09-28 2022-12-09 上海交通大学 Unsupervised domain adaptation method and system based on target domain self-supervised learning
CN112396479B (en) * 2021-01-20 2021-05-25 成都晓多科技有限公司 Clothing matching recommendation method and system based on knowledge graph
CN113553906B (en) * 2021-06-16 2024-10-29 之江实验室 Discrimination non-supervision cross-domain pedestrian re-identification method based on class center domain alignment
CN113936647B (en) * 2021-12-17 2022-04-01 中国科学院自动化研究所 Training method of voice recognition model, voice recognition method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347839A (en) * 2019-07-18 2019-10-18 湖南数定智能科技有限公司 A kind of file classification method based on production multi-task learning model
CN113159945A (en) * 2021-03-12 2021-07-23 华东师范大学 Stock fluctuation prediction method based on multitask self-supervision learning
CN113705772A (en) * 2021-07-21 2021-11-26 浪潮(北京)电子信息产业有限公司 Model training method, device and equipment and readable storage medium
CN114003698A (en) * 2021-12-27 2022-02-01 成都晓多科技有限公司 Text retrieval method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN114490950A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN114490950B (en) Method and storage medium for training encoder model, and method and system for predicting similarity
Gu et al. Stack-captioning: Coarse-to-fine learning for image captioning
CN110648659B (en) Voice recognition and keyword detection device and method based on multitask model
CN111930914B (en) Problem generation method and device, electronic equipment and computer readable storage medium
CN111813954B (en) Method and device for determining relationship between two entities in text statement and electronic equipment
CN112632996A (en) Entity relation triple extraction method based on comparative learning
CN110162628A (en) A kind of content identification method and device
CN112101042A (en) Text emotion recognition method and device, terminal device and storage medium
CN113254615A (en) Text processing method, device, equipment and medium
CN115064154A (en) Method and device for generating mixed language voice recognition model
CN115204143A (en) Method and system for calculating text similarity based on prompt
CN110942774A (en) Man-machine interaction system, and dialogue method, medium and equipment thereof
CN115757695A (en) Log language model training method and system
CN113177113B (en) Task type dialogue model pre-training method, device, equipment and storage medium
CN114880991A (en) Knowledge map question-answer entity linking method, device, equipment and medium
CN114925681B (en) Knowledge graph question-answering question-sentence entity linking method, device, equipment and medium
CN114003708B (en) Automatic question-answering method and device based on artificial intelligence, storage medium and server
CN116432660A (en) Pre-training method and device for emotion analysis model and electronic equipment
CN115600584A (en) Mongolian emotion analysis method combining DRCNN-BiGRU dual channels with GAP
CN114333790A (en) Data processing method, device, equipment, storage medium and program product
CN115495579A (en) Method and device for classifying text of 5G communication assistant, electronic equipment and storage medium
CN113761874A (en) Event reality prediction method and device, electronic equipment and storage medium
CN113327581A (en) Recognition model optimization method and system for improving speech recognition accuracy
CN112949313A (en) Information processing model training method, device, equipment and storage medium
CN112015894A (en) Text single classification method and system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant