WO2020237869A1 - 一种问题意图识别方法、装置、计算机设备及存储介质 - Google Patents
一种问题意图识别方法、装置、计算机设备及存储介质 Download PDFInfo
- Publication number
- WO2020237869A1 WO2020237869A1 PCT/CN2019/102922 CN2019102922W WO2020237869A1 WO 2020237869 A1 WO2020237869 A1 WO 2020237869A1 CN 2019102922 W CN2019102922 W CN 2019102922W WO 2020237869 A1 WO2020237869 A1 WO 2020237869A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vector
- question
- target
- sample
- intention
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
Definitions
- This application relates to the field of deep learning technology, and in particular to a method, device, computer equipment, and storage medium for identifying problem intentions.
- voice assistants In recent years, the demand for the use of voice assistants has gradually increased. Users can get feedback answers by asking questions to the voice assistant to help users solve their questions. However, existing voice assistants can only search for answers from the literal meaning of the user's question, and often the answer is not what is asked, which is difficult to meet the needs of the user.
- the embodiments of the present application provide a method, a device, a computer device, and a storage medium for identifying problem intentions, so as to solve the problem of difficulty in understanding the real intention of a user's problem with existing technical means.
- a method for identifying problem intentions which is characterized in that it includes:
- the target problem vector is input as input to a pre-trained attention-based deep learning model, and each problem probability value corresponding to each preset user intention output by the deep learning model is obtained, and each problem probability value Each represents the probability that the target question text belongs to the preset user intention corresponding to the probability value of each question;
- the preset user intention with the largest problem probability value is selected from the preset user intentions as the real problem intention of the user.
- a problem intention recognition device which is characterized in that it comprises:
- Question voice acquisition module for obtaining the target question voice asked by the user
- the voice-to-word module is used to perform voice-to-word processing on the target question voice to obtain the target question text;
- the text vectorization module is used to vectorize the target question text to obtain the target question vector
- the problem recognition module is used to input the target problem vector as an input to the pre-trained attention-based deep learning model, and obtain the probability values of each problem output by the deep learning model that correspond to each preset user intention.
- Each question probability value respectively represents the probability that the target question text belongs to the preset user intention corresponding to the each question probability value;
- the true intention selection module is used to select the preset user intention with the largest problem probability value from the preset user intentions as the real question intention of the user.
- a computer device including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor implements the above-mentioned problem intention identification method when the processor executes the computer-readable instructions A step of.
- One or more readable storage media storing computer readable instructions, and the computer readable storage medium storing computer readable instructions to cause the one or more processors to execute the steps of the above-mentioned problem intention identification method.
- FIG. 1 is a schematic diagram of an application environment of the problem intention identification method in an embodiment of the present application
- FIG. 2 is a flowchart of a problem intention identification method in an embodiment of the present application
- step 103 is a schematic flowchart of step 103 of the method for identifying problem intentions in an embodiment of the present application in an application scenario
- FIG. 4 is a schematic diagram of a process of pre-training a deep learning model in an application scenario of the problem intention recognition method in an embodiment of the present application;
- step 104 of the method for identifying problem intentions in an embodiment of the present application in an application scenario
- FIG. 6 is a schematic flow diagram of the question intention recognition method in an embodiment of the present application providing answers to users in an application scenario
- FIG. 7 is a schematic structural diagram of a problem intention identification device in an application scenario in an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of a problem intention identification device in another application scenario in an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a problem identification module in an embodiment of the present application.
- Fig. 10 is a schematic diagram of a computer device in an embodiment of the present application.
- the problem intention identification method provided in this application can be applied in the application environment as shown in Fig. 1, where the client communicates with the server through the network.
- the client can be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
- the server can be implemented as an independent server or a server cluster composed of multiple servers.
- a method for identifying problem intentions is provided, and the method is applied to the server in FIG. 1 as an example for description, including the following steps:
- the server may obtain the voice of the target question asked by the user according to actual use needs or application scenarios.
- the server can communicate with the client, the client provides users in a certain place to ask questions, the user inputs voice through the microphone of the client, and the client uploads the voice to the server, so that the voice obtained by the server is the target Question voice.
- the server can also perform the task of recognizing the user's intentions for a large number of speech recordings.
- a database collects a large number of speech recordings from the user's questions in advance, and then transmits these speech recordings to the server via the network, so that the server obtains these words
- the technical recording is the voice of the target question asked by the user.
- the server can also obtain the voice of the target question asked by the user in a variety of ways, which will not be repeated here.
- target question voice in this embodiment generally refers to the voice data collected when the user asks a question.
- the server can convert the target question voice into text to obtain the target question text.
- the server may use ASR (Automatic Speech Recognition) automatic speech recognition technology to recognize the target question voice, complete the speech-to-word process, and obtain the target question text.
- ASR Automatic Speech Recognition
- the server After obtaining the target question text, in order to facilitate the subsequent recognition of the deep learning model, the server needs to vectorize the target question text, that is, the target question text is converted into a vector to express, thereby obtaining the target question vector.
- the server may record the target question text in the form of a data matrix. In the data matrix, each word in the target question text is mapped to a row vector in the data matrix.
- step 103 may include:
- the full name of GloVe is Global Vectors for Word Representation. It is an existing word representation tool based on global word frequency statistics (count-based&overall statistics). It can express a word as a real number. Vector.
- the server uses GloVe to convert each target word in the target question text into a word vector, thereby obtaining an initial question vector.
- step 202 considering that the user's question may contain proper nouns, such as name, location, etc., these proper nouns are difficult to be fully covered by GloVe. Therefore, the server can determine whether each target word is covered by the GloVe word vector. If all target words in the target question text are covered, step 203 can be performed to directly determine the initial question vector as the target Question vector; on the contrary, if each target word in the target question text is not fully covered, the subsequent steps 204 and 205 need to be performed.
- step 203 it can be known from the above content that if each of the target words is covered by the GloVe word vector, the server can determine that the initial question vector is the target question vector.
- the server can convert the target words that are not covered by the GloVe word vector into a TransE word vector to obtain a supplementary vector.
- TransE also known as the knowledge base method, is an existing algorithm model for effective learning of proper nouns, which can convert learned words into distributed vector representations.
- the server can use TransE to perform vector conversion on the target words not covered by the GloVe word vector to obtain a supplementary vector.
- step 205 it can be known that after obtaining the supplementary vector, the server can use the supplementary vector to add to the initial question vector to fill in the lack of the initial question vector, thereby obtaining the target question vector corresponding to the target question text.
- the target question text is "Does Huaweing eat?”
- the sentence includes three target words: “Xiaoming", “dinner” and “?”.
- the server uses GloVe to convert “dining” and "?” into word vectors, which are [1234] and [1235], respectively.
- the server uses TransE to convert it into a word vector, and obtains [1236].
- the target problem vector can be expressed in the form of a one-dimensional vector, namely [1236 ], [1234] and [1235] are merged into [123612341235] as the target problem vector, which can also be expressed in the form of a two-dimensional vector, that is, [1236], [1234] and [1235] are respectively used as rows of a two-dimensional vector Vector, the target vector is:
- each question probability value represents the probability that the target question text belongs to the preset user intention corresponding to the each question probability value. It is understandable that the probability values of these questions correspond to each preset user intention one-to-one. When the probability value of a certain question is greater, it means that the question asked by the user belongs to the preset user intention corresponding to the probability value of the question. high.
- the deep learning model includes four parts: a first two-layer recurrent neural network, a Key-Value memory network, a second two-layer recurrent neural network, and a content similarity calculation network.
- the deep learning The model is pre-trained through the following steps:
- each sample question vector For each sample question vector, set a label value for each preset user intent for each sample question vector, to obtain each label value of each sample question vector, wherein The tag value of the preset user intention corresponding to the question vector is the largest;
- each of the context feature vectors into the Key-Value memory network for iterative calculation until the preset number of iterations or model convergence is reached, and then output each context feature vector after iterative calculation, the Key-Value Participate in the iterative calculation of each Key value in the memory network, the vector corresponding to the subject and predicate in each sample question text;
- each sample result vector calculates the similarity between each sample result vector and each intention vector through a content similarity calculation network, and obtain each similarity value corresponding to each sample result vector, as a sample A probability value, where each of the intention vectors refers to a vector value of each preset user intention after vectorization;
- each sample probability value corresponding to each sample result vector is an adjustment target, and adjust the first two-layer cyclic neural network, the Key-Value memory network, the second two-layer cyclic neural network, and the The content similarity calculates the network parameters of the network to minimize the error between each sample probability value corresponding to each sample result vector and each target label value, and each target label value refers to each sample result Each label value of the sample question vector corresponding to the vector;
- the staff can set various preset user intentions that need to be trained on the server in advance, for example, they can include “hello”, “hope to hang up”, “euphemistically reject”, etc. Intentions.
- the staff also needs to collect their corresponding user skills in specific application scenarios, that is, the user's voice as a sample question voice, such as the question voice actually consulted by the user.
- the server may collect sample question voices that belong to the intention of each preset user through professional consulting platforms, online customer service and other channels.
- sample question voice corresponding to each preset user intent should reach a certain order of magnitude, and the number of sample question voices between each preset user intent can have a certain gap, but it should not be too far apart to avoid affecting the depth
- the training effect of learning model are: the number of sample question voices corresponding to "Hello” is 1 million, the number of sample question voices corresponding to "Hang up” is 200,000, and the sample corresponding to "euphemistic rejection” The number of question voices is 300,000.
- step 302 in the same way as step 102, the server can perform phonetic-to-word processing on the collected sample question voices to obtain sample question texts, which will not be repeated here.
- the server may also perform vectorization processing on the sample question text to obtain a sample question vector.
- step 304 it is understandable that before training, the sample question vector needs to be marked.
- Tag value For example, suppose there are 3 preset user intentions, namely "Hello”, “Hope to hang up” and “Euphemistically reject”.
- sample question vector No. 1 because this sample question vector The true intent of the corresponding sample question text is "Say hello”, then the tag value of "Hello” for sample question vector No. 1 is set to 1, and the tag values of "Hang up” and “euphemistic rejection” are set to 0;
- sample question vector 2 since the true intention of the sample question text corresponding to the sample question vector is "hope to hang up", the tag value of sample question vector 2 to "wish to hang up” is set to 1.
- the label values of "" and “euphemistic rejection” are both set to 0; in the same way, all 1 million sample question vectors are set to label values for each preset user intention, that is, the sample labeling work before training is completed.
- the label value of the preset user intention corresponding to the sample question vector is recorded as 1, and the label value of other preset user intentions is recorded as 0.
- the label value of the preset user intention corresponding to the sample question vector can also be recorded as 0.9, and the label value of other preset user intentions can be recorded as 0.8, 0.7, 0.6, etc., as long as it is smaller than 0.9 to ensure the sample problem
- the label value of the preset user intention corresponding to the vector is the largest among all the label values.
- the deep learning model is provided with a first two-layer cyclic neural network as an encoder, and a second two-layer cyclic neural network as a decoder, where the first two-layer cyclic neural network is Two-tier RNN.
- step 305 first, input the sample problem vector into the first-layer RNN for convolution calculation, and calculate the preliminary feature representation (vector) by reasonably setting the convolution kernel of the first-layer RNN, and then the preliminary feature representation Then input the second layer of RNN for convolution calculation to complete the feature traversal of the preliminary feature representation, and the calculated feature representation result can be considered as the context feature vector corresponding to the sample problem vector. Since the convolution calculation process of step 305 has completed the feature extraction and encoding of the sample problem vector, the first two-layer recurrent neural network can be considered as the encoder in the deep learning model.
- the subject, predicate, and object of each sample question vector are stored in advance as each Key-Value pair in the Key-Value memory network.
- the Key value is composed of partial vectors corresponding to the subject and the predicate in the sample question vector, expressed as (subject, predicate)
- the Value value is composed of partial vectors corresponding to the object in the sample question vector, shown as (object).
- the subject, predicate, and object mentioned above are all expressed in the form of vectors, specifically referring to the word vectors of the corresponding words of the subject, predicate, and object in a sample question text.
- the subject is the vector corresponding to the word “Xiaoming” [1236]
- the predicate is the vector corresponding to the word "eat” [1234]
- the key value of the sample question text It can be expressed as ([1236],[1234]).
- the server calculates the similarity between each context feature vector and the Key value of each Key-Value pair in the Key-Value memory network, and stores the calculated similarity value as each The attention weight between the context feature vector and each Key value, and then use the attention weight to update each context feature vector, and iterate repeatedly until the preset number of iterations is reached or the model converges, and the server outputs The context feature vector after multiple iterations of calculation.
- q j represents the context feature vector during the jth iterative calculation
- q j+1 represents the context feature vector during the j+1th iterative calculation
- sequence number of each Key-Value pair of i is also equal to the sample problem vector
- ⁇ K is the key value
- a and R j are the network parameters of the Key-Value memory network
- the Softmax function is the normalized exponential function, specifically
- the preset number of iterations in this embodiment can be specifically set according to actual conditions, for example, it can be set to 10 times.
- the existing loss function Loss function
- the mutual entropy loss function Softmax Loss
- Softmax Loss the mutual entropy loss function
- step 307 it is understandable that after the server cannot execute 306, it obtains the context feature vectors after iterative calculation. At this time, these context feature vectors need to be input into the second two-layer recurrent neural network for decoding.
- the second two-layer cyclic neural network is similar in principle to the above-mentioned first two-layer cyclic neural network. It is also equipped with two layers of RNN, and each layer of RNN performs a reasonable set of convolution kernels on each of the contextual feature vectors. Convolution calculation, complete the decoding operation of each context feature vector, and finally obtain each sample result vector. It is understandable that the dimension and size of the sample result vector obtained by the server are consistent with the dimension and size of the aforementioned sample problem vector.
- step 308 various preset user intentions that need to be trained are preset on the server, such as "say hello”, “hope to hang up”, “euphemistically reject” and other intentions.
- the server needs to vectorize each preset user intent to obtain each intent vector.
- the server can calculate the similarity between each sample result vector and each intention vector through the content similarity calculation network for each sample result vector to obtain the corresponding result vector of each sample
- Each similarity value of, and the similarity value here is regarded as the probability of the correspondence between the output result and the preset user intention, and recorded as the sample probability value.
- the similarity between each sample result vector and each intent vector can be calculated by the following formula, that is, the content similarity calculation network can be expressed by the following formula:
- ⁇ and b are vectors
- W and V are matrices
- ⁇ , b, W, and V are the network parameters of the content similarity calculation network
- ⁇ i, m represents the sample result vector s output by the second two-layer recurrent neural network
- the similarity value between m-1 and the intention vector h m that is, the sample probability value
- m is the serial number of the intention vector
- M is the number of each intention vector, that is, a total of M preset user intentions.
- step 309 it can be understood that in the process of training the deep learning model, the parameters of the deep learning model need to be adjusted.
- the first two-layer recurrent neural network and the Key- The network parameters of the Value memory network, the second two-layer recurrent neural network, and the content similarity calculation network such as the aforementioned ⁇ , b, W, V, A, and R j , etc.
- the final output result of the deep learning model can be affected, so that the error between each sample probability value corresponding to each sample result vector and each target label value is minimized.
- step 310 in the process of adjusting the above network parameters, it can be determined that the error between each sample probability value corresponding to each sample result vector and each target mark value satisfies the preset training termination condition.
- Each network parameter in the deep learning model has been adjusted in place, and it can be determined that the deep learning model has been trained; otherwise, if it is not satisfied, it means that the deep learning model still needs to be trained.
- the training termination condition can be preset according to actual usage. Specifically, the training termination condition can be set as follows: if the error between each sample probability value corresponding to each sample result vector and each target mark value is equal If it is less than the specified error value, it is considered to meet the preset training termination condition. Or, it can also be set to: use the sample question speech in the verification set to perform the above steps 301-309. If the error between the sample probability value output by the deep learning model and the label value is within a certain range, it is considered to meet the prediction Set the training termination condition.
- the collection of sample question voices in the verification set is similar to the above step 301.
- the above step 301 can be performed to collect sample question voices of each preset user's intention, and then a certain proportion of the collected sample question voices are divided As the training set, the remaining sample question speech is divided into the verification set. For example, 80% of the collected sample question speech can be randomly divided into the training set samples for the subsequent training of the deep learning model, and the other 20% can be divided into the subsequent verification of whether the deep learning model has been trained, that is, whether the preset training is satisfied A sample of the validation set for the termination condition.
- the deep learning model includes four parts: the first two-layer cyclic neural network, the Key-Value memory network, the second two-layer cyclic neural network, and the content similarity calculation network.
- Step 104 can be include:
- Target context feature vector into the Key-Value memory network for iterative calculation until a preset number of iterations or model convergence is reached, and then output the target context feature vector after iterative calculation;
- the server first inputs the target question vector into the first two-layer recurrent neural network for encoding to obtain a target context feature vector; then, inputs the target context feature vector into the Key- Iterative calculations are performed in the Value memory network until the preset number of iterations or model convergence is reached, and then the target context feature vector after iterative calculation is output; then, the target context feature vector after iterative calculation is input into the second two-layer
- the recurrent neural network decodes to obtain the target result vector; finally, the server calculates the similarity between the target result vector and each intent vector through the content similarity calculation network, and obtains each similarity value corresponding to the target result vector , As the probability value of each question.
- the probability value of each question obtained here represents the probability that the target question voice asked by the user belongs to each preset user intention.
- the server After the server obtains each question probability value output by the deep learning model and corresponding to each preset user intention, since each question probability value represents the target question text belonging to each question probability value corresponding Therefore, the server can select the preset user intention with the largest problem probability value from the preset user intentions.
- the preset user intention selected here is the most significant among all preset user intentions. It may be the real intention of the user to ask, so it is determined as the real intention of the user.
- the server determines the user’s real question intention, it can also set multiple question answer tuples in advance, select the answer corresponding to the real question intention from these question answer tuples and provide it to the user, and then directly Solve user questions.
- the method may further include:
- each question answer tuple is composed of a preset user intention and an answer corresponding to the preset user intention, wherein the preset of each question answer tuple User intentions vary;
- each question answer tuple can be preset on the server, and each question answer tuple is composed of a preset user intention and an answer corresponding to the preset user intention, for example, a certain The question answer tuple is (hello; thank you, how about you?), where "hello” is the default user intention in the tuple, and "thank you, how about you?" is the answer in the tuple.
- the preset user intentions of the various question answer tuples on the server are different, that is, one question answer is set for each preset user intention Tuple.
- the answers in these question answer tuples can be manually filtered and uniquely processed in advance, and the most suitable answer for the preset user’s intentions is set in Question answer tuple.
- the most suitable answer for the preset user’s intentions is set in Question answer tuple.
- 5.2k preset user intentions and 30.8k answers can be preset.
- 330 of them are kept meaningful for this application scenario.
- 642 of them were artificially selected for setting up each question answer tuple.
- the server can select a question answer whose preset user intention is the same as the real question intention from each of the question answer tuples A tuple is used as a hit tuple, and then the answer to the hit tuple is fed back to the user to complete the answer to the user's question.
- the target question voice asked by the user is obtained; then, the target question voice is transliterated into words to obtain the target question text; then, the target question text is vectorized to obtain the target Question vector; further, input the target question vector as an input to a pre-trained attention-based deep learning model, and obtain each question probability value output by the deep learning model that corresponds to each preset user intention ,
- Each question probability value represents the probability that the target question text belongs to the preset user intention corresponding to the probability value of each question; finally, the highest question probability value is selected from the preset user intentions
- the user intention is preset as the real question intention of the user.
- this application can accurately recognize the user’s true intention from the voice of the target question asked by the user through the attention-based deep learning model, which improves the accuracy of intention recognition.
- voice recognition situations such as voice assistants, it can Greatly reduce the occurrence of non-asked questions.
- a problem intention identification device is provided, and the problem intention identification device corresponds to the problem intention identification method in the above-mentioned embodiment one-to-one.
- the question intention recognition device includes a question voice acquisition module 601, a phonetic conversion module 602, a text vectorization module 603, a question recognition module 604, and a real intention selection module 605.
- the detailed description of each functional module is as follows:
- the question voice obtaining module 601 is used to obtain the target question voice asked by the user;
- the phonetic-to-word module 602 is configured to perform phonetic-to-word processing on the target question speech to obtain the target question text;
- the text vectorization module 603 is configured to perform vectorization processing on the target question text to obtain a target question vector
- the question recognition module 604 is configured to input the target question vector as an input to a pre-trained attention-based deep learning model, and obtain each question probability output by the deep learning model that corresponds to each preset user intention.
- Each question probability value represents the probability that the target question text belongs to the preset user intention corresponding to each question probability value;
- the real intention selection module 605 is configured to select the preset user intention with the largest problem probability value from the various preset user intentions as the real problem intention of the user.
- the deep learning model includes four parts: a first two-layer recurrent neural network, a Key-Value memory network, a second two-layer recurrent neural network, and a content similarity calculation network.
- the deep learning The model can be pre-trained through the following modules:
- the sample collection module 606 is configured to collect sample question voices belonging to the intention of each preset user
- the sample voice-to-word module 607 is used to perform voice-to-word processing on the collected sample question voices to obtain sample question texts;
- the sample vectorization module 608 is configured to perform vectorization processing on the sample question text to obtain a sample question vector
- the sample labeling module 609 is used to set a label value for each preset user intent for each sample question vector for each sample question vector, to obtain each label value of each sample question vector, where and The tag value of the preset user intention corresponding to each sample question vector is the largest;
- the vector encoding module 610 is configured to input all the sample problem vectors into the first two-layer cyclic neural network for encoding, and obtain the context feature vector corresponding to each sample problem vector;
- the iterative calculation module 611 is used to put each of the context feature vectors into the Key-Value memory network for iterative calculation until the preset number of iterations or model convergence is reached, and then output each context feature vector after iterative calculation, Participating in the iterative calculation of each Key value in the Key-Value memory network is a vector corresponding to the subject and predicate in each sample question text;
- the vector decoding module 612 is configured to input each context feature vector after iterative calculation into the second two-layer cyclic neural network for decoding, and obtain each sample result vector;
- the similarity value calculation module 613 is configured to calculate the similarity between each sample result vector and each intention vector through a content similarity calculation network for each sample result vector, and obtain the corresponding result vector of each sample.
- Each similarity value is used as a sample probability value, and each intent vector refers to a vector value after vectorization of each preset user intent;
- the network parameter adjustment module 614 is configured to adjust the first two-layer recurrent neural network, the Key-Value memory network, and the second double
- the network parameters of the layered recurrent neural network and the content similarity calculation network are used to minimize the error between each sample probability value corresponding to each sample result vector and each target mark value, where each target mark value refers to Each label value of the sample question vector corresponding to each sample result vector;
- the training completion determining module 615 is configured to determine that the deep learning model has been trained if the error between each sample probability value corresponding to each sample result vector and each target mark value meets a preset training termination condition.
- the problem identification module 604 may include:
- An encoding unit 6041 configured to input the target question vector into the first two-layer recurrent neural network for encoding to obtain a target context feature vector
- the vector iterative calculation unit 6042 is configured to input the target context feature vector into the Key-Value memory network for iterative calculation until the preset number of iterations or model convergence is reached, and then output the target context feature after iterative calculation vector;
- the decoding unit 6043 is configured to input the iteratively calculated target context feature vector into the second two-layer recurrent neural network for decoding to obtain a target result vector;
- the similarity calculation unit 6044 is configured to calculate the similarity between the target result vector and each intent vector through the content similarity calculation network, and obtain each similarity value corresponding to the target result vector as each problem probability value .
- the text vectorization module may include:
- the first conversion unit is used to convert each target word in the target question text into a GloVe word vector to obtain an initial question vector;
- the word judgment unit is used to judge whether each target word is covered by the GloVe word vector
- a question vector determining unit configured to determine that the initial question vector is a target question vector if the judgment result of the word judgment unit is yes;
- the second conversion unit is configured to convert the target word not covered by the GloVe word vector into a TransE word vector if the judgment result of the word judgment unit is no to obtain a supplementary vector;
- a vector adding unit is used to add the supplementary vector to the initial question vector to obtain a target question vector.
- the problem intention recognition device may further include:
- the answer tuple obtaining module is used to obtain each preset question answer tuple, each question answer tuple is composed of a preset user intention and an answer corresponding to the preset user intention, wherein each question The preset user intentions of the answer tuples are different;
- the answer tuple selection module is used to select a question answer tuple whose preset user intention is the same as the real question intention from each of the question answer tuples, as a hit tuple;
- the answer feedback module is used to feed back the answer of the hit tuple to the user.
- the various modules in the above-mentioned problem intention identification device can be implemented in whole or in part by software, hardware, and combinations thereof.
- the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
- a computer device is provided.
- the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10.
- the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
- the memory of the computer device includes a readable storage medium and an internal memory.
- the readable storage medium stores an operating system, computer readable instructions, and a database.
- the internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium.
- the database of the computer equipment is used to store the data involved in the problem intention identification method.
- the network interface of the computer device is used to communicate with an external terminal through a network connection.
- the computer-readable instructions are executed by the processor to realize a problem intention identification method.
- the readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
- a computer device including a memory, a processor, and computer-readable instructions stored on the memory and capable of running on the processor.
- the processor executes the computer-readable instructions, the problem in the foregoing embodiment is realized.
- the steps of the intention recognition method for example, step 101 to step 105 shown in FIG. 2.
- the processor executes the computer-readable instruction, the function of each module/unit of the problem intention identification device in the above embodiment is realized, for example, the function of the module 601 to the module 605 shown in FIG. 7. To avoid repetition, I won’t repeat them here.
- a computer-readable storage medium In one embodiment, a computer-readable storage medium is provided.
- the one or more computer-readable storage media store computer-readable instructions.
- the computer-readable instructions When the computer-readable instructions are executed by one or more processors, one or more When multiple processors execute computer readable instructions, the steps of the problem intention identification method in the above method embodiments are implemented, or the one or more readable storage media storing computer readable instructions, the computer readable instructions are executed by one or more When executed by the two processors, one or more processors execute computer-readable instructions to implement the functions of each module/unit in the problem intention identification device in the foregoing device embodiment. To avoid repetition, I won’t repeat them here.
- the readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
- the computer-readable instructions can be stored in a computer-readable storage. In the medium, when the computer-readable instructions are executed, they may include the processes of the foregoing method embodiments.
- any reference to memory, storage, database or other media used in the embodiments provided in this application may include and/or volatile memory.
- the memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory may include random access memory (RAM) or external cache memory.
- RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
- SRAM static RAM
- DRAM dynamic RAM
- SDRAM synchronous DRAM
- DDRSDRAM double data rate SDRAM
- ESDRAM enhanced SDRAM
- SLDRAM synchronous chain Channel
- memory bus Radbus direct RAM
- RDRAM direct memory bus dynamic RAM
- RDRAM memory bus dynamic RAM
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
本申请公开了一种问题意图识别方法、装置、计算机设备及存储介质,应用于深度学习技术领域,用于解决现有技术手段难以理解用户问题的真实意图的问题。本申请提供的方法包括:获取用户提问的目标问题语音;对目标问题语音进行音转字处理,得到目标问题文本;对目标问题文本进行向量化处理,得到目标问题向量;将目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值,每个问题概率值各自表征了目标问题文本属于与每个问题概率值对应的预设用户意图的概率;从各个预设用户意图中选取出问题概率值最大的预设用户意图,作为用户的真实问题意图。
Description
本申请以2019年05月31日提交的申请号为201910467185.X,名称为“一种问题意图识别方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
本申请涉及深度学习技术领域,尤其涉及一种问题意图识别方法、装置、计算机设备及存储介质。
近年来,语音助手的使用需求逐渐增加,用户可以通过向语音助手提问得到反馈的答案,帮助用户解决疑问。然而,现有语音助手仅能从用户问题的字面意思出发来搜索答案,常常出现答非所问的情况,难以满足用户的需求。
因此,寻找一种能够准确理解用户问题意图的方法成为本领域技术人员亟需解决的问题。
发明内容
本申请实施例提供一种问题意图识别方法、装置、计算机设备及存储介质,以解决现有技术手段难以理解用户问题的真实意图的问题。
一种问题意图识别方法,其特征在于,包括:
获取用户提问的目标问题语音;
对所述目标问题语音进行音转字处理,得到目标问题文本;
对所述目标问题文本进行向量化处理,得到目标问题向量;
将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值,每个问题概率值各自表征了所述目标问题文本属于与所述每个问题概率值对应的预设用户意图的概率;
从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图。
一种问题意图识别装置,其特征在于,包括:
问题语音获取模块,用于获取用户提问的目标问题语音;
音转字模块,用于对所述目标问题语音进行音转字处理,得到目标问题文本;
文本向量化模块,用于对所述目标问题文本进行向量化处理,得到目标问题向量;
问题识别模块,用于将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值,每个问题概率值各自表征了所述目标问题文本属于与所述每个问题概率值对应的预设用户意图的概率;
真实意图选取模块,用于从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现上述问题意图识别方法的步骤。
一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读存储介质存储 有计算机可读指令,使得所述一个或多个处理器执行上述问题意图识别方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中问题意图识别方法的一应用环境示意图;
图2是本申请一实施例中问题意图识别方法的一流程图;
图3是本申请一实施例中问题意图识别方法步骤103在一个应用场景下的流程示意图;
图4是本申请一实施例中问题意图识别方法在一个应用场景下预先训练深度学习模型的流程示意图;
图5是本申请一实施例中问题意图识别方法步骤104在一个应用场景下的流程示意图;
图6是本申请一实施例中问题意图识别方法在一个应用场景下提供答案给用户的流程示意图;
图7是本申请一实施例中问题意图识别装置在一个应用场景下的结构示意图;
图8是本申请一实施例中问题意图识别装置在另一个应用场景下的结构示意图;
图9是本申请一实施例中问题识别模块的结构示意图;
图10是本申请一实施例中计算机设备的一示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的问题意图识别方法,可应用在如图1的应用环境中,其中,客户端通过网络与服务器进行通信。其中,该客户端可以但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一实施例中,如图2所示,提供一种问题意图识别方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:
101、获取用户提问的目标问题语音;
本实施例中,服务器可以根据实际使用的需要或者应用场景的需要获取用户提问的目标问题语音。例如,服务器可以与客户端通信连接,该客户端提供给某场所内的用户咨询问题,用户通过客户端的麦克风输入语音,客户端将该语音上传给服务器,从而服务器获取到的该语音即为目标问题语音。或者,服务器也可以执行对大批量的话术录音识别用户意图的任务,某数据库预先收集大量的来自用户提问的话术录音,然后通过网络将这些话术录音传输给服务器,从而服务器获取到的这些话术录音即为用户提问的目标问题语音。
可以理解的是,服务器还可以通过多种方式获取到用户提问的目标问题语音,对此不再过多赘述。
需要说明的是,本实施例中所说的目标问题语音一般是指用户提问时采集的声音数 据。
102、对所述目标问题语音进行音转字处理,得到目标问题文本;
容易理解的是,服务器在获取到目标问题语音之后,可以将该目标问题语音转换为文字,即可得到目标问题文本。比如,服务器可以采用ASR(Automatic Speech Recognition)自动语音识别技术对该目标问题语音进行识别,完成音转字处理,得到该目标问题文本。
103、对所述目标问题文本进行向量化处理,得到目标问题向量;
在得到目标问题文本之后,为了便于后续深度学习模型的识别,服务器需要对该目标问题文本进行向量化处理,即将目标问题文本转化为向量的方式表示,从而得到目标问题向量。具体地,服务器可以将目标问题文本以数据矩阵的形式记载,在数据矩阵中,目标问题文本中的每个字词映射为该数据矩阵中的一个行向量。
为便于理解,如图3所示,进一步地,步骤103可以包括:
201、将所述目标问题文本中的各个目标字词分别转换为GloVe(Global Vectors for Word Representation)词向量,得到初始问题向量;
202、判断所述各个目标字词是否均被GloVe词向量覆盖,若是,则执行步骤203,若否,则执行步骤204;
203、确定所述初始问题向量为目标问题向量;
204、将未被GloVe词向量覆盖的目标字词转换为TransE词向量,得到补充向量;
205、将所述补充向量添加至所述初始问题向量,得到目标问题向量。
对于上述步骤201,GloVe的全称叫Global Vectors for Word Representation,它是现有一个基于全局词频统计(count-based&overall statistics)的词表征(word representation)工具,它可以把一个单词表达成一个由实数组成的向量。本实施例中,服务器使用GloVe将所述目标问题文本中的各个目标字词分别转换成词向量,从而得到初始问题向量。
对于上述步骤202,考虑到用户提出的问题中可能包含有专有名词,例如姓名、地点等,这些专有名词难以被GloVe全覆盖。因此,服务器可以判断所述各个目标字词是否均被GloVe词向量覆盖,如果该目标问题文本中的各个目标字词均已被覆盖,则可以执行步骤203,直接确定所述初始问题向量为目标问题向量;反之,如果该目标问题文本中的各个目标字词未被全覆盖,则需要执行后续步骤204和步骤205。
对于上述步骤203,由上述内容可知,若所述各个目标字词均被GloVe词向量覆盖,则服务器可以确定所述初始问题向量为目标问题向量。
对于上述步骤204,若所述各个目标字词中任一个目标字词未被GloVe词向量覆盖,则可知该目标问题文本中存在无法被GloVe词向量覆盖的目标字词,为了补充这一部分的缺失,服务器可以将未被GloVe词向量覆盖的目标字词转换为TransE词向量,得到补充向量。
需要说明的,TransE,又称知识库方法,是一种现有的有效学习专有名词的算法模型,可以将学习到的字词转换为分布式向量表示。本实施例中,服务器可以采用TransE将未被GloVe词向量覆盖的目标字词进行向量转换,得到补充向量。
对于步骤205,可知,服务器在得到补充向量之后,可以使用该补充向量添加至该初始问题向量中,以填补初始问题向量的缺失,从而得到该目标问题文本对应的目标问题向量。举例说明,假设该目标问题文本为“小明吃饭吗”,该句子中包括“小明”、“吃饭”和“吗”三个目标字词。服务器使用GloVe将“吃饭”和“吗”转换为词向量,分别为[1234]和[1235],对于“小明”一词,服务器使用TransE将其转换为词向量,得到[1236],再将[1236]添加至[1234]和[1235]中,得到该目标问题向量为[1236]、[1234]和[1235],其中,该目标问题向量可以以一维向量的形式表达,即[1236]、[1234]和[1235]合并为[123612341235]作为该目标问题向量,也可以以二维向量的形式表达,即[1236]、[1234]和[1235]分别作为一个二维向 量的行向量,得到目标向量为:
104、将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值,每个问题概率值各自表征了所述目标问题文本属于与所述每个问题概率值对应的预设用户意图的概率;
可以理解的是,服务器在得到目标问题向量之后,可以将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值。其中,每个问题概率值各自表征了所述目标问题文本属于与所述每个问题概率值对应的预设用户意图的概率。可以理解的是,这些问题概率值与各个预设用户意图一一对应,当某个问题概率值越大,则代表了用户提问的问题属于该问题概率值对应的预设用户意图的可能性越高。
为便于理解,下面将对基于注意力的深度学习模型的训练过程进行详细描述。如图4所示,进一步地,所述深度学习模型包括第一双层循环神经网络、Key-Value记忆网络、第二双层循环神经网络和内容相似度计算网络四个部分,所述深度学习模型通过以下步骤预先训练好:
301、收集属于所述各个预设用户意图的样本问题语音;
302、对收集到的样本问题语音分别进行音转字处理,得到样本问题文本;
303、对所述样本问题文本进行向量化处理,得到样本问题向量;
304、针对每个样本问题向量,为所述每个样本问题向量分别针对各个预设用户意图设定标记值,得到所述每个样本问题向量的各个标记值,其中,与所述每个样本问题向量对应的预设用户意图的标记值最大;
305、将所有所述样本问题向量分别输入第一双层循环神经网络进行编码,得到各个样本问题向量各自对应的语境特征向量;
306、将各个所述语境特征向量分别投入Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的各个语境特征向量,所述Key-Value记忆网络中参与迭代计算各个Key值为各个样本问题文本中主语和谓语对应的向量;
307、将迭代计算后的各个语境特征向量分别输入第二双层循环神经网络进行解码,得到各个样本结果向量;
308、针对每个样本结果向量,通过内容相似度计算网络计算所述每个样本结果向量与各个意图向量之间的相似度,得到所述每个样本结果向量对应的各个相似度值,作为样本概率值,所述各个意图向量是指所述各个预设用户意图向量化后的向量值;
309、以所述每个样本结果向量对应的各个样本概率值为调整目标,调整所述第一双层循环神经网络、所述Key-Value记忆网络、所述第二双层循环神经网络和所述内容相似度计算网络的网络参数,以最小化所述每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差,所述各个目标标记值是指所述每个样本结果向量对应的样本问题向量的各个标记值;
310、若每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差满足预设的训练终止条件,则确定所述深度学习模型已训练好。
对于步骤301,本实施例中,针对实际应用场景,工作人员可以预先在服务器上设置好需要训练的各个预设用户意图,例如可以包括“问好”、“希望挂断”、“委婉拒绝”等意图,针对这些预设用户意图,工作人员还需要在具体应用场景下收集各自对应的用户话术,即用户的语音作为样本问题语音,比如用户实际咨询的问题语音。在收集样本问题语音时, 服务器可以通过专业咨询平台、网络客服等渠道收集属于各个预设用户意图的样本问题语音。需要说明的是,每个预设用户意图对应的样本问题语音应当达到一定的数量级,各个预设用户意图之间样本问题语音的数量可以有一定差距,但不应相差过远,避免影响对深度学习模型的训练效果。例如,可以收集到的样本问题语音为:“问好”对应的样本问题语音的数量为100万条,“希望挂断”对应的样本问题语音的数量为20万条,“委婉拒绝”对应的样本问题语音的数量为30万条。
对于步骤302,与上述步骤102同理,服务器可以对收集到的样本问题语音分别进行音转字处理,得到样本问题文本,此处不再赘述。
对于步骤303,与上述步骤103同理,服务器也可以对所述样本问题文本进行向量化处理,得到样本问题向量。
对于步骤304,可以理解的是,在训练之前,需要对样本问题向量进行标记,本实施例中由于需要针对多个预设用户意图进行训练,因此应当针对不同的预设用户意图分别进行设定标记值。举例说明,假设共3个预设用户意图,分别为“问好”、“希望挂断”和“委婉拒绝”,假设共100万个样本问题向量,针对1号样本问题向量,由于该样本问题向量对应的样本问题文本的真实意图为“问好”,则将1号样本问题向量对“问好”的标记值设为1,对“希望挂断”和“委婉拒绝”的标记值均设为0;针对2号样本问题向量,由于该样本问题向量对应的样本问题文本的真实意图为“希望挂断”,则将2号样本问题向量对“希望挂断”的标记值设为1,对“问好”和“委婉拒绝”的标记值均设为0;同理将所有100万个样本问题向量均分别针对各个预设用户意图设定标记值,即完成训练前的样本标注工作。
需要说明的是,上述举例中将样本问题向量对应的预设用户意图的标记值记为1,其它的预设用户意图的标记值记为0,这只是其中一种标记值的设定方式。比如,也可以将样本问题向量对应的预设用户意图的标记值记为0.9,其它的预设用户意图的标记值记为0.8、0.7、0.6等等,只要比0.9小即可,保证样本问题向量对应的预设用户意图的标记值在所有标记值中最大即可。
对于步骤305,本实施例中,该深度学习模型中设有第一双层循环神经网络作为编码器,第二双层循环神经网络作为解码器,其中,该第一双层循环神经网络即为两层的RNN。步骤305中,首先,将样本问题向量输入到第一层RNN中进行卷积计算,通过合理设置第一层RNN的卷积核,计算出初步的特征表示(向量),然后将初步的特征表示再输入第二层RNN做卷积计算,完成对该初步的特征表示的特征遍历,计算得到的特征表示结果可以认为是该样本问题向量对应的语境特征向量。步骤305的卷积计算过程由于完成了对样本问题向量的特征提取和编码,因此可以认为该第一双层循环神经网络为深度学习模型中的编码器。
对于步骤306,本实施例中,预先将各个样本问题向量的主语、谓语和宾语存储为Key-Value记忆网络中的各个Key-Value对。其中,Key值由样本问题向量中主语和谓语对应的部分向量组成,表示为(主语,谓语),Value值由样本问题向量中宾语对应的部分向量组成,示为(宾语)。需要说明的是,上述所说的主语、谓语和宾语均以向量的形式表示,具体是指一个样本问题文本中主语、谓语和宾语对应字词的词向量。例如,对于“小明吃饭吗”这一样本问题文本,主语为“小明”一词对应的向量[1236],谓语为“吃饭”一词对应的向量[1234],从而该样本问题文本的Key值可以表示为([1236],[1234])。
具体地,步骤306中,服务器是将每个语境特征向量与Key-Value记忆网络中各个Key-Value对的Key值进行相似度计算,并将计算得到的相似度值存储为所述每个语境特征向量与各个Key值之间的注意力权重,然后利用该注意力权重来更新所述每个语境特征向量,如此反复迭代,直到达到预设的迭代次数或模型收敛,服务器再输出经过多次迭代计算后的语境特征向量。
为便于理解,上述过程可以通过下述公式一进行表达:
其中,q
j表示第j次迭代计算时语境特征向量,q
j+1表示第j+1次迭代计算时语境特征向量,i各个Key-Value对序号,也等于所述各个样本问题向量的数量,Φ
K为表示Key值,A和R
j为Key-Value记忆力网络的网络参数,Softmax函数为归一化指数函数,具体为
需要说明的是,本实施例中预设的迭代次数具体可以根据实际情况设定,比如可以设定为10次。而关于Key-Value记忆网络何时达到模型收敛的判断,本实施例中可以采用现有的损失函数(Loss function)来进行判定,比如可以采用互熵损失函数(Softmax Loss)来实现对Key-Value记忆网络的模型收敛判断,此处不再展开描述。
对于步骤307,可以理解的是,服务器执行不住306之后,得到迭代计算后的各个语境特征向量,此时,还需要将这些语境特征向量输入到第二双层循环神经网络中进行解码。本实施例中,该第二双层循环神经网络与上述的第一双层循环神经网络原理类似,同样是设置两层RNN,每层RNN通过合理设置卷积核对所述各个语境特征向量进行卷积计算,完成对各个语境特征向量的解码操作,从而最后得到各个样本结果向量。可以理解的是,服务器得到的样本结果向量的维度、尺寸与上述样本问题向量的维度、尺寸一致。
对于步骤308,服务器上预设了需要训练的各个预设用户意图,比如“问好”、“希望挂断”、“委婉拒绝”等意图,在训练时,为了衡量这些预设用户意图与样本结果向量之间的相似程度,服务器需要将各个预设用户意图向量化,得到各个意图向量。服务器在得到各个样本结果向量之后,可以针对每个样本结果向量,通过内容相似度计算网络计算所述每个样本结果向量与各个意图向量之间的相似度,得到所述每个样本结果向量对应的各个相似度值,并将这里的相似度值看作输出结果与预设用户意图之间的对应的可能性,记为样本概率值。
具体地,步骤308可以通过如下公式计算所述每个样本结果向量与各个意图向量之间的相似度,即该内容相似度计算网络可以通过以下公式进行表达:
exp(e
i,m)=ω
Τtanh(Ws
m-1+Vh
m+b)
其中,ω和b均为向量,W和V为矩阵,ω、b、W、V为内容相似度计算网络的网络参数,α
i,m表示第二双层循环神经网络输出的样本结果向量s
m-1与意图向量h
m之间的相似度值,也即样本概率值,m为意图向量的序号,M为所述各个意图向量的数量,即共M个预设用户意图。
对于步骤309,可以理解的是,在训练深度学习模型的过程中,需要调整该深度学习模型的参数,具体在本实施例中,即调整所述第一双层循环神经网络、所述Key-Value记忆网络、所述第二双层循环神经网络和所述内容相似度计算网络的网络参数,比如上述的 ω、b、W、V、A和R
j,等等。通过调整这些网络参数可以影响该深度学习模型最终的输出结果,使得每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差最小化。
对于上述步骤310,在调节上述网络参数的过程中,可以判断每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差满足预设的训练终止条件,若满足,则说明该深度学习模型中的各个网络参数已经调整到位,可以确定该深度学习模型已训练完成;反之,若不满足,则说明该深度学习模型还需要继续训练。
其中,该训练终止条件可以根据实际使用情况预先设定,具体地,可以将该训练终止条件设定为:若每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差均小于指定误差值,则认为其满足该预设的训练终止条件。或者,也可以将其设为:使用验证集中的样本问题语音执行上述步骤301-309,若深度学习模型输出的样本概率值与标记值之间的误差在一定范围内,则认为其满足该预设的训练终止条件。其中,该验证集中的样本问题语音的收集与上述步骤301类似,具体地,可以执行上述步骤301收集得到各个预设用户意图的样本问题语音后,将收集得到的样本问题语音中的一定比例划分为训练集,剩余的样本问题语音划分为验证集。比如,可以将收集得到的样本问题语音中随机划分80%作为后续训练深度学习模型的训练集的样本,将其它的20%划分为后续验证深度学习模型是否训练完成,也即是否满足预设训练终止条件的验证集的样本。
上面描述了基于注意力的深度学习模型的预先训练过程,为便于理解,下面承接上述训练过程的内容,详细描述一下使用该深度学习模型在实际使用中对目标问题向量的识别过程。如图5所示,更进一步地,所述深度学习模型包括第一双层循环神经网络、Key-Value记忆网络、第二双层循环神经网络和内容相似度计算网络四个部分,步骤104可以包括:
401、将所述目标问题向量输入所述第一双层循环神经网络进行编码,得到目标语境特征向量;
402、将所述目标语境特征向量投入所述Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的目标语境特征向量;
403、将迭代计算后的目标语境特征向量输入所述第二双层循环神经网络进行解码,得到目标结果向量;
404、通过所述内容相似度计算网络计算所述目标结果向量与各个意图向量之间的相似度,得到所述目标结果向量对应的各个相似度值,作为各个问题概率值。
上述步骤401-404与上述步骤305-308的原理类似,此处不再过多赘述。
在步骤401-404中,服务器先将所述目标问题向量输入所述第一双层循环神经网络进行编码,得到目标语境特征向量;然后,将所述目标语境特征向量投入所述Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的目标语境特征向量;接着,将迭代计算后的目标语境特征向量输入所述第二双层循环神经网络进行解码,得到目标结果向量;最后,服务器通过所述内容相似度计算网络计算所述目标结果向量与各个意图向量之间的相似度,得到所述目标结果向量对应的各个相似度值,作为各个问题概率值。可知,这里得到的各个问题概率值表征了用户提问的目标问题语音分别属于各个预设用户意图的可能性大小,某个问题概率值越大,则表示该问题概率值对应的预设用户意图越有可能是用户本次提供的真实意图。
105、从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图。
服务器在得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值之后,由于每个问题概率值各自表征了所述目标问题文本属于与所述每个问题概率值对应的预设用户意图的概率,因此,服务器可以从所述各个预设用户意图中选取出问题概 率值最大的预设用户意图,这里选取出的预设用户意图在所有预设用户意图中最有可能是用户提问的真实意图,因此将其确定为所述用户的真实问题意图。
本实施例中,服务器在确定出用户的真实问题意图之后,还可以通过预先设置多个问题答案元组,从这些问题答案元组中选取出与真实问题意图对应的答案提供给用户,进而直接解决用户的提问。如图6所示,进一步地,在步骤105之后,本方法还可以包括:
501、获取预设的各个问题答案元组,每个问题答案元组由一个预设用户意图和与所述一个预设用户意图对应的答案组成,其中,所述各个问题答案元组的预设用户意图各不相同;
502、从所述各个问题答案元组中选取出预设用户意图与所述真实问题意图相同的一个问题答案元组,作为命中元组;
503、将所述命中元组的答案反馈至所述用户。
对于步骤501,可以理解的是,服务器上可以预先设置各个问题答案元组,每个问题答案元组由一个预设用户意图和与所述一个预设用户意图对应的答案组成,比如,某个问题答案元组为(问好;谢谢,你呢?),其中,“问好”为该元组中的预设用户意图,“谢谢,你呢?”为该元组中的答案。并且,为了让所述各个问题答案元组覆盖所有预设用户意图,因此服务器上的所述各个问题答案元组的预设用户意图各不相同,即针对每个预设用户意图设置一个问题答案元组。
需要说明的是,为了让问题答案元组中的答案更加准确且具有代表性,这些问题答案元组中的答案可以预先经过人工筛选和唯一处理,将针对预设用户意图的最合适答案设置在问题答案元组中。比如,在一个实际应用场景下,可以预设5.2k个预设用户意图和30.8k个答案。在这5.2k预设用户意图中,仅仅保留其中的330个对本应用场景有意义的数据。相似地,30.8k答案中,人为挑选其中642个答案用作设定各个问题答案元组。最后,经过对答案的唯一处理和筛选,可以大约有1k个问题答案元组,覆盖了1k个用户可能提问的问题的真实意图并给出准确的答案。
对于步骤502和步骤503,容易理解的是,服务器在确定出用户的真实问题意图之后,可以从所述各个问题答案元组中选取出预设用户意图与所述真实问题意图相同的一个问题答案元组,作为命中元组,然后将所述命中元组的答案反馈至所述用户,即可完成对用户提问的解答。
本申请实施例中,首先,获取用户提问的目标问题语音;然后,对所述目标问题语音进行音转字处理,得到目标问题文本;接着,对所述目标问题文本进行向量化处理,得到目标问题向量;再之,将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值,每个问题概率值各自表征了所述目标问题文本属于与所述每个问题概率值对应的预设用户意图的概率;最后,从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图。可见,本申请通过基于注意力的深度学习模型可以准确地从用户提问的目标问题语音出发识别出用户的真实意图,提升了意图识别的准确性,当应用于语音助手等语音识别情境时,可以大大减少答非所问的情况出现。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种问题意图识别装置,该问题意图识别装置与上述实施例中问题意图识别方法一一对应。如图7所示,该问题意图识别装置包括问题语音获取模块601、音转字模块602、文本向量化模块603、问题识别模块604和真实意图选取模块605。各功能模块详细说明如下:
问题语音获取模块601,用于获取用户提问的目标问题语音;
音转字模块602,用于对所述目标问题语音进行音转字处理,得到目标问题文本;
文本向量化模块603,用于对所述目标问题文本进行向量化处理,得到目标问题向量;
问题识别模块604,用于将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值,每个问题概率值各自表征了所述目标问题文本属于与所述每个问题概率值对应的预设用户意图的概率;
真实意图选取模块605,用于从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图。
如图8所示,进一步地,所述深度学习模型包括第一双层循环神经网络、Key-Value记忆网络、第二双层循环神经网络和内容相似度计算网络四个部分,所述深度学习模型可以通过以下模块预先训练好:
样本收集模块606,用于收集属于所述各个预设用户意图的样本问题语音;
样本音转字模块607,用于对收集到的样本问题语音分别进行音转字处理,得到样本问题文本;
样本向量化模块608,用于对所述样本问题文本进行向量化处理,得到样本问题向量;
样本标记模块609,用于针对每个样本问题向量,为所述每个样本问题向量分别针对各个预设用户意图设定标记值,得到所述每个样本问题向量的各个标记值,其中,与所述每个样本问题向量对应的预设用户意图的标记值最大;
向量编码模块610,用于将所有所述样本问题向量分别输入第一双层循环神经网络进行编码,得到各个样本问题向量各自对应的语境特征向量;
迭代计算模块611,用于将各个所述语境特征向量分别投入Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的各个语境特征向量,所述Key-Value记忆网络中参与迭代计算各个Key值为各个样本问题文本中主语和谓语对应的向量;
向量解码模块612,用于将迭代计算后的各个语境特征向量分别输入第二双层循环神经网络进行解码,得到各个样本结果向量;
相似度值计算模块613,用于针对每个样本结果向量,通过内容相似度计算网络计算所述每个样本结果向量与各个意图向量之间的相似度,得到所述每个样本结果向量对应的各个相似度值,作为样本概率值,所述各个意图向量是指所述各个预设用户意图向量化后的向量值;
网络参数调整模块614,用于以所述每个样本结果向量对应的各个样本概率值为调整目标,调整所述第一双层循环神经网络、所述Key-Value记忆网络、所述第二双层循环神经网络和所述内容相似度计算网络的网络参数,以最小化所述每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差,所述各个目标标记值是指所述每个样本结果向量对应的样本问题向量的各个标记值;
训练完成确定模块615,用于若每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差满足预设的训练终止条件,则确定所述深度学习模型已训练好。
如图9所示,进一步地,所述问题识别模块604可以包括:
编码单元6041,用于将所述目标问题向量输入所述第一双层循环神经网络进行编码,得到目标语境特征向量;
向量迭代计算单元6042,用于将所述目标语境特征向量投入所述Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的目标语境特征向量;
解码单元6043,用于将迭代计算后的目标语境特征向量输入所述第二双层循环神经 网络进行解码,得到目标结果向量;
相似度计算单元6044,用于通过所述内容相似度计算网络计算所述目标结果向量与各个意图向量之间的相似度,得到所述目标结果向量对应的各个相似度值,作为各个问题概率值。
进一步地,所述文本向量化模块可以包括:
第一转换单元,用于将所述目标问题文本中的各个目标字词分别转换为GloVe词向量,得到初始问题向量;
字词判断单元,用于判断所述各个目标字词是否均被GloVe词向量覆盖;
问题向量确定单元,用于若所述字词判断单元的判断结果为是,则确定所述初始问题向量为目标问题向量;
第二转换单元,用于若所述字词判断单元的判断结果为否,则将未被GloVe词向量覆盖的目标字词转换为TransE词向量,得到补充向量;
向量添加单元,用于将所述补充向量添加至所述初始问题向量,得到目标问题向量。
进一步地,所述问题意图识别装置还可以包括:
答案元组获取模块,用于获取预设的各个问题答案元组,每个问题答案元组由一个预设用户意图和与所述一个预设用户意图对应的答案组成,其中,所述各个问题答案元组的预设用户意图各不相同;
答案元组选取模块,用于从所述各个问题答案元组中选取出预设用户意图与所述真实问题意图相同的一个问题答案元组,作为命中元组;
答案反馈模块,用于将所述命中元组的答案反馈至所述用户。
关于问题意图识别装置的具体限定可以参见上文中对于问题意图识别方法的限定,在此不再赘述。上述问题意图识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储问题意图识别方法中涉及到的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种问题意图识别方法。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中问题意图识别方法的步骤,例如图2所示的步骤101至步骤105。或者,处理器执行计算机可读指令时实现上述实施例中问题意图识别装置的各模块/单元的功能,例如图7所示模块601至模块605的功能。为避免重复,这里不再赘述。
在一个实施例中,提供了一种计算机可读存储介质,该一个或多个存储有计算机可读指令的可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行计算机可读指令时实现上述方法实施例中问题意图识别方法的步骤,或者,该一个或多个存储有计算机可读指令的可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行计算机可读指令时实现上述装置实施例中问题意图识别装 置中各模块/单元的功能。为避免重复,这里不再赘述。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括和/或易失性存储器。存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。
Claims (20)
- 一种问题意图识别方法,其特征在于,包括:获取用户提问的目标问题语音;对所述目标问题语音进行音转字处理,得到目标问题文本;对所述目标问题文本进行向量化处理,得到目标问题向量;将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值,每个问题概率值各自表征了所述目标问题文本属于与所述每个问题概率值对应的预设用户意图的概率;从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图。
- 根据权利要求1所述的问题意图识别方法,其特征在于,所述深度学习模型包括第一双层循环神经网络、Key-Value记忆网络、第二双层循环神经网络和内容相似度计算网络四个部分,所述深度学习模型通过以下步骤预先训练好:收集属于所述各个预设用户意图的样本问题语音;对收集到的样本问题语音分别进行音转字处理,得到样本问题文本;对所述样本问题文本进行向量化处理,得到样本问题向量;针对每个样本问题向量,为所述每个样本问题向量分别针对各个预设用户意图设定标记值,得到所述每个样本问题向量的各个标记值,其中,与所述每个样本问题向量对应的预设用户意图的标记值最大;将所有所述样本问题向量分别输入第一双层循环神经网络进行编码,得到各个样本问题向量各自对应的语境特征向量;将各个所述语境特征向量分别投入Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的各个语境特征向量,所述Key-Value记忆网络中参与迭代计算各个Key值为各个样本问题文本中主语和谓语对应的向量;将迭代计算后的各个语境特征向量分别输入第二双层循环神经网络进行解码,得到各个样本结果向量;针对每个样本结果向量,通过内容相似度计算网络计算所述每个样本结果向量与各个意图向量之间的相似度,得到所述每个样本结果向量对应的各个相似度值,作为样本概率值,所述各个意图向量是指所述各个预设用户意图向量化后的向量值;以所述每个样本结果向量对应的各个样本概率值为调整目标,调整所述第一双层循环神经网络、所述Key-Value记忆网络、所述第二双层循环神经网络和所述内容相似度计算网络的网络参数,以最小化所述每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差,所述各个目标标记值是指所述每个样本结果向量对应的样本问题向量的各个标记值;若每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差满足预设的训练终止条件,则确定所述深度学习模型已训练好。
- 根据权利要求2所述的问题意图识别方法,其特征在于,所述将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值包括:将所述目标问题向量输入所述第一双层循环神经网络进行编码,得到目标语境特征向量;将所述目标语境特征向量投入所述Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的目标语境特征向量;将迭代计算后的目标语境特征向量输入所述第二双层循环神经网络进行解码,得到目标结果向量;通过所述内容相似度计算网络计算所述目标结果向量与各个意图向量之间的相似度,得到所述目标结果向量对应的各个相似度值,作为各个问题概率值。
- 根据权利要求1所述的问题意图识别方法,其特征在于,所述对所述目标问题文本进行向量化处理,得到目标问题向量包括:将所述目标问题文本中的各个目标字词分别转换为GloVe词向量,得到初始问题向量;判断所述各个目标字词是否均被GloVe词向量覆盖;若所述各个目标字词均被GloVe词向量覆盖,则确定所述初始问题向量为目标问题向量;若所述各个目标字词中任一个目标字词未被GloVe词向量覆盖,则将未被GloVe词向量覆盖的目标字词转换为TransE词向量,得到补充向量;将所述补充向量添加至所述初始问题向量,得到目标问题向量。
- 根据权利要求1至4中任一项所述的问题意图识别方法,其特征在于,在从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图之后,还包括:获取预设的各个问题答案元组,每个问题答案元组由一个预设用户意图和与所述一个预设用户意图对应的答案组成,其中,所述各个问题答案元组的预设用户意图各不相同;从所述各个问题答案元组中选取出预设用户意图与所述真实问题意图相同的一个问题答案元组,作为命中元组;将所述命中元组的答案反馈至所述用户。
- 一种问题意图识别装置,其特征在于,包括:问题语音获取模块,用于获取用户提问的目标问题语音;音转字模块,用于对所述目标问题语音进行音转字处理,得到目标问题文本;文本向量化模块,用于对所述目标问题文本进行向量化处理,得到目标问题向量;问题识别模块,用于将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值,每个问题概率值各自表征了所述目标问题文本属于与所述每个问题概率值对应的预设用户意图的概率;真实意图选取模块,用于从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图。
- 根据权利要求6所述的问题意图识别装置,其特征在于,所述深度学习模型包括第一双层循环神经网络、Key-Value记忆网络、第二双层循环神经网络和内容相似度计算网络四个部分,所述深度学习模型通过以下模块预先训练好:样本收集模块,用于收集属于所述各个预设用户意图的样本问题语音;样本音转字模块,用于对收集到的样本问题语音分别进行音转字处理,得到样本问题文本;样本向量化模块,用于对所述样本问题文本进行向量化处理,得到样本问题向量;样本标记模块,用于针对每个样本问题向量,为所述每个样本问题向量分别针对各个预设用户意图设定标记值,得到所述每个样本问题向量的各个标记值,其中,与所述每个样本问题向量对应的预设用户意图的标记值最大;向量编码模块,用于将所有所述样本问题向量分别输入第一双层循环神经网络进行编码,得到各个样本问题向量各自对应的语境特征向量;迭代计算模块,用于将各个所述语境特征向量分别投入Key-Value记忆网络中进行迭 代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的各个语境特征向量,所述Key-Value记忆网络中参与迭代计算各个Key值为各个样本问题文本中主语和谓语对应的向量;向量解码模块,用于将迭代计算后的各个语境特征向量分别输入第二双层循环神经网络进行解码,得到各个样本结果向量;相似度值计算模块,用于针对每个样本结果向量,通过内容相似度计算网络计算所述每个样本结果向量与各个意图向量之间的相似度,得到所述每个样本结果向量对应的各个相似度值,作为样本概率值,所述各个意图向量是指所述各个预设用户意图向量化后的向量值;网络参数调整模块,用于以所述每个样本结果向量对应的各个样本概率值为调整目标,调整所述第一双层循环神经网络、所述Key-Value记忆网络、所述第二双层循环神经网络和所述内容相似度计算网络的网络参数,以最小化所述每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差,所述各个目标标记值是指所述每个样本结果向量对应的样本问题向量的各个标记值;训练完成确定模块,用于若每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差满足预设的训练终止条件,则确定所述深度学习模型已训练好。
- 根据权利要求7所述的问题意图识别装置,其特征在于,所述问题识别模块包括:编码单元,用于将所述目标问题向量输入所述第一双层循环神经网络进行编码,得到目标语境特征向量;向量迭代计算单元,用于将所述目标语境特征向量投入所述Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的目标语境特征向量;解码单元,用于将迭代计算后的目标语境特征向量输入所述第二双层循环神经网络进行解码,得到目标结果向量;相似度计算单元,用于通过所述内容相似度计算网络计算所述目标结果向量与各个意图向量之间的相似度,得到所述目标结果向量对应的各个相似度值,作为各个问题概率值。
- 根据权利要求6所述的问题意图识别装置,其特征在于,所述文本向量化模块包括:第一转换单元,用于将所述目标问题文本中的各个目标字词分别转换为GloVe词向量,得到初始问题向量;字词判断单元,用于判断所述各个目标字词是否均被GloVe词向量覆盖;问题向量确定单元,用于若所述字词判断单元的判断结果为是,则确定所述初始问题向量为目标问题向量;第二转换单元,用于若所述字词判断单元的判断结果为否,则将未被GloVe词向量覆盖的目标字词转换为TransE词向量,得到补充向量;向量添加单元,用于将所述补充向量添加至所述初始问题向量,得到目标问题向量。
- 根据权利要求6至9中任一项所述的问题意图识别装置,其特征在于,所述问题意图识别装置还包括:答案元组获取模块,用于获取预设的各个问题答案元组,每个问题答案元组由一个预设用户意图和与所述一个预设用户意图对应的答案组成,其中,所述各个问题答案元组的预设用户意图各不相同;答案元组选取模块,用于从所述各个问题答案元组中选取出预设用户意图与所述真实问题意图相同的一个问题答案元组,作为命中元组;答案反馈模块,用于将所述命中元组的答案反馈至所述用户。
- 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理 器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:获取用户提问的目标问题语音;对所述目标问题语音进行音转字处理,得到目标问题文本;对所述目标问题文本进行向量化处理,得到目标问题向量;将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值,每个问题概率值各自表征了所述目标问题文本属于与所述每个问题概率值对应的预设用户意图的概率;从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图。
- 根据权利要求11所述的计算机设备,其特征在于,所述深度学习模型包括第一双层循环神经网络、Key-Value记忆网络、第二双层循环神经网络和内容相似度计算网络四个部分,所述深度学习模型通过以下步骤预先训练好:收集属于所述各个预设用户意图的样本问题语音;对收集到的样本问题语音分别进行音转字处理,得到样本问题文本;对所述样本问题文本进行向量化处理,得到样本问题向量;针对每个样本问题向量,为所述每个样本问题向量分别针对各个预设用户意图设定标记值,得到所述每个样本问题向量的各个标记值,其中,与所述每个样本问题向量对应的预设用户意图的标记值最大;将所有所述样本问题向量分别输入第一双层循环神经网络进行编码,得到各个样本问题向量各自对应的语境特征向量;将各个所述语境特征向量分别投入Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的各个语境特征向量,所述Key-Value记忆网络中参与迭代计算各个Key值为各个样本问题文本中主语和谓语对应的向量;将迭代计算后的各个语境特征向量分别输入第二双层循环神经网络进行解码,得到各个样本结果向量;针对每个样本结果向量,通过内容相似度计算网络计算所述每个样本结果向量与各个意图向量之间的相似度,得到所述每个样本结果向量对应的各个相似度值,作为样本概率值,所述各个意图向量是指所述各个预设用户意图向量化后的向量值;以所述每个样本结果向量对应的各个样本概率值为调整目标,调整所述第一双层循环神经网络、所述Key-Value记忆网络、所述第二双层循环神经网络和所述内容相似度计算网络的网络参数,以最小化所述每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差,所述各个目标标记值是指所述每个样本结果向量对应的样本问题向量的各个标记值;若每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差满足预设的训练终止条件,则确定所述深度学习模型已训练好。
- 根据权利要求12所述的计算机设备,其特征在于,所述将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值包括:将所述目标问题向量输入所述第一双层循环神经网络进行编码,得到目标语境特征向量;将所述目标语境特征向量投入所述Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的目标语境特征向量;将迭代计算后的目标语境特征向量输入所述第二双层循环神经网络进行解码,得到目 标结果向量;通过所述内容相似度计算网络计算所述目标结果向量与各个意图向量之间的相似度,得到所述目标结果向量对应的各个相似度值,作为各个问题概率值。
- 根据权利要求11所述的计算机设备,其特征在于,所述对所述目标问题文本进行向量化处理,得到目标问题向量包括:将所述目标问题文本中的各个目标字词分别转换为GloVe词向量,得到初始问题向量;判断所述各个目标字词是否均被GloVe词向量覆盖;若所述各个目标字词均被GloVe词向量覆盖,则确定所述初始问题向量为目标问题向量;若所述各个目标字词中任一个目标字词未被GloVe词向量覆盖,则将未被GloVe词向量覆盖的目标字词转换为TransE词向量,得到补充向量;将所述补充向量添加至所述初始问题向量,得到目标问题向量。
- 根据权利要求11至14中任一项所述的计算机设备,其特征在于,在从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图之后,所述处理器执行所述计算机可读指令时还实现如下步骤:获取预设的各个问题答案元组,每个问题答案元组由一个预设用户意图和与所述一个预设用户意图对应的答案组成,其中,所述各个问题答案元组的预设用户意图各不相同;从所述各个问题答案元组中选取出预设用户意图与所述真实问题意图相同的一个问题答案元组,作为命中元组;将所述命中元组的答案反馈至所述用户。
- 一个或多个存储有计算机可读指令的可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:获取用户提问的目标问题语音;对所述目标问题语音进行音转字处理,得到目标问题文本;对所述目标问题文本进行向量化处理,得到目标问题向量;将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值,每个问题概率值各自表征了所述目标问题文本属于与所述每个问题概率值对应的预设用户意图的概率;从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图。
- 根据权利要求16所述的可读存储介质,其特征在于,所述深度学习模型包括第一双层循环神经网络、Key-Value记忆网络、第二双层循环神经网络和内容相似度计算网络四个部分,所述深度学习模型通过以下步骤预先训练好:收集属于所述各个预设用户意图的样本问题语音;对收集到的样本问题语音分别进行音转字处理,得到样本问题文本;对所述样本问题文本进行向量化处理,得到样本问题向量;针对每个样本问题向量,为所述每个样本问题向量分别针对各个预设用户意图设定标记值,得到所述每个样本问题向量的各个标记值,其中,与所述每个样本问题向量对应的预设用户意图的标记值最大;将所有所述样本问题向量分别输入第一双层循环神经网络进行编码,得到各个样本问题向量各自对应的语境特征向量;将各个所述语境特征向量分别投入Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的各个语境特征向量,所述Key-Value记 忆网络中参与迭代计算各个Key值为各个样本问题文本中主语和谓语对应的向量;将迭代计算后的各个语境特征向量分别输入第二双层循环神经网络进行解码,得到各个样本结果向量;针对每个样本结果向量,通过内容相似度计算网络计算所述每个样本结果向量与各个意图向量之间的相似度,得到所述每个样本结果向量对应的各个相似度值,作为样本概率值,所述各个意图向量是指所述各个预设用户意图向量化后的向量值;以所述每个样本结果向量对应的各个样本概率值为调整目标,调整所述第一双层循环神经网络、所述Key-Value记忆网络、所述第二双层循环神经网络和所述内容相似度计算网络的网络参数,以最小化所述每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差,所述各个目标标记值是指所述每个样本结果向量对应的样本问题向量的各个标记值;若每个样本结果向量对应的各个样本概率值与各个目标标记值之间的误差满足预设的训练终止条件,则确定所述深度学习模型已训练好。
- 根据权利要求17所述的可读存储介质,其特征在于,所述将所述目标问题向量作为输入投入至预先训练好的基于注意力的深度学习模型,得到所述深度学习模型输出的、与各个预设用户意图分别对应的各个问题概率值包括:将所述目标问题向量输入所述第一双层循环神经网络进行编码,得到目标语境特征向量;将所述目标语境特征向量投入所述Key-Value记忆网络中进行迭代计算,直到达到预设的迭代次数或模型收敛,然后输出迭代计算后的目标语境特征向量;将迭代计算后的目标语境特征向量输入所述第二双层循环神经网络进行解码,得到目标结果向量;通过所述内容相似度计算网络计算所述目标结果向量与各个意图向量之间的相似度,得到所述目标结果向量对应的各个相似度值,作为各个问题概率值。
- 根据权利要求16所述的可读存储介质,其特征在于,所述对所述目标问题文本进行向量化处理,得到目标问题向量包括:将所述目标问题文本中的各个目标字词分别转换为GloVe词向量,得到初始问题向量;判断所述各个目标字词是否均被GloVe词向量覆盖;若所述各个目标字词均被GloVe词向量覆盖,则确定所述初始问题向量为目标问题向量;若所述各个目标字词中任一个目标字词未被GloVe词向量覆盖,则将未被GloVe词向量覆盖的目标字词转换为TransE词向量,得到补充向量;将所述补充向量添加至所述初始问题向量,得到目标问题向量。
- 根据权利要求16至19中任一项所述的可读存储介质,其特征在于,在从所述各个预设用户意图中选取出问题概率值最大的预设用户意图,作为所述用户的真实问题意图之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:获取预设的各个问题答案元组,每个问题答案元组由一个预设用户意图和与所述一个预设用户意图对应的答案组成,其中,所述各个问题答案元组的预设用户意图各不相同;从所述各个问题答案元组中选取出预设用户意图与所述真实问题意图相同的一个问题答案元组,作为命中元组;将所述命中元组的答案反馈至所述用户。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910467185.X | 2019-05-31 | ||
CN201910467185.XA CN110287285B (zh) | 2019-05-31 | 2019-05-31 | 一种问题意图识别方法、装置、计算机设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020237869A1 true WO2020237869A1 (zh) | 2020-12-03 |
Family
ID=68002985
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/102922 WO2020237869A1 (zh) | 2019-05-31 | 2019-08-28 | 一种问题意图识别方法、装置、计算机设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110287285B (zh) |
WO (1) | WO2020237869A1 (zh) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112686052A (zh) * | 2020-12-28 | 2021-04-20 | 科大讯飞股份有限公司 | 试题推荐及相关模型的训练方法、电子设备、存储装置 |
CN112862096A (zh) * | 2021-02-04 | 2021-05-28 | 百果园技术(新加坡)有限公司 | 一种模型训练和数据处理方法、装置、设备及介质 |
CN113469237A (zh) * | 2021-06-28 | 2021-10-01 | 平安科技(深圳)有限公司 | 用户意图识别方法、装置、电子设备及存储介质 |
CN113569918A (zh) * | 2021-07-05 | 2021-10-29 | 北京淇瑀信息科技有限公司 | 分类温度调节方法、装置、电子设备及介质 |
CN113905135A (zh) * | 2021-10-14 | 2022-01-07 | 天津车之家软件有限公司 | 一种智能外呼机器人的用户意向识别方法和装置 |
CN114155844A (zh) * | 2021-12-21 | 2022-03-08 | 科大讯飞股份有限公司 | 一种评分方法、装置、计算设备及存储介质 |
CN114881348A (zh) * | 2022-05-26 | 2022-08-09 | 中国平安人寿保险股份有限公司 | 智能外呼时段预测方法、装置、计算机设备及存储介质 |
CN116663537A (zh) * | 2023-07-26 | 2023-08-29 | 中信联合云科技有限责任公司 | 基于大数据分析的选题策划信息处理方法及系统 |
CN117648930A (zh) * | 2023-11-22 | 2024-03-05 | 平安创科科技(北京)有限公司 | 联合任务实现方法、装置、设备及介质 |
CN117725414A (zh) * | 2023-12-13 | 2024-03-19 | 北京海泰方圆科技股份有限公司 | 训练内容生成模型方法、确定输出内容方法、装置及设备 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110955831B (zh) * | 2019-11-25 | 2023-04-14 | 北京三快在线科技有限公司 | 物品推荐方法、装置、计算机设备及存储介质 |
CN110909144A (zh) * | 2019-11-28 | 2020-03-24 | 中信银行股份有限公司 | 问答对话方法、装置、电子设备及计算机可读存储介质 |
CN111126071B (zh) * | 2019-12-02 | 2023-05-12 | 支付宝(杭州)信息技术有限公司 | 提问文本数据的确定方法、装置和客服群的数据处理方法 |
CN111026853B (zh) * | 2019-12-02 | 2023-10-27 | 支付宝(杭州)信息技术有限公司 | 目标问题的确定方法、装置、服务器和客服机器人 |
CN110781402A (zh) * | 2020-01-02 | 2020-02-11 | 南京创维信息技术研究院有限公司 | 基于天猫精灵实现电视上的多轮深度检索系统及方法 |
CN111259625B (zh) * | 2020-01-16 | 2023-06-27 | 平安科技(深圳)有限公司 | 意图识别方法、装置、设备及计算机可读存储介质 |
CN112597290B (zh) * | 2020-12-25 | 2023-08-01 | 携程计算机技术(上海)有限公司 | 结合上下文的意图识别方法、系统、电子设备和存储介质 |
CN112764760B (zh) * | 2021-01-25 | 2023-12-26 | 中国科学院自动化研究所 | 基于程序评测的辅助答题系统 |
CN113408278B (zh) * | 2021-06-22 | 2023-01-20 | 平安科技(深圳)有限公司 | 意图识别方法、装置、设备及存储介质 |
CN113609851B (zh) * | 2021-07-09 | 2024-07-02 | 浙江连信科技有限公司 | 心理学上想法认知偏差的识别方法、装置及电子设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180144385A1 (en) * | 2016-11-18 | 2018-05-24 | Wal-Mart Stores, Inc. | Systems and methods for mapping a predicted entity to a product based on an online query |
CN108073574A (zh) * | 2016-11-16 | 2018-05-25 | 三星电子株式会社 | 用于处理自然语言以及训练自然语言模型的方法和设备 |
CN108920622A (zh) * | 2018-06-29 | 2018-11-30 | 北京奇艺世纪科技有限公司 | 一种意图识别的训练方法、训练装置和识别装置 |
CN109376361A (zh) * | 2018-11-16 | 2019-02-22 | 北京九狐时代智能科技有限公司 | 一种意图识别方法及装置 |
US20190139537A1 (en) * | 2017-11-08 | 2019-05-09 | Kabushiki Kaisha Toshiba | Dialogue system and dialogue method |
CN109800306A (zh) * | 2019-01-10 | 2019-05-24 | 深圳Tcl新技术有限公司 | 意图分析方法、装置、显示终端及计算机可读存储介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180330718A1 (en) * | 2017-05-11 | 2018-11-15 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for End-to-End speech recognition |
CN108304513B (zh) * | 2018-01-23 | 2020-08-11 | 义语智能科技(上海)有限公司 | 增加生成式对话模型结果多样性的方法及设备 |
CN108984535B (zh) * | 2018-06-25 | 2022-04-05 | 腾讯科技(深圳)有限公司 | 语句翻译的方法、翻译模型训练的方法、设备及存储介质 |
CN109241251B (zh) * | 2018-07-27 | 2022-05-27 | 众安信息技术服务有限公司 | 一种会话交互方法 |
CN109657229A (zh) * | 2018-10-31 | 2019-04-19 | 北京奇艺世纪科技有限公司 | 一种意图识别模型生成方法、意图识别方法及装置 |
CN109741751A (zh) * | 2018-12-11 | 2019-05-10 | 上海交通大学 | 面向智能语音控制的意图识别方法及装置 |
-
2019
- 2019-05-31 CN CN201910467185.XA patent/CN110287285B/zh active Active
- 2019-08-28 WO PCT/CN2019/102922 patent/WO2020237869A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108073574A (zh) * | 2016-11-16 | 2018-05-25 | 三星电子株式会社 | 用于处理自然语言以及训练自然语言模型的方法和设备 |
US20180144385A1 (en) * | 2016-11-18 | 2018-05-24 | Wal-Mart Stores, Inc. | Systems and methods for mapping a predicted entity to a product based on an online query |
US20190139537A1 (en) * | 2017-11-08 | 2019-05-09 | Kabushiki Kaisha Toshiba | Dialogue system and dialogue method |
CN108920622A (zh) * | 2018-06-29 | 2018-11-30 | 北京奇艺世纪科技有限公司 | 一种意图识别的训练方法、训练装置和识别装置 |
CN109376361A (zh) * | 2018-11-16 | 2019-02-22 | 北京九狐时代智能科技有限公司 | 一种意图识别方法及装置 |
CN109800306A (zh) * | 2019-01-10 | 2019-05-24 | 深圳Tcl新技术有限公司 | 意图分析方法、装置、显示终端及计算机可读存储介质 |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112686052B (zh) * | 2020-12-28 | 2023-12-01 | 科大讯飞股份有限公司 | 试题推荐及相关模型的训练方法、电子设备、存储装置 |
CN112686052A (zh) * | 2020-12-28 | 2021-04-20 | 科大讯飞股份有限公司 | 试题推荐及相关模型的训练方法、电子设备、存储装置 |
CN112862096A (zh) * | 2021-02-04 | 2021-05-28 | 百果园技术(新加坡)有限公司 | 一种模型训练和数据处理方法、装置、设备及介质 |
CN113469237B (zh) * | 2021-06-28 | 2023-09-15 | 平安科技(深圳)有限公司 | 用户意图识别方法、装置、电子设备及存储介质 |
CN113469237A (zh) * | 2021-06-28 | 2021-10-01 | 平安科技(深圳)有限公司 | 用户意图识别方法、装置、电子设备及存储介质 |
CN113569918A (zh) * | 2021-07-05 | 2021-10-29 | 北京淇瑀信息科技有限公司 | 分类温度调节方法、装置、电子设备及介质 |
CN113905135B (zh) * | 2021-10-14 | 2023-10-20 | 天津车之家软件有限公司 | 一种智能外呼机器人的用户意向识别方法和装置 |
CN113905135A (zh) * | 2021-10-14 | 2022-01-07 | 天津车之家软件有限公司 | 一种智能外呼机器人的用户意向识别方法和装置 |
CN114155844A (zh) * | 2021-12-21 | 2022-03-08 | 科大讯飞股份有限公司 | 一种评分方法、装置、计算设备及存储介质 |
CN114881348A (zh) * | 2022-05-26 | 2022-08-09 | 中国平安人寿保险股份有限公司 | 智能外呼时段预测方法、装置、计算机设备及存储介质 |
CN116663537A (zh) * | 2023-07-26 | 2023-08-29 | 中信联合云科技有限责任公司 | 基于大数据分析的选题策划信息处理方法及系统 |
CN116663537B (zh) * | 2023-07-26 | 2023-11-03 | 中信联合云科技有限责任公司 | 基于大数据分析的选题策划信息处理方法及系统 |
CN117648930A (zh) * | 2023-11-22 | 2024-03-05 | 平安创科科技(北京)有限公司 | 联合任务实现方法、装置、设备及介质 |
CN117725414A (zh) * | 2023-12-13 | 2024-03-19 | 北京海泰方圆科技股份有限公司 | 训练内容生成模型方法、确定输出内容方法、装置及设备 |
Also Published As
Publication number | Publication date |
---|---|
CN110287285A (zh) | 2019-09-27 |
CN110287285B (zh) | 2023-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020237869A1 (zh) | 一种问题意图识别方法、装置、计算机设备及存储介质 | |
WO2020232877A1 (zh) | 一种问题答案选取方法、装置、计算机设备及存储介质 | |
WO2020119031A1 (zh) | 基于深度学习的问答反馈方法、装置、设备及存储介质 | |
WO2020119030A1 (zh) | 用于答复问题的模型训练方法、装置、设备及存储介质 | |
US11556786B2 (en) | Attention-based decoder-only sequence transduction neural networks | |
WO2020177230A1 (zh) | 基于机器学习的医疗数据分类方法、装置、计算机设备及存储介质 | |
WO2018133761A1 (zh) | 一种人机对话的方法和装置 | |
CN112016295B (zh) | 症状数据处理方法、装置、计算机设备及存储介质 | |
WO2018157805A1 (zh) | 一种自动问答处理方法及自动问答系统 | |
WO2021000497A1 (zh) | 检索方法、装置、计算机设备和存储介质 | |
CN110083693B (zh) | 机器人对话回复方法及装置 | |
WO2020147395A1 (zh) | 基于情感的文本分类处理方法、装置和计算机设备 | |
WO2021114620A1 (zh) | 病历质控方法、装置、计算机设备和存储介质 | |
WO2020215560A1 (zh) | 自编码神经网络处理方法、装置、计算机设备及存储介质 | |
WO2020151310A1 (zh) | 文本生成方法、装置、计算机设备及介质 | |
WO2021027125A1 (zh) | 序列标注方法、装置、计算机设备和存储介质 | |
CN109857846B (zh) | 用户问句与知识点的匹配方法和装置 | |
CN111694940A (zh) | 一种用户报告的生成方法及终端设备 | |
CN111078847A (zh) | 电力用户意图识别方法、装置、计算机设备和存储介质 | |
CN110598210B (zh) | 实体识别模型训练、实体识别方法、装置、设备及介质 | |
CN113886531B (zh) | 智能问答话术确定方法、装置、计算机设备和存储介质 | |
CN111611383A (zh) | 用户意图的识别方法、装置、计算机设备及存储介质 | |
CN110377618B (zh) | 裁决结果分析方法、装置、计算机设备和存储介质 | |
CN112307048A (zh) | 语义匹配模型训练方法、匹配方法、装置、设备及存储介质 | |
WO2020177378A1 (zh) | 文本信息的特征提取方法、装置、计算机设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19930964 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19930964 Country of ref document: EP Kind code of ref document: A1 |