CN109977404A

CN109977404A - Answer extracting method, apparatus and storage medium based on deep learning

Info

Publication number: CN109977404A
Application number: CN201910225135.0A
Authority: CN
Inventors: 杨雪峰; 徐爽; 巨颖; 孙宁远
Original assignee: Shenzhen Chase Technology Co Ltd
Current assignee: Shenzhen Chase Technology Co Ltd
Priority date: 2019-03-22
Filing date: 2019-03-22
Publication date: 2019-07-05
Also published as: WO2020192307A1

Abstract

This application involves a kind of answer extracting method, apparatus and storage medium based on deep learning, this method comprises: customer problem is obtained, and, according to the customer problem, obtain document content relevant to the customer problem；Based on deep learning model, is determined in the document content and extract initial position and extraction end position；By the extraction initial position and the document content extracted between end position, it is determined as answer corresponding to the customer problem, and show the answer.The application does not need manually to extract feature to formulate various matching rules and extract answer, the customer problem and document content relevant to customer problem that directly will acquire are input to deep learning model and can obtain the most suitable answer to match with customer problem from document content, simplify Answer extracting process, and answer accuracy is improved, to substantially increase the efficiency and quality of automatic customer service.

Description

Answer extraction method and device based on deep learning and storage medium

Technical Field

The present application relates to the field of natural language understanding technologies, and in particular, to a method and an apparatus for extracting answers based on deep learning, and a storage medium.

Background

At present, in order to reduce the workload of customer service staff and improve the office efficiency, a plurality of merchants use intelligent customer service to automatically answer some questions of customers, and the intelligent customer service is mostly a document-based automatic question-answering system. Document-based automated question-answering systems typically include three modules, namely, question processing, chapter search, and answer processing. The working flow is that a user puts forward a problem in a natural language and a problem processing module processes the problem; then searching relevant documents containing answers from the massive document set by a chapter searching module in the system according to the processed questions; finally, the answer processing module extracts document blocks containing answers from the related documents through some answer extraction technologies and returns the document blocks to the user.

In the related art, the automatic question answering system usually has different answer extraction methods for different types of questions in the answer processing module. For example, for simple fact type questions, answers may be simply matched based on a bag-of-words model, i.e., named entities consistent with expected answer types are extracted from document periods as candidate answers; the answers can also be matched based on the surface layer mode, and the basic idea is that the answers of the questions and the keywords of the question sentence always have certain specific surface layer relations, so that the algorithm does not use too much deep language processing, and extracts the candidate answers meeting the surface layer rule mode from the document sentence segment. The answer extraction method needs manual feature extraction to formulate various matching rules to extract answers, so that the answer extraction process is complicated, the accuracy of the extracted answers is reduced, and the efficiency and the quality of automatic customer service are influenced.

Disclosure of Invention

To overcome, at least to some extent, the problems in the related art, the present application provides a method, an apparatus, and a storage medium for answer extraction based on deep learning.

According to a first aspect of embodiments of the present application, there is provided an answer extraction method based on deep learning, including:

acquiring a user question, and acquiring document content related to the user question according to the user question;

determining an extraction starting position and an extraction ending position in the document content based on a deep learning model;

and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and displaying the answer.

Optionally, the determining, based on the deep learning model, an extraction starting position and an extraction ending position in the document content includes:

respectively obtaining a user question to be processed and document content to be processed according to the user question and the document content, respectively performing word segmentation on the user question to be processed and the document content to be processed, and performing word vector conversion on each word to obtain a first question matrix and a first document matrix;

processing the first document matrix to enable the processed first document matrix to contain problem information, and respectively coding the processed first document matrix and the processed first problem matrix to respectively obtain a second document matrix and a second problem matrix;

based on an attention mechanism, performing interactive processing on the second document matrix and the second problem matrix to obtain a third document matrix;

based on an attention mechanism, performing self-matching processing on the third document matrix to obtain a fourth document matrix;

and determining an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix based on a pointer network.

Optionally, the obtaining the user question to be processed and the document content to be processed according to the user question and the document content respectively includes:

splicing all the document contents to obtain document contents to be processed; and/or the presence of a gas in the gas,

and repeating the user problems for multiple times, and splicing the repeated user problems to obtain the user problems to be processed, wherein the repeated times of the user problems are the total number of the document contents.

Optionally, the processing the first document matrix, so that the processed first document matrix includes problem information, includes:

determining word co-occurrence characteristics, and splicing the word co-occurrence characteristics to the tail of the corresponding document word vector in the first document matrix to obtain the processed first document matrix.

Optionally, the word co-occurrence feature includes: the determining word co-occurrence characteristics and splicing the word co-occurrence characteristics to the tails of the corresponding document word vectors in the first document matrix comprises:

corresponding to each word in the document content to be processed, if the word is the same as at least one word in a user question to be processed, determining that a first word co-occurrence characteristic corresponding to the word in the document content to be processed is a first value, otherwise, determining that the first word co-occurrence characteristic is a second value, wherein the first value and the second value are fixed values and are respectively used for indicating that the word in the document content appears or does not appear in the user question, and splicing the first word co-occurrence characteristic to the tail of a word vector corresponding to the word in the first document matrix; and/or the presence of a gas in the gas,

respectively calculating similarity values between each word vector in the first document matrix and each word vector in the first problem matrix, normalizing the similarity values corresponding to each word vector in the first document matrix, and splicing the normalized similarity values serving as second word co-occurrence characteristics to the tail of the corresponding word vector in the first document matrix.

Optionally, the encoding the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively includes:

taking the processed first document matrix as the input of a preset first GRU network, processing the processed first document matrix by adopting the first GRU network, and determining the output layer output of the first GRU network as a second document matrix; and the number of the first and second groups,

and determining an input problem matrix according to the first problem matrix, taking the input problem matrix as the input of a preset second GRU network, processing the input problem matrix by adopting the second GRU network, and determining the output layer output of the second GRU network as the second problem matrix.

Optionally, the determining an input problem matrix according to the first problem matrix includes:

determining the first problem matrix as an input problem matrix if the first GRU network is different from the second GRU network; or,

and if the first GRU network is the same as the second GRU network, splicing preset features at the tail part of each word vector corresponding to each word vector in the first problem matrix to obtain a spliced problem matrix, and determining the spliced problem matrix as an input problem matrix, wherein the number of the preset features is the same as the number of the word co-occurrence features.

Optionally, the performing, based on the attention mechanism, an interaction process on the second document matrix and the second problem matrix to obtain a third document matrix includes:

processing the second document matrix and the second problem matrix based on an attention mechanism to obtain an interaction matrix, so that the interaction matrix comprises comparison information of documents and problems;

and taking the interaction matrix as the input of a preset third GRU network, processing the interaction matrix by adopting the third GRU network, and determining the output layer output of the third GRU network as a third document matrix.

Optionally, the processing the second document matrix and the second problem matrix based on the attention mechanism to obtain an interaction matrix, so that the interaction matrix includes comparison information of documents and problems, including:

calculating to obtain a word pair similarity matrix of the documents and the problems according to the second document matrix and the second problem matrix, and determining a first normalization matrix and a second normalization matrix according to the word pair similarity matrix;

based on an attention mechanism, respectively adopting the first normalization matrix and the second problem matrix to carry out weighting operation, and adopting the first normalization matrix, the second normalization matrix and the second document matrix to carry out weighting operation, and respectively calculating to obtain a first interaction attention matrix and a second interaction attention matrix;

and sequentially splicing the second document matrix, the first interactive attention matrix, the dot-product matrix of the second document matrix and the first interactive attention matrix, and the dot-product matrix of the second document matrix and the second interactive attention matrix, and determining the spliced matrix as an interactive matrix.

Optionally, the performing, based on the attention mechanism, a self-matching process on the third document matrix to obtain a fourth document matrix includes:

processing the third document matrix based on an attention mechanism to obtain a self-matching matrix;

and taking the self-matching matrix as the input of a preset bidirectional recurrent neural network, processing the self-matching matrix by adopting the bidirectional recurrent neural network, and determining the hidden layer output of the bidirectional recurrent neural network as a fourth document matrix.

Optionally, the processing the third document matrix based on the attention mechanism to obtain a self-matching matrix includes:

according to the third document matrix, calculating to obtain a self-matching similarity matrix of the documents, and determining a self-matching weighting matrix according to the self-matching similarity matrix;

based on an attention mechanism, performing weighting operation on the third document matrix by adopting the self-matching weighting matrix, and calculating to obtain a self-matching attention matrix;

and splicing the third document matrix and the self-matching attention moment matrix to obtain a spliced matrix, and determining the self-matching matrix according to the spliced matrix.

Optionally, the determining a self-matching matrix according to the spliced matrix includes:

determining the spliced matrix as a self-matching matrix; or,

and based on a gate control mechanism, carrying out weighting processing on the spliced matrix, and determining the weighted matrix as a self-matching matrix.

Optionally, the determining, based on the pointer network, an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix includes:

reducing the second problem matrix to obtain a reduced problem matrix;

corresponding to each document content, calculating to obtain an attention matrix according to the fourth document matrix and the reduced problem matrix, wherein the attention moment matrix is used for representing semantic representation of a problem word vector on the document word vector and comprises an attention matrix at a first moment and an attention matrix at a second moment;

corresponding to each document content, calculating a probability value corresponding to each document word according to the attention matrix, determining the document word corresponding to the maximum probability value at the first moment as an extraction starting position of the corresponding document content, and determining the document word corresponding to the maximum probability value at the second moment as an extraction ending position of the corresponding document content;

determining the document content to be selected according to the product of the probability values corresponding to the extraction starting position and the extraction ending position corresponding to different document contents, and determining the extraction starting position and the extraction ending position corresponding to the document content to be selected as the extraction starting position and the extraction ending position which are finally adopted.

According to a second aspect of embodiments of the present application, there is provided an answer extraction device based on deep learning, including:

the acquisition module is used for acquiring user questions and acquiring document contents related to the user questions according to the user questions;

the processing module is used for determining an extraction starting position and an extraction ending position in the document content based on a deep learning model;

and the display module is used for determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question and displaying the answer.

Optionally, the processing module includes:

the first processing unit is used for respectively obtaining a user question to be processed and document content to be processed according to the user question and the document content, respectively segmenting words of the user question to be processed and the document content to be processed, and performing word vector conversion on each word to obtain a first question matrix and a first document matrix;

the second processing unit is used for processing the first document matrix to enable the processed first document matrix to contain problem information, and coding the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively;

the third processing unit is used for performing interactive processing on the second document matrix and the second problem matrix based on an attention mechanism to obtain a third document matrix;

the fourth processing unit is used for carrying out self-matching processing on the third document matrix based on an attention mechanism to obtain a fourth document matrix;

and the fifth processing unit is used for determining an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix based on a pointer network.

Optionally, the first processing unit is specifically configured to:

Optionally, the second processing unit is specifically configured to:

Optionally, the word co-occurrence feature includes: the first word co-occurrence feature and/or the second word co-occurrence feature, the second processing unit is specifically configured to:

Optionally, the second processing unit is specifically configured to:

Optionally, the third processing unit is specifically configured to:

Optionally, the fourth processing unit is specifically configured to:

determining the spliced matrix as a self-matching matrix; or,

Optionally, the fifth processing unit is specifically configured to:

reducing the second problem matrix to obtain a reduced problem matrix;

According to a third aspect of embodiments of the present application, there is provided a storage medium storing a computer program which, when executed by a processor, implements each step in an answer extraction method based on deep learning. The method comprises the following steps:

determining the spliced matrix as a self-matching matrix; or,

reducing the second problem matrix to obtain a reduced problem matrix;

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

firstly, obtaining a user question, and obtaining document content related to the user question according to the user question; then, based on a deep learning model, determining an extraction starting position and an extraction ending position in the document content; and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and displaying the answer. By adopting the answer extraction method, various matching rules are formulated without manually extracting features to extract answers, the obtained user questions and the document contents related to the user questions are directly input into the deep learning model, so that the most appropriate answers matched with the user questions can be obtained from the document contents, the answer extraction process is simplified, the answer accuracy is improved, and the efficiency and the quality of automatic customer service are greatly improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a flowchart illustrating an answer extraction method based on deep learning according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating a method for determining an extraction start position and an extraction end position in document contents based on a deep learning model according to an exemplary embodiment.

Fig. 3 is a schematic diagram illustrating a structure of a deep learning network model according to another exemplary embodiment.

Fig. 4 is a schematic structural diagram illustrating an answer extraction apparatus based on deep learning according to another exemplary embodiment.

Fig. 5 is a schematic structural diagram illustrating processing modules in an answer extraction device based on deep learning according to another exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods and apparatus consistent with certain aspects of the present application, as detailed in the appended claims.

Fig. 1 illustrates a method for answer extraction based on deep learning according to an exemplary embodiment.

As shown in fig. 1, the method provided by this embodiment includes the following steps:

step S11, obtaining user questions, and obtaining document contents related to the user questions according to the user questions;

for example, the customer service robot may receive user questions entered by a user, the user may enter in natural language, the input may be in the form of text, voice, etc. For example, the user question is "can stock be bought and sold on the day? ".

After the user question is obtained, the customer service robot may obtain related document content by using a related technology, for example, the related document content is the document content containing the user question. The specific acquisition of the document content related to the user problem can be realized by using known technology, and is not detailed here. Based on the above-mentioned user problem, a related document content such as "T +0 transaction" is a transaction method introduced by deep drafts 1993, which means that after the investor purchases (sells) stocks (or futures) and confirms the transaction in the day, the stocks purchased (sold) in the day can be sold in the day, and the stocks sold in the day can be purchased in the day. From 1/1 in 1995, China executed a "T + 1" trading system, i.e., the stock bought the day must be sold the next trading day, in order to ensure the stability of the stock market and prevent excessive crisis. At the same time, the fund still implements "T + 0", namely the fund returned to the cage on the day is immediately available. The stock of B shares uses T +1, and the fund is applicable to T + 3. ".

Step S12, determining an extraction starting position and an extraction ending position in the document content based on a deep learning model;

further details of this step can be found in the related description that follows.

Step S13, determining the document content between the extraction start position and the extraction end position as an answer corresponding to the user question, and displaying the answer.

For example, based on the deep learning model, if it is determined that the extraction start position and the extraction end position are "1995" and "sold", respectively, from 1 month and 1 day of 1995, china carries out a "T + 1" trading system, i.e., the stock bought the day must be sold until the next trading day, in order to ensure the stability of the stock market and prevent excessive crisis. "is it determined to be a user question" can stock be bought and sold on the day? "the corresponding answer. The answer may then be presented. The presentation mode can be in the form of text or voice.

In the embodiment, a user question is firstly obtained, and document content related to the user question is obtained according to the user question; then, based on a deep learning model, determining an extraction starting position and an extraction ending position in the document content; and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and displaying the answer. By adopting the answer extraction method, various matching rules are formulated without manually extracting features to extract answers, the obtained user questions and the document contents related to the user questions are directly input into the deep learning model, so that the most appropriate answers matched with the user questions can be obtained from the document contents, the answer extraction process is simplified, the answer accuracy is improved, and the efficiency and the quality of automatic customer service are greatly improved.

Further, referring to fig. 2, the determining an extraction starting position and an extraction ending position in the document content based on the deep learning model includes the following steps:

step S21, respectively obtaining a user question to be processed and a document content to be processed according to the user question and the document content, respectively performing word segmentation on the user question to be processed and the document content to be processed, and performing word vector conversion on each word to obtain a first question matrix and a first document matrix;

in conjunction with the structural diagram of the deep learning network model shown in fig. 3, in S31, a first question matrix and a first document matrix, which are represented by word vectors in fig. 3, may be obtained corresponding to the document content and the user question.

Step S22, processing the first document matrix to make the processed first document matrix contain problem information, and coding the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively;

as shown in fig. 3, the second document matrix and the second question matrix may be obtained by self-encoding the first question matrix and the first document matrix, respectively, in S32.

Step S23, based on the attention mechanism, performing interactive processing on the second document matrix and the second problem matrix to obtain a third document matrix;

as shown in fig. 3, after the attention-based mixed interaction process is performed in S33, a third document matrix is obtained.

Step S24, based on the attention mechanism, the third document matrix is subjected to self-matching processing to obtain a fourth document matrix;

as shown in fig. 3, self-attention processing is performed in S34 to obtain a fourth document matrix.

And step S25, determining an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix based on a pointer network.

As shown in fig. 3, prediction is performed in S35 to obtain the final presented answer.

In step S21, obtaining a user question to be processed and a document content to be processed according to the user question and the document content respectively includes:

Specifically, the document content to be processed is formed by splicing the contents of a plurality of documents related to the question asked by the user, for example, according to the key words in the question asked by the user, k documents closest to the question asked are retrieved from massive documents on the network, and all the contents of the k documents are spliced together to form the document content to be transmitted into the deep learning model; correspondingly, the user questions to be processed are spliced by the single questions asked by the k users.

After obtaining document contents to be processed and user problems to be processed, performing word segmentation on the user problems and the document contents respectively, and performing word vector conversion on each word segmentation, wherein the word vector conversion is completed by a pre-trained word vector model, the user problems to be processed (marked as q) and the document contents to be processed (marked as c) are transmitted into the pre-trained word vector model, and each word in the user problems and the document contents is mapped into a 300-dimensional full real number vector, namely a word vector (or called as a parameter matrix), by the word vector model, so that a first problem matrix (marked as q _ emb) and a first document matrix (marked as c _ emb) are obtained; where the word vector model is used to represent a mapping of words to word vectors.

In step S22, the processing the first document matrix so that the processed first document matrix contains question information includes:

Further, the word co-occurrence characteristics include: the determining word co-occurrence characteristics and splicing the word co-occurrence characteristics to the tails of the corresponding document word vectors in the first document matrix comprises:

Specifically, in order to enable the document content to be processed to contain information of some user questions, two word co-occurrence features need to be added into each word vector in the first document matrix, corresponding to the thinking of people in normal reading, if a word in a user question appears at a certain position in the document content, the content near the position is likely to be an answer. Taking the example of adding two word co-occurrence features, adding a first word co-occurrence feature and a second word co-occurrence feature in each word vector in the first document matrix, where the first word co-occurrence feature is used to indicate whether a word in the document content appears in the user question, and the second word co-occurrence feature is used to indicate the similarity between a word in the document content and a word in the user question.

It is to be understood that the first value may be 1 and the second value may be 0.

In particular, corresponding to each word vector in the first document matrix, if each word vector corresponds to the same word in the document content to be processed as the word corresponding to at least one word vector in the first question matrix, determining a first word co-occurrence characteristic (denoted as wiq)^b) Is a first value, i.e., 1; otherwise, determining that the first word co-occurrence characteristic is a second value, namely 0. Namely, judging whether the word in the document content to be processed appears in the user problem to be processed, if so, the first word co-occurrence feature is 1, otherwise, the first word co-occurrence feature is 0, and splicing the determined first word co-occurrence feature to the tail part of the corresponding word vector in the first document matrix.

It should be noted that the formula for calculating the co-occurrence feature of the first word is as follows:the meaning of this formula is: if the jth word in the document content to be processed is the same as the ith word in the user question to be processed, the co-occurrence characteristic of the first word corresponding to the jth word in the document is 1, otherwise, the co-occurrence characteristic is 0.

Then, similarity values between the word vectors in the first document matrix and the word vectors in the first problem matrix are respectively calculated, the calculated similarity values are normalized, and the normalized values are determined as second word co-occurrence characteristics.

It should be noted that the value range of the co-occurrence feature of the second word is [0, 1 ]; the calculation formula of the co-occurrence feature of the second word is as follows:

sim_i,j＝v_wiq(x_j⊙q_i)，v_wiq∈Rⁿ

wherein v is_wiqIs a parameter matrix, x, obtained by pre-training_jRepresenting the jth word vector, q, in the document word vector_iRepresenting the ith word vector, sim, of the question word vectors_i,jA similarity score representing the jth word in the document content and the ith word in the user question,represents pair sim_i,jThe softmax normalization is performed and,a second word co-occurrence feature representing a jth word in the document.

After determining a first word co-occurrence feature and a second word co-occurrence feature corresponding to each word vector in the first document matrix, splicing the first word co-occurrence feature and the second word co-occurrence feature to the tail of the corresponding word vector in the first document matrix to obtain a processed first document matrix; for example, the dimension of each word vector is 300, and the dimension of the document word vector before splicing is [ batch _ size, seq _ len,300], so that the dimension of the document word vector after splicing two word co-occurrence features after each word vector in the document word vector becomes [ batch _ size, seq _ len,302 ].

The processed first document matrix obtained through the above process contains problem information.

It should be noted that after the document content and the user question are mapped into the document word vector and the question word vector, batch processing needs to be performed on the document word vector and the question word vector. The batch _ size represents the number of each batch of the obtained document word vectors after batch processing; seq _ len represents the document word vector length.

It should be noted that the first word co-occurrence feature and the second word co-occurrence feature both represent paragraph lengths, if a word vector in the first document matrix is the same as at least one word vector in the first document matrix, then the first word co-occurrence feature (value is 1) added at the tail of the corresponding word vector in the first document matrix, that is, the value of the added dimension corresponding to the corresponding word vector is 1, otherwise, the value of the added dimension corresponding to the word vector is 0; similarly, if the similarity value between a word vector in the first document matrix and a word vector in the first question matrix is 0.6, the second word co-occurrence feature added to the tail of the corresponding word vector in the first document matrix is 0.6, i.e., the value of the tail of the corresponding word vector in the added dimension is 0.6.

Further, the encoding the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively includes:

Further, the determining an input problem matrix according to the first problem matrix includes:

Specifically, after the word co-occurrence feature is added to the processed first document matrix, the dimension of the processed first document matrix is different from that of the first problem matrix, different encoders are needed for encoding the matrices with different dimensions, and if the processed first document matrix and the processed first problem matrix are encoded by using the same encoder, the tail of each word vector in the first problem matrix needs to be spliced with two preset features to obtain the processed first problem matrix. In this way, the dimension of the obtained processed first problem matrix is the same as the dimension of the processed first document matrix. Both preset features may be, but are not limited to being, 1, i.e., the value in both dimensions added at the end of each word vector in the first problem matrix is 1.

In one embodiment, the encoder is a Gated current Unit (GRU), and the processed first document matrix and the processed first problem matrix are encoded through a predetermined number of layers of GUR networks to obtain a second document matrix (denoted as C) and a second problem matrix (denoted as Q). The second document matrix obtained at this time contains information of the user question.

It should be noted that the preset number may be, but is not limited to, 2, or 3, or 4.

Step S22 is equivalent to "read first with question", understanding the question and document by GRU coding.

In step S23, based on an attention mechanism, performing interaction processing on the second document matrix and the second problem matrix to obtain a third document matrix; the method comprises the following steps:

Further, the processing the second document matrix and the second problem matrix based on the attention mechanism to obtain an interaction matrix, so that the interaction matrix includes comparison information of documents and problems, including:

Specifically, a similarity matrix of the document-question word pairs is obtained by adopting a trilinear function and calculating according to the second document matrix and the second question matrix. The formula is f (q, c) W₀[q,c,q⊙c](ii) a Wherein, W₀Is a weight matrix, q is a word vector representation for each word in the second question matrix obtained in step S22, c is a word vector representation for each word in the second document matrix obtained in step S22, and f (q, c) represents a similarity matrix of document-question word pairs.

In one embodiment, the weight matrix W₀Is obtained by random initialization.

Then respectively normalizing the row and the column of the similarity matrix of the document-question word pair to obtain a row normalization matrix (usingRepresentation) and column normalization matrix (in)Representation).

The row normalization matrix is a first normalization matrix, and the column normalization matrix is a second normalization matrix.

Calculating to obtain a document-problem attention matrix according to the row normalization matrix and the transpose matrix of the second problem matrix; the calculation formula is as follows:wherein Q is^TA transpose matrix representing a second problem matrix, a representing a document-problem attention matrix;

calculating to obtain a problem-document attention matrix according to the row normalization matrix, the column normalization matrix and the transpose matrix of the second document matrix; the calculation formula isWherein, C^TA transposed matrix representing a second document matrix,a transpose matrix representing a column normalization matrix, B represents a document-problem attention matrix;

then, a second document matrix C, a document-question attention matrix A, a matrix obtained by performing point multiplication on the second document matrix C and the document-question attention matrix A, and a matrix obtained by performing point multiplication on the second document matrix C and the question-document attention matrix B are sequentially spliced to obtain a spliced matrix, namely the spliced matrixAnd encoding the spliced matrix through a GRU network to obtain a third document matrix, wherein the third document matrix integrates the comparison information of the problem and the document.

It should be noted that the dimension of the document matrix C is [ batch _ size, paragraph length, hidden ], the dimension of the problem matrix is [ batch _ size, problem length, hidden ], when the matrices are spliced, the matrices are sequentially spliced on the last dimension of the previous matrix, and the dimension of the spliced matrix is [ batch _ size, paragraph length, hidden 4 ]. Wherein, hidden represents the number of cryptic neurons after encoding.

Step S23 corresponds to "read document with question" and "read question with document". Comparing the second document matrix C obtained in the last step with the second question matrix Q, namely calculating the attention distribution of each word in the document content about each word in the user question through an attention mechanism, namely measuring a document word vector by using a question word vector; for each word in the user question, its attention distribution with respect to each word in the document content is calculated, i.e., the question word vector is measured with the document word vector. In this manner, a connection is established between the user question and the relevant content portion of the document to locate within the document content the portion that is truly useful for answering the user question.

In step S24, based on the attention mechanism, performing self-matching processing on the third document matrix to obtain a fourth document matrix; the method comprises the following steps:

Further, the processing the third document matrix based on the attention mechanism to obtain a self-matching matrix includes:

Specifically, according to the third document matrix and a parameter matrix obtained by pre-training, calculating to obtain a self-matching similarity matrix of the documents and the documents; the formula is as follows:

wherein v is^T、Andare all parameter matrices derived by random initialization, v^PA vector representation of the third document matrix obtained in step S23 in which the alignment information of the question and the document is fused, and j represents a word index.

Normalizing the self-matching similarity matrix of the document and the document to obtain a normalized matrix; the formula is as follows:wherein,representing the normalized matrix.

According to the vector representation of the third document matrix and the normalized matrix, calculating to obtain a self-matching attention matrix of each word in the document corresponding to the whole document; the formula is as follows:the meaning is as follows: weighted sum is carried out on the weight and the word vectors in the third document matrix corresponding to the weight to obtain a self-matching attention matrix c_tWhich represents a semantic vector for each word in the document corresponding to the entire document. Splicing the third document matrix and the self-matching attention moment matrix to obtain a spliced matrix

Further, the determining a self-matching matrix according to the spliced matrix includes:

determining the spliced matrix as a self-matching matrix; or,

Specifically, the obtained spliced matrix can be directly determined as a self-matching matrix; alternatively, in order to control the importance of different parts in the document content, a gate control mechanism may be additionally introduced to adaptively control the importance of different parts. That is, first, according to the sigmoid function and the self-matching attention matrix c corresponding to each word in the document_tAnd calculating a third document matrix to obtain a weight matrix (marked as g)_t) To control the importance of different parts of the document, and then g_tWith the resulting spliced matrixPerforming dot product calculation to obtain a new matrix, and determining the obtained new matrix as a self-matching matrix; the formula is as follows:

wherein, W_gA parameter matrix obtained by random initialization is shown,the resulting matrix after the stitching is represented,representing the self-matching matrix after processing by the gating control mechanism.

And after the self-matching matrix is obtained, the self-matching matrix is used as an input quantity and is input into a bidirectional Recurrent Neural Network (BiRNN) to obtain an output value of a hidden layer node of the BiRNN, and the output value is determined as a fourth document matrix. It is expressed as:

wherein,is the output of the BiRNN hidden node at time t.

In step S24, the document is read again and "read for the third time" corresponding to the understanding of the document and the problem with respect to the problem. The word representation of the document content is adjusted by using the context of the document content to obtain more context information of the document content, namely, words far away in the same document content are compared, and the current word can be distinguished from other words with similar meanings in the rest part of the document.

In step S25, determining an extraction start position and an extraction end position in the document content according to the fourth document matrix and the second question matrix based on a pointer network, including:

reducing the second problem matrix to obtain a reduced problem matrix;

In particular, since the user question is formed by splicing several individual questions asked by the initial user, here, the user question is first reduced to a single question, that is, the encoded second problem matrix obtained in step S22 is reduced to a single problem matrix, then, for the fourth document matrix obtained in step S24, the attention matrix for the single question matrix is calculated, and then the start/end positions corresponding to the maximum probability values in the attention matrix corresponding to the fourth document matrix are calculated respectively as the start/end positions of answer extraction, because the document content is formed by splicing the contents of a plurality of related documents, a plurality of pairs of starting positions and ending positions are correspondingly obtained, products of probability values corresponding to each pair of starting positions and ending positions are calculated, and a group with the largest probability product is selected as a final answer to extract the starting/ending positions. For example, the document content is obtained by splicing the content of 5 related documents, and finally 5 start positions and end positions are obtained correspondingly, the probability values corresponding to the 5 real positions and the end positions are multiplied to obtain the product of 5 probabilities, and the final answer extraction start/end position of the start position and the end position corresponding to the value with the maximum probability is selected from the product.

The detailed formula calculation process is as follows:

calculating a semantic vector of each word vector in the document relative to the document according to the fourth document matrix, and taking the semantic vector as an input quantity of the RNN to obtain an output value of the RNN hidden node; the formula is as follows:

wherein,a vector representation, c, corresponding to the fourth document matrix obtained in step S24_tIs the semantic vector of each word vector in the document at time t with respect to the document itself,is a problem vector generated with the attention of the documentWherein r is^QThe calculation process of (2) is as follows:

wherein v is^T、Andare all parameters, u, obtained by random initialization^QA vector representation, S, representing the problem matrix obtained after encoding in step S22 and having been reduced to a single problem length_jRepresenting a single problemSimilarity matrix with the individual problem itself, a_iRepresenting a normalized matrix, r, obtained by normalizing the similarity matrix^QAnd representing the problem matrix obtained by weighted summation of the single problem matrix.

The problem matrix u obtained after encoding in step S22 and having been reduced to a single problem length^QThe problem vector r is obtained by the similarity calculation, normalization and weighting^QAsWith each word vector in the document, a semantic vector c with respect to the document itself_tAre transmitted into the RNN together.

Calculating to obtain a similarity matrix of the documents and the problems according to the fourth document matrix and the output values of the RNN hidden layer nodes; the formula is as follows:

wherein h is^PA fourth matrix of documents is represented that,a similarity matrix representing the updated document vector and the restored single question word vector,andall are parameter matrices obtained by random initialization.

Normalizing the similarity matrix of the document and the problem to obtain a normalized similarity matrix of the document and the problem; the formula is as follows:wherein,a similarity matrix representing the normalized documents and questions.

And calculating the maximum value of the similarity matrix of the normalized document and the problem, and determining the position corresponding to the maximum value as an extraction starting position and an extraction ending position. The formula is as follows:

wherein p is^tIs thatThe start/end position of the corresponding answer in the document contents, a¹The corresponding maximum position is the answer extraction start position, a²The corresponding maximum position is the end position of the answer extraction, and n represents the document length.

For example, 5 sets of probability values are obtainedAndandandandandandrespectively calculating the product of 5 groups of probability values to obtainAndselectingAndand extracting a starting position and an ending position by taking a group of document words respectively corresponding to the probability values corresponding to the medium maximum value as answers.

It should be noted that t in the above formula represents a time index; and, the value of t in the formula referred to in step S25 is two, which are the first time and the second time, respectively, and the document word corresponding to the maximum probability value at the first time is determined as the extraction start position of the corresponding document content, and the document word corresponding to the maximum probability value at the second time is determined as the answer extraction start position corresponding to the extraction end position of the corresponding document content.

It should be noted that, in the above formula, the subscripts i and j each represent an index of a word in the corresponding vector or matrix.

As shown in fig. 4, the answer extraction device based on deep learning provided in this embodiment includes:

an obtaining module 41, configured to obtain a user question, and obtain document content related to the user question according to the user question;

a processing module 42, configured to determine an extraction starting position and an extraction ending position in the document content based on a deep learning model;

and a displaying module 43, configured to determine the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and display the answer.

Further, referring to fig. 5, the processing module 42 includes:

the first processing unit 421 is configured to obtain a user question to be processed and a document content to be processed according to the user question and the document content, perform word segmentation on the user question to be processed and the document content to be processed, and perform word vector conversion on each word to obtain a first question matrix and a first document matrix;

a second processing unit 422, configured to process the first document matrix, so that the processed first document matrix includes problem information, and encode the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively;

a third processing unit 423, configured to perform interaction processing on the second document matrix and the second problem matrix based on an attention mechanism, so as to obtain a third document matrix;

a fourth processing unit 424, configured to perform self-matching processing on the third document matrix based on an attention mechanism, to obtain a fourth document matrix;

a fifth processing unit 425, configured to determine, based on the pointer network, an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix.

Further, the first processing unit 421 is specifically configured to:

Further, the second processing unit 422 is specifically configured to:

Further, the word co-occurrence characteristics include: the first word co-occurrence feature and/or the second word co-occurrence feature, the second processing unit 422 is specifically configured to:

Further, the second processing unit 422 is specifically configured to:

Further, the third processing unit 423 is specifically configured to:

Further, the fourth processing unit 424 is specifically configured to:

determining the spliced matrix as a self-matching matrix; or,

Further, the fifth processing unit 425 is specifically configured to:

reducing the second problem matrix to obtain a reduced problem matrix;

In the embodiment, a user question is firstly obtained through an obtaining module, and document content related to the user question is obtained according to the user question; then determining an extraction starting position and an extraction ending position in the document content through a processing module based on a deep learning model; and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question through a display module, and displaying the answer. By adopting the answer extraction device, various matching rules are formulated without manually extracting features to extract answers, the obtained user questions and the document contents related to the user questions are directly input into the deep learning model, so that the most appropriate answers matched with the user questions can be obtained from the document contents, the answer extraction process is simplified, the answer accuracy is improved, and the efficiency and the quality of automatic customer service are greatly improved.

It should be noted that, for parts that are not described in detail in this embodiment, reference may be made to descriptions in the embodiment related to the method, and details are not described herein again.

Another embodiment of the present application provides a storage medium storing a computer program, which when executed by a processor, implements each step of an answer extraction method based on deep learning. The method comprises the following steps:

Further, the determining an extraction starting position and an extraction ending position in the document content based on the deep learning model includes:

Further, the obtaining the user question to be processed and the document content to be processed according to the user question and the document content respectively includes:

Further, the processing the first document matrix, so that the processed first document matrix includes problem information, includes:

Further, the performing, based on the attention mechanism, an interactive process on the second document matrix and the second problem matrix to obtain a third document matrix includes:

Further, the performing self-matching processing on the third document matrix based on the attention mechanism to obtain a fourth document matrix includes:

determining the spliced matrix as a self-matching matrix; or,

Further, the determining, based on the pointer network, an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix includes:

reducing the second problem matrix to obtain a reduced problem matrix;

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a resource signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. An answer extraction method based on deep learning is characterized by comprising the following steps:

2. The method according to claim 1, wherein the determining an extraction starting position and an extraction ending position in the document content based on the deep learning model comprises:

3. The method according to claim 2, wherein the obtaining the user question to be processed and the document content to be processed according to the user question and the document content comprises:

4. The method of claim 2, wherein processing the first document matrix such that the processed first document matrix contains question information comprises:

5. The method of claim 4, wherein the word co-occurrence feature comprises: the determining word co-occurrence characteristics and splicing the word co-occurrence characteristics to the tails of the corresponding document word vectors in the first document matrix comprises:

6. The method of claim 2, wherein the encoding the processed first document matrix and the first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively comprises:

7. The method of claim 6, wherein determining an input problem matrix from the first problem matrix comprises:

8. The method of claim 2, wherein the interactively processing the second document matrix and the second problem matrix based on the attention mechanism to obtain a third document matrix comprises:

9. The method of claim 8, wherein the processing the second document matrix and the second problem matrix based on the attention mechanism to obtain an interaction matrix, so that the interaction matrix includes comparison information of documents and problems, includes:

10. The method of claim 2, wherein the self-matching the third document matrix based on the attention mechanism to obtain a fourth document matrix comprises:

11. The method of claim 10, wherein the processing the third document matrix based on the attention mechanism to obtain a self-matching matrix comprises:

12. The method of claim 11, wherein determining a self-matching matrix from the stitched matrix comprises:

determining the spliced matrix as a self-matching matrix; or,

13. The method of claim 2, wherein the determining a start position and an end position of extraction in the document content based on the pointer network according to the fourth document matrix and the second problem matrix comprises:

reducing the second problem matrix to obtain a reduced problem matrix;

14. An answer extraction device based on deep learning, comprising:

15. A storage medium storing a computer program which, when executed by a processor, implements the steps of the answer extraction method based on deep learning according to any one of claims 1 to 13.