CN109977404A - Answer extracting method, apparatus and storage medium based on deep learning - Google Patents
Answer extracting method, apparatus and storage medium based on deep learning Download PDFInfo
- Publication number
- CN109977404A CN109977404A CN201910225135.0A CN201910225135A CN109977404A CN 109977404 A CN109977404 A CN 109977404A CN 201910225135 A CN201910225135 A CN 201910225135A CN 109977404 A CN109977404 A CN 109977404A
- Authority
- CN
- China
- Prior art keywords
- matrix
- document
- word
- determining
- processed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000013135 deep learning Methods 0.000 title claims abstract description 20
- 238000000605 extraction Methods 0.000 claims abstract description 180
- 238000013136 deep learning model Methods 0.000 claims abstract description 26
- 239000011159 matrix material Substances 0.000 claims description 949
- 239000013598 vector Substances 0.000 claims description 152
- 238000012545 processing Methods 0.000 claims description 138
- 230000007246 mechanism Effects 0.000 claims description 72
- 230000003993 interaction Effects 0.000 claims description 57
- 238000010606 normalization Methods 0.000 claims description 47
- 230000002452 interceptive effect Effects 0.000 claims description 35
- 238000013528 artificial neural network Methods 0.000 claims description 22
- 230000002457 bidirectional effect Effects 0.000 claims description 22
- 230000000306 recurrent effect Effects 0.000 claims description 22
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 14
- 239000000284 extract Substances 0.000 abstract description 4
- 239000010410 layer Substances 0.000 description 27
- 238000004364 calculation method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 239000002344 surface layer Substances 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0281—Customer communication at a business location, e.g. providing product or service information, consulting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Finance (AREA)
- Artificial Intelligence (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application involves a kind of answer extracting method, apparatus and storage medium based on deep learning, this method comprises: customer problem is obtained, and, according to the customer problem, obtain document content relevant to the customer problem;Based on deep learning model, is determined in the document content and extract initial position and extraction end position;By the extraction initial position and the document content extracted between end position, it is determined as answer corresponding to the customer problem, and show the answer.The application does not need manually to extract feature to formulate various matching rules and extract answer, the customer problem and document content relevant to customer problem that directly will acquire are input to deep learning model and can obtain the most suitable answer to match with customer problem from document content, simplify Answer extracting process, and answer accuracy is improved, to substantially increase the efficiency and quality of automatic customer service.
Description
Technical Field
The present application relates to the field of natural language understanding technologies, and in particular, to a method and an apparatus for extracting answers based on deep learning, and a storage medium.
Background
At present, in order to reduce the workload of customer service staff and improve the office efficiency, a plurality of merchants use intelligent customer service to automatically answer some questions of customers, and the intelligent customer service is mostly a document-based automatic question-answering system. Document-based automated question-answering systems typically include three modules, namely, question processing, chapter search, and answer processing. The working flow is that a user puts forward a problem in a natural language and a problem processing module processes the problem; then searching relevant documents containing answers from the massive document set by a chapter searching module in the system according to the processed questions; finally, the answer processing module extracts document blocks containing answers from the related documents through some answer extraction technologies and returns the document blocks to the user.
In the related art, the automatic question answering system usually has different answer extraction methods for different types of questions in the answer processing module. For example, for simple fact type questions, answers may be simply matched based on a bag-of-words model, i.e., named entities consistent with expected answer types are extracted from document periods as candidate answers; the answers can also be matched based on the surface layer mode, and the basic idea is that the answers of the questions and the keywords of the question sentence always have certain specific surface layer relations, so that the algorithm does not use too much deep language processing, and extracts the candidate answers meeting the surface layer rule mode from the document sentence segment. The answer extraction method needs manual feature extraction to formulate various matching rules to extract answers, so that the answer extraction process is complicated, the accuracy of the extracted answers is reduced, and the efficiency and the quality of automatic customer service are influenced.
Disclosure of Invention
To overcome, at least to some extent, the problems in the related art, the present application provides a method, an apparatus, and a storage medium for answer extraction based on deep learning.
According to a first aspect of embodiments of the present application, there is provided an answer extraction method based on deep learning, including:
acquiring a user question, and acquiring document content related to the user question according to the user question;
determining an extraction starting position and an extraction ending position in the document content based on a deep learning model;
and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and displaying the answer.
Optionally, the determining, based on the deep learning model, an extraction starting position and an extraction ending position in the document content includes:
respectively obtaining a user question to be processed and document content to be processed according to the user question and the document content, respectively performing word segmentation on the user question to be processed and the document content to be processed, and performing word vector conversion on each word to obtain a first question matrix and a first document matrix;
processing the first document matrix to enable the processed first document matrix to contain problem information, and respectively coding the processed first document matrix and the processed first problem matrix to respectively obtain a second document matrix and a second problem matrix;
based on an attention mechanism, performing interactive processing on the second document matrix and the second problem matrix to obtain a third document matrix;
based on an attention mechanism, performing self-matching processing on the third document matrix to obtain a fourth document matrix;
and determining an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix based on a pointer network.
Optionally, the obtaining the user question to be processed and the document content to be processed according to the user question and the document content respectively includes:
splicing all the document contents to obtain document contents to be processed; and/or the presence of a gas in the gas,
and repeating the user problems for multiple times, and splicing the repeated user problems to obtain the user problems to be processed, wherein the repeated times of the user problems are the total number of the document contents.
Optionally, the processing the first document matrix, so that the processed first document matrix includes problem information, includes:
determining word co-occurrence characteristics, and splicing the word co-occurrence characteristics to the tail of the corresponding document word vector in the first document matrix to obtain the processed first document matrix.
Optionally, the word co-occurrence feature includes: the determining word co-occurrence characteristics and splicing the word co-occurrence characteristics to the tails of the corresponding document word vectors in the first document matrix comprises:
corresponding to each word in the document content to be processed, if the word is the same as at least one word in a user question to be processed, determining that a first word co-occurrence characteristic corresponding to the word in the document content to be processed is a first value, otherwise, determining that the first word co-occurrence characteristic is a second value, wherein the first value and the second value are fixed values and are respectively used for indicating that the word in the document content appears or does not appear in the user question, and splicing the first word co-occurrence characteristic to the tail of a word vector corresponding to the word in the first document matrix; and/or the presence of a gas in the gas,
respectively calculating similarity values between each word vector in the first document matrix and each word vector in the first problem matrix, normalizing the similarity values corresponding to each word vector in the first document matrix, and splicing the normalized similarity values serving as second word co-occurrence characteristics to the tail of the corresponding word vector in the first document matrix.
Optionally, the encoding the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively includes:
taking the processed first document matrix as the input of a preset first GRU network, processing the processed first document matrix by adopting the first GRU network, and determining the output layer output of the first GRU network as a second document matrix; and the number of the first and second groups,
and determining an input problem matrix according to the first problem matrix, taking the input problem matrix as the input of a preset second GRU network, processing the input problem matrix by adopting the second GRU network, and determining the output layer output of the second GRU network as the second problem matrix.
Optionally, the determining an input problem matrix according to the first problem matrix includes:
determining the first problem matrix as an input problem matrix if the first GRU network is different from the second GRU network; or,
and if the first GRU network is the same as the second GRU network, splicing preset features at the tail part of each word vector corresponding to each word vector in the first problem matrix to obtain a spliced problem matrix, and determining the spliced problem matrix as an input problem matrix, wherein the number of the preset features is the same as the number of the word co-occurrence features.
Optionally, the performing, based on the attention mechanism, an interaction process on the second document matrix and the second problem matrix to obtain a third document matrix includes:
processing the second document matrix and the second problem matrix based on an attention mechanism to obtain an interaction matrix, so that the interaction matrix comprises comparison information of documents and problems;
and taking the interaction matrix as the input of a preset third GRU network, processing the interaction matrix by adopting the third GRU network, and determining the output layer output of the third GRU network as a third document matrix.
Optionally, the processing the second document matrix and the second problem matrix based on the attention mechanism to obtain an interaction matrix, so that the interaction matrix includes comparison information of documents and problems, including:
calculating to obtain a word pair similarity matrix of the documents and the problems according to the second document matrix and the second problem matrix, and determining a first normalization matrix and a second normalization matrix according to the word pair similarity matrix;
based on an attention mechanism, respectively adopting the first normalization matrix and the second problem matrix to carry out weighting operation, and adopting the first normalization matrix, the second normalization matrix and the second document matrix to carry out weighting operation, and respectively calculating to obtain a first interaction attention matrix and a second interaction attention matrix;
and sequentially splicing the second document matrix, the first interactive attention matrix, the dot-product matrix of the second document matrix and the first interactive attention matrix, and the dot-product matrix of the second document matrix and the second interactive attention matrix, and determining the spliced matrix as an interactive matrix.
Optionally, the performing, based on the attention mechanism, a self-matching process on the third document matrix to obtain a fourth document matrix includes:
processing the third document matrix based on an attention mechanism to obtain a self-matching matrix;
and taking the self-matching matrix as the input of a preset bidirectional recurrent neural network, processing the self-matching matrix by adopting the bidirectional recurrent neural network, and determining the hidden layer output of the bidirectional recurrent neural network as a fourth document matrix.
Optionally, the processing the third document matrix based on the attention mechanism to obtain a self-matching matrix includes:
according to the third document matrix, calculating to obtain a self-matching similarity matrix of the documents, and determining a self-matching weighting matrix according to the self-matching similarity matrix;
based on an attention mechanism, performing weighting operation on the third document matrix by adopting the self-matching weighting matrix, and calculating to obtain a self-matching attention matrix;
and splicing the third document matrix and the self-matching attention moment matrix to obtain a spliced matrix, and determining the self-matching matrix according to the spliced matrix.
Optionally, the determining a self-matching matrix according to the spliced matrix includes:
determining the spliced matrix as a self-matching matrix; or,
and based on a gate control mechanism, carrying out weighting processing on the spliced matrix, and determining the weighted matrix as a self-matching matrix.
Optionally, the determining, based on the pointer network, an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix includes:
reducing the second problem matrix to obtain a reduced problem matrix;
corresponding to each document content, calculating to obtain an attention matrix according to the fourth document matrix and the reduced problem matrix, wherein the attention moment matrix is used for representing semantic representation of a problem word vector on the document word vector and comprises an attention matrix at a first moment and an attention matrix at a second moment;
corresponding to each document content, calculating a probability value corresponding to each document word according to the attention matrix, determining the document word corresponding to the maximum probability value at the first moment as an extraction starting position of the corresponding document content, and determining the document word corresponding to the maximum probability value at the second moment as an extraction ending position of the corresponding document content;
determining the document content to be selected according to the product of the probability values corresponding to the extraction starting position and the extraction ending position corresponding to different document contents, and determining the extraction starting position and the extraction ending position corresponding to the document content to be selected as the extraction starting position and the extraction ending position which are finally adopted.
According to a second aspect of embodiments of the present application, there is provided an answer extraction device based on deep learning, including:
the acquisition module is used for acquiring user questions and acquiring document contents related to the user questions according to the user questions;
the processing module is used for determining an extraction starting position and an extraction ending position in the document content based on a deep learning model;
and the display module is used for determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question and displaying the answer.
Optionally, the processing module includes:
the first processing unit is used for respectively obtaining a user question to be processed and document content to be processed according to the user question and the document content, respectively segmenting words of the user question to be processed and the document content to be processed, and performing word vector conversion on each word to obtain a first question matrix and a first document matrix;
the second processing unit is used for processing the first document matrix to enable the processed first document matrix to contain problem information, and coding the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively;
the third processing unit is used for performing interactive processing on the second document matrix and the second problem matrix based on an attention mechanism to obtain a third document matrix;
the fourth processing unit is used for carrying out self-matching processing on the third document matrix based on an attention mechanism to obtain a fourth document matrix;
and the fifth processing unit is used for determining an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix based on a pointer network.
Optionally, the first processing unit is specifically configured to:
splicing all the document contents to obtain document contents to be processed; and/or the presence of a gas in the gas,
and repeating the user problems for multiple times, and splicing the repeated user problems to obtain the user problems to be processed, wherein the repeated times of the user problems are the total number of the document contents.
Optionally, the second processing unit is specifically configured to:
determining word co-occurrence characteristics, and splicing the word co-occurrence characteristics to the tail of the corresponding document word vector in the first document matrix to obtain the processed first document matrix.
Optionally, the word co-occurrence feature includes: the first word co-occurrence feature and/or the second word co-occurrence feature, the second processing unit is specifically configured to:
corresponding to each word in the document content to be processed, if the word is the same as at least one word in a user question to be processed, determining that a first word co-occurrence characteristic corresponding to the word in the document content to be processed is a first value, otherwise, determining that the first word co-occurrence characteristic is a second value, wherein the first value and the second value are fixed values and are respectively used for indicating that the word in the document content appears or does not appear in the user question, and splicing the first word co-occurrence characteristic to the tail of a word vector corresponding to the word in the first document matrix; and/or the presence of a gas in the gas,
respectively calculating similarity values between each word vector in the first document matrix and each word vector in the first problem matrix, normalizing the similarity values corresponding to each word vector in the first document matrix, and splicing the normalized similarity values serving as second word co-occurrence characteristics to the tail of the corresponding word vector in the first document matrix.
Optionally, the second processing unit is specifically configured to:
taking the processed first document matrix as the input of a preset first GRU network, processing the processed first document matrix by adopting the first GRU network, and determining the output layer output of the first GRU network as a second document matrix; and the number of the first and second groups,
and determining an input problem matrix according to the first problem matrix, taking the input problem matrix as the input of a preset second GRU network, processing the input problem matrix by adopting the second GRU network, and determining the output layer output of the second GRU network as the second problem matrix.
Optionally, the second processing unit is specifically configured to:
determining the first problem matrix as an input problem matrix if the first GRU network is different from the second GRU network; or,
and if the first GRU network is the same as the second GRU network, splicing preset features at the tail part of each word vector corresponding to each word vector in the first problem matrix to obtain a spliced problem matrix, and determining the spliced problem matrix as an input problem matrix, wherein the number of the preset features is the same as the number of the word co-occurrence features.
Optionally, the third processing unit is specifically configured to:
processing the second document matrix and the second problem matrix based on an attention mechanism to obtain an interaction matrix, so that the interaction matrix comprises comparison information of documents and problems;
and taking the interaction matrix as the input of a preset third GRU network, processing the interaction matrix by adopting the third GRU network, and determining the output layer output of the third GRU network as a third document matrix.
Optionally, the third processing unit is specifically configured to:
calculating to obtain a word pair similarity matrix of the documents and the problems according to the second document matrix and the second problem matrix, and determining a first normalization matrix and a second normalization matrix according to the word pair similarity matrix;
based on an attention mechanism, respectively adopting the first normalization matrix and the second problem matrix to carry out weighting operation, and adopting the first normalization matrix, the second normalization matrix and the second document matrix to carry out weighting operation, and respectively calculating to obtain a first interaction attention matrix and a second interaction attention matrix;
and sequentially splicing the second document matrix, the first interactive attention matrix, the dot-product matrix of the second document matrix and the first interactive attention matrix, and the dot-product matrix of the second document matrix and the second interactive attention matrix, and determining the spliced matrix as an interactive matrix.
Optionally, the fourth processing unit is specifically configured to:
processing the third document matrix based on an attention mechanism to obtain a self-matching matrix;
and taking the self-matching matrix as the input of a preset bidirectional recurrent neural network, processing the self-matching matrix by adopting the bidirectional recurrent neural network, and determining the hidden layer output of the bidirectional recurrent neural network as a fourth document matrix.
Optionally, the fourth processing unit is specifically configured to:
according to the third document matrix, calculating to obtain a self-matching similarity matrix of the documents, and determining a self-matching weighting matrix according to the self-matching similarity matrix;
based on an attention mechanism, performing weighting operation on the third document matrix by adopting the self-matching weighting matrix, and calculating to obtain a self-matching attention matrix;
and splicing the third document matrix and the self-matching attention moment matrix to obtain a spliced matrix, and determining the self-matching matrix according to the spliced matrix.
Optionally, the fourth processing unit is specifically configured to:
determining the spliced matrix as a self-matching matrix; or,
and based on a gate control mechanism, carrying out weighting processing on the spliced matrix, and determining the weighted matrix as a self-matching matrix.
Optionally, the fifth processing unit is specifically configured to:
reducing the second problem matrix to obtain a reduced problem matrix;
corresponding to each document content, calculating to obtain an attention matrix according to the fourth document matrix and the reduced problem matrix, wherein the attention moment matrix is used for representing semantic representation of a problem word vector on the document word vector and comprises an attention matrix at a first moment and an attention matrix at a second moment;
corresponding to each document content, calculating a probability value corresponding to each document word according to the attention matrix, determining the document word corresponding to the maximum probability value at the first moment as an extraction starting position of the corresponding document content, and determining the document word corresponding to the maximum probability value at the second moment as an extraction ending position of the corresponding document content;
determining the document content to be selected according to the product of the probability values corresponding to the extraction starting position and the extraction ending position corresponding to different document contents, and determining the extraction starting position and the extraction ending position corresponding to the document content to be selected as the extraction starting position and the extraction ending position which are finally adopted.
According to a third aspect of embodiments of the present application, there is provided a storage medium storing a computer program which, when executed by a processor, implements each step in an answer extraction method based on deep learning. The method comprises the following steps:
acquiring a user question, and acquiring document content related to the user question according to the user question;
determining an extraction starting position and an extraction ending position in the document content based on a deep learning model;
and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and displaying the answer.
Optionally, the determining, based on the deep learning model, an extraction starting position and an extraction ending position in the document content includes:
respectively obtaining a user question to be processed and document content to be processed according to the user question and the document content, respectively performing word segmentation on the user question to be processed and the document content to be processed, and performing word vector conversion on each word to obtain a first question matrix and a first document matrix;
processing the first document matrix to enable the processed first document matrix to contain problem information, and respectively coding the processed first document matrix and the processed first problem matrix to respectively obtain a second document matrix and a second problem matrix;
based on an attention mechanism, performing interactive processing on the second document matrix and the second problem matrix to obtain a third document matrix;
based on an attention mechanism, performing self-matching processing on the third document matrix to obtain a fourth document matrix;
and determining an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix based on a pointer network.
Optionally, the obtaining the user question to be processed and the document content to be processed according to the user question and the document content respectively includes:
splicing all the document contents to obtain document contents to be processed; and/or the presence of a gas in the gas,
and repeating the user problems for multiple times, and splicing the repeated user problems to obtain the user problems to be processed, wherein the repeated times of the user problems are the total number of the document contents.
Optionally, the processing the first document matrix, so that the processed first document matrix includes problem information, includes:
determining word co-occurrence characteristics, and splicing the word co-occurrence characteristics to the tail of the corresponding document word vector in the first document matrix to obtain the processed first document matrix.
Optionally, the word co-occurrence feature includes: the determining word co-occurrence characteristics and splicing the word co-occurrence characteristics to the tails of the corresponding document word vectors in the first document matrix comprises:
corresponding to each word in the document content to be processed, if the word is the same as at least one word in a user question to be processed, determining that a first word co-occurrence characteristic corresponding to the word in the document content to be processed is a first value, otherwise, determining that the first word co-occurrence characteristic is a second value, wherein the first value and the second value are fixed values and are respectively used for indicating that the word in the document content appears or does not appear in the user question, and splicing the first word co-occurrence characteristic to the tail of a word vector corresponding to the word in the first document matrix; and/or the presence of a gas in the gas,
respectively calculating similarity values between each word vector in the first document matrix and each word vector in the first problem matrix, normalizing the similarity values corresponding to each word vector in the first document matrix, and splicing the normalized similarity values serving as second word co-occurrence characteristics to the tail of the corresponding word vector in the first document matrix.
Optionally, the encoding the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively includes:
taking the processed first document matrix as the input of a preset first GRU network, processing the processed first document matrix by adopting the first GRU network, and determining the output layer output of the first GRU network as a second document matrix; and the number of the first and second groups,
and determining an input problem matrix according to the first problem matrix, taking the input problem matrix as the input of a preset second GRU network, processing the input problem matrix by adopting the second GRU network, and determining the output layer output of the second GRU network as the second problem matrix.
Optionally, the determining an input problem matrix according to the first problem matrix includes:
determining the first problem matrix as an input problem matrix if the first GRU network is different from the second GRU network; or,
and if the first GRU network is the same as the second GRU network, splicing preset features at the tail part of each word vector corresponding to each word vector in the first problem matrix to obtain a spliced problem matrix, and determining the spliced problem matrix as an input problem matrix, wherein the number of the preset features is the same as the number of the word co-occurrence features.
Optionally, the performing, based on the attention mechanism, an interaction process on the second document matrix and the second problem matrix to obtain a third document matrix includes:
processing the second document matrix and the second problem matrix based on an attention mechanism to obtain an interaction matrix, so that the interaction matrix comprises comparison information of documents and problems;
and taking the interaction matrix as the input of a preset third GRU network, processing the interaction matrix by adopting the third GRU network, and determining the output layer output of the third GRU network as a third document matrix.
Optionally, the processing the second document matrix and the second problem matrix based on the attention mechanism to obtain an interaction matrix, so that the interaction matrix includes comparison information of documents and problems, including:
calculating to obtain a word pair similarity matrix of the documents and the problems according to the second document matrix and the second problem matrix, and determining a first normalization matrix and a second normalization matrix according to the word pair similarity matrix;
based on an attention mechanism, respectively adopting the first normalization matrix and the second problem matrix to carry out weighting operation, and adopting the first normalization matrix, the second normalization matrix and the second document matrix to carry out weighting operation, and respectively calculating to obtain a first interaction attention matrix and a second interaction attention matrix;
and sequentially splicing the second document matrix, the first interactive attention matrix, the dot-product matrix of the second document matrix and the first interactive attention matrix, and the dot-product matrix of the second document matrix and the second interactive attention matrix, and determining the spliced matrix as an interactive matrix.
Optionally, the performing, based on the attention mechanism, a self-matching process on the third document matrix to obtain a fourth document matrix includes:
processing the third document matrix based on an attention mechanism to obtain a self-matching matrix;
and taking the self-matching matrix as the input of a preset bidirectional recurrent neural network, processing the self-matching matrix by adopting the bidirectional recurrent neural network, and determining the hidden layer output of the bidirectional recurrent neural network as a fourth document matrix.
Optionally, the processing the third document matrix based on the attention mechanism to obtain a self-matching matrix includes:
according to the third document matrix, calculating to obtain a self-matching similarity matrix of the documents, and determining a self-matching weighting matrix according to the self-matching similarity matrix;
based on an attention mechanism, performing weighting operation on the third document matrix by adopting the self-matching weighting matrix, and calculating to obtain a self-matching attention matrix;
and splicing the third document matrix and the self-matching attention moment matrix to obtain a spliced matrix, and determining the self-matching matrix according to the spliced matrix.
Optionally, the determining a self-matching matrix according to the spliced matrix includes:
determining the spliced matrix as a self-matching matrix; or,
and based on a gate control mechanism, carrying out weighting processing on the spliced matrix, and determining the weighted matrix as a self-matching matrix.
Optionally, the determining, based on the pointer network, an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix includes:
reducing the second problem matrix to obtain a reduced problem matrix;
corresponding to each document content, calculating to obtain an attention matrix according to the fourth document matrix and the reduced problem matrix, wherein the attention moment matrix is used for representing semantic representation of a problem word vector on the document word vector and comprises an attention matrix at a first moment and an attention matrix at a second moment;
corresponding to each document content, calculating a probability value corresponding to each document word according to the attention matrix, determining the document word corresponding to the maximum probability value at the first moment as an extraction starting position of the corresponding document content, and determining the document word corresponding to the maximum probability value at the second moment as an extraction ending position of the corresponding document content;
determining the document content to be selected according to the product of the probability values corresponding to the extraction starting position and the extraction ending position corresponding to different document contents, and determining the extraction starting position and the extraction ending position corresponding to the document content to be selected as the extraction starting position and the extraction ending position which are finally adopted.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
firstly, obtaining a user question, and obtaining document content related to the user question according to the user question; then, based on a deep learning model, determining an extraction starting position and an extraction ending position in the document content; and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and displaying the answer. By adopting the answer extraction method, various matching rules are formulated without manually extracting features to extract answers, the obtained user questions and the document contents related to the user questions are directly input into the deep learning model, so that the most appropriate answers matched with the user questions can be obtained from the document contents, the answer extraction process is simplified, the answer accuracy is improved, and the efficiency and the quality of automatic customer service are greatly improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flowchart illustrating an answer extraction method based on deep learning according to an exemplary embodiment.
Fig. 2 is a flowchart illustrating a method for determining an extraction start position and an extraction end position in document contents based on a deep learning model according to an exemplary embodiment.
Fig. 3 is a schematic diagram illustrating a structure of a deep learning network model according to another exemplary embodiment.
Fig. 4 is a schematic structural diagram illustrating an answer extraction apparatus based on deep learning according to another exemplary embodiment.
Fig. 5 is a schematic structural diagram illustrating processing modules in an answer extraction device based on deep learning according to another exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods and apparatus consistent with certain aspects of the present application, as detailed in the appended claims.
Fig. 1 illustrates a method for answer extraction based on deep learning according to an exemplary embodiment.
As shown in fig. 1, the method provided by this embodiment includes the following steps:
step S11, obtaining user questions, and obtaining document contents related to the user questions according to the user questions;
for example, the customer service robot may receive user questions entered by a user, the user may enter in natural language, the input may be in the form of text, voice, etc. For example, the user question is "can stock be bought and sold on the day? ".
After the user question is obtained, the customer service robot may obtain related document content by using a related technology, for example, the related document content is the document content containing the user question. The specific acquisition of the document content related to the user problem can be realized by using known technology, and is not detailed here. Based on the above-mentioned user problem, a related document content such as "T +0 transaction" is a transaction method introduced by deep drafts 1993, which means that after the investor purchases (sells) stocks (or futures) and confirms the transaction in the day, the stocks purchased (sold) in the day can be sold in the day, and the stocks sold in the day can be purchased in the day. From 1/1 in 1995, China executed a "T + 1" trading system, i.e., the stock bought the day must be sold the next trading day, in order to ensure the stability of the stock market and prevent excessive crisis. At the same time, the fund still implements "T + 0", namely the fund returned to the cage on the day is immediately available. The stock of B shares uses T +1, and the fund is applicable to T + 3. ".
Step S12, determining an extraction starting position and an extraction ending position in the document content based on a deep learning model;
further details of this step can be found in the related description that follows.
Step S13, determining the document content between the extraction start position and the extraction end position as an answer corresponding to the user question, and displaying the answer.
For example, based on the deep learning model, if it is determined that the extraction start position and the extraction end position are "1995" and "sold", respectively, from 1 month and 1 day of 1995, china carries out a "T + 1" trading system, i.e., the stock bought the day must be sold until the next trading day, in order to ensure the stability of the stock market and prevent excessive crisis. "is it determined to be a user question" can stock be bought and sold on the day? "the corresponding answer. The answer may then be presented. The presentation mode can be in the form of text or voice.
In the embodiment, a user question is firstly obtained, and document content related to the user question is obtained according to the user question; then, based on a deep learning model, determining an extraction starting position and an extraction ending position in the document content; and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and displaying the answer. By adopting the answer extraction method, various matching rules are formulated without manually extracting features to extract answers, the obtained user questions and the document contents related to the user questions are directly input into the deep learning model, so that the most appropriate answers matched with the user questions can be obtained from the document contents, the answer extraction process is simplified, the answer accuracy is improved, and the efficiency and the quality of automatic customer service are greatly improved.
Further, referring to fig. 2, the determining an extraction starting position and an extraction ending position in the document content based on the deep learning model includes the following steps:
step S21, respectively obtaining a user question to be processed and a document content to be processed according to the user question and the document content, respectively performing word segmentation on the user question to be processed and the document content to be processed, and performing word vector conversion on each word to obtain a first question matrix and a first document matrix;
in conjunction with the structural diagram of the deep learning network model shown in fig. 3, in S31, a first question matrix and a first document matrix, which are represented by word vectors in fig. 3, may be obtained corresponding to the document content and the user question.
Step S22, processing the first document matrix to make the processed first document matrix contain problem information, and coding the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively;
as shown in fig. 3, the second document matrix and the second question matrix may be obtained by self-encoding the first question matrix and the first document matrix, respectively, in S32.
Step S23, based on the attention mechanism, performing interactive processing on the second document matrix and the second problem matrix to obtain a third document matrix;
as shown in fig. 3, after the attention-based mixed interaction process is performed in S33, a third document matrix is obtained.
Step S24, based on the attention mechanism, the third document matrix is subjected to self-matching processing to obtain a fourth document matrix;
as shown in fig. 3, self-attention processing is performed in S34 to obtain a fourth document matrix.
And step S25, determining an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix based on a pointer network.
As shown in fig. 3, prediction is performed in S35 to obtain the final presented answer.
In step S21, obtaining a user question to be processed and a document content to be processed according to the user question and the document content respectively includes:
splicing all the document contents to obtain document contents to be processed; and/or the presence of a gas in the gas,
and repeating the user problems for multiple times, and splicing the repeated user problems to obtain the user problems to be processed, wherein the repeated times of the user problems are the total number of the document contents.
Specifically, the document content to be processed is formed by splicing the contents of a plurality of documents related to the question asked by the user, for example, according to the key words in the question asked by the user, k documents closest to the question asked are retrieved from massive documents on the network, and all the contents of the k documents are spliced together to form the document content to be transmitted into the deep learning model; correspondingly, the user questions to be processed are spliced by the single questions asked by the k users.
After obtaining document contents to be processed and user problems to be processed, performing word segmentation on the user problems and the document contents respectively, and performing word vector conversion on each word segmentation, wherein the word vector conversion is completed by a pre-trained word vector model, the user problems to be processed (marked as q) and the document contents to be processed (marked as c) are transmitted into the pre-trained word vector model, and each word in the user problems and the document contents is mapped into a 300-dimensional full real number vector, namely a word vector (or called as a parameter matrix), by the word vector model, so that a first problem matrix (marked as q _ emb) and a first document matrix (marked as c _ emb) are obtained; where the word vector model is used to represent a mapping of words to word vectors.
In step S22, the processing the first document matrix so that the processed first document matrix contains question information includes:
determining word co-occurrence characteristics, and splicing the word co-occurrence characteristics to the tail of the corresponding document word vector in the first document matrix to obtain the processed first document matrix.
Further, the word co-occurrence characteristics include: the determining word co-occurrence characteristics and splicing the word co-occurrence characteristics to the tails of the corresponding document word vectors in the first document matrix comprises:
corresponding to each word in the document content to be processed, if the word is the same as at least one word in a user question to be processed, determining that a first word co-occurrence characteristic corresponding to the word in the document content to be processed is a first value, otherwise, determining that the first word co-occurrence characteristic is a second value, wherein the first value and the second value are fixed values and are respectively used for indicating that the word in the document content appears or does not appear in the user question, and splicing the first word co-occurrence characteristic to the tail of a word vector corresponding to the word in the first document matrix; and/or the presence of a gas in the gas,
respectively calculating similarity values between each word vector in the first document matrix and each word vector in the first problem matrix, normalizing the similarity values corresponding to each word vector in the first document matrix, and splicing the normalized similarity values serving as second word co-occurrence characteristics to the tail of the corresponding word vector in the first document matrix.
Specifically, in order to enable the document content to be processed to contain information of some user questions, two word co-occurrence features need to be added into each word vector in the first document matrix, corresponding to the thinking of people in normal reading, if a word in a user question appears at a certain position in the document content, the content near the position is likely to be an answer. Taking the example of adding two word co-occurrence features, adding a first word co-occurrence feature and a second word co-occurrence feature in each word vector in the first document matrix, where the first word co-occurrence feature is used to indicate whether a word in the document content appears in the user question, and the second word co-occurrence feature is used to indicate the similarity between a word in the document content and a word in the user question.
It is to be understood that the first value may be 1 and the second value may be 0.
In particular, corresponding to each word vector in the first document matrix, if each word vector corresponds to the same word in the document content to be processed as the word corresponding to at least one word vector in the first question matrix, determining a first word co-occurrence characteristic (denoted as wiq)b) Is a first value, i.e., 1; otherwise, determining that the first word co-occurrence characteristic is a second value, namely 0. Namely, judging whether the word in the document content to be processed appears in the user problem to be processed, if so, the first word co-occurrence feature is 1, otherwise, the first word co-occurrence feature is 0, and splicing the determined first word co-occurrence feature to the tail part of the corresponding word vector in the first document matrix.
It should be noted that the formula for calculating the co-occurrence feature of the first word is as follows:the meaning of this formula is: if the jth word in the document content to be processed is the same as the ith word in the user question to be processed, the co-occurrence characteristic of the first word corresponding to the jth word in the document is 1, otherwise, the co-occurrence characteristic is 0.
Then, similarity values between the word vectors in the first document matrix and the word vectors in the first problem matrix are respectively calculated, the calculated similarity values are normalized, and the normalized values are determined as second word co-occurrence characteristics.
It should be noted that the value range of the co-occurrence feature of the second word is [0, 1 ]; the calculation formula of the co-occurrence feature of the second word is as follows:
simi,j=vwiq(xj⊙qi),vwiq∈Rn
wherein v iswiqIs a parameter matrix, x, obtained by pre-trainingjRepresenting the jth word vector, q, in the document word vectoriRepresenting the ith word vector, sim, of the question word vectorsi,jA similarity score representing the jth word in the document content and the ith word in the user question,represents pair simi,jThe softmax normalization is performed and,a second word co-occurrence feature representing a jth word in the document.
After determining a first word co-occurrence feature and a second word co-occurrence feature corresponding to each word vector in the first document matrix, splicing the first word co-occurrence feature and the second word co-occurrence feature to the tail of the corresponding word vector in the first document matrix to obtain a processed first document matrix; for example, the dimension of each word vector is 300, and the dimension of the document word vector before splicing is [ batch _ size, seq _ len,300], so that the dimension of the document word vector after splicing two word co-occurrence features after each word vector in the document word vector becomes [ batch _ size, seq _ len,302 ].
The processed first document matrix obtained through the above process contains problem information.
It should be noted that after the document content and the user question are mapped into the document word vector and the question word vector, batch processing needs to be performed on the document word vector and the question word vector. The batch _ size represents the number of each batch of the obtained document word vectors after batch processing; seq _ len represents the document word vector length.
It should be noted that the first word co-occurrence feature and the second word co-occurrence feature both represent paragraph lengths, if a word vector in the first document matrix is the same as at least one word vector in the first document matrix, then the first word co-occurrence feature (value is 1) added at the tail of the corresponding word vector in the first document matrix, that is, the value of the added dimension corresponding to the corresponding word vector is 1, otherwise, the value of the added dimension corresponding to the word vector is 0; similarly, if the similarity value between a word vector in the first document matrix and a word vector in the first question matrix is 0.6, the second word co-occurrence feature added to the tail of the corresponding word vector in the first document matrix is 0.6, i.e., the value of the tail of the corresponding word vector in the added dimension is 0.6.
Further, the encoding the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively includes:
taking the processed first document matrix as the input of a preset first GRU network, processing the processed first document matrix by adopting the first GRU network, and determining the output layer output of the first GRU network as a second document matrix; and the number of the first and second groups,
and determining an input problem matrix according to the first problem matrix, taking the input problem matrix as the input of a preset second GRU network, processing the input problem matrix by adopting the second GRU network, and determining the output layer output of the second GRU network as the second problem matrix.
Further, the determining an input problem matrix according to the first problem matrix includes:
determining the first problem matrix as an input problem matrix if the first GRU network is different from the second GRU network; or,
and if the first GRU network is the same as the second GRU network, splicing preset features at the tail part of each word vector corresponding to each word vector in the first problem matrix to obtain a spliced problem matrix, and determining the spliced problem matrix as an input problem matrix, wherein the number of the preset features is the same as the number of the word co-occurrence features.
Specifically, after the word co-occurrence feature is added to the processed first document matrix, the dimension of the processed first document matrix is different from that of the first problem matrix, different encoders are needed for encoding the matrices with different dimensions, and if the processed first document matrix and the processed first problem matrix are encoded by using the same encoder, the tail of each word vector in the first problem matrix needs to be spliced with two preset features to obtain the processed first problem matrix. In this way, the dimension of the obtained processed first problem matrix is the same as the dimension of the processed first document matrix. Both preset features may be, but are not limited to being, 1, i.e., the value in both dimensions added at the end of each word vector in the first problem matrix is 1.
In one embodiment, the encoder is a Gated current Unit (GRU), and the processed first document matrix and the processed first problem matrix are encoded through a predetermined number of layers of GUR networks to obtain a second document matrix (denoted as C) and a second problem matrix (denoted as Q). The second document matrix obtained at this time contains information of the user question.
It should be noted that the preset number may be, but is not limited to, 2, or 3, or 4.
Step S22 is equivalent to "read first with question", understanding the question and document by GRU coding.
In step S23, based on an attention mechanism, performing interaction processing on the second document matrix and the second problem matrix to obtain a third document matrix; the method comprises the following steps:
processing the second document matrix and the second problem matrix based on an attention mechanism to obtain an interaction matrix, so that the interaction matrix comprises comparison information of documents and problems;
and taking the interaction matrix as the input of a preset third GRU network, processing the interaction matrix by adopting the third GRU network, and determining the output layer output of the third GRU network as a third document matrix.
Further, the processing the second document matrix and the second problem matrix based on the attention mechanism to obtain an interaction matrix, so that the interaction matrix includes comparison information of documents and problems, including:
calculating to obtain a word pair similarity matrix of the documents and the problems according to the second document matrix and the second problem matrix, and determining a first normalization matrix and a second normalization matrix according to the word pair similarity matrix;
based on an attention mechanism, respectively adopting the first normalization matrix and the second problem matrix to carry out weighting operation, and adopting the first normalization matrix, the second normalization matrix and the second document matrix to carry out weighting operation, and respectively calculating to obtain a first interaction attention matrix and a second interaction attention matrix;
and sequentially splicing the second document matrix, the first interactive attention matrix, the dot-product matrix of the second document matrix and the first interactive attention matrix, and the dot-product matrix of the second document matrix and the second interactive attention matrix, and determining the spliced matrix as an interactive matrix.
Specifically, a similarity matrix of the document-question word pairs is obtained by adopting a trilinear function and calculating according to the second document matrix and the second question matrix. The formula is f (q, c) W0[q,c,q⊙c](ii) a Wherein, W0Is a weight matrix, q is a word vector representation for each word in the second question matrix obtained in step S22, c is a word vector representation for each word in the second document matrix obtained in step S22, and f (q, c) represents a similarity matrix of document-question word pairs.
In one embodiment, the weight matrix W0Is obtained by random initialization.
Then respectively normalizing the row and the column of the similarity matrix of the document-question word pair to obtain a row normalization matrix (usingRepresentation) and column normalization matrix (in)Representation).
The row normalization matrix is a first normalization matrix, and the column normalization matrix is a second normalization matrix.
Calculating to obtain a document-problem attention matrix according to the row normalization matrix and the transpose matrix of the second problem matrix; the calculation formula is as follows:wherein Q isTA transpose matrix representing a second problem matrix, a representing a document-problem attention matrix;
calculating to obtain a problem-document attention matrix according to the row normalization matrix, the column normalization matrix and the transpose matrix of the second document matrix; the calculation formula isWherein, CTA transposed matrix representing a second document matrix,a transpose matrix representing a column normalization matrix, B represents a document-problem attention matrix;
then, a second document matrix C, a document-question attention matrix A, a matrix obtained by performing point multiplication on the second document matrix C and the document-question attention matrix A, and a matrix obtained by performing point multiplication on the second document matrix C and the question-document attention matrix B are sequentially spliced to obtain a spliced matrix, namely the spliced matrixAnd encoding the spliced matrix through a GRU network to obtain a third document matrix, wherein the third document matrix integrates the comparison information of the problem and the document.
It should be noted that the dimension of the document matrix C is [ batch _ size, paragraph length, hidden ], the dimension of the problem matrix is [ batch _ size, problem length, hidden ], when the matrices are spliced, the matrices are sequentially spliced on the last dimension of the previous matrix, and the dimension of the spliced matrix is [ batch _ size, paragraph length, hidden 4 ]. Wherein, hidden represents the number of cryptic neurons after encoding.
Step S23 corresponds to "read document with question" and "read question with document". Comparing the second document matrix C obtained in the last step with the second question matrix Q, namely calculating the attention distribution of each word in the document content about each word in the user question through an attention mechanism, namely measuring a document word vector by using a question word vector; for each word in the user question, its attention distribution with respect to each word in the document content is calculated, i.e., the question word vector is measured with the document word vector. In this manner, a connection is established between the user question and the relevant content portion of the document to locate within the document content the portion that is truly useful for answering the user question.
In step S24, based on the attention mechanism, performing self-matching processing on the third document matrix to obtain a fourth document matrix; the method comprises the following steps:
processing the third document matrix based on an attention mechanism to obtain a self-matching matrix;
and taking the self-matching matrix as the input of a preset bidirectional recurrent neural network, processing the self-matching matrix by adopting the bidirectional recurrent neural network, and determining the hidden layer output of the bidirectional recurrent neural network as a fourth document matrix.
Further, the processing the third document matrix based on the attention mechanism to obtain a self-matching matrix includes:
according to the third document matrix, calculating to obtain a self-matching similarity matrix of the documents, and determining a self-matching weighting matrix according to the self-matching similarity matrix;
based on an attention mechanism, performing weighting operation on the third document matrix by adopting the self-matching weighting matrix, and calculating to obtain a self-matching attention matrix;
and splicing the third document matrix and the self-matching attention moment matrix to obtain a spliced matrix, and determining the self-matching matrix according to the spliced matrix.
Specifically, according to the third document matrix and a parameter matrix obtained by pre-training, calculating to obtain a self-matching similarity matrix of the documents and the documents; the formula is as follows:
wherein v isT、Andare all parameter matrices derived by random initialization, vPA vector representation of the third document matrix obtained in step S23 in which the alignment information of the question and the document is fused, and j represents a word index.
Normalizing the self-matching similarity matrix of the document and the document to obtain a normalized matrix; the formula is as follows:wherein,representing the normalized matrix.
According to the vector representation of the third document matrix and the normalized matrix, calculating to obtain a self-matching attention matrix of each word in the document corresponding to the whole document; the formula is as follows:the meaning is as follows: weighted sum is carried out on the weight and the word vectors in the third document matrix corresponding to the weight to obtain a self-matching attention matrix ctWhich represents a semantic vector for each word in the document corresponding to the entire document. Splicing the third document matrix and the self-matching attention moment matrix to obtain a spliced matrix
Further, the determining a self-matching matrix according to the spliced matrix includes:
determining the spliced matrix as a self-matching matrix; or,
and based on a gate control mechanism, carrying out weighting processing on the spliced matrix, and determining the weighted matrix as a self-matching matrix.
Specifically, the obtained spliced matrix can be directly determined as a self-matching matrix; alternatively, in order to control the importance of different parts in the document content, a gate control mechanism may be additionally introduced to adaptively control the importance of different parts. That is, first, according to the sigmoid function and the self-matching attention matrix c corresponding to each word in the documenttAnd calculating a third document matrix to obtain a weight matrix (marked as g)t) To control the importance of different parts of the document, and then gtWith the resulting spliced matrixPerforming dot product calculation to obtain a new matrix, and determining the obtained new matrix as a self-matching matrix; the formula is as follows:
wherein, WgA parameter matrix obtained by random initialization is shown,the resulting matrix after the stitching is represented,representing the self-matching matrix after processing by the gating control mechanism.
And after the self-matching matrix is obtained, the self-matching matrix is used as an input quantity and is input into a bidirectional Recurrent Neural Network (BiRNN) to obtain an output value of a hidden layer node of the BiRNN, and the output value is determined as a fourth document matrix. It is expressed as:
wherein,is the output of the BiRNN hidden node at time t.
In step S24, the document is read again and "read for the third time" corresponding to the understanding of the document and the problem with respect to the problem. The word representation of the document content is adjusted by using the context of the document content to obtain more context information of the document content, namely, words far away in the same document content are compared, and the current word can be distinguished from other words with similar meanings in the rest part of the document.
In step S25, determining an extraction start position and an extraction end position in the document content according to the fourth document matrix and the second question matrix based on a pointer network, including:
reducing the second problem matrix to obtain a reduced problem matrix;
corresponding to each document content, calculating to obtain an attention matrix according to the fourth document matrix and the reduced problem matrix, wherein the attention moment matrix is used for representing semantic representation of a problem word vector on the document word vector and comprises an attention matrix at a first moment and an attention matrix at a second moment;
corresponding to each document content, calculating a probability value corresponding to each document word according to the attention matrix, determining the document word corresponding to the maximum probability value at the first moment as an extraction starting position of the corresponding document content, and determining the document word corresponding to the maximum probability value at the second moment as an extraction ending position of the corresponding document content;
determining the document content to be selected according to the product of the probability values corresponding to the extraction starting position and the extraction ending position corresponding to different document contents, and determining the extraction starting position and the extraction ending position corresponding to the document content to be selected as the extraction starting position and the extraction ending position which are finally adopted.
In particular, since the user question is formed by splicing several individual questions asked by the initial user, here, the user question is first reduced to a single question, that is, the encoded second problem matrix obtained in step S22 is reduced to a single problem matrix, then, for the fourth document matrix obtained in step S24, the attention matrix for the single question matrix is calculated, and then the start/end positions corresponding to the maximum probability values in the attention matrix corresponding to the fourth document matrix are calculated respectively as the start/end positions of answer extraction, because the document content is formed by splicing the contents of a plurality of related documents, a plurality of pairs of starting positions and ending positions are correspondingly obtained, products of probability values corresponding to each pair of starting positions and ending positions are calculated, and a group with the largest probability product is selected as a final answer to extract the starting/ending positions. For example, the document content is obtained by splicing the content of 5 related documents, and finally 5 start positions and end positions are obtained correspondingly, the probability values corresponding to the 5 real positions and the end positions are multiplied to obtain the product of 5 probabilities, and the final answer extraction start/end position of the start position and the end position corresponding to the value with the maximum probability is selected from the product.
The detailed formula calculation process is as follows:
calculating a semantic vector of each word vector in the document relative to the document according to the fourth document matrix, and taking the semantic vector as an input quantity of the RNN to obtain an output value of the RNN hidden node; the formula is as follows:
wherein,a vector representation, c, corresponding to the fourth document matrix obtained in step S24tIs the semantic vector of each word vector in the document at time t with respect to the document itself,is a problem vector generated with the attention of the documentWherein r isQThe calculation process of (2) is as follows:
wherein v isT、Andare all parameters, u, obtained by random initializationQA vector representation, S, representing the problem matrix obtained after encoding in step S22 and having been reduced to a single problem lengthjRepresenting a single problemSimilarity matrix with the individual problem itself, aiRepresenting a normalized matrix, r, obtained by normalizing the similarity matrixQAnd representing the problem matrix obtained by weighted summation of the single problem matrix.
The problem matrix u obtained after encoding in step S22 and having been reduced to a single problem lengthQThe problem vector r is obtained by the similarity calculation, normalization and weightingQAsWith each word vector in the document, a semantic vector c with respect to the document itselftAre transmitted into the RNN together.
Calculating to obtain a similarity matrix of the documents and the problems according to the fourth document matrix and the output values of the RNN hidden layer nodes; the formula is as follows:
wherein h isPA fourth matrix of documents is represented that,a similarity matrix representing the updated document vector and the restored single question word vector,andall are parameter matrices obtained by random initialization.
Normalizing the similarity matrix of the document and the problem to obtain a normalized similarity matrix of the document and the problem; the formula is as follows:wherein,a similarity matrix representing the normalized documents and questions.
And calculating the maximum value of the similarity matrix of the normalized document and the problem, and determining the position corresponding to the maximum value as an extraction starting position and an extraction ending position. The formula is as follows:
wherein p istIs thatThe start/end position of the corresponding answer in the document contents, a1The corresponding maximum position is the answer extraction start position, a2The corresponding maximum position is the end position of the answer extraction, and n represents the document length.
For example, 5 sets of probability values are obtainedAndandandandandandrespectively calculating the product of 5 groups of probability values to obtainAndselectingAndand extracting a starting position and an ending position by taking a group of document words respectively corresponding to the probability values corresponding to the medium maximum value as answers.
It should be noted that t in the above formula represents a time index; and, the value of t in the formula referred to in step S25 is two, which are the first time and the second time, respectively, and the document word corresponding to the maximum probability value at the first time is determined as the extraction start position of the corresponding document content, and the document word corresponding to the maximum probability value at the second time is determined as the answer extraction start position corresponding to the extraction end position of the corresponding document content.
It should be noted that, in the above formula, the subscripts i and j each represent an index of a word in the corresponding vector or matrix.
Fig. 4 is a schematic structural diagram illustrating an answer extraction apparatus based on deep learning according to another exemplary embodiment.
As shown in fig. 4, the answer extraction device based on deep learning provided in this embodiment includes:
an obtaining module 41, configured to obtain a user question, and obtain document content related to the user question according to the user question;
a processing module 42, configured to determine an extraction starting position and an extraction ending position in the document content based on a deep learning model;
and a displaying module 43, configured to determine the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and display the answer.
Further, referring to fig. 5, the processing module 42 includes:
the first processing unit 421 is configured to obtain a user question to be processed and a document content to be processed according to the user question and the document content, perform word segmentation on the user question to be processed and the document content to be processed, and perform word vector conversion on each word to obtain a first question matrix and a first document matrix;
a second processing unit 422, configured to process the first document matrix, so that the processed first document matrix includes problem information, and encode the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively;
a third processing unit 423, configured to perform interaction processing on the second document matrix and the second problem matrix based on an attention mechanism, so as to obtain a third document matrix;
a fourth processing unit 424, configured to perform self-matching processing on the third document matrix based on an attention mechanism, to obtain a fourth document matrix;
a fifth processing unit 425, configured to determine, based on the pointer network, an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix.
Further, the first processing unit 421 is specifically configured to:
splicing all the document contents to obtain document contents to be processed; and/or the presence of a gas in the gas,
and repeating the user problems for multiple times, and splicing the repeated user problems to obtain the user problems to be processed, wherein the repeated times of the user problems are the total number of the document contents.
Further, the second processing unit 422 is specifically configured to:
determining word co-occurrence characteristics, and splicing the word co-occurrence characteristics to the tail of the corresponding document word vector in the first document matrix to obtain the processed first document matrix.
Further, the word co-occurrence characteristics include: the first word co-occurrence feature and/or the second word co-occurrence feature, the second processing unit 422 is specifically configured to:
corresponding to each word in the document content to be processed, if the word is the same as at least one word in a user question to be processed, determining that a first word co-occurrence characteristic corresponding to the word in the document content to be processed is a first value, otherwise, determining that the first word co-occurrence characteristic is a second value, wherein the first value and the second value are fixed values and are respectively used for indicating that the word in the document content appears or does not appear in the user question, and splicing the first word co-occurrence characteristic to the tail of a word vector corresponding to the word in the first document matrix; and/or the presence of a gas in the gas,
respectively calculating similarity values between each word vector in the first document matrix and each word vector in the first problem matrix, normalizing the similarity values corresponding to each word vector in the first document matrix, and splicing the normalized similarity values serving as second word co-occurrence characteristics to the tail of the corresponding word vector in the first document matrix.
Further, the second processing unit 422 is specifically configured to:
taking the processed first document matrix as the input of a preset first GRU network, processing the processed first document matrix by adopting the first GRU network, and determining the output layer output of the first GRU network as a second document matrix; and the number of the first and second groups,
and determining an input problem matrix according to the first problem matrix, taking the input problem matrix as the input of a preset second GRU network, processing the input problem matrix by adopting the second GRU network, and determining the output layer output of the second GRU network as the second problem matrix.
Further, the second processing unit 422 is specifically configured to:
determining the first problem matrix as an input problem matrix if the first GRU network is different from the second GRU network; or,
and if the first GRU network is the same as the second GRU network, splicing preset features at the tail part of each word vector corresponding to each word vector in the first problem matrix to obtain a spliced problem matrix, and determining the spliced problem matrix as an input problem matrix, wherein the number of the preset features is the same as the number of the word co-occurrence features.
Further, the third processing unit 423 is specifically configured to:
processing the second document matrix and the second problem matrix based on an attention mechanism to obtain an interaction matrix, so that the interaction matrix comprises comparison information of documents and problems;
and taking the interaction matrix as the input of a preset third GRU network, processing the interaction matrix by adopting the third GRU network, and determining the output layer output of the third GRU network as a third document matrix.
Further, the third processing unit 423 is specifically configured to:
calculating to obtain a word pair similarity matrix of the documents and the problems according to the second document matrix and the second problem matrix, and determining a first normalization matrix and a second normalization matrix according to the word pair similarity matrix;
based on an attention mechanism, respectively adopting the first normalization matrix and the second problem matrix to carry out weighting operation, and adopting the first normalization matrix, the second normalization matrix and the second document matrix to carry out weighting operation, and respectively calculating to obtain a first interaction attention matrix and a second interaction attention matrix;
and sequentially splicing the second document matrix, the first interactive attention matrix, the dot-product matrix of the second document matrix and the first interactive attention matrix, and the dot-product matrix of the second document matrix and the second interactive attention matrix, and determining the spliced matrix as an interactive matrix.
Further, the fourth processing unit 424 is specifically configured to:
processing the third document matrix based on an attention mechanism to obtain a self-matching matrix;
and taking the self-matching matrix as the input of a preset bidirectional recurrent neural network, processing the self-matching matrix by adopting the bidirectional recurrent neural network, and determining the hidden layer output of the bidirectional recurrent neural network as a fourth document matrix.
Further, the fourth processing unit 424 is specifically configured to:
according to the third document matrix, calculating to obtain a self-matching similarity matrix of the documents, and determining a self-matching weighting matrix according to the self-matching similarity matrix;
based on an attention mechanism, performing weighting operation on the third document matrix by adopting the self-matching weighting matrix, and calculating to obtain a self-matching attention matrix;
and splicing the third document matrix and the self-matching attention moment matrix to obtain a spliced matrix, and determining the self-matching matrix according to the spliced matrix.
Further, the fourth processing unit 424 is specifically configured to:
determining the spliced matrix as a self-matching matrix; or,
and based on a gate control mechanism, carrying out weighting processing on the spliced matrix, and determining the weighted matrix as a self-matching matrix.
Further, the fifth processing unit 425 is specifically configured to:
reducing the second problem matrix to obtain a reduced problem matrix;
corresponding to each document content, calculating to obtain an attention matrix according to the fourth document matrix and the reduced problem matrix, wherein the attention moment matrix is used for representing semantic representation of a problem word vector on the document word vector and comprises an attention matrix at a first moment and an attention matrix at a second moment;
corresponding to each document content, calculating a probability value corresponding to each document word according to the attention matrix, determining the document word corresponding to the maximum probability value at the first moment as an extraction starting position of the corresponding document content, and determining the document word corresponding to the maximum probability value at the second moment as an extraction ending position of the corresponding document content;
determining the document content to be selected according to the product of the probability values corresponding to the extraction starting position and the extraction ending position corresponding to different document contents, and determining the extraction starting position and the extraction ending position corresponding to the document content to be selected as the extraction starting position and the extraction ending position which are finally adopted.
In the embodiment, a user question is firstly obtained through an obtaining module, and document content related to the user question is obtained according to the user question; then determining an extraction starting position and an extraction ending position in the document content through a processing module based on a deep learning model; and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question through a display module, and displaying the answer. By adopting the answer extraction device, various matching rules are formulated without manually extracting features to extract answers, the obtained user questions and the document contents related to the user questions are directly input into the deep learning model, so that the most appropriate answers matched with the user questions can be obtained from the document contents, the answer extraction process is simplified, the answer accuracy is improved, and the efficiency and the quality of automatic customer service are greatly improved.
It should be noted that, for parts that are not described in detail in this embodiment, reference may be made to descriptions in the embodiment related to the method, and details are not described herein again.
Another embodiment of the present application provides a storage medium storing a computer program, which when executed by a processor, implements each step of an answer extraction method based on deep learning. The method comprises the following steps:
acquiring a user question, and acquiring document content related to the user question according to the user question;
determining an extraction starting position and an extraction ending position in the document content based on a deep learning model;
and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and displaying the answer.
Further, the determining an extraction starting position and an extraction ending position in the document content based on the deep learning model includes:
respectively obtaining a user question to be processed and document content to be processed according to the user question and the document content, respectively performing word segmentation on the user question to be processed and the document content to be processed, and performing word vector conversion on each word to obtain a first question matrix and a first document matrix;
processing the first document matrix to enable the processed first document matrix to contain problem information, and respectively coding the processed first document matrix and the processed first problem matrix to respectively obtain a second document matrix and a second problem matrix;
based on an attention mechanism, performing interactive processing on the second document matrix and the second problem matrix to obtain a third document matrix;
based on an attention mechanism, performing self-matching processing on the third document matrix to obtain a fourth document matrix;
and determining an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix based on a pointer network.
Further, the obtaining the user question to be processed and the document content to be processed according to the user question and the document content respectively includes:
splicing all the document contents to obtain document contents to be processed; and/or the presence of a gas in the gas,
and repeating the user problems for multiple times, and splicing the repeated user problems to obtain the user problems to be processed, wherein the repeated times of the user problems are the total number of the document contents.
Further, the processing the first document matrix, so that the processed first document matrix includes problem information, includes:
determining word co-occurrence characteristics, and splicing the word co-occurrence characteristics to the tail of the corresponding document word vector in the first document matrix to obtain the processed first document matrix.
Further, the word co-occurrence characteristics include: the determining word co-occurrence characteristics and splicing the word co-occurrence characteristics to the tails of the corresponding document word vectors in the first document matrix comprises:
corresponding to each word in the document content to be processed, if the word is the same as at least one word in a user question to be processed, determining that a first word co-occurrence characteristic corresponding to the word in the document content to be processed is a first value, otherwise, determining that the first word co-occurrence characteristic is a second value, wherein the first value and the second value are fixed values and are respectively used for indicating that the word in the document content appears or does not appear in the user question, and splicing the first word co-occurrence characteristic to the tail of a word vector corresponding to the word in the first document matrix; and/or the presence of a gas in the gas,
respectively calculating similarity values between each word vector in the first document matrix and each word vector in the first problem matrix, normalizing the similarity values corresponding to each word vector in the first document matrix, and splicing the normalized similarity values serving as second word co-occurrence characteristics to the tail of the corresponding word vector in the first document matrix.
Further, the encoding the processed first document matrix and the processed first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively includes:
taking the processed first document matrix as the input of a preset first GRU network, processing the processed first document matrix by adopting the first GRU network, and determining the output layer output of the first GRU network as a second document matrix; and the number of the first and second groups,
and determining an input problem matrix according to the first problem matrix, taking the input problem matrix as the input of a preset second GRU network, processing the input problem matrix by adopting the second GRU network, and determining the output layer output of the second GRU network as the second problem matrix.
Further, the determining an input problem matrix according to the first problem matrix includes:
determining the first problem matrix as an input problem matrix if the first GRU network is different from the second GRU network; or,
and if the first GRU network is the same as the second GRU network, splicing preset features at the tail part of each word vector corresponding to each word vector in the first problem matrix to obtain a spliced problem matrix, and determining the spliced problem matrix as an input problem matrix, wherein the number of the preset features is the same as the number of the word co-occurrence features.
Further, the performing, based on the attention mechanism, an interactive process on the second document matrix and the second problem matrix to obtain a third document matrix includes:
processing the second document matrix and the second problem matrix based on an attention mechanism to obtain an interaction matrix, so that the interaction matrix comprises comparison information of documents and problems;
and taking the interaction matrix as the input of a preset third GRU network, processing the interaction matrix by adopting the third GRU network, and determining the output layer output of the third GRU network as a third document matrix.
Further, the processing the second document matrix and the second problem matrix based on the attention mechanism to obtain an interaction matrix, so that the interaction matrix includes comparison information of documents and problems, including:
calculating to obtain a word pair similarity matrix of the documents and the problems according to the second document matrix and the second problem matrix, and determining a first normalization matrix and a second normalization matrix according to the word pair similarity matrix;
based on an attention mechanism, respectively adopting the first normalization matrix and the second problem matrix to carry out weighting operation, and adopting the first normalization matrix, the second normalization matrix and the second document matrix to carry out weighting operation, and respectively calculating to obtain a first interaction attention matrix and a second interaction attention matrix;
and sequentially splicing the second document matrix, the first interactive attention matrix, the dot-product matrix of the second document matrix and the first interactive attention matrix, and the dot-product matrix of the second document matrix and the second interactive attention matrix, and determining the spliced matrix as an interactive matrix.
Further, the performing self-matching processing on the third document matrix based on the attention mechanism to obtain a fourth document matrix includes:
processing the third document matrix based on an attention mechanism to obtain a self-matching matrix;
and taking the self-matching matrix as the input of a preset bidirectional recurrent neural network, processing the self-matching matrix by adopting the bidirectional recurrent neural network, and determining the hidden layer output of the bidirectional recurrent neural network as a fourth document matrix.
Further, the processing the third document matrix based on the attention mechanism to obtain a self-matching matrix includes:
according to the third document matrix, calculating to obtain a self-matching similarity matrix of the documents, and determining a self-matching weighting matrix according to the self-matching similarity matrix;
based on an attention mechanism, performing weighting operation on the third document matrix by adopting the self-matching weighting matrix, and calculating to obtain a self-matching attention matrix;
and splicing the third document matrix and the self-matching attention moment matrix to obtain a spliced matrix, and determining the self-matching matrix according to the spliced matrix.
Further, the determining a self-matching matrix according to the spliced matrix includes:
determining the spliced matrix as a self-matching matrix; or,
and based on a gate control mechanism, carrying out weighting processing on the spliced matrix, and determining the weighted matrix as a self-matching matrix.
Further, the determining, based on the pointer network, an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix includes:
reducing the second problem matrix to obtain a reduced problem matrix;
corresponding to each document content, calculating to obtain an attention matrix according to the fourth document matrix and the reduced problem matrix, wherein the attention moment matrix is used for representing semantic representation of a problem word vector on the document word vector and comprises an attention matrix at a first moment and an attention matrix at a second moment;
corresponding to each document content, calculating a probability value corresponding to each document word according to the attention matrix, determining the document word corresponding to the maximum probability value at the first moment as an extraction starting position of the corresponding document content, and determining the document word corresponding to the maximum probability value at the second moment as an extraction ending position of the corresponding document content;
determining the document content to be selected according to the product of the probability values corresponding to the extraction starting position and the extraction ending position corresponding to different document contents, and determining the extraction starting position and the extraction ending position corresponding to the document content to be selected as the extraction starting position and the extraction ending position which are finally adopted.
In the embodiment, a user question is firstly obtained, and document content related to the user question is obtained according to the user question; then, based on a deep learning model, determining an extraction starting position and an extraction ending position in the document content; and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and displaying the answer. By adopting the answer extraction method, various matching rules are formulated without manually extracting features to extract answers, the obtained user questions and the document contents related to the user questions are directly input into the deep learning model, so that the most appropriate answers matched with the user questions can be obtained from the document contents, the answer extraction process is simplified, the answer accuracy is improved, and the efficiency and the quality of automatic customer service are greatly improved.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a resource signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.
Claims (15)
1. An answer extraction method based on deep learning is characterized by comprising the following steps:
acquiring a user question, and acquiring document content related to the user question according to the user question;
determining an extraction starting position and an extraction ending position in the document content based on a deep learning model;
and determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question, and displaying the answer.
2. The method according to claim 1, wherein the determining an extraction starting position and an extraction ending position in the document content based on the deep learning model comprises:
respectively obtaining a user question to be processed and document content to be processed according to the user question and the document content, respectively performing word segmentation on the user question to be processed and the document content to be processed, and performing word vector conversion on each word to obtain a first question matrix and a first document matrix;
processing the first document matrix to enable the processed first document matrix to contain problem information, and respectively coding the processed first document matrix and the processed first problem matrix to respectively obtain a second document matrix and a second problem matrix;
based on an attention mechanism, performing interactive processing on the second document matrix and the second problem matrix to obtain a third document matrix;
based on an attention mechanism, performing self-matching processing on the third document matrix to obtain a fourth document matrix;
and determining an extraction starting position and an extraction ending position in the document content according to the fourth document matrix and the second problem matrix based on a pointer network.
3. The method according to claim 2, wherein the obtaining the user question to be processed and the document content to be processed according to the user question and the document content comprises:
splicing all the document contents to obtain document contents to be processed; and/or the presence of a gas in the gas,
and repeating the user problems for multiple times, and splicing the repeated user problems to obtain the user problems to be processed, wherein the repeated times of the user problems are the total number of the document contents.
4. The method of claim 2, wherein processing the first document matrix such that the processed first document matrix contains question information comprises:
determining word co-occurrence characteristics, and splicing the word co-occurrence characteristics to the tail of the corresponding document word vector in the first document matrix to obtain the processed first document matrix.
5. The method of claim 4, wherein the word co-occurrence feature comprises: the determining word co-occurrence characteristics and splicing the word co-occurrence characteristics to the tails of the corresponding document word vectors in the first document matrix comprises:
corresponding to each word in the document content to be processed, if the word is the same as at least one word in a user question to be processed, determining that a first word co-occurrence characteristic corresponding to the word in the document content to be processed is a first value, otherwise, determining that the first word co-occurrence characteristic is a second value, wherein the first value and the second value are fixed values and are respectively used for indicating that the word in the document content appears or does not appear in the user question, and splicing the first word co-occurrence characteristic to the tail of a word vector corresponding to the word in the first document matrix; and/or the presence of a gas in the gas,
respectively calculating similarity values between each word vector in the first document matrix and each word vector in the first problem matrix, normalizing the similarity values corresponding to each word vector in the first document matrix, and splicing the normalized similarity values serving as second word co-occurrence characteristics to the tail of the corresponding word vector in the first document matrix.
6. The method of claim 2, wherein the encoding the processed first document matrix and the first problem matrix respectively to obtain a second document matrix and a second problem matrix respectively comprises:
taking the processed first document matrix as the input of a preset first GRU network, processing the processed first document matrix by adopting the first GRU network, and determining the output layer output of the first GRU network as a second document matrix; and the number of the first and second groups,
and determining an input problem matrix according to the first problem matrix, taking the input problem matrix as the input of a preset second GRU network, processing the input problem matrix by adopting the second GRU network, and determining the output layer output of the second GRU network as the second problem matrix.
7. The method of claim 6, wherein determining an input problem matrix from the first problem matrix comprises:
determining the first problem matrix as an input problem matrix if the first GRU network is different from the second GRU network; or,
and if the first GRU network is the same as the second GRU network, splicing preset features at the tail part of each word vector corresponding to each word vector in the first problem matrix to obtain a spliced problem matrix, and determining the spliced problem matrix as an input problem matrix, wherein the number of the preset features is the same as the number of the word co-occurrence features.
8. The method of claim 2, wherein the interactively processing the second document matrix and the second problem matrix based on the attention mechanism to obtain a third document matrix comprises:
processing the second document matrix and the second problem matrix based on an attention mechanism to obtain an interaction matrix, so that the interaction matrix comprises comparison information of documents and problems;
and taking the interaction matrix as the input of a preset third GRU network, processing the interaction matrix by adopting the third GRU network, and determining the output layer output of the third GRU network as a third document matrix.
9. The method of claim 8, wherein the processing the second document matrix and the second problem matrix based on the attention mechanism to obtain an interaction matrix, so that the interaction matrix includes comparison information of documents and problems, includes:
calculating to obtain a word pair similarity matrix of the documents and the problems according to the second document matrix and the second problem matrix, and determining a first normalization matrix and a second normalization matrix according to the word pair similarity matrix;
based on an attention mechanism, respectively adopting the first normalization matrix and the second problem matrix to carry out weighting operation, and adopting the first normalization matrix, the second normalization matrix and the second document matrix to carry out weighting operation, and respectively calculating to obtain a first interaction attention matrix and a second interaction attention matrix;
and sequentially splicing the second document matrix, the first interactive attention matrix, the dot-product matrix of the second document matrix and the first interactive attention matrix, and the dot-product matrix of the second document matrix and the second interactive attention matrix, and determining the spliced matrix as an interactive matrix.
10. The method of claim 2, wherein the self-matching the third document matrix based on the attention mechanism to obtain a fourth document matrix comprises:
processing the third document matrix based on an attention mechanism to obtain a self-matching matrix;
and taking the self-matching matrix as the input of a preset bidirectional recurrent neural network, processing the self-matching matrix by adopting the bidirectional recurrent neural network, and determining the hidden layer output of the bidirectional recurrent neural network as a fourth document matrix.
11. The method of claim 10, wherein the processing the third document matrix based on the attention mechanism to obtain a self-matching matrix comprises:
according to the third document matrix, calculating to obtain a self-matching similarity matrix of the documents, and determining a self-matching weighting matrix according to the self-matching similarity matrix;
based on an attention mechanism, performing weighting operation on the third document matrix by adopting the self-matching weighting matrix, and calculating to obtain a self-matching attention matrix;
and splicing the third document matrix and the self-matching attention moment matrix to obtain a spliced matrix, and determining the self-matching matrix according to the spliced matrix.
12. The method of claim 11, wherein determining a self-matching matrix from the stitched matrix comprises:
determining the spliced matrix as a self-matching matrix; or,
and based on a gate control mechanism, carrying out weighting processing on the spliced matrix, and determining the weighted matrix as a self-matching matrix.
13. The method of claim 2, wherein the determining a start position and an end position of extraction in the document content based on the pointer network according to the fourth document matrix and the second problem matrix comprises:
reducing the second problem matrix to obtain a reduced problem matrix;
corresponding to each document content, calculating to obtain an attention matrix according to the fourth document matrix and the reduced problem matrix, wherein the attention moment matrix is used for representing semantic representation of a problem word vector on the document word vector and comprises an attention matrix at a first moment and an attention matrix at a second moment;
corresponding to each document content, calculating a probability value corresponding to each document word according to the attention matrix, determining the document word corresponding to the maximum probability value at the first moment as an extraction starting position of the corresponding document content, and determining the document word corresponding to the maximum probability value at the second moment as an extraction ending position of the corresponding document content;
determining the document content to be selected according to the product of the probability values corresponding to the extraction starting position and the extraction ending position corresponding to different document contents, and determining the extraction starting position and the extraction ending position corresponding to the document content to be selected as the extraction starting position and the extraction ending position which are finally adopted.
14. An answer extraction device based on deep learning, comprising:
the acquisition module is used for acquiring user questions and acquiring document contents related to the user questions according to the user questions;
the processing module is used for determining an extraction starting position and an extraction ending position in the document content based on a deep learning model;
and the display module is used for determining the document content between the extraction starting position and the extraction ending position as an answer corresponding to the user question and displaying the answer.
15. A storage medium storing a computer program which, when executed by a processor, implements the steps of the answer extraction method based on deep learning according to any one of claims 1 to 13.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910225135.0A CN109977404A (en) | 2019-03-22 | 2019-03-22 | Answer extracting method, apparatus and storage medium based on deep learning |
PCT/CN2020/075553 WO2020192307A1 (en) | 2019-03-22 | 2020-02-17 | Answer extraction method and apparatus based on deep learning, and computer device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910225135.0A CN109977404A (en) | 2019-03-22 | 2019-03-22 | Answer extracting method, apparatus and storage medium based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109977404A true CN109977404A (en) | 2019-07-05 |
Family
ID=67080278
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910225135.0A Pending CN109977404A (en) | 2019-03-22 | 2019-03-22 | Answer extracting method, apparatus and storage medium based on deep learning |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109977404A (en) |
WO (1) | WO2020192307A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110825870A (en) * | 2019-10-31 | 2020-02-21 | 腾讯科技(深圳)有限公司 | Document abstract acquisition method and device, storage medium and electronic device |
CN111078854A (en) * | 2019-12-13 | 2020-04-28 | 北京金山数字娱乐科技有限公司 | Question-answer prediction model training method and device and question-answer prediction method and device |
WO2020192307A1 (en) * | 2019-03-22 | 2020-10-01 | 深圳追一科技有限公司 | Answer extraction method and apparatus based on deep learning, and computer device and storage medium |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112417094B (en) * | 2020-11-17 | 2024-04-05 | 华东理工大学 | Answer selection method, device, server and storage medium based on web text |
CN112541350B (en) * | 2020-12-04 | 2024-06-14 | 支付宝(杭州)信息技术有限公司 | Variant text reduction method, device and equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108415977A (en) * | 2018-02-09 | 2018-08-17 | 华南理工大学 | One is read understanding method based on the production machine of deep neural network and intensified learning |
US20180300312A1 (en) * | 2017-04-13 | 2018-10-18 | Baidu Usa Llc | Global normalized reader systems and methods |
CN108959246A (en) * | 2018-06-12 | 2018-12-07 | 北京慧闻科技发展有限公司 | Answer selection method, device and electronic equipment based on improved attention mechanism |
CN109376222A (en) * | 2018-09-27 | 2019-02-22 | 国信优易数据有限公司 | Question and answer matching degree calculation method, question and answer automatic matching method and device |
CN109478204A (en) * | 2016-05-17 | 2019-03-15 | 马鲁巴公司 | The machine of non-structured text understands |
CN109492227A (en) * | 2018-11-16 | 2019-03-19 | 大连理工大学 | It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977404A (en) * | 2019-03-22 | 2019-07-05 | 深圳追一科技有限公司 | Answer extracting method, apparatus and storage medium based on deep learning |
-
2019
- 2019-03-22 CN CN201910225135.0A patent/CN109977404A/en active Pending
-
2020
- 2020-02-17 WO PCT/CN2020/075553 patent/WO2020192307A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109478204A (en) * | 2016-05-17 | 2019-03-15 | 马鲁巴公司 | The machine of non-structured text understands |
US20180300312A1 (en) * | 2017-04-13 | 2018-10-18 | Baidu Usa Llc | Global normalized reader systems and methods |
CN108415977A (en) * | 2018-02-09 | 2018-08-17 | 华南理工大学 | One is read understanding method based on the production machine of deep neural network and intensified learning |
CN108959246A (en) * | 2018-06-12 | 2018-12-07 | 北京慧闻科技发展有限公司 | Answer selection method, device and electronic equipment based on improved attention mechanism |
CN109376222A (en) * | 2018-09-27 | 2019-02-22 | 国信优易数据有限公司 | Question and answer matching degree calculation method, question and answer automatic matching method and device |
CN109492227A (en) * | 2018-11-16 | 2019-03-19 | 大连理工大学 | It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations |
Non-Patent Citations (2)
Title |
---|
WENHUI WANG 等: "Gated Self-Matching Networks for Reading Comprehension and Question Answering", 《PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 * |
朱国轩: "基于深度学习的任务导向型机器阅读理解", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020192307A1 (en) * | 2019-03-22 | 2020-10-01 | 深圳追一科技有限公司 | Answer extraction method and apparatus based on deep learning, and computer device and storage medium |
CN110825870A (en) * | 2019-10-31 | 2020-02-21 | 腾讯科技(深圳)有限公司 | Document abstract acquisition method and device, storage medium and electronic device |
CN110825870B (en) * | 2019-10-31 | 2023-07-14 | 腾讯科技(深圳)有限公司 | Method and device for acquiring document abstract, storage medium and electronic device |
CN111078854A (en) * | 2019-12-13 | 2020-04-28 | 北京金山数字娱乐科技有限公司 | Question-answer prediction model training method and device and question-answer prediction method and device |
CN111078854B (en) * | 2019-12-13 | 2023-10-27 | 北京金山数字娱乐科技有限公司 | Training method and device of question-answer prediction model, and question-answer prediction method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2020192307A1 (en) | 2020-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977404A (en) | Answer extracting method, apparatus and storage medium based on deep learning | |
CN111783474B (en) | Comment text viewpoint information processing method and device and storage medium | |
KR102213478B1 (en) | A system for tracking user knowledge based on artificial intelligence learning and method thereof | |
US10540967B2 (en) | Machine reading method for dialog state tracking | |
WO2024011814A1 (en) | Image-text mutual retrieval method, system and device, and nonvolatile readable storage medium | |
CN111401928B (en) | Method and device for determining semantic similarity of text based on graph data | |
CN110990555B (en) | End-to-end retrieval type dialogue method and system and computer equipment | |
CN111309887B (en) | Method and system for training text key content extraction model | |
CN115186110B (en) | Multi-modal knowledge graph completion method and system based on relationship-enhanced negative sampling | |
CN111159367A (en) | Information processing method and related equipment | |
CN110413743A (en) | A kind of key message abstracting method, device, equipment and storage medium | |
CN111105013A (en) | Optimization method of countermeasure network architecture, image description generation method and system | |
CN111897954A (en) | User comment aspect mining system, method and storage medium | |
CN117891939A (en) | Text classification method combining particle swarm algorithm with CNN convolutional neural network | |
CN114386426B (en) | Gold medal speaking skill recommendation method and device based on multivariate semantic fusion | |
CN114492451A (en) | Text matching method and device, electronic equipment and computer readable storage medium | |
CN114743029A (en) | Image text matching method | |
CN112131363B (en) | Automatic question and answer method, device, equipment and storage medium | |
CN113705159A (en) | Merchant name labeling method, device, equipment and storage medium | |
CN113449103A (en) | Bank transaction flow classification method and system integrating label and text interaction mechanism | |
CN111460113A (en) | Data interaction method and related equipment | |
CN117236384A (en) | Training and predicting method and device for terminal machine change prediction model and storage medium | |
CN110334204A (en) | A kind of exercise similarity calculation recommended method based on user record | |
CN116910190A (en) | Method, device and equipment for acquiring multi-task perception model and readable storage medium | |
CN115293818A (en) | Advertisement putting and selecting method and device, equipment and medium thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190705 |