CN112507074A - Numerical reasoning method and device in machine reading understanding - Google Patents
Numerical reasoning method and device in machine reading understanding Download PDFInfo
- Publication number
- CN112507074A CN112507074A CN202011436272.8A CN202011436272A CN112507074A CN 112507074 A CN112507074 A CN 112507074A CN 202011436272 A CN202011436272 A CN 202011436272A CN 112507074 A CN112507074 A CN 112507074A
- Authority
- CN
- China
- Prior art keywords
- vector
- node
- current
- characterization
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 239000013598 vector Substances 0.000 claims abstract description 410
- 238000012512 characterization method Methods 0.000 claims abstract description 179
- 238000013145 classification model Methods 0.000 claims description 35
- 238000010586 diagram Methods 0.000 claims description 20
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000006243 chemical reaction Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 230000007246 mechanism Effects 0.000 claims description 7
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 230000001131 transforming effect Effects 0.000 claims description 3
- 238000012804 iterative process Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 241000234295 Musa Species 0.000 description 3
- 235000021015 bananas Nutrition 0.000 description 3
- 239000007787 solid Substances 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the specification provides a numerical reasoning method and device in machine reading understanding. The method comprises the steps of obtaining a current question and a current text; determining entities and numbers included in the current question and the current text, and types corresponding to the numbers respectively; constructing a relational network graph which comprises entity nodes corresponding to all entities and digital nodes corresponding to all numbers, wherein neighbors are formed between the digital nodes of the same type and between the entity nodes and the digital nodes with preset relations through connecting edges; determining a first problem characterization vector corresponding to the current problem and an initial characterization vector of each node in the relational network graph; and performing iteration for a preset number of times on each node in the relational network graph based on the initial characterization vector of each node to obtain an updated characterization vector of each node. The ability of machine reading numerical reasoning to handle complex problems can be improved.
Description
The present invention is a divisional application of the invention having an application date of 31/07/78/2020 and an application number of 202010759810.0, entitled "method and apparatus for numerical value inference in machine reading understanding".
Technical Field
One or more embodiments of the present disclosure relate to the field of computers, and more particularly, to a method and apparatus for numerical reasoning in machine-reading understanding.
Background
Machine-read understanding as a task in natural language processing, a question and text are generally given, the text is used to describe conditions of the question, and an answer corresponding to the question can be obtained through machine-read understanding. In machine reading understanding, numerical reasoning is an important capability, and generally includes numerical reasoning modes such as addition, subtraction, sorting, statistics and the like. For a question related to numerical reasoning, how to deduce the correct answer based on text is the current question of interest.
In the prior art, when numerical reasoning in machine reading understanding faces complex problems, correct answers cannot be obtained frequently.
Accordingly, improved solutions are desired that improve the ability of machine-readable numerical reasoning to handle complex problems.
Disclosure of Invention
One or more embodiments of the present specification describe a method and an apparatus for numerical reasoning in machine reading understanding, which can improve the capability of numerical reasoning in machine reading understanding to handle complex problems.
In a first aspect, a method for numerical reasoning in machine reading understanding is provided, the method comprising:
acquiring a current problem and a current text, wherein the current text is used for describing conditions of the current problem;
determining each entity and each number included in the current question and the current text, and the type corresponding to each number;
constructing a relational network graph, wherein the relational network graph comprises entity nodes corresponding to the entities and digital nodes corresponding to the numbers, and neighbors are formed between the digital nodes of the same type and between the entity nodes and the digital nodes with preset relations through connecting edges;
inputting the current question and the current text into a language model, and obtaining a first semantic representation vector corresponding to each semantic element position in the current question and the current text through the language model;
determining a first problem representation vector corresponding to the current problem and an initial representation vector of each node in the relational network graph according to each first semantic representation vector;
performing iteration for a preset number of times on each node in the relational network graph based on the initial characterization vector of each node, wherein each iteration comprises the step of performing neighbor node aggregation on each node based on the first problem characterization vector and by using an attention mechanism to obtain an updated characterization vector of each node;
and determining a numerical reasoning answer according to the updated characterization vector of each node after the iteration of the preset times.
In one possible embodiment, the types include at least one of:
amount, time, percentage.
In one possible embodiment, the entity comprises at least one of:
name of person, place name, name of article.
In a possible implementation manner, the determining, according to each first semantic representation vector, a first problem representation vector corresponding to the current problem includes:
and performing mean value pooling on each first semantic representation vector corresponding to each semantic element position in the current problem to obtain a first problem representation vector corresponding to the current problem.
In a possible embodiment, the determining an initial token vector of each node in the relational network graph according to each first semantic token vector includes:
for any node in each node in the relational network graph, determining a plurality of semantic element positions matched with the content of the any node in the current question and the current text, and performing mean pooling on a plurality of first semantic representation vectors corresponding to the semantic element positions to determine an initial representation vector of the any node.
In one possible embodiment, each iteration includes:
determining a problem driving vector corresponding to the current iteration number by using a neural network corresponding to the current iteration number based on the first problem characterization vector;
determining an intermediate vector for each node based on the initial token vector, the current token vector and the problem driving vector for each node;
respectively transforming the intermediate vectors of each node by using the query matrix, the key matrix and the value matrix to obtain a query vector, a key vector and a value vector which respectively correspond to each node;
similarity calculation is carried out on the query vector corresponding to the first node and the key vector corresponding to the second node, so that the attention score from the second node to the first node is obtained; the first node and the second node are any two nodes which are adjacent to each other in the relational network graph;
taking any node as a target node, carrying out weighted summation on value vectors of all neighbors according to all attention points from all neighbors of the target node to the target node, and determining an updated characterization vector of the target node based on a summation result.
Further, the determining, based on the first problem characterization vector, a problem driving vector corresponding to the current iteration number by using a neural network corresponding to the current iteration number includes:
enabling the first problem characterization vector to pass through a first full connection layer to obtain a first feature vector;
the first feature vector is subjected to an activation function to obtain a second feature vector;
and passing the second eigenvector through a second full-connection layer corresponding to the current iteration number to obtain a problem driving vector corresponding to the current iteration number.
Further, the determining an intermediate vector for each node based on the initial token vector, the current token vector, and the problem driving vector for each node comprises:
splicing the initial characterization vector of each node with the current characterization vector of each node to obtain first spliced vectors corresponding to each node;
and converting the first splicing vector of each node into a preset dimensionality, and then carrying out bit-wise multiplication on the first splicing vector and the problem driving vector to obtain a middle vector of each node.
Further, the converting the first stitching vector into a preset dimension includes:
and enabling the first splicing vector to pass through a third full-connection layer so as to be converted into a preset dimension, wherein the preset dimension is the same as the dimension of the problem driving vector.
In a possible implementation manner, the determining a numerical reasoning answer according to the updated characterization vector of each node after the predetermined number of iterations includes:
for any node in each node in the relational network graph, determining a plurality of semantic element positions matched with the content of the node in the current question and the current text, and acquiring a plurality of first semantic representation vectors corresponding to the semantic element positions;
updating a plurality of acquired first semantic representation vectors corresponding to the plurality of semantic element positions according to the updated representation vector of any node after the iteration of the preset times so as to determine second semantic representation vectors respectively corresponding to the plurality of semantic element positions;
determining a first comprehensive representation vector corresponding to the current question and the current text according to each second semantic representation vector corresponding to the current question and the positions of a plurality of semantic elements in the current text and each first semantic representation vector corresponding to other semantic element positions;
determining an answer type corresponding to the numerical reasoning answer by using a first classification model according to the first comprehensive characterization vector;
and determining the numerical reasoning answer by utilizing a second classification model at least according to the answer type and the first comprehensive characterization vector.
Further, the answer types include at least one of:
answer extraction, counting questions, and arithmetic expression class questions.
Further, the answer type is answer extraction;
the determining the numerical reasoning answer by using a second classification model according to at least the answer type and the first comprehensive characterization vector comprises:
determining second problem representation vectors corresponding to the current problem according to the second semantic representation vectors corresponding to the positions of the semantic elements in the current problem respectively and the first semantic representation vectors corresponding to the positions of other semantic elements respectively;
multiplying the first comprehensive characterization vector and the second problem characterization vector according to bits to obtain a first cross characterization vector;
and splicing the first comprehensive characterization vector and the first cross characterization vector, and inputting the spliced vectors into the second classification model to obtain the numerical reasoning answer.
Further, the second classification model is used for predicting an answer starting position and an answer ending position in each semantic element position, so as to obtain the numerical reasoning answer according to the answer starting position and the answer ending position.
Further, the answer type is a counting question;
the second classification model is used for predicting numbers from 0 to 9 to obtain the numerical reasoning answer.
Further, the answer type is an arithmetic expression type question;
and the second classification model is used for predicting the symbols of each number in the current question and the current text, the symbols comprise plus signs, minus signs and 0, and numerical reasoning answers are obtained through the operation of each number and the symbols.
In a second aspect, there is provided a numerical reasoning apparatus for machine-readable understanding, the apparatus comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a current question and a current text, and the current text is used for describing conditions of the current question;
the first determining unit is used for determining each entity and each number included in the current question and the current text acquired by the acquiring unit, and the type corresponding to each number;
the construction unit is used for constructing a relational network graph, and the relational network graph comprises entity nodes corresponding to the entities determined by the first determination unit and digital nodes corresponding to the numbers, and neighbors are formed between the digital nodes of the same type and between the entity nodes and the digital nodes with preset relations through connecting edges;
the first representation unit is used for inputting the current question and the current text acquired by the acquisition unit into a language model, and acquiring first semantic representation vectors corresponding to the positions of semantic elements in the current question and the current text through the language model;
the second characterization unit is used for determining a first problem characterization vector corresponding to the current problem and an initial characterization vector of each node in the relational network graph according to each first semantic characterization vector obtained by the first characterization unit;
the iteration unit is used for performing iteration for a preset number of times on each node in the relational network graph based on the initial characterization vector of each node obtained by the second characterization unit, wherein each iteration comprises the step of performing neighbor node aggregation on each node based on the first problem characterization vector obtained by the second characterization unit and by using an attention mechanism to obtain an updated characterization vector of each node;
and the second determining unit is used for determining a numerical reasoning answer according to the updated characterization vector of each node after the iteration of the preset times, which is obtained by the iteration unit.
In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
In a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.
According to the method and the device provided by the embodiment of the specification, firstly, a current problem and a current text are obtained, wherein the current text is used for describing conditions of the current problem; then, determining that the current question and the current text comprise each number, and determining each entity contained in the current question and the current text and the type corresponding to each number; then, constructing a relational network graph, wherein the relational network graph comprises entity nodes corresponding to the entities and digital nodes corresponding to the numbers, and neighbors are formed between the digital nodes of the same type and between the entity nodes and the digital nodes with preset relations through connecting edges; the relationship between the entity with the preset relationship and the number is established through the relationship network diagram, and the iterative node characterization vector based on the relationship is helpful for distinguishing the relationship between the number and the text; moreover, the relationship between the numbers of the same type is established through a relationship network diagram, and the node characterization vectors iterated on the basis of the relationship are helpful for distinguishing the types of different numbers; in addition, in the iterative process of the node vectors, iteration is carried out based on the problem characterization vectors, the importance of each node for solving the problem is reflected, it can be understood that the importance of the nodes related to the problem is higher than that of other nodes, so that the iterative process can be effectively guided, correspondingly, the node characterization vectors iterated based on the relation can embody the importance of different nodes, and finally, numerical reasoning answers are determined according to the updated characterization vectors of each node after the iteration of the preset times, so that the complex problem can be better solved, and the capability of numerical reasoning processing on the complex problem in machine reading understanding can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
FIG. 2 illustrates a flow diagram of a method of numerical reasoning in machine-read understanding, according to one embodiment;
FIG. 3 illustrates a structural schematic of a relational network diagram according to one embodiment;
FIG. 4 illustrates a coded output diagram of a language model according to one embodiment;
FIG. 5 illustrates a diagram of a multiple iteration process according to one embodiment;
FIG. 6 illustrates a schematic diagram of the manner in which a second semantic representation vector is determined according to one embodiment;
FIG. 7 illustrates an answer prediction process according to one embodiment;
FIG. 8 shows a schematic block diagram of a numerical inference engine in machine-readable understanding, according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. The implementation scenario involves numerical reasoning in machine-reading understanding. Referring to fig. 1, it will be appreciated that numerical reasoning requires answers based on conditions and questions, with the conditions in machine-read understanding being presented textually, that is, textually describing the conditions of the questions. For the same question under different texts, the answer is usually different, for example, text 1, that is, condition 1, is that what money the small red has is 5 yuan, the small bright has 3 yuan, and the question is how much money the small red and the small bright have, at this time, the answer to the question under the condition described in text 1 can be obtained as 8 yuan through addition operation; text 2, namely condition 2, is that there is 2 yuan for xiaohong, there is 3 yuan for xiaohong, and the question is how much money the xiaohong and xiaohong share, at this time, by adding operation, the answer to the question under the condition described in text 2 can be obtained as 5 yuan. Under the same condition, the answers are usually different for different questions, for example, the text is that the condition is that the; the text is that the condition is that there is 5 yuan for xiaohong, there is 3 yuan for xiaohang, and the question 2 is how much more money for xiaohong than xiaohang, at this time, the answer of the question 2 is 2 yuan under the condition of text description can be obtained through subtraction operation.
It should be noted that the texts and questions described above are only examples, and are not limited to the texts and questions in the embodiment of the present specification, and the numerical reasoning method in machine reading understanding provided by the embodiment of the present specification may be applied to various numerical reasoning questions in various scenarios, and may include answer extraction, counting questions, and the like in addition to the above-mentioned arithmetic expression questions.
FIG. 2 illustrates a flow diagram of a method for numerical reasoning in machine-reading understanding according to one embodiment, which may be based on the implementation scenario illustrated in FIG. 1. As shown in fig. 2, the numerical reasoning method in machine reading understanding in this embodiment includes the following steps: step 21, obtaining a current question and a current text, wherein the current text is used for describing conditions of the current question; step 22, determining each entity and each number included in the current question and the current text, and the type corresponding to each number; step 23, constructing a relational network graph, wherein the relational network graph comprises entity nodes corresponding to the entities and digital nodes corresponding to the numbers, and neighbors are formed between the digital nodes of the same type and between the entity nodes and the digital nodes with preset relations through connecting edges; step 24, inputting the current question and the current text into a language model, and obtaining a first semantic representation vector corresponding to the current question and the position of each semantic element in the current text through the language model; step 25, determining a first problem representation vector corresponding to the current problem and an initial representation vector of each node in the relational network graph according to each first semantic representation vector; 26, performing iteration for a preset number of times on each node in the relational network graph based on the initial characterization vector of each node, wherein each iteration comprises aggregating neighbor nodes based on the first problem characterization vector and by using an attention mechanism for each node to obtain an updated characterization vector of each node; and 27, determining a numerical reasoning answer according to the updated characterization vector of each node after the iteration of the preset times. Specific execution modes of the above steps are described below.
First, in step 21, a current question and a current text describing conditions of the current question are obtained. It will be appreciated that current questions and current text are typically associated with numbers, or at least one of them is associated with numbers; both the current question and the current text may contain numbers or one of them may contain numbers.
In the embodiment of the present specification, the current text may be a section of speech or an article, and the length of the current text is not limited.
Then, in step 22, the entities and numbers included in the current question and the current text, and the types corresponding to the numbers are determined. It will be appreciated that this step can be implemented with the aid of existing chinese word segmentation tools.
In one example, the type includes at least one of:
amount, time, percentage.
For example, the type corresponding to the number 1 in "1 yuan" is the amount; the type corresponding to the number 2019 in "2019 year" is time; the type corresponding to the number 50 in "discount 50 percent" is a percentage.
In one example, the entity includes at least one of:
name of person, place name, name of article.
For example, the text "Xiaohong played last month in Hainan and also bought 5 jin of bananas" includes Xiaohong, Hainan and bananas, wherein Xiaohong is the name of a person, Hainan is the name of a place, and bananas are the names of articles.
Next, in step 23, a relational network graph is constructed, which includes entity nodes corresponding to the entities and digital nodes corresponding to the numbers, and neighbors are formed between the digital nodes of the same type and between the entity nodes and the digital nodes having a preset relationship by connecting edges. It is understood that step 23 is to construct the relational network graph based on the entities and the numbers determined in step 22 and the types corresponding to the numbers.
For example, if the types of numbers corresponding to two digital nodes are money, time, or percentage, a connecting edge exists between the two digital nodes, that is, the two digital nodes are neighbors of each other.
In one example, the preset relationship is a relationship between an entity corresponding to the entity node and a semantic element position in the current question or the current text of a number corresponding to the digital byte point. For example, the preset relationship is that an entity corresponding to the entity node and a number corresponding to the number byte belong to the same sentence in the current question or the current text.
FIG. 3 illustrates a structural schematic of a relational network diagram according to one embodiment. Referring to fig. 3, the relationship network graph includes entity nodes S1, S2, S3, S4, and digital nodes J1, J2, T1, T2. The types of numbers corresponding to the digital nodes J1 and J2 are money, and a connecting edge is arranged between the digital nodes J1 and J2; the types of the numbers corresponding to the digital nodes T1 and T2 are both time, and a connecting edge is arranged between the digital nodes T1 and T2; the entity node S1 and the digital node J2 have a preset relationship, and a connecting edge is arranged between the entity node S1 and the digital node J2; the entity node S1 and the digital node T1 have a preset relationship, and a connecting edge is arranged between the entity node S1 and the digital node T1; the entity node S2 and the digital node J1 have a preset relationship, and a connecting edge is arranged between the entity node S2 and the digital node J1; the entity node S2 and the digital node T2 have a preset relationship, and a connecting edge is arranged between the entity node S2 and the digital node T2; the entity nodes S3 and S4 do not have a preset relationship with any digital node, and the entity nodes S3 and S4 do not have a connecting edge with any digital node.
And step 24, inputting the current question and the current text into a language model, and obtaining a first semantic representation vector corresponding to the current question and the position of each semantic element in the current text through the language model. It will be appreciated that the current question and the current text may be divided into semantic elements, each corresponding to a position in the current question and the current text, which may be referred to as a semantic element position.
In one example, the language model may be implemented based on RoBERTa, and the current question and the current text are spliced and input into the language model.
FIG. 4 illustrates a coded output diagram of a language model according to one embodiment. Referring to fig. 4, the current question and the current text are input into a language model, and a first semantic representation vector corresponding to each semantic element position in the current question and the current text is obtained through the language model, wherein each small circle corresponds to one semantic element position, and the semantic element positions corresponding to the filled black solid small circles correspond to each entity and each number determined in step 22, or correspond to each node in step 23.
And step 25, determining a first problem representation vector corresponding to the current problem and an initial representation vector of each node in the relational network graph according to each first semantic representation vector. It is understood that, a plurality of semantic element positions are generally included in the current problem, and the content of any node generally matches the plurality of semantic element positions, so that based on the first semantic representation vector corresponding to each semantic element position, the first problem representation vector corresponding to the current problem and the initial representation vector of each node in the relational network graph can be determined.
In an example, the first semantic representation vectors corresponding to the semantic element positions in the current problem may be averaged and pooled to obtain the first problem representation vector corresponding to the current problem.
In one example, for any node in the relational network graph, a number of semantic element positions matching with the content of the any node in the current question and the current text may be determined, and a number of first semantic feature vectors corresponding to the number of semantic element positions may be subjected to mean pooling to determine an initial feature vector of the any node.
And then in step 26, performing iteration for a predetermined number of times on each node in the relational network graph based on the initial characterization vector of each node, wherein each iteration comprises performing neighbor node aggregation on each node based on the first problem characterization vector and by using an attention mechanism, so as to obtain an updated characterization vector of each node. It can be understood that, because the digital nodes of the same type in the relational network graph have connecting edges therebetween, and the entity nodes having the preset relationship and the digital nodes have connecting edges therebetween, the updated characterization vectors of the nodes after iteration can also embody the relationship between the nodes.
FIG. 5 illustrates a diagram of a multiple iteration process, according to one embodiment. Referring to fig. 5, in the embodiment of the present specification, each node in the relational network graph is iterated a predetermined number of times, where the predetermined number of times may be 2 or 3, and the predetermined number of times is only 2 in fig. 5 as an example. And when the current iteration number is 1, the current characterization vector of the node is the initial characterization vector of the node, and when the current iteration number is 2, the current characterization vector of the node is the updated characterization vector of the node after the first iteration. The iterative process is guided based on the first problem characterization vector, and it can be understood that the nodes associated with the problem are more important in the numerical reasoning process. In T ═ 1, based on the relational network graph and the first problem characterization vector, a subgraph related to the problem can be determined. In T-2, an iteration may be performed based on the subgraph determined by T-1.
In one example, each iteration includes:
determining a problem driving vector corresponding to the current iteration number by using a neural network corresponding to the current iteration number based on the first problem characterization vector;
determining an intermediate vector for each node based on the initial token vector, the current token vector and the problem driving vector for each node;
respectively transforming the intermediate vectors of each node by using the query matrix, the key matrix and the value matrix to obtain a query vector, a key vector and a value vector which respectively correspond to each node;
similarity calculation is carried out on the query vector corresponding to the first node and the key vector corresponding to the second node, so that the attention score from the second node to the first node is obtained; the first node and the second node are any two nodes which are adjacent to each other in the relational network graph;
taking any node as a target node, carrying out weighted summation on value vectors of all neighbors according to all attention points from all neighbors of the target node to the target node, and determining an updated characterization vector of the target node based on a summation result.
It can be understood that when the current iteration times are different, the corresponding problem driving vectors are also different, so that in the iteration process, that is, in the graph inference process, different emphasis points in the problem need to be learned for each layer of iteration, so that the emphasis points of each layer of inference are different.
Further, the determining, based on the first problem characterization vector, a problem driving vector corresponding to the current iteration number by using a neural network corresponding to the current iteration number includes:
enabling the first problem characterization vector to pass through a first full connection layer to obtain a first feature vector;
the first feature vector is subjected to an activation function to obtain a second feature vector;
and passing the second eigenvector through a second full-connection layer corresponding to the current iteration number to obtain a problem driving vector corresponding to the current iteration number.
In an embodiment of the present specification, the first fully-connected layer is configured to obtain a first feature vector having a spatial distribution different from that of the first problem characterization vector. The first fully-connected layer may be shared when obtaining the problem drive vector corresponding to each current iteration number.
As an example, the activation function may adopt an ELU activation function.
Further, the determining an intermediate vector for each node based on the initial token vector, the current token vector, and the problem driving vector for each node comprises:
splicing the initial characterization vector of each node with the current characterization vector of each node to obtain first spliced vectors corresponding to each node;
and converting the first splicing vector of each node into a preset dimensionality, and then carrying out bit-wise multiplication on the first splicing vector and the problem driving vector to obtain a middle vector of each node.
Further, the converting the first stitching vector into a preset dimension includes:
and enabling the first splicing vector to pass through a third full-connection layer so as to be converted into a preset dimension, wherein the preset dimension is the same as the dimension of the problem driving vector.
It is understood that if vector a is (a1, a2, … an) and vector B is (B1, B2, … bn), the result of bitwise multiplying vector a and vector B is (a1B1, a2B2, … anbn).
Furthermore, there are many ways to calculate the similarity between the query vector corresponding to the first node and the key vector corresponding to the second node, for example, the query vector corresponding to the first node is first spliced with the key vector corresponding to the second node, and then the attention point from the second node to the first node is obtained through a full connection layer.
In order to better evaluate the more important nodes in the neighbors of the target node, a softmax regression can be performed on the above attention points.
Further, the determining the updated characterization vector of the target node based on the summation result may specifically be: and splicing the vector obtained by the summation result with the current characterization vector of the target node, and then obtaining the updated characterization vector of the target node after passing through a full connection layer.
And finally, in step 27, determining a numerical reasoning answer according to the updated characterization vector of each node after the predetermined number of iterations. It can be understood that the updated token vector of each node, relative to the initial token vector of each node, can better reflect the relationship between entities and numbers, and the types of different numbers, thereby being helpful for determining the numerical reasoning answer.
In one example, the determining a numerical reasoning answer according to the updated characterization vector of each node after the predetermined number of iterations includes:
for any node in each node in the relational network graph, determining a plurality of semantic element positions matched with the content of the node in the current question and the current text, and acquiring a plurality of first semantic representation vectors corresponding to the semantic element positions;
updating a plurality of acquired first semantic representation vectors corresponding to the plurality of semantic element positions according to the updated representation vector of any node after the iteration of the preset times so as to determine second semantic representation vectors respectively corresponding to the plurality of semantic element positions;
determining a first comprehensive representation vector corresponding to the current question and the current text according to each second semantic representation vector corresponding to the current question and the positions of a plurality of semantic elements in the current text and each first semantic representation vector corresponding to other semantic element positions;
determining an answer type corresponding to the numerical reasoning answer by using a first classification model according to the first comprehensive characterization vector;
and determining the numerical reasoning answer by utilizing a second classification model at least according to the answer type and the first comprehensive characterization vector.
FIG. 6 illustrates a schematic diagram of the manner in which a second semantic representation vector is determined, according to one embodiment. Referring to fig. 6, the semantic element positions corresponding to the filled black solid small circles correspond to each node in the relational network graph, and only the semantic element positions need to be updated, and the updating mode may be a mode of summing the updated representation vector and the first semantic representation vector corresponding to the semantic element position to obtain the second semantic representation vector of the semantic element position. In this embodiment of the present specification, a node may correspond to multiple semantic element positions, for example, a number corresponding to a digital byte point is 1000, and the 1000 number corresponds to two semantic element positions, which are a semantic element position corresponding to 100 and a semantic element position corresponding to 0, respectively, in which case, a semantic representation vector corresponding to each semantic element position needs to be updated.
FIG. 7 illustrates an answer prediction process according to one embodiment. Referring to fig. 7, in the embodiment of the present specification, after a predetermined number of iterations, update characterization vectors of each node are obtained, a first comprehensive characterization vector corresponding to the current question and the current text is determined according to each update characterization vector, based on the first comprehensive characterization vector, an answer type is determined by using a first classification model, and then a numerical reasoning answer is determined by using a second classification model according to the first comprehensive characterization vector and the answer type, so that the accuracy of the numerical reasoning answer can be improved.
Further, the answer types include at least one of:
answer extraction, counting questions, and arithmetic expression class questions.
Further, the answer type is answer extraction;
the determining the numerical reasoning answer by using a second classification model according to at least the answer type and the first comprehensive characterization vector comprises:
determining second problem representation vectors corresponding to the current problem according to the second semantic representation vectors corresponding to the positions of the semantic elements in the current problem respectively and the first semantic representation vectors corresponding to the positions of other semantic elements respectively;
multiplying the first comprehensive characterization vector and the second problem characterization vector according to bits to obtain a first cross characterization vector;
and splicing the first comprehensive characterization vector and the first cross characterization vector, and inputting the spliced vectors into the second classification model to obtain the numerical reasoning answer.
Further, the second classification model is used for predicting an answer starting position and an answer ending position in each semantic element position, so as to obtain the numerical reasoning answer according to the answer starting position and the answer ending position.
Further, the answer type is a counting question;
the second classification model is used for predicting numbers from 0 to 9 to obtain the numerical reasoning answer.
Further, the answer type is an arithmetic expression type question;
and the second classification model is used for predicting the symbols of each number in the current question and the current text, the symbols comprise plus signs, minus signs and 0, and numerical reasoning answers are obtained through the operation of each number and the symbols.
According to the method provided by the embodiment of the specification, firstly, a current problem and a current text are obtained, wherein the current text is used for describing conditions of the current problem; then, determining that the current question and the current text comprise each number, and determining each entity contained in the current question and the current text and the type corresponding to each number; then, constructing a relational network graph, wherein the relational network graph comprises entity nodes corresponding to the entities and digital nodes corresponding to the numbers, and neighbors are formed between the digital nodes of the same type and between the entity nodes and the digital nodes with preset relations through connecting edges; the relationship between the entity with the preset relationship and the number is established through the relationship network diagram, and the iterative node characterization vector based on the relationship is helpful for distinguishing the relationship between the number and the text; moreover, the relationship between the numbers of the same type is established through a relationship network diagram, and the node characterization vectors iterated on the basis of the relationship are helpful for distinguishing the types of different numbers; in addition, in the iterative process of the node vectors, iteration is carried out based on the problem characterization vectors, the importance of each node for solving the problem is reflected, it can be understood that the importance of the nodes related to the problem is higher than that of other nodes, so that the iterative process can be effectively guided, correspondingly, the node characterization vectors iterated based on the relation can embody the importance of different nodes, and finally, numerical reasoning answers are determined according to the updated characterization vectors of each node after the iteration of the preset times, so that the complex problem can be better solved, and the capability of numerical reasoning processing on the complex problem in machine reading understanding can be improved.
According to another aspect of the embodiment, a numerical reasoning device in machine reading understanding is further provided, and the numerical reasoning device is used for executing the numerical reasoning method in machine reading understanding provided by the embodiment of the specification. FIG. 8 shows a schematic block diagram of a numerical inference engine in machine-readable understanding, according to one embodiment. As shown in fig. 8, the apparatus 800 includes:
an obtaining unit 81 configured to obtain a current question and a current text, where the current text is used to describe a condition of the current question;
a first determining unit 82, configured to determine entities and numbers included in the current question and the current text acquired by the acquiring unit 81, and types corresponding to the numbers respectively;
a constructing unit 83 configured to construct a relational network graph, where the relational network graph includes entity nodes corresponding to the entities determined by the first determining unit 82, and digital nodes corresponding to the numbers, and neighbors are formed between digital nodes of the same type and between entity nodes and digital nodes having a preset relationship by connecting edges;
a first representation unit 84, configured to input the current question and the current text acquired by the acquisition unit 81 into a language model, and obtain, through the language model, a first semantic representation vector corresponding to each semantic element position in the current question and the current text;
a second characterization unit 85, configured to determine, according to each first semantic characterization vector obtained by the first characterization unit 84, a first problem characterization vector corresponding to the current problem and an initial characterization vector of each node in the relational network graph;
an iteration unit 86, configured to perform iteration of a predetermined number of times on each node in the relational network graph based on the initial characterization vector of each node obtained by the second characterization unit 85, where each iteration includes, for each node, performing neighbor node aggregation based on the first problem characterization vector obtained by the second characterization unit 85 and using an attention mechanism to obtain an updated characterization vector of each node;
the second determining unit 87 is configured to determine a numerical reasoning answer according to the updated characterization vector of each node after the iteration of the predetermined number of times, which is obtained by the iteration unit 86.
Optionally, as an embodiment, the type includes at least one of:
amount, time, percentage.
Optionally, as an embodiment, the entity includes at least one of:
name of person, place name, name of article.
Optionally, as an embodiment, the second characterization unit 85 is specifically configured to perform mean pooling on each first semantic representation vector corresponding to each semantic element position in the current problem to obtain a first problem representation vector corresponding to the current problem.
Optionally, as an embodiment, the second characterization unit 85 is specifically configured to, for any node in the nodes in the relational network graph, determine a plurality of semantic element positions in the current question and the current text, which are matched with the content of the node, and perform mean pooling on a plurality of first semantic characterization vectors corresponding to the plurality of semantic element positions to determine an initial characterization vector of the node.
Optionally, as an embodiment, the iteration unit 86 includes:
the first determining subunit is used for determining a problem driving vector corresponding to the current iteration number by using a neural network corresponding to the current iteration number based on the first problem characterization vector;
the second determining subunit is used for determining an intermediate vector of each node based on the initial characterization vector and the current characterization vector of each node and the problem driving vector determined by the first determining subunit;
the conversion subunit is configured to convert the intermediate vector of each node determined by the second determination subunit by using the query matrix, the key matrix, and the value matrix, respectively, to obtain a query vector, a key vector, and a value vector corresponding to each node, respectively;
the similarity calculation subunit is used for performing similarity calculation on the query vector corresponding to the first node obtained by the transformation subunit and the key vector corresponding to the second node to obtain the attention score from the second node to the first node; the first node and the second node are any two nodes which are adjacent to each other in the relational network graph;
and the updating subunit is used for taking any node as a target node, carrying out weighted summation on the value vectors of the neighbors obtained by the transformation subunit according to the attention scores from the neighbors of the target node to the target node, which are obtained by the similarity calculation subunit, and determining the updated characterization vector of the target node based on the summation result.
Further, the first determining subunit includes:
the first conversion module is used for enabling the first problem characterization vector to pass through a first full connection layer to obtain a first feature vector;
the activation module is used for enabling the first feature vector obtained by the first conversion module to pass through an activation function to obtain a second feature vector;
and the second conversion module is used for enabling the second feature vector obtained by the activation module to pass through a second full-connection layer corresponding to the current iteration number to obtain a problem driving vector corresponding to the current iteration number.
Further, the second determining subunit includes:
the splicing module is used for splicing the initial characterization vector of each node with the current characterization vector of each node to obtain a first splicing vector corresponding to each node;
and the intermediate conversion module is used for converting the first splicing vector of each node obtained by the splicing module into a preset dimensionality and then multiplying the first splicing vector by the problem driving vector in a bit-by-bit manner to obtain an intermediate vector of each node.
Further, the intermediate conversion module is specifically configured to pass through a third full-link layer with the first stitching vector to convert the first stitching vector into a preset dimension, where the preset dimension is the same as the problem driving vector dimension.
Optionally, as an embodiment, the second determining unit 87 includes:
an obtaining module, configured to determine, for any node in each node in the relational network graph, the current question and a plurality of semantic element positions in the current text that match the content of the node, and obtain a plurality of first semantic representation vectors corresponding to the semantic element positions;
the updating module is used for updating a plurality of first semantic representation vectors corresponding to the plurality of semantic element positions acquired by the acquiring module according to the updated representation vector of any node after the iteration of the preset times so as to determine second semantic representation vectors respectively corresponding to the plurality of semantic element positions;
the comprehensive characterization module is used for determining first comprehensive characterization vectors corresponding to the current question and the current text according to the second semantic characterization vectors respectively corresponding to the current question and the positions of the semantic elements in the current text obtained by the updating module and the first semantic characterization vectors respectively corresponding to other positions of the semantic elements;
the first determination module is used for determining an answer type corresponding to the numerical reasoning answer by using a first classification model according to the first comprehensive characterization vector obtained by the comprehensive characterization module;
and the second determining module is used for determining the numerical reasoning answer by utilizing a second classification model at least according to the answer type determined by the first determining module and the first comprehensive characterization vector.
Further, the answer types include at least one of:
answer extraction, counting questions, and arithmetic expression class questions.
Further, the answer type is answer extraction;
the second determining module is specifically configured to:
determining second problem representation vectors corresponding to the current problem according to the second semantic representation vectors corresponding to the positions of the semantic elements in the current problem respectively and the first semantic representation vectors corresponding to the positions of other semantic elements respectively;
multiplying the first comprehensive characterization vector and the second problem characterization vector according to bits to obtain a first cross characterization vector;
and splicing the first comprehensive characterization vector and the first cross characterization vector, and inputting the spliced vectors into the second classification model to obtain the numerical reasoning answer.
Further, the second classification model is used for predicting an answer starting position and an answer ending position in each semantic element position, so as to obtain the numerical reasoning answer according to the answer starting position and the answer ending position.
Further, the answer type is a counting question;
the second classification model is used for predicting numbers from 0 to 9 to obtain the numerical reasoning answer.
Further, the answer type is an arithmetic expression type question;
and the second classification model is used for predicting the symbols of each number in the current question and the current text, the symbols comprise plus signs, minus signs and 0, and numerical reasoning answers are obtained through the operation of each number and the symbols.
With the apparatus provided in the embodiment of the present specification, first, the obtaining unit 81 obtains a current question and a current text, where the current text is used to describe a condition of the current question; then, the first determining unit 82 determines that the current question and the current text include numbers, and also determines entities included therein and types corresponding to the numbers; then, the construction unit 83 constructs a relational network graph, which includes entity nodes corresponding to the entities and digital nodes corresponding to the numbers, and neighbors are formed between the digital nodes of the same type and between the entity nodes and the digital nodes having a preset relationship by connecting edges; the relationship between the entity with the preset relationship and the number is established through the relationship network diagram, and the iteration unit 86 is helpful for distinguishing the relationship between the number and the text based on the node characterization vector after the iteration of the relationship; moreover, the relationship between the numbers of the same type is established through a relationship network diagram, and the node characterization vectors iterated on the basis of the relationship are helpful for distinguishing the types of different numbers; in addition, in the iterative process of the node vector, iteration is performed based on the problem characterization vector, so that the importance of each node to answer the problem is reflected, it can be understood that the importance of the node related to the problem is higher than that of other nodes, so that the iterative process can be effectively guided, correspondingly, the node characterization vector after iteration based on the relation can embody the importance of different nodes, and finally the second determining unit 87 determines the numerical reasoning answer according to the updated characterization vector of each node after iteration of the preset number of times, so that the complex problem can be better solved, and the capability of the machine for processing the complex problem through numerical reasoning in reading and understanding can be improved.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 2.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.
Claims (32)
1. A method of numerical reasoning in machine-reading understanding, the method comprising:
acquiring a current problem and a current text, wherein the current text is used for describing conditions of the current problem;
determining each entity and each number included in the current question and the current text, and the type corresponding to each number;
constructing a relational network graph, wherein the relational network graph comprises entity nodes corresponding to the entities and digital nodes corresponding to the numbers, and neighbors are formed between the digital nodes of the same type and between the entity nodes and the digital nodes with preset relations through connecting edges;
inputting the current question and the current text into a language model, and obtaining a first semantic representation vector corresponding to each semantic element position in the current question and the current text through the language model;
determining a first problem representation vector corresponding to the current problem and an initial representation vector of each node in the relational network graph according to each first semantic representation vector;
performing iteration for a preset number of times on each node in the relational network diagram based on the initial characterization vector of each node, wherein each iteration comprises determining a problem driving vector corresponding to the current iteration number for each node based on the first problem characterization vector, and performing neighbor node aggregation by using an attention mechanism based on the problem driving vector to obtain an updated characterization vector of each node;
and determining a numerical reasoning answer according to the updated characterization vector of each node after the iteration of the preset times.
2. The method of claim 1, wherein the type comprises at least one of:
amount, time, percentage.
3. The method of claim 1, wherein the entity comprises at least one of:
name of person, place name, name of article.
4. The method of claim 1, wherein the determining a first problem characterization vector corresponding to the current problem according to each first semantic characterization vector comprises:
and performing mean value pooling on each first semantic representation vector corresponding to each semantic element position in the current problem to obtain a first problem representation vector corresponding to the current problem.
5. The method of claim 1, wherein the determining an initial characterization vector for each node in the relational network graph from each first semantic characterization vector comprises:
for any node in each node in the relational network graph, determining a plurality of semantic element positions matched with the content of the any node in the current question and the current text, and performing mean pooling on a plurality of first semantic representation vectors corresponding to the semantic element positions to determine an initial representation vector of the any node.
6. The method of claim 1, wherein each iteration comprises:
determining a problem driving vector corresponding to the current iteration number by using a neural network corresponding to the current iteration number based on the first problem characterization vector;
determining an intermediate vector for each node based on the initial token vector, the current token vector and the problem driving vector for each node;
respectively transforming the intermediate vectors of each node by using the query matrix, the key matrix and the value matrix to obtain a query vector, a key vector and a value vector which respectively correspond to each node;
similarity calculation is carried out on the query vector corresponding to the first node and the key vector corresponding to the second node, so that the attention score from the second node to the first node is obtained; the first node and the second node are any two nodes which are adjacent to each other in the relational network graph;
taking any node as a target node, carrying out weighted summation on value vectors of all neighbors according to all attention points from all neighbors of the target node to the target node, and determining an updated characterization vector of the target node based on a summation result.
7. The method of claim 6, wherein the determining, based on the first problem characterization vector, a problem drive vector corresponding to a current number of iterations with a neural network corresponding to the current number of iterations comprises:
enabling the first problem characterization vector to pass through a first full connection layer to obtain a first feature vector;
the first feature vector is subjected to an activation function to obtain a second feature vector;
and passing the second eigenvector through a second full-connection layer corresponding to the current iteration number to obtain a problem driving vector corresponding to the current iteration number.
8. The method of claim 6, wherein determining an intermediate vector for each node based on the initial token vector, the current token vector, and the problem driving vector for each node comprises:
splicing the initial characterization vector of each node with the current characterization vector of each node to obtain first spliced vectors corresponding to each node;
and converting the first splicing vector of each node into a preset dimensionality, and then carrying out bit-wise multiplication on the first splicing vector and the problem driving vector to obtain a middle vector of each node.
9. The method of claim 8, wherein the converting the first stitching vector to a preset dimension comprises:
and enabling the first splicing vector to pass through a third full-connection layer so as to be converted into a preset dimension, wherein the preset dimension is the same as the dimension of the problem driving vector.
10. The method of claim 1, wherein determining a numerical reasoning answer based on the updated characterization vector for each node after the predetermined number of iterations comprises:
for any node in each node in the relational network graph, determining a plurality of semantic element positions matched with the content of the node in the current question and the current text, and acquiring a plurality of first semantic representation vectors corresponding to the semantic element positions;
updating a plurality of acquired first semantic representation vectors corresponding to the plurality of semantic element positions according to the updated representation vector of any node after the iteration of the preset times so as to determine second semantic representation vectors respectively corresponding to the plurality of semantic element positions;
determining a first comprehensive representation vector corresponding to the current question and the current text according to each second semantic representation vector corresponding to the current question and the positions of a plurality of semantic elements in the current text and each first semantic representation vector corresponding to other semantic element positions;
determining an answer type corresponding to the numerical reasoning answer by using a first classification model according to the first comprehensive characterization vector;
and determining the numerical reasoning answer by utilizing a second classification model at least according to the answer type and the first comprehensive characterization vector.
11. The method of claim 10, wherein the answer types include at least one of:
answer extraction, counting questions, and arithmetic expression class questions.
12. The method of claim 10, wherein the answer type is answer extraction;
the determining the numerical reasoning answer by using a second classification model according to at least the answer type and the first comprehensive characterization vector comprises:
determining second problem representation vectors corresponding to the current problem according to the second semantic representation vectors corresponding to the positions of the semantic elements in the current problem respectively and the first semantic representation vectors corresponding to the positions of other semantic elements respectively;
multiplying the first comprehensive characterization vector and the second problem characterization vector according to bits to obtain a first cross characterization vector;
and splicing the first comprehensive characterization vector and the first cross characterization vector, and inputting the spliced vectors into the second classification model to obtain the numerical reasoning answer.
13. The method of claim 12, wherein the second classification model is used for predicting an answer start position and an answer end position in each semantic element position, so as to obtain the numerical reasoning answer according to the answer start position and the answer end position.
14. The method of claim 10, wherein the answer type is a counting question;
the second classification model is used for predicting numbers from 0 to 9 to obtain the numerical reasoning answer.
15. The method of claim 10, wherein the answer type is an arithmetic expression class question;
and the second classification model is used for predicting the symbols of each number in the current question and the current text, the symbols comprise plus signs, minus signs and 0, and numerical reasoning answers are obtained through the operation of each number and the symbols.
16. A numerical reasoning apparatus in machine-readable understanding, the apparatus comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a current question and a current text, and the current text is used for describing conditions of the current question;
the first determining unit is used for determining each entity and each number included in the current question and the current text acquired by the acquiring unit, and the type corresponding to each number;
the construction unit is used for constructing a relational network graph, and the relational network graph comprises entity nodes corresponding to the entities determined by the first determination unit and digital nodes corresponding to the numbers, and neighbors are formed between the digital nodes of the same type and between the entity nodes and the digital nodes with preset relations through connecting edges;
the first representation unit is used for inputting the current question and the current text acquired by the acquisition unit into a language model, and acquiring first semantic representation vectors corresponding to the positions of semantic elements in the current question and the current text through the language model;
the second characterization unit is used for determining a first problem characterization vector corresponding to the current problem and an initial characterization vector of each node in the relational network graph according to each first semantic characterization vector obtained by the first characterization unit;
the iteration unit is used for performing iteration of a preset number of times on each node in the relational network graph based on the initial characterization vector of each node obtained by the second characterization unit, wherein each iteration comprises determining a problem driving vector corresponding to the current iteration number for each node based on the first problem characterization vector obtained by the second characterization unit, and performing neighbor node aggregation based on the problem driving vector by using an attention mechanism to obtain an updated characterization vector of each node;
and the second determining unit is used for determining a numerical reasoning answer according to the updated characterization vector of each node after the iteration of the preset times, which is obtained by the iteration unit.
17. The apparatus of claim 16, wherein the type comprises at least one of:
amount, time, percentage.
18. The apparatus of claim 16, wherein the entity comprises at least one of:
name of person, place name, name of article.
19. The apparatus according to claim 16, wherein the second characterization unit is specifically configured to perform mean pooling on each first semantic representation vector corresponding to each semantic element position in the current problem to obtain the first problem representation vector corresponding to the current problem.
20. The apparatus according to claim 16, wherein the second characterization unit is specifically configured to, for any node in the relational network graph, determine a number of semantic element positions in the current question and the current text that match the content of the node, and perform mean pooling on a number of first semantic characterization vectors corresponding to the number of semantic element positions to determine an initial characterization vector of the node.
21. The apparatus of claim 16, wherein the iteration unit comprises:
the first determining subunit is used for determining a problem driving vector corresponding to the current iteration number by using a neural network corresponding to the current iteration number based on the first problem characterization vector;
the second determining subunit is used for determining an intermediate vector of each node based on the initial characterization vector and the current characterization vector of each node and the problem driving vector determined by the first determining subunit;
the conversion subunit is configured to convert the intermediate vector of each node determined by the second determination subunit by using the query matrix, the key matrix, and the value matrix, respectively, to obtain a query vector, a key vector, and a value vector corresponding to each node, respectively;
the similarity calculation subunit is used for performing similarity calculation on the query vector corresponding to the first node obtained by the transformation subunit and the key vector corresponding to the second node to obtain the attention score from the second node to the first node; the first node and the second node are any two nodes which are adjacent to each other in the relational network graph;
and the updating subunit is used for taking any node as a target node, carrying out weighted summation on the value vectors of the neighbors obtained by the transformation subunit according to the attention scores from the neighbors of the target node to the target node, which are obtained by the similarity calculation subunit, and determining the updated characterization vector of the target node based on the summation result.
22. The apparatus of claim 21, wherein the first determining subunit comprises:
the first conversion module is used for enabling the first problem characterization vector to pass through a first full connection layer to obtain a first feature vector;
the activation module is used for enabling the first feature vector obtained by the first conversion module to pass through an activation function to obtain a second feature vector;
and the second conversion module is used for enabling the second feature vector obtained by the activation module to pass through a second full-connection layer corresponding to the current iteration number to obtain a problem driving vector corresponding to the current iteration number.
23. The apparatus of claim 21, wherein the second determining subunit comprises:
the splicing module is used for splicing the initial characterization vector of each node with the current characterization vector of each node to obtain a first splicing vector corresponding to each node;
and the intermediate conversion module is used for converting the first splicing vector of each node obtained by the splicing module into a preset dimensionality and then multiplying the first splicing vector by the problem driving vector in a bit-by-bit manner to obtain an intermediate vector of each node.
24. The apparatus of claim 23, wherein the intermediate conversion module is specifically configured to pass the first stitching vector through a third fully connected layer to convert to a predetermined dimension, the predetermined dimension being the same as the problem driving vector dimension.
25. The apparatus of claim 16, wherein the second determining unit comprises:
an obtaining module, configured to determine, for any node in each node in the relational network graph, the current question and a plurality of semantic element positions in the current text that match the content of the node, and obtain a plurality of first semantic representation vectors corresponding to the semantic element positions;
the updating module is used for updating a plurality of first semantic representation vectors corresponding to the plurality of semantic element positions acquired by the acquiring module according to the updated representation vector of any node after the iteration of the preset times so as to determine second semantic representation vectors respectively corresponding to the plurality of semantic element positions;
the comprehensive characterization module is used for determining first comprehensive characterization vectors corresponding to the current question and the current text according to the second semantic characterization vectors respectively corresponding to the current question and the positions of the semantic elements in the current text obtained by the updating module and the first semantic characterization vectors respectively corresponding to other positions of the semantic elements;
the first determination module is used for determining an answer type corresponding to the numerical reasoning answer by using a first classification model according to the first comprehensive characterization vector obtained by the comprehensive characterization module;
and the second determining module is used for determining the numerical reasoning answer by utilizing a second classification model at least according to the answer type determined by the first determining module and the first comprehensive characterization vector.
26. The apparatus of claim 25, wherein the answer type comprises at least one of:
answer extraction, counting questions, and arithmetic expression class questions.
27. The apparatus of claim 25, wherein the answer type is answer extraction;
the second determining module is specifically configured to:
determining second problem representation vectors corresponding to the current problem according to the second semantic representation vectors corresponding to the positions of the semantic elements in the current problem respectively and the first semantic representation vectors corresponding to the positions of other semantic elements respectively;
multiplying the first comprehensive characterization vector and the second problem characterization vector according to bits to obtain a first cross characterization vector;
and splicing the first comprehensive characterization vector and the first cross characterization vector, and inputting the spliced vectors into the second classification model to obtain the numerical reasoning answer.
28. The apparatus of claim 27, wherein the second classification model is configured to predict an answer start position and an answer end position in each semantic element position, so as to obtain the numerical reasoning answer according to the answer start position and the answer end position.
29. The apparatus of claim 25, wherein the answer type is a counting question;
the second classification model is used for predicting numbers from 0 to 9 to obtain the numerical reasoning answer.
30. The apparatus of claim 25, wherein the answer type is an arithmetic expression class question;
and the second classification model is used for predicting the symbols of each number in the current question and the current text, the symbols comprise plus signs, minus signs and 0, and numerical reasoning answers are obtained through the operation of each number and the symbols.
31. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-15.
32. A computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of any of claims 1-15.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011436272.8A CN112507074A (en) | 2020-07-31 | 2020-07-31 | Numerical reasoning method and device in machine reading understanding |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010759810.0A CN111737419B (en) | 2020-07-31 | 2020-07-31 | Numerical reasoning method and device in machine reading understanding |
CN202011436272.8A CN112507074A (en) | 2020-07-31 | 2020-07-31 | Numerical reasoning method and device in machine reading understanding |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010759810.0A Division CN111737419B (en) | 2020-07-31 | 2020-07-31 | Numerical reasoning method and device in machine reading understanding |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112507074A true CN112507074A (en) | 2021-03-16 |
Family
ID=72656807
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010759810.0A Active CN111737419B (en) | 2020-07-31 | 2020-07-31 | Numerical reasoning method and device in machine reading understanding |
CN202011436272.8A Pending CN112507074A (en) | 2020-07-31 | 2020-07-31 | Numerical reasoning method and device in machine reading understanding |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010759810.0A Active CN111737419B (en) | 2020-07-31 | 2020-07-31 | Numerical reasoning method and device in machine reading understanding |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN111737419B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111737419B (en) * | 2020-07-31 | 2020-12-04 | 支付宝(杭州)信息技术有限公司 | Numerical reasoning method and device in machine reading understanding |
CN114510941B (en) * | 2022-01-19 | 2024-06-25 | 重庆大学 | Discrete reasoning method and system based on clues |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140229161A1 (en) * | 2013-02-12 | 2014-08-14 | International Business Machines Corporation | Latent semantic analysis for application in a question answer system |
CN108830157A (en) * | 2018-05-15 | 2018-11-16 | 华北电力大学(保定) | Human bodys' response method based on attention mechanism and 3D convolutional neural networks |
CN109325038A (en) * | 2018-09-05 | 2019-02-12 | 天津航旭科技发展有限公司 | Knowledge mapping extended model, structural knowledge storage method and equipment |
CN110377686A (en) * | 2019-07-04 | 2019-10-25 | 浙江大学 | A kind of address information Feature Extraction Method based on deep neural network model |
CN110674279A (en) * | 2019-10-15 | 2020-01-10 | 腾讯科技(深圳)有限公司 | Question-answer processing method, device, equipment and storage medium based on artificial intelligence |
CN111737419A (en) * | 2020-07-31 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Numerical reasoning method and device in machine reading understanding |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20200044201A (en) * | 2018-10-10 | 2020-04-29 | 한국전자통신연구원 | Neural machine translation model learning method and apparatus for improving translation performance |
CN111291243B (en) * | 2019-12-30 | 2022-07-12 | 浙江大学 | Visual reasoning method for uncertainty of spatiotemporal information of character event |
-
2020
- 2020-07-31 CN CN202010759810.0A patent/CN111737419B/en active Active
- 2020-07-31 CN CN202011436272.8A patent/CN112507074A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140229161A1 (en) * | 2013-02-12 | 2014-08-14 | International Business Machines Corporation | Latent semantic analysis for application in a question answer system |
CN108830157A (en) * | 2018-05-15 | 2018-11-16 | 华北电力大学(保定) | Human bodys' response method based on attention mechanism and 3D convolutional neural networks |
CN109325038A (en) * | 2018-09-05 | 2019-02-12 | 天津航旭科技发展有限公司 | Knowledge mapping extended model, structural knowledge storage method and equipment |
CN110377686A (en) * | 2019-07-04 | 2019-10-25 | 浙江大学 | A kind of address information Feature Extraction Method based on deep neural network model |
CN110674279A (en) * | 2019-10-15 | 2020-01-10 | 腾讯科技(深圳)有限公司 | Question-answer processing method, device, equipment and storage medium based on artificial intelligence |
CN111737419A (en) * | 2020-07-31 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Numerical reasoning method and device in machine reading understanding |
Non-Patent Citations (1)
Title |
---|
刘丽佳 等: "基于LM算法的领域概念实体属性关系抽取", 中文信息学报, no. 06, 30 November 2014 (2014-11-30) * |
Also Published As
Publication number | Publication date |
---|---|
CN111737419B (en) | 2020-12-04 |
CN111737419A (en) | 2020-10-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Diao et al. | Black-box prompt learning for pre-trained language models | |
CN112464641A (en) | BERT-based machine reading understanding method, device, equipment and storage medium | |
CN111506714A (en) | Knowledge graph embedding based question answering | |
CN113535984A (en) | Attention mechanism-based knowledge graph relation prediction method and device | |
CN109582956A (en) | text representation method and device applied to sentence embedding | |
CN112711953A (en) | Text multi-label classification method and system based on attention mechanism and GCN | |
Oncina et al. | Learning stochastic edit distance: Application in handwritten character recognition | |
US20190080352A1 (en) | Segment Extension Based on Lookalike Selection | |
CN111737419B (en) | Numerical reasoning method and device in machine reading understanding | |
CN111160000B (en) | Composition automatic scoring method, device terminal equipment and storage medium | |
Florescu et al. | Algorithmically generating new algebraic features of polynomial systems for machine learning | |
CN110245349A (en) | A kind of syntax dependency parsing method, apparatus and a kind of electronic equipment | |
CN113032676B (en) | Recommendation method and system based on micro-feedback | |
Rai | Advanced deep learning with R: Become an expert at designing, building, and improving advanced neural network models using R | |
Schwier et al. | Zero knowledge hidden markov model inference | |
KR102582779B1 (en) | Knowledge completion method and apparatus through neuro symbolic-based relation embeding | |
CN113806489A (en) | Method, electronic device and computer program product for dataset creation | |
CN117252665B (en) | Service recommendation method and device, electronic equipment and storage medium | |
JP2010272004A (en) | Discriminating apparatus, discrimination method, and computer program | |
CN114936220B (en) | Search method and device for Boolean satisfiability problem solution, electronic equipment and medium | |
WO2024098282A1 (en) | Geometric problem-solving method and apparatus, and device and storage medium | |
Fei et al. | Soft Reasoning on Uncertain Knowledge Graphs | |
CN114676237A (en) | Sentence similarity determining method and device, computer equipment and storage medium | |
Shirbhayye et al. | An accurate prediction of MPG (Miles per Gallon) using linear regression model of machine learning | |
CN113297854A (en) | Method, device and equipment for mapping text to knowledge graph entity and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40049162 Country of ref document: HK |