CN113792550A - Method and device for determining predicted answer and method and device for reading and understanding - Google Patents
Method and device for determining predicted answer and method and device for reading and understanding Download PDFInfo
- Publication number
- CN113792550A CN113792550A CN202111110989.8A CN202111110989A CN113792550A CN 113792550 A CN113792550 A CN 113792550A CN 202111110989 A CN202111110989 A CN 202111110989A CN 113792550 A CN113792550 A CN 113792550A
- Authority
- CN
- China
- Prior art keywords
- word
- graph network
- feature vector
- initial
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 147
- 239000013598 vector Substances 0.000 claims abstract description 822
- 238000012549 training Methods 0.000 claims abstract description 137
- 238000002372 labelling Methods 0.000 claims abstract description 24
- 238000012545 processing Methods 0.000 claims description 117
- 239000012634 fragment Substances 0.000 claims description 42
- 230000006870 function Effects 0.000 claims description 39
- 238000000605 extraction Methods 0.000 claims description 37
- 238000010276 construction Methods 0.000 claims description 31
- 238000004364 calculation method Methods 0.000 claims description 24
- 230000011218 segmentation Effects 0.000 claims description 23
- 239000011159 matrix material Substances 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 2
- 238000012546 transfer Methods 0.000 claims description 2
- 239000000523 sample Substances 0.000 description 430
- 230000008569 process Effects 0.000 description 29
- 239000013074 reference sample Substances 0.000 description 18
- 238000010586 diagram Methods 0.000 description 17
- 238000003058 natural language processing Methods 0.000 description 14
- 238000004458 analytical method Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The application provides a method and a device for determining a predicted answer, and a method and a device for reading and understanding, wherein the method for determining the predicted answer comprises the following steps: converting the value of each dimension of a target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that the prediction label of the word unit corresponding to each dimension is at least one label; determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension; and determining a predicted answer based on the predicted label of the word unit corresponding to each dimension. By means of the sequence labeling, the prediction label of each word unit can be determined, the prediction answer can be determined according to the prediction label, when the model parameters are adjusted, the prediction label of the correct prediction answer can be closer to the correct label, and the training efficiency and the use accuracy of the reading understanding model can be improved by means of the method.
Description
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a method and an apparatus for determining a predicted answer, a method and an apparatus for reading and understanding, a computing device, and a computer-readable storage medium.
Background
Machine reading understanding is a research dedicated to teach machine reading human language and understand its connotation, and is widely used as a hot direction in the field of natural language processing with the development of natural language processing technology. The machine reading understanding task focuses more on the understanding of the text and learns the relevant information from the text so that questions related to the text can be answered.
In the prior art, a method for training a machine to understand a text is mainly to construct a model to be trained, and obtain a reading understanding model meeting requirements by training the model to be trained, so that the reading understanding model can complete a reading understanding task as accurately as possible. Specifically, the sample question and the sample answer may be input into the model to be trained as training samples, the model to be trained may output a predicted answer, and the model to be trained is optimized according to a difference between the predicted answer and the sample answer, so as to obtain a desired reading understanding model.
However, the above method only considers the correlation between the questions and the answers, which is relatively single, and some questions may be applicable to different texts, and the obtained answers for different texts are different, and in addition, the method directly determines the predicted answer based on the sample question and the sample answer, which considers that the sample question and the sample answer are integrated, the obtained predicted answer has a low accuracy, which may result in an increase in the number of times of model training, and therefore, the training efficiency of training the reading understanding model by the above method is low, and the accuracy of the reading understanding task performed by the trained reading understanding model may be low.
Disclosure of Invention
In view of the above, the present disclosure provides a method for determining a predicted answer. The application also relates to a reading understanding model training method, a reading understanding method, a predicted answer determining device, a reading understanding model training device, a reading understanding device, a computing device and a computer readable storage medium, which are used for solving the technical defects in the prior art.
According to a first aspect of embodiments of the present application, there is provided a method for training a reading understanding model, including:
constructing an initial first graph network of sample text fragments and sample answers and an initial second graph network of sample questions and the sample answers by reading a graph construction network layer of an understanding model;
inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network;
inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer;
training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached.
According to a second aspect of embodiments of the present application, there is provided a reading understanding method including:
constructing an initial first graph network of a target text and a target answer and an initial second graph network of a target question and the target answer through a graph construction network layer of a reading understanding model, wherein the reading understanding model is trained by the method of the first aspect;
inputting the target text, the target question and the target answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network;
and inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain an answer of the target question.
According to a third aspect of embodiments of the present application, there is provided a method for determining a predicted answer, including:
converting the value of each dimension of a target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that the prediction label of the word unit corresponding to each dimension is at least one label;
determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension;
and determining a predicted answer based on the predicted label of the word unit corresponding to each dimension.
According to a fourth aspect of embodiments of the present application, there is provided another reading and understanding method, including:
constructing an initial first graph network of a target text and a target answer and an initial second graph network of a target question and the target answer by reading a graph construction network layer of an understanding model, wherein the reading understanding model is obtained by training the method of the first aspect or the third aspect;
inputting the target text, the target question and the target answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network;
inputting the first graph network and the second graph network into a graph convolution network layer of the reading understanding model, and determining a target hidden layer feature vector;
converting the value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and at least one probability corresponding to each dimension represents the probability that the label of the word unit corresponding to each dimension is at least one label;
determining a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension;
and determining an answer of the target question based on the label of the word unit corresponding to each dimension.
According to a fifth aspect of embodiments of the present application, there is provided an apparatus for determining a predicted answer, including:
the first conversion module is configured to convert a value of each dimension of a target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that a prediction label of the word unit corresponding to each dimension is at least one label;
a first determining module configured to determine a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension;
a second determining module configured to determine a predicted answer based on the predicted label of the word unit corresponding to each dimension.
According to a sixth aspect of embodiments of the present application, there is provided a reading and understanding apparatus, comprising:
a graph network construction module configured to construct an initial first graph network of a target text and a target answer and an initial second graph network of a target question and the target answer by reading a graph construction network layer of an understanding model, wherein the reading understanding model is trained by the method of the first aspect or the third aspect;
a text processing module configured to input the target text, the target question and the target answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
a third determination module configured to input the first graph network and the second graph network into a graph convolution network layer of the reading understanding model, and determine a target hidden layer feature vector;
a second conversion module configured to convert a value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one probability corresponding to each dimension represents a probability that a label of the word unit corresponding to each dimension is at least one label;
a fourth determining module configured to determine a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension;
a fifth determining module configured to determine an answer to the target question based on the label of the word unit corresponding to each dimension.
According to a seventh aspect of embodiments of the present application, there is provided a training apparatus for reading an understanding model, including:
a first graph network construction module configured to construct an initial first graph network of sample text fragments and sample answers by reading a graph construction network layer of an understanding model, and construct an initial second graph network of sample questions and the sample answers;
a first text processing module configured to input the sample text segment, the sample question, and the sample answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
the prediction module is configured to input the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a prediction answer;
a training module configured to train the reading understanding model based on a difference between the predicted answer and the sample answer until a training stop condition is reached.
According to an eighth aspect of embodiments of the present application, there is provided a reading and understanding apparatus including:
a second graph network construction module configured to construct an initial first graph network of the target text and the target answer and an initial second graph network of the target question and the target answer by reading a graph construction network layer of the understanding model;
a second text processing module configured to input the target text, the target question and the target answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
a determination module configured to input the first graph network and the second graph network into a graph volume network layer of the reading understanding model, and determine an answer to the target question.
According to a ninth aspect of embodiments of the present application, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the method of the first, second, third or fourth aspect when executing the instructions.
According to a tenth aspect of embodiments of the present application, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the method of the first, second, third or fourth aspect described above.
According to an eleventh aspect of embodiments of the present application, there is provided a chip storing computer instructions that, when executed by the chip, implement the steps of the method according to the first, second, third or fourth aspect.
In the embodiment of the application, an initial first graph network of a sample text fragment and a sample answer is constructed by reading a graph construction network layer of an understanding model, and an initial second graph network of a sample question and the sample answer is constructed; inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network; inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer; training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached. By the method, the incidence relation among the sample text segment, the sample question and the sample answer can be effectively utilized, the reading understanding model is trained by combining the incidence relation among the sample text segment, the sample question and the sample answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
In the embodiment of the application, a value of each dimension of a target hidden layer feature vector is converted into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that a prediction label of the word unit corresponding to each dimension is at least one label; determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension; and determining a predicted answer based on the predicted label of the word unit corresponding to each dimension. By means of the sequence labeling, the prediction label of each word unit can be determined, the prediction answer can be determined according to the prediction label, when the model parameters are adjusted, the prediction label of the correct prediction answer can be closer to the correct label, and the method can improve the training efficiency and the use accuracy of the reading understanding model.
Drawings
FIG. 1 is a block diagram of a computing device according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for training a reading understanding model according to an embodiment of the present application;
FIG. 3 is a data flow diagram between layers for reading and understanding a model during model training according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an initial third graph network provided by an embodiment of the present application;
FIG. 5 is a schematic diagram of an initial first graph network provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of an initial fourth graph network provided by an embodiment of the present application;
FIG. 7 is a schematic diagram of an initial second graph network provided by an embodiment of the present application;
FIG. 8 is a process flow diagram of a reading understanding model training method applied to choice questions according to an embodiment of the present application;
FIG. 9 is a flow chart of a reading understanding method provided by an embodiment of the present application;
FIG. 10 is a diagram illustrating a flow of data between layers of a read understanding model when applied to an application according to an embodiment of the present application;
FIG. 11 is a schematic diagram of another initial first graph network provided by an embodiment of the present application;
FIG. 12 is a schematic diagram of another initial second graph network provided by an embodiment of the present application;
FIG. 13 is a process flow diagram of a reading understanding model applied to a choice question provided by an embodiment of the present application;
FIG. 14 is a schematic diagram of a reading comprehension model training apparatus according to an embodiment of the present application;
fig. 15 is a schematic structural diagram of a reading and understanding apparatus according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application. The word "if," as used herein, may be interpreted as "responsive to a determination," depending on the context.
First, the noun terms to which one or more embodiments of the present application relate are explained.
The Bert model: the Bidirectional Encoder reconstruction from Transformer is a dynamic word vector technology, a Bidirectional Transformer model is adopted to train a non-labeled data set, and characteristic information of preceding and following words is comprehensively considered, so that the problems of one word ambiguity and the like can be better solved.
GCN model: graph connected Network, Graph convolution Network model, can be used to extract features of a Graph.
Word vector: a representation of a word is intended to enable a computer to process the word.
Word embedding: refers to the process of embedding a high-dimensional space with the number of all words into a continuous vector space with a much lower dimension, each word or phrase being mapped as a vector on the real number domain.
Word unit: before any actual processing of the input text, it needs to be segmented into language units such as words, punctuation marks, numbers or letters, which are called word units. For English text, a word unit can be a word, a punctuation mark, a number, etc.; for Chinese text, the smallest word unit can be a word, a punctuation mark, a number, etc.
word2 vec: a method for word embedding processing is an efficient word vector training method constructed by Mikolov on the basis of Bengio Neural Network Language Model (NNLM). Namely, the method can be used for carrying out word embedding processing on the text to obtain a word vector of the text.
A first word unit: in a model training stage of reading and understanding a model, a first word unit is a word unit obtained after word segmentation processing is carried out on a sample text fragment; in the stage of executing the reading understanding task by the reading understanding model, the first word unit is a word unit obtained after word segmentation processing is carried out on the target text.
The first word unit group: a word unit group composed of a plurality of first word units.
A second word unit: in a model training stage of reading and understanding the model, the second word unit is a word unit obtained after word segmentation processing is carried out on the sample problem; in the stage of executing the reading understanding task by the reading understanding model, the second word unit is a word unit obtained after performing word segmentation processing on the target problem.
The second word unit group: a word unit group composed of a plurality of second word units.
A third word unit: in the model training stage of the reading understanding model, the second word unit is a word unit obtained after the word segmentation processing is carried out on the sample answer; in the stage of executing the reading understanding task by the reading understanding model, the second word unit is a word unit obtained after performing word segmentation processing on the target answer.
The third word unit group: a word unit group composed of a plurality of third word units.
A first feature vector: in a model training stage of the reading understanding model, a first feature vector is a vector obtained after word embedding processing is carried out on a first word unit in a sample text fragment; in the stage of executing the reading understanding task by the reading understanding model, the first feature vector is a vector obtained by performing word embedding processing on a first word unit of the target text.
First feature vector group: and the characteristic vector group is formed by a plurality of first characteristic vectors.
Second feature vector: in a model training stage of reading and understanding the model, the second feature vector is a vector obtained after word embedding processing is carried out on a second word unit in the sample problem; in the stage of executing the reading understanding task by the reading understanding model, the first feature vector is a vector obtained by performing word embedding processing on the second word unit of the target problem.
Second feature vector group: and the feature vector group is formed by a plurality of second feature vectors.
The third feature vector: in the model training stage of the reading understanding model, the third feature vector is a vector obtained after word embedding processing is carried out on a third word unit in the sample answer; in the stage of executing the reading understanding task by the reading understanding model, the third feature vector is a vector obtained by performing word embedding processing on a third word unit of the target answer.
Third feature vector group: and the characteristic vector group is formed by a plurality of third characteristic vectors.
Initial first graph network: in a model training stage of reading and understanding a model, an initial first graph network is a graph network for representing the incidence relation between a sample text fragment and a sample answer; in the stage of executing the reading understanding task by the reading understanding model, the initial first graph network is a graph network which represents the incidence relation between the target text and the target answer.
Initial second graph network: in a model training stage of reading and understanding the model, the initial second graph network is a graph network for representing the incidence relation between the sample question and the sample answer; in the stage of executing the reading understanding task by the reading understanding model, the initial second graph network is a graph network for representing the incidence relation between the target question and the target answer.
Initial third graph network: in a model training stage of reading and understanding the model, the initial third graph network is a graph network for representing the dependency relationship among word units in the sample text segment; in the stage of executing the reading understanding task by the reading understanding model, the initial third graph network is a graph network for representing the dependency relationship between word units in the target text.
Initial fourth graph network: in a model training phase of reading and understanding the model, the initial third graph network is a graph network for representing the dependency relationship between word units in the sample problem; in the stage of executing the reading understanding task by the reading understanding model, the initial third graph network is a graph network for characterizing the dependency relationship between word units in the target problem.
First graph network: initial first graph network including attention values of nodes and attention values of edges
Second graph network: initial second graph network comprising attention values of nodes and attention values of edges
First hidden layer feature vector: and the vector representation of the first graph network is obtained after the convolution processing is carried out on the first graph network through the graph convolution network layer.
Second hidden layer feature vector: and the second graph network is subjected to convolution processing by the graph convolution network layer to obtain vector representation of the second graph network.
Target hidden layer feature vector: and combining the first hidden layer feature vector and the second hidden layer feature vector to obtain a vector representation.
The present application provides a reading comprehension model training method, and the present application also relates to a reading comprehension model training device, a computing device, and a computer readable storage medium, which are described in detail in the following embodiments one by one.
FIG. 1 shows a block diagram of a computing device 100 according to an embodiment of the present application. The components of the computing device 100 include, but are not limited to, memory 110 and processor 120. The processor 120 is coupled to the memory 110 via a bus 130 and a database 150 is used to store data.
Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present application, the above-mentioned components of the computing device 100 and other components not shown in fig. 1 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 1 is for purposes of example only and is not limiting as to the scope of the present application. Those skilled in the art may add or replace other components as desired.
Computing device 100 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.
Wherein, the processor 120 may execute the steps in the training method of reading understanding model shown in fig. 2. Fig. 2 shows a flowchart of a training method for reading understanding models, which includes steps 202 to 210, according to an embodiment of the present application.
Wherein the reading understanding model is used for executing the reading understanding task, and the correct answer of the question can be output under the condition of the given text, the question and the answer to be selected. The sample answer is a correct answer to the sample question corresponding to the sample text fragment. The sample text segment may be any text segment obtained by segmenting the sample text.
The initial first graph network is used for representing the incidence relation between the sample text fragments and the sample answers, and the initial second graph network is used for representing the incidence relation between the sample questions and the sample answers.
In some embodiments, the training data set may be constructed in advance from a plurality of sample texts, a plurality of sample questions, and a plurality of sample answers.
As an example, there is a corresponding relationship among a plurality of sample texts, a plurality of sample questions and a plurality of sample answers, since the sample texts are usually chapter-level texts, the data size is relatively large, and the model processing is relatively difficult, each sample text can be segmented or sentence-wise processed to obtain a plurality of sample text fragments of each sample text, and the plurality of sample text fragments of each sample text all correspond to the sample questions and sample answers corresponding to the sample text, so that a plurality of sample text fragments, a plurality of sample questions and a plurality of sample answers can be stored in the training data set, and a corresponding relationship exists among the sample text fragments, the sample questions and the sample answers, and one sample text fragment, one sample question and one sample answer in the corresponding relationship can be referred to as a set of training data.
As another example, since the sample text is usually a chapter-level text, the data size is large, and the model processing is difficult, each sample text may be segmented or sentence-divided to obtain a plurality of sample text fragments of each sample text. Taking a reference sample text as an example, a plurality of sample text segments of the reference sample text may be referred to as reference sample text segments, a sample question corresponding to the reference sample text may be referred to as a reference sample question, and a sample answer corresponding to both the reference sample text and the reference sample question may be referred to as a reference sample answer, then the plurality of sample text segments may be respectively matched with the reference sample question, a plurality of first similarities may be determined, and the plurality of sample text segments may be respectively matched with the reference sample answer, a plurality of second similarities may be determined, a reference sample text segment whose first similarity and second similarity are both greater than a similarity threshold may be obtained, and it may be considered that the obtained reference sample text segment has a strong association with the reference sample question and the reference sample text segment has a strong association with the reference sample answer, therefore, the obtained reference sample text fragment, the reference sample question and the reference sample answer may be taken as a set of training data. And performing the above processing on each sample text to obtain multiple groups of training data, wherein the sample text segments in each group of training data have higher relevance with the corresponding sample questions and sample answers.
By the two exemplary ways described above, a training data set including multiple sets of training data may be created, and the multiple sets of training data may be obtained from the training data set and input into the graph building network layer of the reading understanding model.
Illustratively, referring to fig. 3, a sample text fragment, a sample question, and a sample answer may be input into a graph building network layer of the reading understanding model, an initial first graph network based on the sample text fragment and the sample answer, and an initial second graph network based on the sample question and the sample answer.
In implementation, the specific implementation of constructing an initial first graph network of sample text fragments and sample answers and an initial second graph network of sample questions and the sample answers by reading a graph construction network layer of the understanding model may include: and constructing an initial third graph network based on the dependency relationship among the word units in the sample text segment, and constructing an initial fourth graph network based on the dependency relationship among the word units in the sample question. And constructing the initial first graph network based on the incidence relation between the initial third graph network and the sample answer, and constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the sample answer.
Wherein the initial third graph network is used for characterizing the dependency relationship between word units in the sample text segment. The initial fourth graph network is used to characterize dependencies between word units in the sample problem.
That is, an initial third graph network reflecting the dependency relationship between word units in the sample text segment may be constructed first, and then the first graph network may be constructed based on the initial third graph network and according to the association relationship between the sample answer and the sample text segment. And constructing an initial fourth graph network reflecting the dependency relationship among word units in the sample question, and constructing a second graph network according to the incidence relationship between the sample answer and the sample question on the basis of the initial fourth graph network.
Therefore, the incidence relation between the word unit of the sample text segment and the word unit of the sample answer can be clearly described through the first graph network, the incidence relation between the word unit of the sample question and the word unit of the sample answer can be clearly described through the second graph network, the incidence relation between the word unit of the sample question and the word unit of the sample answer is preliminarily obtained, and preparation is made for further subsequent use.
In some embodiments, constructing the initial third graph network based on the dependencies between word units in the sample text segments may include: taking word units in the sample text fragment as nodes to obtain a plurality of nodes; and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample text fragment to obtain the initial third graph network.
That is, the initial third graph network that characterizes the dependency relationship between word units in the sample text segment may be constructed by taking the word units in the sample text segment as nodes and the dependency relationship between the word units as edges. Therefore, the association relationship among the word units in the sample text segment can be preliminarily determined, and the learning of the model on the relationship among the word units in the sample text segment can be strengthened.
As an example, dependency analysis may be performed on the sample text segment through a Stanford Core NLP (Natural Language Processing) algorithm, so as to obtain a dependency relationship between multiple word units in the sample text segment.
Illustratively, by performing dependency analysis on a sample text segment "i love my country" through a Stanford Core NLP algorithm, we can obtain "i" as a subject, "love" as a predicate, and "my country" as an object, and can obtain the dependency relationship between "i", "love", "i", "of", "ancestor", and "country" with each other. For example, in the sample text segment, one "me" and "love", one "me" and "am" both have dependencies, one "love" and "am" also have dependencies, and one "am" and "mn" have dependencies, and based on the above dependencies, the initial third graph network shown in fig. 4 can be obtained.
In some embodiments, the specific implementation of constructing the initial first graph network based on the incidence relation between the initial third graph network and the sample answer may include: and connecting the target node with a node in the initial third graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample text segment to obtain the initial first graph network.
That is, the word unit in the sample answer may be used as a target node, and the target node may be connected to a node corresponding to the word unit of the sample text segment in the initial third graph network, so that the initial first graph network representing the association relationship between the word unit of the sample text segment and the word unit of the sample answer may be obtained, and the model may preliminarily learn the association relationship between the sample text segment and the sample answer.
As an example, a target node corresponding to a word unit in the sample answer may be connected to a node corresponding to each word unit in the sample text segment. Or, as another example, the target node corresponding to the word unit in the sample answer may be connected to the node in the initial third graph network, which has an association relationship with the target node.
Illustratively, taking the sample text segment as "i love my home" and the sample answer as "home", respectively connecting the "ancestors" in the sample answer to each node in the initial third graph network, and respectively connecting the "country" in the sample answer to each node in the initial third graph network, a first graph network shown in fig. 5 may be obtained, and the bold node in fig. 5 is the target node.
In some embodiments, the constructing the initial fourth graph network based on the dependencies between word units in the sample question may include: taking word units in the sample problem as nodes to obtain a plurality of nodes; and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample problem to obtain the initial fourth graph network.
That is, the initial fourth graph network that characterizes the dependency relationship between word units in the sample problem may be constructed by taking the word units in the sample problem as nodes and the dependency relationship between the word units as edges. Therefore, the incidence relation among the word units in the sample problem can be preliminarily determined, and the learning of the model on the relation among the word units in the sample problem can be strengthened.
As an example, the dependency analysis of the sample problem can be performed by the Stanford Core NLP algorithm, and the dependency relationship between multiple word units in the sample problem can be obtained.
As an example, by performing dependency analysis on the sample problem "who i love" through the Stanford Core NLP algorithm, we can get "i" as the subject, "love" as the predicate, "who" as the object, and can get the dependency relationship between "i", "love", "who" and each other. For example, in the sample problem, there is a dependency relationship between "i" and "love", there is a dependency relationship between "love" and "who", and there is a dependency relationship between "i" and "who", based on the above dependency relationship, referring to fig. 6, the initial fourth graph network shown in fig. 6 can be obtained.
In some embodiments, the specific implementation of constructing the initial first graph network based on the incidence relation between the initial third graph network and the sample answer may include: and connecting the target node with a node in the initial third graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample text segment to obtain the initial first graph network.
In some embodiments, the specific implementation of constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the sample answer may include: and connecting the target node with a node in the initial fourth graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample question to obtain the initial second graph network.
That is, the word unit in the sample answer may be used as the target node, and the target node may be connected to the node corresponding to the word unit of the sample question in the initial fourth graph network, so that the initial second graph network representing the association relationship between the word unit of the sample question and the word unit of the sample answer may be obtained, and the model may preliminarily learn the association relationship between the sample question and the sample answer.
As an example, a target node corresponding to a word unit in the sample answer may be connected to a node corresponding to each word unit in the sample question. Or, as another example, a target node corresponding to the word unit in the sample answer may be connected to a node in the initial fourth graph network, which has an association relationship with the target node.
Illustratively, taking the sample question as "who i love", and the sample answer as "home", the "ancestors" in the sample answer may be respectively connected to each node in the initial fourth graph network, and the "country" in the sample answer may be respectively connected to each node in the initial fourth graph network, so that the initial second graph network shown in fig. 7 may be obtained, and the bold node in fig. 7 is the target node.
In the embodiment of the application, the reading understanding model can be trained by fully utilizing the incidence relation between the sample text segment and the sample answer and the incidence relation between the sample text segment and the sample question, so that the accuracy of the reading understanding task executed by the reading understanding model can be improved.
As one example, a feature extraction layer may be used to extract features of the input text. The first feature vector group is a feature vector group obtained after the sample text segment passes through the feature extraction layer, the second feature vector group is a feature vector group obtained after the sample question sample text segment passes through the feature extraction layer, and the third feature vector group is a feature vector group obtained after the sample answer sample text segment passes through the feature extraction layer. And the first feature vector group comprises a plurality of first feature vectors, each first feature vector corresponds to one word unit in the sample text segment, the second feature vector group comprises a plurality of second feature vectors, each second feature vector corresponds to one word unit in the sample question, the third feature vector group comprises a plurality of third feature vectors, and each third feature vector corresponds to one word unit in the sample answer.
For example, referring to fig. 3, a sample text segment, a sample question, and a sample answer may be input into a feature extraction layer of the reading understanding model to determine a first feature vector group, a second feature vector group, and a third feature vector group, respectively.
In implementation, the specific implementation of this step may include: performing word segmentation processing on the sample text segment, the sample question and the sample answer to respectively obtain a first word unit group, a second word unit group and a third word unit group; performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to obtain a first word vector group, a second word vector group and a third word vector group respectively; and coding the first word vector group, the second word vector group and the third word vector group to respectively obtain the first feature vector group, the second feature vector group and the third feature vector group.
In the embodiment of the present application, the feature extraction layer may include a word embedding processing function and an encoding function. As one example, the feature extraction layer may include a word embedding processing module and an encoding module.
Illustratively, the feature extraction layer may employ the structure of the Bert model. Because the feature vector obtained by the Bert model is the feature vector combined with full-text semantic information, the feature vectors of sample text fragments, sample questions and word units in sample answers can be more fully utilized, and the accuracy of reading and understanding the model can be improved.
As an example, taking a sample text segment as an example, if the sample text segment is a chinese text, a word may be divided into a word unit, and a punctuation mark may be divided into a word unit; if the sample text segment is a foreign language text, a word can be divided into a word unit, and a phrase can be divided into a word unit; if there are numbers in the sample text segment, the numbers can be divided into a word unit individually.
Exemplarily, assuming that the sample text segment is "lilai is called" fairy ", seven first word units of" li "," white "," by "," say "," verse "," fairy "can be obtained.
As an example, a word embedding process may be performed on each first word unit in the first word unit group in a one-hot (one-hot) encoding manner to obtain a word vector of each first word unit, a word embedding process may be performed on each second word unit in the second word unit group to obtain a word vector of each second word unit, and a word embedding process may be performed on each word unit in the third word unit group to obtain a word vector of each third word unit.
As another example, word embedding processing may be performed on each first word unit in the first word unit group in a word2vec coding manner to obtain a word vector of each first word unit, word embedding processing may be performed on each second word unit in the second word unit group to obtain a word vector of each second word unit, and word embedding processing may be performed on each word unit in the third word unit group to obtain a word vector of each third word unit.
As an example, each first word vector, each second word vector, and each third word vector are encoded, so that a vector representation after full-text semantic information of a sample text fragment corresponding to each first word unit, that is, a first feature vector, can be obtained, a vector representation after full-text semantic information of a sample question corresponding to each second word unit, that is, a second feature vector, and a vector representation after answer full-text semantic information of a sample corresponding to each third word unit, that is, a third feature vector, can be obtained, and then a first feature vector group, a second feature vector group, and a third feature vector group can be obtained.
Illustratively, taking the sample answer as "li white" as an example, inputting "li white" into the feature extraction layer, performing word segmentation on "li white" to obtain word units "li" and "white", performing word embedding processing on "li" and "white" respectively to obtain a word vector of "li" and a word vector of "white", encoding the word vector of "li" and the word vector of "white", obtaining a third feature vector obtained after "li" is combined with the word vector of "white", and obtaining a third feature vector obtained after "white" is combined with the word vector of "li", where assuming that the third feature vector corresponding to "li" is x and the third feature vector corresponding to "white" is y, the third feature vector group may be xy. Similarly, the sample text segment 'Libai is called poetry' and is input into the feature extraction layer, the first feature vector of each word in the sample text segment can be output, and the sample question 'who is the poetry' is input into the feature extraction layer, and the second feature vector of each word in the sample question can be output.
Through the feature extraction, a first feature vector capable of accurately reflecting the semantics of each word unit in a sample text fragment, a second feature vector capable of accurately reflecting the semantics of each word unit in a sample question and a third feature vector capable of accurately reflecting the semantics of each word unit in a sample answer can be obtained, namely, the reading understanding model is trained by using more accurate feature vectors, and the accuracy of the trained model can be improved.
It should be noted that, in the embodiment of the present application, the feature extraction layer may adopt a structure of the BERT model that is already preprocessed and is finely adjusted by using a reading understanding task, so that the obtained first feature vector group, the second feature vector group, and the third feature vector group can respectively more accurately reflect the semantic features of the sample text fragment, the semantic features of the sample question, and the semantic features of the sample answer, and the training rate and the use accuracy of the model can be improved.
Wherein the first graph network is an initial first graph network that includes attention values of nodes and attention values of edges. The second graph network is an initial second graph network that includes attention values for nodes and attention values for edges.
As an example, the attention layer may employ the structure of the attention layer of the BERT model. Alternatively, the attention layer may adopt any other structure including a model of an attention mechanism, which is not limited in this embodiment of the present application.
As an example, in this step, a first feature vector group, a second feature vector group, a third feature vector group, an initial first graph network, and an initial second graph network may be input into an attention layer of the reading understanding model, an attention value is added to a node and an edge of the initial first graph network based on the first feature vector group and the second feature vector group to obtain the first graph network, and an attention value is added to a node and an edge of the initial second graph network based on the second feature vector group and the third feature vector group to obtain the second graph network. Exemplarily, referring to fig. 3, a first feature vector group, a second feature vector group, a third feature vector group, an initial first graph network, and an initial second graph network may be input into an attention layer of a reading understanding model, and an attention value is added to nodes and edges included in the initial first graph network based on the first feature vector group and the second feature vector group, so as to obtain a first graph network; and adding attention values to the nodes and edges included in the initial second graph network based on the second feature vector group and the third feature vector group to obtain a second graph network.
Or, as another example, in this step, the first feature vector group, the second feature vector group, and the third feature vector group may be input into an attention layer of the reading understanding model, an attention value of a node and an edge included in the initial first graph network is obtained based on the first feature vector group and the second feature vector group, and the attention value is added to the initial first graph network to obtain the first graph network; and obtaining attention values of nodes and edges included in the initial second graph network based on the second feature vector group and the third feature vector group, and adding the attention values to the initial second graph network to obtain the second graph network.
In implementation, the specific implementation of this step may include: adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors; adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors.
As an example, the initial first graph network characterizes an association between a sample text segment and a sample answer, the first set of feature vectors is a feature representation of the sample text segment, and the third set of feature vectors is a feature representation of the sample answer, so that attention values can be added to nodes and edges of the initial first graph network according to the first set of feature vectors and the third set of feature vectors. Similarly, the initial second graph network represents the incidence relation between the sample question and the sample answer, the second feature vector group is the feature representation of the sample question, and the third feature vector group is the feature representation of the sample answer, so that the attention values can be added to the nodes and the edges of the initial second graph network according to the second feature vector group and the third feature vector group.
The nodes in the initial first graph network are word units of the sample text segments and the sample answers, so that attention values can be added to the nodes and the edges of the initial first graph network at the attention level according to the first feature vector group and the third feature vector group, and the association relationship between the sample text segments and the sample answers can be further captured. Similarly, the nodes in the initial second graph network are word units of the sample question and the sample answer, so that the attention values can be added to the nodes and the edges of the initial second graph network at the attention level according to the second feature vector group and the third feature vector group, and the association relationship between the sample question and the sample answer can be further captured. Therefore, the reading understanding model can further learn the incidence relation among the sample text fragments, the sample answers and the sample questions, and the accuracy of the reading understanding model for processing the reading understanding task is improved.
In some embodiments, the specific implementation of adding, by the attention layer, attention values to the nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors may include: taking a first feature vector in the first feature vector group as an attention value of a first node in the initial first graph network, wherein the first node is a node corresponding to a word unit of the sample text segment in the initial first graph network; taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the sample answer in the initial first graph network; determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge; based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
That is, the first feature vector in the first feature vector group may be used as the attention value of the node corresponding to the word unit of the sample text segment in the initial first graph network, and the third feature vector in the third feature vector group may be used as the attention value of the node corresponding to the word unit of the sample answer in the initial first graph network. And determining an attention value of an edge between word units of the sample text fragment in the initial first graph network according to the first feature vector group, and determining an attention value of an edge between word units of the sample text fragment and word units of the sample answer in the initial first graph network according to the first feature vector group and the third feature vector group. Therefore, the incidence relation between word units in the sample text segment and the incidence relation between the sample text segment and the sample answers can be further learned, and the accuracy of the reading understanding model obtained through training is convenient to improve.
As an example, for two first nodes where an edge exists, attention calculation may be performed on the first feature vectors of word units corresponding to the two first nodes, and the attention value of the edge may be obtained. Specifically, the attention calculation on the two first feature vectors is to multiply the two first feature vectors and normalize the result to obtain the attention value. Referring to fig. 5, there is an edge between "i" and "ai" in fig. 5, and "i" and "ai" are word units in a sample text segment, a first feature vector of the word unit "i" may be obtained from a first feature vector group, and a first feature vector of "ai" may be obtained from the first feature vector group, the first feature vector of "i" and the first feature vector of "ai" may be multiplied, and normalization processing may be performed on the product, so that an attention value of the edge between "i" and "ai" may be obtained.
As an example, for a first node and a second node where an edge exists, attention calculation may be performed on a first feature vector of a word unit corresponding to the first node and a third feature vector of the word unit corresponding to the second node, and an attention value of the edge may be obtained. Specifically, the attention calculation on the first feature vector and the third feature vector is to multiply the first feature vector and the third feature vector and normalize the result to obtain the attention value. Illustratively, referring to fig. 5, an edge exists between "me" and "ancestor" in fig. 5, and "me" is a word unit in a sample text segment, and "ancestor" is a word unit in a sample answer, a first feature vector of the word unit "me" may be obtained from a first feature vector group, and a third feature vector of the "ancestor" may be obtained from a third feature vector group, and the first feature vector of "me" and the third feature vector of "ancestor" may be multiplied, and the product is normalized, so that an attention value of the edge between "me" and "ancestor" may be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 5 can be determined, and the first graph network can be obtained by adding the attention values of the nodes and the edges to the initial first graph network.
In some embodiments, the specific implementation of adding, by the attention layer, attention values to the nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors may include: taking a second feature vector in the second feature vector group as an attention value of a third node in the initial second graph network, wherein the third node is a node corresponding to a word unit of the sample problem in the initial second graph network; taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the sample answer in the initial second graph network; determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group; based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
That is, the second feature vector in the second feature vector group may be used as the attention value of the node corresponding to the word unit of the sample question in the initial second graph network, and the third feature vector in the third feature vector group may be used as the attention value of the node corresponding to the word unit of the sample answer in the initial second graph network. And determining an attention value of an edge between word units of the sample question in the initial second graph network according to the second feature vector group, and determining an attention value of an edge between word units of the sample question and word units of the sample answer in the initial second graph network according to the second feature vector group and the third feature vector group. Therefore, the incidence relation between word units in the sample question and the incidence relation between the sample question and the sample answer can be further learned, and the accuracy of the reading understanding model obtained through training is convenient to improve.
As an example, for two third nodes where an edge exists, attention calculation may be performed on the second feature vectors of the word units corresponding to the two third nodes, and the attention value of the edge may be obtained. Specifically, the attention calculation on the two second feature vectors is to multiply the two second feature vectors and normalize the result to obtain the attention value. Illustratively, referring to fig. 7, an edge exists between "i" and "who" in fig. 7, and "i" and "who" are word units in the sample question, a second feature vector of the word unit "i" may be obtained from the second feature vector group, and a second feature vector of "who" may be obtained from the second feature vector group, and the second feature vector of "i" and the second feature vector of "who" may be multiplied, and normalization processing may be performed on the products, and a value of attention of the edge between "i" and "who" may be obtained.
As an example, for a third node and a fourth node where an edge exists, attention calculation may be performed on a second feature vector of a word unit corresponding to the third node and a third feature vector of a word unit corresponding to the fourth node, and an attention value of the edge may be obtained. Specifically, the attention calculation for the second feature vector and the third feature vector is to multiply the second feature vector and the third feature vector and normalize the result to obtain the attention value. Illustratively, referring to fig. 7, there is an edge between "who" and "country" in fig. 7, and "who" is a word unit in the sample question, "country" is a word unit in the sample answer, a second feature vector of the word unit "who" can be obtained from the second feature vector group, and a third feature vector of "country" can be obtained from the third feature vector group, the second feature vector of "who" and the third feature vector of "country" can be multiplied, the product is normalized, and the attention value of the edge between "who" and "country" can be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 7 can be determined, and the attention values of the nodes and the edges are added to the initial second graph network, so that the second graph network can be obtained.
Note that, in the embodiment of the present application, attention calculation may be performed on two feature vectors by the following formula (1).
Wherein, in formula (1), attention represents the attention value, softmax (·) is a normalization function, Q and K represent two eigenvectors, respectively, dkIs a constant and T is a matrix transpose.
For example, referring to fig. 7, an edge exists between "who" and "country" in fig. 7, and "who" is a word unit in the sample question, "country" is a word unit in the sample answer, a second feature vector of the word unit "who" may be obtained as Q from the second feature vector group, and a third feature vector of "country" may be obtained as K from the third feature vector group, and the attention value of the edge between "who" and "country" may be determined by the above formula (1).
In the embodiment of the application, the incidence relation among the sample text segment, the sample question and the sample answer can be further captured through the attention layer, the incidence relation is converted into the attention value and is given to the initial first graph network and the initial second graph network, the first graph network and the second graph network are obtained, the model further learns the incidence relation among the sample text segment, the sample question and the sample answer, and the accuracy of the reading understanding model obtained through training can be improved.
It should be noted that, steps 204 to 206 are steps of inputting the sample text segment, the sample question, and the sample answer into the text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a specific implementation of the first graph network and the second graph network.
And step 208, inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer.
As an example, the graph convolution network layer may be a GCN model.
Illustratively, referring to fig. 3, the first graph network and the second graph network may be input into a graph volume network layer of the reading understanding model to obtain the predicted answer.
In an implementation, inputting the first graph network and the second graph network into the graph volume network layer of the reading understanding model, and obtaining a specific implementation of the predicted answer may include: determining, by the graph convolution network layer, a first hidden layer feature vector of the first graph network and a second hidden layer feature vector of the second graph network; carrying out weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain a target hidden layer feature vector; determining the predicted answer based on the target hidden layer feature vector.
As an example, the first hidden layer feature vector is a vector representation of the first graph network obtained by performing convolution processing on the first graph network through the graph convolution network layer, and can be regarded as a graph feature vector of the first graph network. The second hidden layer feature vector is a vector representation of the second graph network obtained after the second graph network is subjected to convolution processing by the graph convolution network layer, and can be regarded as a graph feature vector of the second graph network. The target hidden layer feature vector is a vector representation obtained by combining vector representations of the first graph network and the second graph network.
In some embodiments, the graph network may be convolved at the graph convolution network layer by the following equation (2).
Wherein in formula (2), i represents the ith node in the graph network, j represents the jth node in the graph network,the feature vector representing the i +1 th convolutional layer input to the i-th node, σ (·) represents a nonlinear transfer function, which may be a ReLU activation function, NiRepresenting node i and all nodes connected to node i,denotes the j (th)The characteristic vector of the first convolution layer, C, is input to each nodeijIndicating the attention value of the edge between the ith node and the jth node,represents the weight of the jth node at the ith convolutional layer,represents the intercept of the jth node at the ith convolutional layer.
As an example, a graph convolutional layer network layer may include a plurality of convolutional layers, each convolutional layer includes a preset weight parameter matrix, a weight of each node in each convolutional layer may be an initial weight in the weight parameter matrix, and similarly, each convolutional layer may include a preset intercept parameter matrix, and an intercept of each node in each convolutional layer may be an initial intercept in the intercept parameter matrix. In the subsequent training process, the weight parameter matrix and the intercept parameter matrix of each convolutional layer can be adjusted according to the training condition.
For example, taking the first graph network as an example, assuming that the graph convolutional layer network layer includes two convolutional layers, in the first convolutional layer, the feature vector of each node in the first graph network may be used as an input, and the weight parameter matrix and the intercept parameter matrix of the first convolutional layer may be used as preset parameters, and by using the above formula (2), it may be determined that the feature vector of the second convolutional layer is input to each node in the first graph network, that is, the feature vector obtained after each node in the first graph network performs a convolution process. Then, in the second convolutional layer, the obtained feature vector of the second convolutional layer can be input to each node as an input, the weight parameter matrix and the intercept parameter matrix of the second convolutional layer are used as preset parameters, and the feature vector of the third convolutional layer input to each node in the first graph network, namely the feature vector obtained after each node in the first graph network is subjected to convolution processing twice can be determined through the formula (2). And splicing the feature vectors obtained after the convolution processing is performed twice on a plurality of nodes in the first graph network to obtain a first hidden layer feature vector of the first graph network.
As an example, when the first hidden layer feature vector and the second hidden layer feature vector are subjected to weighted summation, a weight of the first hidden layer feature vector and a weight of the second hidden layer feature vector may be the same or different, and may be set by a user according to actual needs or may be set by a computing device as a default, which is not limited in this embodiment of the present application.
Through the method, the potential association relationship among the nodes in the first graph network and the potential association relationship among the nodes in the second graph network can be obtained, so that the potential association relationship among the sample text fragments, the sample questions and the sample answers can be conveniently read and understood by the model, and the accuracy of the model is improved.
In some embodiments, determining a specific implementation of the predicted answer based on the target hidden layer feature vector may include: converting the value of each dimension of the target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that the prediction label of the word unit corresponding to each dimension is at least one label; determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension; and determining the predicted answer based on the predicted label of the word unit corresponding to each dimension.
As an example, the sequence annotation function is a function used in performing sequence annotation, and can map an input vector into at least one-dimensional probabilities, that is, at least one probability can be obtained for each vector. The Sequence Tagging may be referred to as Sequence Tagging, and after the probability corresponding to the vector of each dimension is determined through a Sequence Tagging function, a preset tag may be tagged to each word unit according to the probability.
As an example, the label may be B, I, O. Wherein, B may be called Begin, and represents the initial word of the answer, i.e. the first word of the answer; i can be called Inside and represents the ending word in the middle of the answer, namely the second word to the last word of the answer; o may be referred to as Outside and represents a non-answer word, i.e., a word that is not an answer.
It should be noted that the length of the target hidden layer feature vector is the same as the length of the sample text segment, that is, the dimension of the target hidden layer feature vector and the number of word units of the sample text segment can be considered to be the same.
Exemplarily, assuming that the sample text fragment is "i love my country", the target hidden layer feature vector is a 6-dimensional vector, and the 6 dimensions respectively correspond to word units i, love, me, ancestor, and country, each dimension in the target hidden layer feature vector is converted into 3 prediction probabilities, and each prediction probability corresponds to a possibility of occurrence of the label "BIO". For example, for the word unit "i", assuming that the calculated prediction probabilities are 0.2, 0.3, and 0.5, respectively, it can be determined that the probability that the prediction label is "O" is the largest, and the prediction label corresponding to "i" is "O". Similarly, the prediction labels corresponding to the 6 word units can be determined as "O", "B", and "I", respectively. Since the label "B" represents the initial word of the answer and the label "I" represents the intermediate final word of the answer, it can be considered that "ancestor" and "nation" are predicted answers.
The prediction label of each word unit can be determined in a sequence labeling mode, the prediction answer can be determined according to the prediction label, when the model parameters are adjusted, the prediction label of the correct prediction answer can be closer to the correct label, and the training efficiency and accuracy of the reading understanding model can be improved in the mode.
As an example, the at least one tag includes an answer beginning word, an answer middle ending word, and a non-answer word, and determining a specific implementation of the predicted answer based on the predicted tag of the word unit corresponding to each dimension may include: and taking the word unit corresponding to the head word of the answer and the word unit corresponding to the middle ending word of the answer as the predicted answer.
That is to say, the head word of the answer and the middle head word of the answer can be spliced to obtain the predicted answer.
Continuing with the above example, if the label of the word unit "ancestor" is B, the label of the word unit "nation" is I, the label "B" represents the initial word of the answer, and the label "I" represents the final word in the middle of the answer, then "ancestor" may be determined as the predicted answer.
In some embodiments, a difference between the prediction function and the sample answer may be determined by a loss function, and the reading understanding model may be trained based on the difference.
As an example, the training of the reading understanding model based on the difference value mainly adjusts the parameters of the graph convolution network layer based on the difference value so that the predicted answer and the sample answer can be closer in the subsequent training. For example, assuming that the answer is "home", if the probability of the "O" label corresponding to "country" is the highest during the training process, the parameters need to be adjusted in the model training so that the probability of the "country" corresponding to the "I" label is the highest.
For example, referring to fig. 3, a difference value may be determined based on the predicted answer and the sample answer, and a parameter of the graph convolution network layer may be adjusted based on the difference value.
In some embodiments, the training the reading understanding model based on the difference between the predicted answer and the sample answer until reaching the training stop condition may include: if the difference value is smaller than a preset threshold value, stopping training the reading understanding model; and if the difference is larger than or equal to the preset threshold, continuing to train the reading understanding model.
It should be noted that the preset threshold may be set by a user according to actual needs, or may be set by default by a computing device, which is not limited in this embodiment of the application.
That is to say, the reading understanding model may be trained based on a difference between the predicted answer and the sample answer, if the loss value is smaller than the preset threshold value, it may be considered that the current model parameter has substantially met the requirement, it may be considered that the reading understanding model has been trained, and therefore, the training of the reading understanding model may be stopped. If the loss value is greater than or equal to the preset threshold value, it can be considered that the difference between the predicted answer of the model and the sample answer is large, and the current model parameters cannot meet the requirements, so that the reading understanding model needs to be trained continuously.
Whether the reading understanding model is continuously trained or not is determined according to the relation between the difference value and the preset threshold value, the training degree of the reading understanding model can be accurately mastered, and the training efficiency of the model and the accuracy of the reading understanding task processed by the model are improved.
In other embodiments, the reaching of the training-stop condition may include: recording and carrying out iterative training once the predicted answer is obtained every time; and counting the training times of iterative training, and if the training times are greater than a time threshold value, determining that the training stopping condition is reached.
It should be noted that the time threshold may be set by a user according to actual needs, or may be set by default by a computing device, which is not limited in this embodiment of the application.
As an example, each time a predicted answer is obtained, it is stated that an iterative training is performed, one may be added to the recorded iterative training times, the training times are counted after each iterative training is performed, if the training times are greater than a time threshold, it is stated that the training for reading and understanding the model is sufficient, that is, a training stop condition is reached, and then the training is continued, a better effect may not be reached, so the training may be stopped. If the training times are less than or equal to the time threshold, the number of times of training the reading understanding model is too small, the reading understanding model may not be trained until the reading understanding model meets the actual requirement, and therefore training can be continued based on the difference value between the predicted answer and the sample answer.
Whether the reading understanding model is continuously trained or not is determined according to the corresponding relation between the times of iterative training and the time threshold, so that unnecessary iterative training can be reduced, the consumption of computing resources caused by iterative training is reduced, and the training efficiency of the model is improved.
In the embodiment of the application, an initial first graph network of a sample text fragment and a sample answer is constructed by reading a graph construction network layer of an understanding model, and an initial second graph network of a sample question and the sample answer is constructed; inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network; inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer; training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached. By the method, the incidence relation among the sample text segment, the sample question and the sample answer can be effectively utilized, the reading understanding model is trained by combining the incidence relation among the sample text segment, the sample question and the sample answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
The following description will further describe the reading understanding model training method provided in the present application with reference to fig. 8 by taking an example of an application of the reading understanding model training method in a reading understanding task. Fig. 8 shows a processing flow chart of a training method applied to a reading understanding model of a choice question according to an embodiment of the present application, which may specifically include the following steps:
For example, assuming that the sample text fragment is "i love my home", the sample question is a choice question, assuming that the sample question is "i love who", the choices include "home", "father", "mother", "family", and the sample answer is "home".
And step 804, inputting the sample text segment, the sample question and the sample answer into a graph construction network layer of the reading understanding model, and constructing an initial third graph network based on the dependency relationship among word units in the sample text segment.
In implementation, words in the sample text segment may be used as nodes to obtain a plurality of nodes, and based on the dependency relationship between word units in the sample text segment, the nodes having the dependency relationship are connected to obtain the initial third graph network.
For example, referring to fig. 4, the nodes of the initial third graph network include word units "i", "love", "i", "of", "ancestor", and "nation" in the sample text segment, and according to the dependency relationships among these six word units, it can be determined that there is an edge between one "i" and "love", one "i" and "of", "ancestor", respectively, one edge between "love" and "ancestor", and one edge between "ancestor" and "nation".
For example, referring to fig. 5, the initial first graph network may be obtained by determining "ancestor" as a target node, "determining" nation "as a target node, connecting the target node" ancestor "to each node in the initial third graph network, and connecting the target node" nation "to each node in the initial third graph network.
In implementation, the words in the sample problem can be used as nodes to obtain a plurality of nodes; and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample problem to obtain the initial fourth graph network.
For example, referring to fig. 6, the nodes of the initial fourth graph network include word units "i", "love", "who" in the sample problem, and according to the dependencies between these three word units, it can be determined that there is an edge between "i" and "love", "who", respectively, and there is an edge between "love" and "who".
For example, referring to fig. 7, the initial second graph network may be obtained by determining "ancestor" as a target node, "determining" nation "as a target node, connecting the target node" ancestor "to each node in the initial fourth graph network, and connecting the target node" nation "to each node in the initial fourth graph network.
It should be noted that steps 802 to 810 are the lower descriptions of step 202, and the implementation process thereof is the same as the process of step 202, and for specific implementation, reference may be made to the related description of step 202, and this embodiment is not described herein again.
Continuing with the above example, the sample text segment is segmented to obtain the first word unit groups, which are "me", "love", "me", "of", "ancestor" country ", respectively. Similarly, the second word unit group can be obtained by segmenting the sample question, and is "I", "love" and "who" respectively. The sample answers are participled to obtain a third word unit group, which is respectively 'ancestor' and 'nation'.
Taking the sample answer as "the country", the feature extraction layer may obtain a vector representation of each word in the text "the country", assuming that the third word vector corresponding to "the country" is x, and the third word vector corresponding to "the country" is y. Similarly, the word embedding processing is carried out on the sample text segment 'i love my country', the first word vector of each word in the sample text segment can be output, the word embedding processing is carried out on the sample question 'i love who' and the second word vector of each word in the sample question can be output.
Continuing with the above example, encoding "ancestor" and "country" in the sample answer may result in a third feature vector of "ancestor" and a third feature vector of "country", respectively. Similarly, the first feature vector of me, the first feature vector of love and the first feature vector of who can be obtained by encoding me, love and who in the sample problem. The "i", "love", "me", "ancestor" and "country" in the sample text segment are encoded, so that a second feature vector of "i", "love", "me", "ancestor" and a second feature vector of "country" can be obtained respectively.
It should be noted that, step 812-step 816 are the lower descriptions of step 204, the implementation process thereof is the same as the process of step 204, and specific implementation may refer to the related description of step 204, which is not described herein again.
As an example, a first feature vector in the first feature vector group may be used as an attention value of a first node in the initial first graph network, where the first node is a node corresponding to a word unit of the sample text segment in the initial first graph network; taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the sample answer in the initial first graph network; determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge; based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
Illustratively, referring to fig. 5, for two first nodes with edges, an edge exists between "i" and "ai" in fig. 5, and "i" and "ai" are word units in a sample text segment, a first feature vector of the word unit "i" may be obtained from a first feature vector group, and a first feature vector of "ai" may be obtained from the first feature vector group, the first feature vector of "i" and the first feature vector of "ai" may be multiplied, and normalization processing may be performed on the product, and an attention value of the edge between "i" and "ai" may be obtained. For the first node and the second node where an edge exists, an edge exists between "me" and "ancestor" in fig. 5, and "me" is a word unit in the sample text segment, and "ancestor" is a word unit in the sample answer, a first feature vector of the word unit "me" may be obtained from the first feature vector group, and a third feature vector of the "ancestor" may be obtained from the third feature vector group, and the first feature vector of "me" and the third feature vector of "ancestor" may be multiplied, and normalization processing may be performed on the product, so that an attention value of the edge between "me" and "ancestor" may be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 5 can be determined, and the first graph network can be obtained by adding the attention values of the nodes and the edges to the initial first graph network.
And 820, adding attention values to the nodes and edges of the initial second graph network through the attention layer based on the second feature vector group and the third feature vector group to obtain a second graph network.
As an example, a second feature vector in the second feature vector group is taken as an attention value of a third node in the initial second graph network, where the third node is a node corresponding to a word unit of the sample question in the initial second graph network; taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the sample answer in the initial second graph network; determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group; based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
Illustratively, referring to fig. 7, for two third nodes with edges, an edge exists between "i" and "who" in fig. 7, and "i" and "who" are word units in the sample question, a second feature vector of the word unit "i" may be obtained from the second feature vector group, and a second feature vector of "who" may be obtained from the second feature vector group, and the second feature vector of "i" and the second feature vector of "who" may be multiplied, and the product is normalized, so that the attention value of the edge between "i" and "who" may be obtained. For the third node and the fourth node where an edge exists, an edge exists between "who" and "country" in fig. 7, and "who" is a word unit in the sample question, "country" is a word unit in the sample answer, a second feature vector of the word unit "who" can be obtained from the second feature vector group, and a third feature vector of "country" can be obtained from the third feature vector group, the second feature vector of "who" and the third feature vector of "country" can be multiplied, the product is normalized, and the attention value of the edge between "who" and "country" can be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 7 can be determined, and the attention values of the nodes and the edges are added to the initial second graph network, so that the second graph network can be obtained.
It should be noted that steps 812-820 are the lower descriptions of step 206, the implementation process thereof is the same as the process of step 206, and specific implementation may refer to the related description of step 206, which is not described herein again.
And step 826, converting the value of each dimension of the target hidden layer feature vector into at least one prediction probability through a sequence labeling function.
Each dimension of the target hidden layer feature vector corresponds to a word unit, and at least one prediction probability corresponding to each dimension represents the probability that a prediction label of the word unit corresponding to each dimension is at least one label. In addition, the length of the target hidden layer feature vector is the same as the length of the sample text segment, that is, the dimension of the target hidden layer feature vector and the number of word units of the sample text segment can be considered to be the same.
Exemplarily, assuming that the target hidden-layer feature vector is a 6-dimensional vector, and the 6 dimensions respectively correspond to word units of my, love, me, ancestor, and country, each dimension in the target hidden-layer feature vector is converted into 3 prediction probabilities, and each probability corresponds to a possibility of occurrence of the label "BIO". For example, for the word unit "i", it is assumed that the calculated prediction probabilities are 0.2, 0.3, and 0.5, respectively.
Continuing the above example, since 0.5 max, it may be determined that "i" corresponds to a prediction tag of "O".
And step 830, taking the word unit corresponding to the head word of the answer and the word unit corresponding to the middle head word of the answer as the predicted answer.
Continuing the above example, assume that the prediction labels corresponding to the 6 word units are determined to be "O", "B", and "I", respectively. Since the label "B" represents the initial word of the answer and the label "I" represents the intermediate final word of the answer, it can be considered that "ancestor" and "nation" are predicted answers.
It should be noted that, step 822 to step 830 are the lower descriptions of step 208, and the implementation process thereof is the same as the process of step 208, and for specific implementation, reference may be made to the related description of step 208, and details of this embodiment are not described herein again.
And 834, stopping training the reading understanding model if the loss value is smaller than a preset threshold value.
It should be noted that, step 832 to step 836 are the lower descriptions of step 210, and the implementation process thereof is the same as that of step 210, and for specific implementation, reference may be made to the related description of step 210, and this embodiment is not described herein again.
In the embodiment of the application, an initial first graph network of a sample text fragment and a sample answer is constructed by reading a graph construction network layer of an understanding model, and an initial second graph network of a sample question and the sample answer is constructed; inputting the sample text segment, the sample question and the sample answer into a feature extraction layer of the reading understanding model to respectively obtain a first feature vector group, a second feature vector group and a third feature vector group; inputting the first feature vector group, the second feature vector group and the third feature vector group into an attention layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network; inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer; training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached. By the method, the incidence relation among the sample text segment, the sample question and the sample answer can be effectively utilized, the reading understanding model is trained by combining the incidence relation among the sample text segment, the sample question and the sample answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
Referring to fig. 9, fig. 9 shows a flowchart of a reading understanding method provided according to an embodiment of the present application, including steps 902 to 908.
And 902, constructing an initial first graph network of the target text and the target answer and constructing an initial second graph network of the target question and the target answer by reading the graph construction network layer of the understanding model.
As an example, if the target question is a choice question, the target answer may be a text obtained by splicing multiple options; if the target question is a short answer, the target answer may be a keyword in the target text.
Exemplarily, assuming that the target text is "unlawful libuser write poetry, which is called poetry", the target question is a choice question, and the target question is "which poetry person is called poetry", the three options are "lisu", "dupu", and "sushi", respectively, the three options may be spliced as a target answer, and the target answer may be "lisu du fu sushi".
Illustratively, assuming that the target question is a short answer question and the target question is "which poem is called poem", and the target text is "to go into wine" in a luxurious language, which expresses the spirit of being too late, optimistic confidence and anger stuffiness to social reality, which is a work of poem lie white ", keywords may be extracted from the target text to obtain" to go into wine "," too late "," optimistic confidence "," poem "and" lie white ", and" to go into wine too late, optimistic confidence "may be used as the target answer.
As an example, an initial first graph network is used for representing the association relationship between the target text and the target answer, and an initial second graph network is used for representing the association relationship between the target question and the target answer.
Illustratively, referring to fig. 10, a target text, a target question and a target answer may be input into a graph building network layer of the reading understanding model, an initial first graph network is obtained based on the target text and the target answer, and an initial second graph network is obtained based on the target question and the target answer.
In implementation, if the text length of the target text is smaller than the length threshold, the specific implementation of constructing an initial first graph network of the target text and the target answer and constructing an initial second graph network of the target question and the target answer by reading the graph construction network layer of the understanding model may include: and constructing an initial third graph network based on the dependency relationship among the word units in the target text, and constructing an initial fourth graph network based on the dependency relationship among the word units in the target question. And constructing the initial first graph network based on the incidence relation between the initial third graph network and the target answer, and constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the target answer.
And the initial third graph network is used for representing the dependency relationship between word units in the target text. The initial fourth graph network is used to characterize the dependencies between word units in the target problem.
That is, if the text length of the target text is smaller than the length threshold, the reading understanding model may process the target text, and may first construct an initial third graph network reflecting the dependency relationship between word units in the target text, and then construct a first graph network according to the association relationship between the target answer and the target text on the basis of the initial third graph network. And constructing an initial fourth graph network reflecting the dependency relationship among word units in the target question, and constructing a second graph network according to the incidence relationship between the target answer and the target question on the basis of the initial fourth graph network.
It should be noted that the length threshold may be set by a user according to actual needs, or may be set by default by a device, which is not limited in this embodiment of the application.
In some embodiments, constructing the initial third graph network based on dependencies between word units in the target text may include: taking word units in the target text as nodes to obtain a plurality of nodes; and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the target text to obtain the initial third graph network.
That is to say, the word units in the target text are taken as nodes, the dependency relationship between the word units is taken as an edge, and the initial third graph network which represents the dependency relationship between the word units in the target text can be constructed.
As an example, dependency analysis can be performed on the target text through a Stanford Core NLP algorithm, and the dependency relationship among multiple word units in the target text can be obtained.
Illustratively, taking the target text of "i love my country" as an example, performing dependency analysis on the target text of "i love my country" through the Stanford Core NLP algorithm can obtain "i" as a subject, "love" as a predicate, and "my country" as an object, and can obtain the dependency relationship between "i", "love", "i", "ancestor" and "country" to each other. For example, one "me" and "love", one "me" and "ancestor" both have a dependency relationship in the target text, one "love" and "ancestor", and "ancestor" and "country", based on which the initial third graph network shown in fig. 4 is obtained.
In some embodiments, the specific implementation of constructing the initial first graph network based on the association relationship between the initial third graph network and the target answer may include: and connecting the target node with a node in the initial third graph network by taking the word unit in the target answer as a target node based on the incidence relation between the word unit in the target answer and the word unit in the target text to obtain the initial first graph network.
That is, the word unit in the target answer may be used as the target node, and the target node may be connected to the node corresponding to the word unit of the target text in the initial third graph network, so that the initial first graph network representing the association relationship between the word unit of the target text and the word unit of the target answer may be obtained.
As an example, a target node corresponding to a word unit in the target answer may be connected to a node corresponding to each word unit in the target text. Or, as another example, a target node corresponding to the word unit in the target answer may be connected to a node in the initial third graph network, which has an association relationship with the target node.
Illustratively, taking the target text as "i love my home", taking the target question as an example of a choice question, assuming that the options include "home" and "country", the target answer is "home", the word unit in the target answer may be taken as a target node, the "ancestor" in the target answer is respectively connected to each node in the initial third graph network, the "country" in the target answer is respectively connected to each node in the initial third graph network, the "home" in the target answer is respectively connected to each node in the initial third graph network, the "country" in the target answer is respectively connected to each node in the initial third graph network, the first graph network shown in fig. 11 may be obtained, and the bold nodes in fig. 11 are target nodes.
In some embodiments, the constructing the initial fourth graph network based on the dependencies between word units in the target problem may include: taking word units in the target problem as nodes to obtain a plurality of nodes; and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the target problem to obtain the initial fourth graph network.
That is, the initial fourth graph network that characterizes the dependency relationship between word units in the target problem may be constructed by taking the word units in the target problem as nodes and taking the dependency relationship between the word units as edges.
As an example, the dependency analysis of the target problem can be performed by a Stanford Core NLP algorithm, and the dependency relationship between a plurality of word units in the target problem can be obtained.
As an example, by performing dependency analysis on the target question "who i love" through the Stanford Core NLP algorithm, we can get "i" as the subject, "love" as the predicate, "who" as the object, and can get the dependency relationship between "i", "love", "who" and each other. For example, there is a dependency relationship between "i" and "love" in the target problem, there is a dependency relationship between "love" and "who", and there is a dependency relationship between "i" and "who", based on the above-mentioned dependency relationship, referring to fig. 6, the initial fourth graph network shown in fig. 6 can be obtained.
In some embodiments, the specific implementation of constructing the initial second graph network based on the association relationship between the initial fourth graph network and the target answer may include: and connecting the target node with a node in the initial fourth graph network by taking the word unit in the target answer as a target node based on the incidence relation between the word unit in the target answer and the word unit in the target question to obtain the initial second graph network.
That is, the word unit in the target answer may be used as the target node, and the target node may be connected to the node corresponding to the word unit of the target question in the initial fourth graph network, so that the initial second graph network representing the association relationship between the word unit of the target question and the word unit of the target answer may be obtained.
As an example, a target node corresponding to a word unit in the target answer may be connected to a node corresponding to each word unit in the target question. Or, as another example, a target node corresponding to the word unit in the target answer may be connected to a node in the initial fourth graph network, which has an association relationship with the target node.
Illustratively, taking the target question as "who i love", and the target answer as "hometown of the country", word units in the target answer may be used as target nodes, the "ancestors" in the target answer are respectively connected to each node in the initial fourth graph network, the "country" in the target answer is respectively connected to each node in the initial fourth graph network, the "home" in the target answer is respectively connected to each node in the initial fourth graph network, and the "country" in the target answer is respectively connected to each node in the initial fourth graph network, so that the initial second graph network shown in fig. 12 may be obtained, and the bold nodes in fig. 12 are the target nodes.
In the embodiment of the application, the reading understanding model can be trained by fully utilizing the incidence relation between the target text and the target answer and the incidence relation between the target text and the target question, so that the accuracy of the reading understanding task executed by the reading understanding model can be improved.
It should be noted that, the above is described by taking the example that the text length of the target text is smaller than the length threshold, and if the target text is a chapter-level text, that is, the text length of the target text is greater than or equal to the length threshold, the reading understanding model may not be able to process the target text, so that the target text may be segmented or sentence-divided to obtain a plurality of target text segments, and then the initial first graph network of each target text segment and the target problem is constructed by the above method. For example, if the target text is divided into 3 target text segments, 3 first graph networks may be constructed.
Wherein, the feature extraction layer can be used for extracting features of the input text.
As an example, the first feature vector group is a feature vector group obtained after the target text passes through the feature extraction layer, the second feature vector group is a feature vector group obtained after the target text of the target question passes through the feature extraction layer, and the third feature vector group is a feature vector group obtained after the target text of the target answer passes through the feature extraction layer. And the first feature vector group comprises a plurality of first feature vectors, each first feature vector corresponds to one word unit in the target text, the second feature vector group comprises a plurality of second feature vectors, each second feature vector corresponds to one word unit in the target question, the third feature vector group comprises a plurality of third feature vectors, and each third feature vector corresponds to one word unit in the target answer.
For example, referring to fig. 10, a target text, a target question and a target answer may be input into a feature extraction layer of a reading understanding model to determine a first feature vector group, a second feature vector group and a third feature vector group, respectively.
In implementation, if the text length of the target text is smaller than the length threshold, the specific implementation of this step may include: performing word segmentation processing on the target text, the target question and the target answer to respectively obtain a first word unit group, a second word unit group and a third word unit group; performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to obtain a first word vector group, a second word vector group and a third word vector group respectively; and coding the first word vector group, the second word vector group and the third word vector group to respectively obtain the first feature vector group, the second feature vector group and the third feature vector group.
In the embodiment of the present application, the feature extraction layer may include a word embedding processing function and an encoding function. As one example, the feature extraction layer may include a word embedding processing module and an encoding module.
Illustratively, the feature extraction layer may employ the structure of the Bert model. Because the feature vector obtained by the Bert model is the feature vector combined with full-text semantic information, the feature vectors of the target text, the target question and the word unit in the target answer can be more fully utilized, and the accuracy of reading and understanding the model can be improved.
As an example, taking a target text as an example, if the target text is a chinese text, a word may be divided into a word unit, and a punctuation mark may be divided into a word unit; if the target text is a foreign language text, a word can be divided into a word unit, and a phrase can be divided into a word unit; if the target text has the numbers, the numbers can be divided into word units separately.
Exemplarily, assuming that the target text is "li-white lifetime written poetry countless, which is called poetry", the plurality of first word units of "li", "white", "one", "raw", "written", "poetry", "none", "number", "by", "called", "as", "poetry", "immortal" can be obtained.
As an example, a word embedding process may be performed on each first word unit in the first word unit group in a one-hot (one-hot) encoding manner to obtain a word vector of each first word unit, a word embedding process may be performed on each second word unit in the second word unit group to obtain a word vector of each second word unit, and a word embedding process may be performed on each word unit in the third word unit group to obtain a word vector of each third word unit.
As another example, word embedding processing may be performed on each first word unit in the first word unit group in a word2vec coding manner to obtain a word vector of each first word unit, word embedding processing may be performed on each second word unit in the second word unit group to obtain a word vector of each second word unit, and word embedding processing may be performed on each word unit in the third word unit group to obtain a word vector of each third word unit.
As an example, each first word vector, each second word vector, and each third word vector are encoded, so that a vector representation after fusion of full-text semantic information of a target text corresponding to each first word unit, that is, a first feature vector, and a vector representation after fusion of full-text semantic information of a target question corresponding to each second word unit, that is, a second feature vector, and a vector representation after fusion of full-text semantic information of a target answer corresponding to each third word unit, that is, a third feature vector, can be obtained, and then a first feature vector group, a second feature vector group, and a third feature vector group can be obtained.
Illustratively, taking an example that the target problem is "who i love", inputting "who i love" into the feature extraction layer, segmenting "who i love", obtaining word units "i", "love" and "who", performing word embedding processing on "i", "love" and "who" respectively, obtaining word vectors of "i", word vectors of "love" and word vectors of "who", encoding word vectors of "i", word vectors of "love" and word vectors of "who", obtaining a third feature vector obtained after "i" combines word vectors of "love" and "who", and obtaining a third feature vector obtained after "who" combines word vectors of "i" and "love" combine "word vectors of" and "love". Similarly, the target text 'i love my country' is input into the feature extraction layer, a first feature vector of each word in the target text can be output, and the target answer 'hometown' is input into the feature extraction layer, and a second feature vector of each word in the target answer can be output.
In the embodiment of the application, the feature extraction layer may adopt a structure of a BERT model which is preprocessed and is finely adjusted by using a reading understanding task, so that the obtained first feature vector group, the obtained second feature vector group and the obtained third feature vector group can respectively reflect the semantics of a target text, the semantics of a target question and the semantics of a target answer more accurately, and the training rate and the use accuracy of the model can be improved.
In the above description, the text length of the target text is smaller than the length threshold, and when the text length of the target text is smaller than the length threshold, the reading understanding model can process the target text, so that the target text can be directly subjected to word segmentation. In other embodiments, if the target text is a chapter-level text, that is, the text length of the target text is greater than or equal to the length threshold, the reading understanding model may not be able to process the target text, and therefore, the target text may be segmented or sentence-wise processed to obtain a plurality of target text segments, and then the first feature vector group of each target text segment is extracted through the feature extraction layer. For example, if the target text is divided into 3 target text segments, 3 first feature vector groups may be extracted, where the 3 first feature vector groups are respectively used to represent the semantics of the 3 target text segments. Moreover, the method for extracting the first feature vector group of the target text segment is the same as the above-mentioned method for extracting the first feature vector group of the target text, and the details are not repeated herein in this embodiment.
Wherein the first graph network is an initial first graph network that includes attention values of nodes and attention values of edges. The second graph network is an initial second graph network that includes attention values for nodes and attention values for edges.
As an example, the attention layer may employ the structure of the attention layer of the BERT model. Alternatively, the attention layer may adopt any other structure including a model of an attention mechanism, which is not limited in this embodiment of the present application.
As an example, in this step, a first feature vector group, a second feature vector group, a third feature vector group, an initial first graph network, and an initial second graph network may be input into an attention layer of the reading understanding model, an attention value is added to a node and an edge of the initial first graph network based on the first feature vector group and the second feature vector group to obtain the first graph network, and an attention value is added to a node and an edge of the initial second graph network based on the second feature vector group and the third feature vector group to obtain the second graph network. Exemplarily, referring to fig. 10, a first feature vector group, a second feature vector group, a third feature vector group, an initial first graph network, and an initial second graph network may be input into an attention layer of a reading understanding model, and an attention value is added to nodes and edges included in the initial first graph network based on the first feature vector group and the second feature vector group, so as to obtain a first graph network; and adding attention values to the nodes and edges included in the initial second graph network based on the second feature vector group and the third feature vector group to obtain a second graph network.
Or, as another example, in this step, the first feature vector group, the second feature vector group, and the third feature vector group may be input into an attention layer of the reading understanding model, an attention value of a node and an edge included in the initial first graph network is obtained based on the first feature vector group and the second feature vector group, and the attention value is added to the initial first graph network to obtain the first graph network; and obtaining attention values of nodes and edges included in the initial second graph network based on the second feature vector group and the third feature vector group, and adding the attention values to the initial second graph network to obtain the second graph network.
In implementation, if the text length of the target text is smaller than the length threshold, the specific implementation of this step may include: adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors; adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors.
As an example, the initial first graph network represents the association relationship between the target text and the target answer, the first feature vector group is the feature representation of the target text, and the third feature vector group is the feature representation of the target answer, so that the attention values can be added to the nodes and edges of the initial first graph network according to the first feature vector group and the third feature vector group. Similarly, the initial second graph network represents the incidence relation between the target question and the target answer, the second feature vector group is the feature representation of the target question, and the third feature vector group is the feature representation of the target answer, so that the attention values can be added to the nodes and edges of the initial second graph network according to the second feature vector group and the third feature vector group.
In some embodiments, the specific implementation of adding, by the attention layer, attention values to the nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors may include: taking a first feature vector in the first feature vector group as an attention value of a first node in the initial first graph network, wherein the first node is a node corresponding to a word unit of the target text in the initial first graph network; taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the target answer in the initial first graph network; determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge; based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
That is, the first feature vector in the first feature vector group may be used as the attention value of the node corresponding to the word unit of the target text in the initial first graph network, and the third feature vector in the third feature vector group may be used as the attention value of the node corresponding to the word unit of the target answer in the initial first graph network. And determining an attention value of an edge between word units of the target text in the initial first graph network according to the first feature vector group, and determining an attention value of an edge between word units of the target text and word units of the target answer in the initial first graph network according to the first feature vector group and the third feature vector group.
As an example, for two first nodes where an edge exists, attention calculation may be performed on the first feature vectors of word units corresponding to the two first nodes, and the attention value of the edge may be obtained. Specifically, the attention calculation on the two first feature vectors is to multiply the two first feature vectors and normalize the result to obtain the attention value. Illustratively, referring to fig. 11, there is an edge between "i" and "ai" in fig. 11, and "i" and "ai" are word units in the target text, a first feature vector of the word unit "i" may be obtained from the first feature vector group, and a first feature vector of "ai" may be obtained from the first feature vector group, and the first feature vector of "i" and the first feature vector of "ai" may be multiplied, and normalization processing may be performed on the product, and an attention value of the edge between "i" and "ai" may be obtained.
As an example, for a first node and a second node where an edge exists, attention calculation may be performed on a first feature vector of a word unit corresponding to the first node and a third feature vector of the word unit corresponding to the second node, and an attention value of the edge may be obtained. Specifically, the attention calculation on the first feature vector and the third feature vector is to multiply the first feature vector and the third feature vector and normalize the result to obtain the attention value. Illustratively, referring to fig. 11, there is an edge between "i" and "home" in fig. 11, and "i" is a word unit in the target text, and "home" is a word unit in the target answer, a first feature vector of the word unit "i" may be obtained from a first feature vector group, and a third feature vector of "home" may be obtained from a third feature vector group, and the first feature vector of "i" and the third feature vector of "home" may be multiplied, and the product is normalized, so that the attention value of the edge between "i" and "home" may be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 11 can be determined, and the first graph network can be obtained by adding the attention values of the nodes and the edges to the initial first graph network.
In some embodiments, the specific implementation of adding, by the attention layer, attention values to the nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors may include: taking a second feature vector in the second feature vector group as an attention value of a third node in the initial second graph network, wherein the third node is a node corresponding to a word unit of the target problem in the initial second graph network; taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the target answer in the initial second graph network; determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group; based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
That is, the second feature vector in the second feature vector group may be used as the attention value of the node corresponding to the word unit of the target question in the initial second graph network, and the third feature vector in the third feature vector group may be used as the attention value of the node corresponding to the word unit of the target answer in the initial second graph network. And determining an attention value of an edge between word units of the target question in the initial second graph network according to the second feature vector group, and determining an attention value of an edge between word units of the target question and word units of the target answer in the initial second graph network according to the second feature vector group and the third feature vector group.
As an example, for two third nodes where an edge exists, attention calculation may be performed on the second feature vectors of the word units corresponding to the two third nodes, and the attention value of the edge may be obtained. Specifically, the attention calculation on the two second feature vectors is to multiply the two second feature vectors and normalize the result to obtain the attention value. Illustratively, referring to fig. 12, an edge exists between "i" and "who" in fig. 12, and "i" and "who" are word units in the target question, a second feature vector of the word unit "i" may be obtained from the second feature vector group, and a second feature vector of "who" may be obtained from the second feature vector group, and the second feature vector of "i" and the second feature vector of "who" may be multiplied, and normalization processing may be performed on the products, and a value of attention of the edge between "i" and "who" may be obtained.
As an example, for a third node and a fourth node where an edge exists, attention calculation may be performed on a second feature vector of a word unit corresponding to the third node and a third feature vector of a word unit corresponding to the fourth node, and an attention value of the edge may be obtained. Specifically, the attention calculation for the second feature vector and the third feature vector is to multiply the second feature vector and the third feature vector and normalize the result to obtain the attention value. Illustratively, referring to fig. 12, there is an edge between "who" and "home" in fig. 12, and "who" is a word unit in the target question, "home" is a word unit in the target answer, a second feature vector of the word unit "who" can be obtained from a second feature vector group, and a third feature vector of "home" can be obtained from a third feature vector group, the second feature vector of "who" and the third feature vector of "home" can be multiplied, the product is normalized, and the attention value of the edge between "who" and "home" can be obtained.
In the above manner, the attention value of each edge and the attention value of each node in fig. 12 can be determined, and the attention values of the nodes and the edges are added to the initial second graph network, so that the second graph network can be obtained.
In the embodiment of the present application, attention calculation may be performed on the two feature vectors by using the above formula (1), and for specific implementation, reference may be made to the relevant description of step 206, which is not described herein again.
It should be noted that, the determination of the first graph network is described as an example in the above, where the text length of the target text is smaller than the length threshold, that is, the first feature vector group corresponds to the target text. In other embodiments, for a target text, if the target text is split into a plurality of target text segments, and the first feature vector group is a feature vector group of the target text segment, attention values may be added to nodes and edges of the initial first graph network corresponding to the target text segment based on the first feature vector group of each target text segment and the third feature vector group of the target answer.
For example, if the target text is divided into 3 target text segments, 3 first feature vector groups may be extracted, generating 3 initial first graph networks. For the reference initial first graph network, which is generated based on the reference target text segment and the target answer, attention values may be added to nodes and edges of the reference initial first graph network according to the first feature vector group of the reference target text segment and the third feature vector group of the target answer, so as to obtain the reference first graph network. The reference target text segment is any one of a plurality of text segments, a reference initial first graph network corresponds to the reference target text segment, and the reference first graph network corresponds to the reference target text segment. Similarly, 3 first graph networks can be obtained in the above manner. In addition, the implementation process of adding the attention values to the nodes and edges of the initial first graph network corresponding to the target text segment is the same as the implementation process of adding the attention values to the nodes and edges of the initial first graph network, and reference may be specifically made to the related description of the foregoing embodiment in this step, which is not described herein again.
It should be noted that, steps 904 to 906 are steps of inputting the target text, the target question, and the target answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a specific implementation of the first graph network and the second graph network.
As an example, the Graph volume Network layer may be a GCN (Graph volume Network) model.
Illustratively, referring to fig. 10, the first graph network and the second graph network may be entered into a graph volume network layer of the reading understanding model to obtain answers.
In implementation, if the text length of the target text is smaller than the length threshold, the first graph network is a graph network that reflects an association relationship between the target text and the target answer, and the specific implementation of inputting the first graph network and the second graph network into the graph volume network layer of the reading understanding model to obtain the answer may include: determining, by the graph convolution network layer, a first hidden layer feature vector of the first graph network and a second hidden layer feature vector of the second graph network; carrying out weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain a target hidden layer feature vector; determining the answer based on the target hidden layer feature vector.
The first hidden layer feature vector is a vector representation of the first graph network obtained after the convolution processing is carried out on the first graph network through the graph convolution network layer. The second hidden layer feature vector is a vector representation of the second graph network obtained after the second graph network is subjected to convolution processing through the graph convolution network layer.
As an example, a first graph network may be input into a graph convolution network layer for convolution processing to obtain a first hidden layer feature vector, and a second graph network may be input into the graph convolution network layer for convolution processing to obtain a second hidden layer feature vector.
It should be noted that, in the graph convolution network layer, the graph network may be subjected to convolution processing through the above formula (2), and specific implementation may refer to the relevant description of step 208, which is not described herein again in this embodiment of the present application.
As an example, when the first hidden layer feature vector and the second hidden layer feature vector are subjected to weighted summation, a weight of the first hidden layer feature vector and a weight of the second hidden layer feature vector may be the same or different, and may be set by a user according to actual needs or may be set by a computing device as a default, which is not limited in this embodiment of the present application.
In some embodiments, determining a specific implementation of the answer based on the target hidden-layer feature vector may include: converting the value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and at least one probability corresponding to each dimension represents the probability that the label of the word unit corresponding to each dimension is at least one label; determining a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension; and determining the answer based on the label of the word unit corresponding to each dimension.
As an example, the sequence annotation function is a function used in performing sequence annotation, and can map an input vector into at least one-dimensional probabilities, that is, at least one probability can be obtained for each vector.
For example, the target hidden layer feature vector may be used as an input of a sequence labeling function, and through calculation of the sequence labeling function, a probability corresponding to each dimension of the target hidden layer feature vector may be obtained.
As an example, the label may be B, I, O. Wherein, B represents the initial word of the answer, namely the first word of the answer; i represents an ending word in the middle of the answer, namely the second character to the last character of the answer; o denotes a non-answer word, i.e., a word that is not an answer.
It should be noted that the length of the target hidden layer feature vector is the same as the length of the target text.
Illustratively, taking the target text as "i love my country", the target hidden-layer feature vector is a 6-dimensional vector, and the 6 dimensions correspond to word units i, ancestor, and country respectively, each dimension in the target hidden-layer feature vector is converted into 3 probabilities, and each probability corresponds to the possibility of occurrence of the label "BIO". For example, for the word unit "love", assuming that the calculated probabilities are 0.2, 0.3, and 0.5, respectively, it can be determined that the probability of the label being "O" is the largest, and the label corresponding to "love" is "O". Similarly, let us assume that the labels corresponding to the 6 word units are "O", "B" and "I", respectively. Since the label "B" represents the initial word of the answer and the label "I" represents the intermediate final word of the answer, it can be considered that "ancestor" and "nation" are the answers.
As an example, the at least one tag includes an answer beginning word, an answer middle ending word, and a non-answer word, and determining a specific implementation of the answer based on the tag of the word unit corresponding to each dimension may include: and taking the word unit corresponding to the head word of the answer and the word unit corresponding to the middle ending word of the answer as the answers.
That is to say, the initial word of the answer and the final word in the middle of the answer can be spliced to obtain the answer.
Continuing the above example, "home" may be determined as the answer.
It should be noted that, the above description is given by taking an example in which the text length of the target text is smaller than the length threshold, that is, the first graph network corresponds to the entire target text. In other embodiments, for a target text, if the target text is split into a plurality of target text segments, the first graph network corresponds to the target text segments, and the answer obtained by inputting the first graph network and the second graph network into the graph convolution network layer corresponds to the target text segments, but the answer is not necessarily a correct answer to the target question. Thus, in this case, each target text segment may get one answer, and then multiple answers may be obtained, and then the correct answer to the target question may be determined from the multiple answers.
As an example, a target answer with the highest frequency of occurrence among the plurality of answers may be taken as the answer to the target question. For example, assuming that the target text is divided into 10 target text segments, each of the first graph network and the second graph network is input into the graph convolution network layer for processing, 10 answers may be obtained, and the 10 answers include the target answer, and the target answer with the largest occurrence number of times among the 10 answers may be used as the answer to the target question.
By the method, the association relation among the target text, the target question and the target answer can be effectively extracted by utilizing the characteristic vectors of the target text, the target question and the target answer, the answer of the target question is determined through the reading understanding model by combining the association relation among the sample text, the target question and the target answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
The following will further describe the training method of the reading understanding model by taking the application of the reading understanding method provided by the present application in a reading understanding task as an example with reference to fig. 13. Fig. 13 shows a processing flow chart of a reading understanding method applied to a choice topic according to an embodiment of the present application, which may specifically include the following steps:
step 1302: and acquiring a target text, a target question and a target answer.
In the embodiment of the present application, the form of the target question and the text length of the target text are not limited, and the reading and understanding method is described in the embodiment by taking the target question as a choice question and taking the text length of the target text smaller than the length threshold as an example.
For example, the target text is "i love my home", the target question is "who i love", and the target answer is two options, i.e., "home".
Step 1304: and inputting the target text, the target question and the target answer into a graph construction network layer of the reading understanding model, and constructing an initial third graph network based on the dependency relationship among word units in the target text.
For example, taking the target text "i love my country" as an example, performing dependency analysis on the target text "i love my country" through the Stanford Core NLP algorithm may obtain "i" as a subject, "love" as a predicate, "my country" as an object, and may obtain the dependency relationship between "i", "love", "i", "ancestor" and "country" with each other. For example, one "me" and "love", one "me" and "ancestor" both have a dependency relationship in the target text, one "love" and "ancestor", and "ancestor" and "country", based on which the initial third graph network shown in fig. 4 is obtained.
Step 1306: and connecting the target node with a node in the initial third graph network by taking the word unit in the target answer as the target node based on the incidence relation between the word unit in the target answer and the word unit in the target text to obtain the initial first graph network.
Continuing with the above example, the word units in the target answer may be used as target nodes, the "ancestors" in the target answer are respectively connected to each node in the initial third graph network, the "countries" in the target answer are respectively connected to each node in the initial third graph network, the "homes" in the target answer are respectively connected to each node in the initial third graph network, and the "counties" in the target answer are respectively connected to each node in the initial third graph network, so that the first graph network shown in fig. 11 may be obtained, where the bold nodes in fig. 11 are the target nodes.
Step 1308: and inputting the target text, the target question and the target answer into a graph construction network layer of the reading understanding model, and constructing an initial fourth graph network based on the dependency relationship among word units in the target question.
Continuing with the above example, by performing dependency analysis on the target problem "who i love" through the Stanford Core NLP algorithm, we can obtain that "i" is the subject, "love" is the predicate, and "who" is the object, and can obtain the dependency relationship between "i", "love", and "who" each other. For example, there is a dependency relationship between "i" and "love" in the target problem, there is a dependency relationship between "love" and "who", and there is a dependency relationship between "i" and "who", based on the above-mentioned dependency relationship, referring to fig. 6, the initial fourth graph network shown in fig. 6 can be obtained.
Step 1310: and connecting the target node with a node in the initial fourth graph network by taking the word unit in the target answer as the target node based on the incidence relation between the word unit in the target answer and the word unit in the target question to obtain the initial second graph network.
Continuing with the above example, the word units in the target answer may be used as target nodes, the "ancestors" in the target answer are respectively connected to each node in the initial fourth graph network, the "countries" in the target answer are respectively connected to each node in the initial fourth graph network, the "homes" in the target answer are respectively connected to each node in the initial fourth graph network, and the "counties" in the target answer are respectively connected to each node in the initial fourth graph network, so that the initial second graph network shown in fig. 12 may be obtained, where the bold nodes in fig. 12 are the target nodes.
Step 1312: inputting a target text, a target question and a target answer into a feature extraction layer of a reading understanding model, performing word segmentation processing on the target text to obtain a first word unit group, performing word segmentation processing on the target question to obtain a second word unit group, and performing word segmentation processing on the target answer to obtain a third word unit group.
Continuing with the above example, the first word unit group, which is "me", "love", "me", "ancestor" country ", respectively, can be obtained after the word segmentation of the target text. Similarly, the word segmentation is performed on the target question to obtain a second word unit group, which is "I", "love", and "who", respectively. The word segmentation is performed on the target answer to obtain a third word unit group, which is "ancestor", "country", "home", and "county", respectively.
And 1314, performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to respectively obtain a first word vector group, a second word vector group and a third word vector group.
Continuing the above example, performing word embedding processing on each first word unit in the first word unit group may obtain a first word vector of "me", "love", "me", "ancestor", and a first word vector of "country", respectively. Similarly, each second word unit in the second word unit group is subjected to word embedding processing, which may be a "my" second word vector, a "love" second word vector, and a "who" second word vector, respectively. Word embedding processing is carried out on each third word unit in the third word unit group, so that a third word vector of the ancestor, a third word vector of the country, a third word vector of the family and a third word vector of the country can be obtained respectively.
Step 1316, encoding the first word vector group, the second word vector group and the third word vector group to obtain a first feature vector group, a second feature vector group and a third feature vector group, respectively.
Continuing with the above example, encoding the word vector of "me", "love", "who" and "who" may result in a third feature vector obtained after "me" combines the word vectors of "love", "who" and may result in a third feature vector obtained after "love" combines the word vectors of "me" and "who", and may result in a third feature vector obtained after "who" combines the word vectors of "me" and "love". Similarly, a second feature vector of each word unit in the target question and a third feature vector of each word unit in the target answer can be obtained.
Continuing the above example, the feature vector of each node in FIG. 11 may be taken as the attention value of each node. In fig. 11, an edge exists between "i" and "ai", and "i" and "ai" are word units in the target text, a first feature vector of the word unit "i" may be obtained from the first feature vector group, and a first feature vector of "ai" may be obtained from the first feature vector group, the first feature vector of "i" and the first feature vector of "ai" may be multiplied, and normalization processing may be performed on the product, so that an attention value of the edge between "i" and "ai" may be obtained. An edge exists between 'me' and 'home', the 'me' is a word unit in a target text, the 'home' is a word unit in a target answer, a first feature vector of the word unit 'me' can be obtained from a first feature vector group, a third feature vector of the 'home' can be obtained from a third feature vector group, the first feature vector of the 'me' and the third feature vector of the 'home' can be multiplied, normalization processing is carried out on the products, and an attention value of the edge between the 'me' and the 'home' can be obtained.
By the above manner, the attention value of each edge and the attention value of each node in fig. 11 can be determined, and the first graph network can be obtained by adding the attention values of the nodes and the edges to the initial first graph network.
Step 1320, adding attention values to the nodes and edges of the initial second graph network through the attention layer based on the second feature vector group and the third feature vector group to obtain a second graph network.
Continuing the above example, the feature vector of each node in FIG. 11 may be taken as the attention value of each node. In fig. 12, an edge exists between "me" and "who", and "me" and "who" are word units in the target problem, a second feature vector of the word unit "me" may be obtained from the second feature vector group, and a second feature vector of "who" may be obtained from the second feature vector group, the second feature vector of "me" and the second feature vector of "who" may be multiplied, and normalization processing may be performed on the products, so that an attention value of the edge between "me" and "who" may be obtained. An edge exists between 'who' and 'home', the 'who' is a word unit in the target question, the 'home' is a word unit in the target answer, a second feature vector of the word unit 'who' can be obtained from the second feature vector group, a third feature vector of the 'home' can be obtained from the third feature vector group, the second feature vector of the 'who' and the third feature vector of the 'home' can be multiplied, normalization processing is carried out on the products, and the attention value of the edge between the 'who' and the 'home' can be obtained.
In the above manner, the attention value of each edge and the attention value of each node in fig. 12 can be determined, and the attention values of the nodes and the edges are added to the initial second graph network, so that the second graph network can be obtained.
As an example, a first graph network may be input into a graph convolution network layer for convolution processing to obtain a first hidden layer feature vector, and a second graph network may be input into the graph convolution network layer for convolution processing to obtain a second hidden layer feature vector.
And step 1324, performing weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain a target hidden layer feature vector.
As an example, the sequence annotation function is a function used in performing sequence annotation, and can map an input vector into at least one-dimensional probabilities, that is, at least one probability can be obtained for each vector.
For example, the target hidden layer feature vector may be used as an input of a sequence labeling function, and through calculation of the sequence labeling function, a probability corresponding to each dimension of the target hidden layer feature vector may be obtained.
Continuing with the above example, if the target text is "i love my country", and includes 6 word units, the target hidden layer feature vector is a 6-dimensional vector, and the 6 dimensions correspond to the word units i, ancestor, and country, respectively, each dimension in the target hidden layer feature vector is converted into 3 prediction probabilities, and each probability corresponds to the possibility of occurrence of the label "BIO". For example, for the word unit "ancestor", it is assumed that the calculated prediction probabilities are 0.5, 0.3, and 0.2, respectively, 0.5 is the probability that the label of the word unit "ancestor" is "B", 0.3 is the probability that the label of the word unit "ancestor" is "I", 0.2 is the probability that the label of the word unit "ancestor" is "O", and for the word unit "country", it is assumed that the calculated prediction probabilities are 0.3, 0.6, and 0.1, respectively, 0.3 is the probability that the label of the word unit "country" is "B", 0.6 is the probability that the label of the word unit "country" is "I", and 0.1 is the probability that the label of the word unit "country" is "O".
At step 1328, a prediction label of the word unit corresponding to each dimension is determined based on the at least one prediction probability corresponding to each dimension.
Continuing with the above example, since 0.5 is the largest among the prediction probabilities corresponding to the word unit "ancestor" and 0.5 is the probability that the label of the word unit "ancestor" is "B", the prediction label corresponding to the word unit "ancestor" can be determined to be "O", 0.6 is the largest among the prediction probabilities corresponding to the word unit "country", and 0.6 is the probability that the label of the word unit "country" is "I", the prediction label corresponding to the "country" can be determined to be "I".
Continuing the above example, assume that the 6 word units identified as "I, love, I, ancestor, and State" correspond to labels "O", "B", and "I", respectively. Since the label "B" represents an answer leading word and the label "I" represents an answer intermediate ending word, it can be determined that the answer to the target question is "home".
By the method, the association relation among the target text, the target question and the target answer can be effectively extracted by utilizing the characteristic vectors of the target text, the target question and the target answer, the answer of the target question is determined through the reading understanding model by combining the association relation among the target text, the target question and the target answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
Corresponding to the above method embodiment, the present application further provides an embodiment of a training apparatus for reading and understanding a model, and fig. 14 illustrates a schematic structural diagram of a training apparatus for reading and understanding a model according to an embodiment of the present application. As shown in fig. 14, the apparatus may include:
a first graph network construction module 1402 configured to construct an initial first graph network of sample text fragments and sample answers by reading graph construction network layers of the understanding model, and construct an initial second graph network of sample questions and the sample answers;
a first text processing module 1404 configured to input the sample text segment, the sample question, and the sample answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
a prediction module 1406 configured to input the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer;
a training module 1408 configured to train the reading understanding model based on a difference between the predicted answer and the sample answer until a training stop condition is reached.
Optionally, the first text processing module 1404 configured to:
inputting the sample text segment, the sample question and the sample answer into a feature extraction layer of the reading understanding model to respectively obtain a first feature vector group, a second feature vector group and a third feature vector group;
inputting the first feature vector group, the second feature vector group and the third feature vector group into an attention layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network.
Optionally, the first text processing module 1404 configured to:
performing word segmentation processing on the sample text segment, the sample question and the sample answer to respectively obtain a first word unit group, a second word unit group and a third word unit group;
performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to obtain a first word vector group, a second word vector group and a third word vector group respectively;
and coding the first word vector group, the second word vector group and the third word vector group to respectively obtain the first feature vector group, the second feature vector group and the third feature vector group.
Optionally, a first graph network building module 1402 configured to:
constructing an initial third graph network based on the dependency relationship among the word units in the sample text segment, and constructing an initial fourth graph network based on the dependency relationship among the word units in the sample question;
and constructing the initial first graph network based on the incidence relation between the initial third graph network and the sample answer, and constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the sample answer.
Optionally, the first graph network building module 1402 is configured to:
taking word units in the sample text fragment as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample text fragment to obtain the initial third graph network.
Optionally, the first graph network building module 1402 is configured to:
and connecting the target node with a node in the initial third graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample text segment to obtain the initial first graph network.
Optionally, the first graph network building module 1402 is configured to:
taking word units in the sample problem as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample problem to obtain the initial fourth graph network.
Optionally, the first graph network building module 1402 is configured to:
and connecting the target node with a node in the initial fourth graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample question to obtain the initial second graph network.
Optionally, the first text processing module 1404 configured to:
adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors;
adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors.
Optionally, the first text processing module 1404 configured to:
taking a first feature vector in the first feature vector group as an attention value of a first node in the initial first graph network, wherein the first node is a node corresponding to a word unit of the sample text segment in the initial first graph network;
taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the sample answer in the initial first graph network;
determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge;
based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
Optionally, the first text processing module 1404 configured to:
taking a second feature vector in the second feature vector group as an attention value of a third node in the initial second graph network, wherein the third node is a node corresponding to a word unit of the sample problem in the initial second graph network;
taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the sample answer in the initial second graph network;
determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group;
based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
Optionally, the prediction module 1406 is configured to:
determining, by the graph convolution network layer, a first hidden layer feature vector of the first graph network and a second hidden layer feature vector of the second graph network;
carrying out weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain a target hidden layer feature vector;
determining the predicted answer based on the target hidden layer feature vector.
Optionally, the prediction module 1406 is configured to:
converting the value of each dimension of the target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that the prediction label of the word unit corresponding to each dimension is at least one label;
determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension;
and determining the predicted answer based on the predicted label of the word unit corresponding to each dimension.
Optionally, the prediction module 1406 is configured to:
the at least one label comprises an answer initial word, an answer intermediate final word and a non-answer word, and a word unit corresponding to the answer initial word and a word unit corresponding to the answer intermediate final word are used as the prediction answer.
Optionally, the training module 1408 is configured to:
if the difference value is smaller than a preset threshold value, stopping training the reading understanding model;
and if the difference is larger than or equal to the preset threshold, continuing to train the reading understanding model.
Optionally, the training module 1408 is configured to:
recording and carrying out iterative training once the predicted answer is obtained every time;
and counting the training times of iterative training, and if the training times are greater than a time threshold value, determining that the training stopping condition is reached.
In the embodiment of the application, an initial first graph network of a sample text fragment and a sample answer is constructed by reading a graph construction network layer of an understanding model, and an initial second graph network of a sample question and the sample answer is constructed; inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network; inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain a predicted answer; training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached. By the method, the incidence relation among the sample text segment, the sample question and the sample answer can be effectively utilized, the reading understanding model is trained by combining the incidence relation among the sample text segment, the sample question and the sample answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
The above is an illustrative scheme of the training device for reading and understanding the model in the embodiment. It should be noted that the technical solution of the training apparatus for reading and understanding the model and the technical solution of the training method for reading and understanding the model belong to the same concept, and details that are not described in detail in the technical solution of the training apparatus for reading and understanding the model can be referred to the description of the technical solution of the training method for reading and understanding the model.
Corresponding to the above method embodiment, the present application further provides an embodiment of a reading and understanding apparatus, and fig. 15 shows a schematic structural diagram of a reading and understanding apparatus provided in an embodiment of the present application. As shown in fig. 15, the apparatus may include:
a second graph network construction module 1502 configured to construct an initial first graph network of the target text and the target answer and an initial second graph network of the target question and the target answer by reading graph construction network layers of the understanding model;
a second text processing module 1504, configured to input the target text, the target question and the target answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
a determining module 1506 configured to input the first graph network and the second graph network into a graph volume network layer of the reading understanding model, and determine an answer to the target question.
Optionally, a second text processing module 1504 configured to:
inputting the target text, the target question and the target answer into a feature extraction layer of the reading understanding model to respectively obtain a first feature vector group, a second feature vector group and a third feature vector group;
inputting the first feature vector group, the second feature vector group and the third feature vector group into an attention layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network.
Optionally, a second text processing module 1504 configured to:
performing word segmentation processing on the target text, the target question and the target answer to respectively obtain a first word unit group, a second word unit group and a third word unit group;
performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to obtain a first word vector group, a second word vector group and a third word vector group respectively;
and coding the first word vector group, the second word vector group and the third word vector group to respectively obtain the first feature vector group, the second feature vector group and the third feature vector group.
Optionally, second graph network building module 1502 is configured to:
constructing an initial third graph network based on the dependency relationship among the word units in the target text, and constructing an initial fourth graph network based on the dependency relationship among the word units in the target question;
and constructing the initial first graph network based on the incidence relation between the initial third graph network and the target answer, and constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the target answer.
Optionally, the second graph network constructing module 1502 is configured to:
taking word units in the target text as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the target text to obtain the initial third graph network.
Optionally, the second graph network constructing module 1502 is configured to:
and connecting the target node with a node in the initial third graph network by taking the word unit in the target answer as a target node based on the incidence relation between the word unit in the target answer and the word unit in the target text to obtain the initial first graph network.
Optionally, the second graph network constructing module 1502 is configured to:
taking word units in the target problem as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the target problem to obtain the initial fourth graph network.
Optionally, the second graph network constructing module 1502 is configured to:
and connecting the target node with a node in the initial fourth graph network by taking the word unit in the target answer as a target node based on the incidence relation between the word unit in the target answer and the word unit in the target question to obtain the initial second graph network.
Optionally, a second text processing module 1504 configured to:
adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors;
adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors.
Optionally, a second text processing module 1504 configured to:
taking a first feature vector in the first feature vector group as an attention value of a first node in the initial first graph network, wherein the first node is a node corresponding to a word unit of the target text in the initial first graph network;
taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the target answer in the initial first graph network;
determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge;
based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
Optionally, a second text processing module 1504 configured to:
taking a second feature vector in the second feature vector group as an attention value of a third node in the initial second graph network, wherein the third node is a node corresponding to a word unit of the target problem in the initial second graph network;
taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the target answer in the initial second graph network;
determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group;
based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
Optionally, the determining module 1506 is configured to:
determining, by the graph convolution network layer, a first hidden layer feature vector of the first graph network and a second hidden layer feature vector of the second graph network;
carrying out weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain a target hidden layer feature vector;
determining the answer based on the target hidden layer feature vector.
Optionally, the determining module 1506 is configured to:
converting the value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and at least one probability corresponding to each dimension represents the probability that the label of the word unit corresponding to each dimension is at least one label;
determining a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension;
and determining the answer based on the label of the word unit corresponding to each dimension.
Optionally, the determining module 1506 is configured to:
the at least one label comprises an answer initial word, an answer intermediate final word and a non-answer word, and a word unit corresponding to the answer initial word and a word unit corresponding to the answer intermediate final word are used as the answers.
In the embodiment of the application, an initial first graph network of a target text and a target answer is constructed through a graph construction network layer of a reading understanding model, and an initial second graph network of a target question and the target answer is constructed; inputting the target text, the target question and the target answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network; and inputting the first graph network and the second graph network into a graph volume network layer of the reading understanding model to obtain an answer of the target question. By the method, the feature vectors of the target text, the target question and the target answer can be effectively utilized, the incidence relation among the target text, the target question and the target answer is extracted, the answer of the target question is determined through the reading understanding model by combining the incidence relation among the target text, the target question and the target answer, and the accuracy of the reading understanding task executed by the reading understanding model can be improved.
The above is a schematic scheme of a reading and understanding device of the embodiment. It should be noted that the technical solution of the reading and understanding apparatus and the technical solution of the reading and understanding method belong to the same concept, and details that are not described in detail in the technical solution of the reading and understanding apparatus can be referred to the description of the technical solution of the reading and understanding method.
It should be noted that the components in the device claims should be understood as functional blocks which are necessary to implement the steps of the program flow or the steps of the method, and each functional block is not actually defined by functional division or separation. The device claims defined by such a set of functional modules are to be understood as a functional module framework for implementing the solution mainly by means of a computer program as described in the specification, and not as a physical device for implementing the solution mainly by means of hardware.
An embodiment of the present application further provides a computing device, which includes a memory, a processor, and computer instructions stored in the memory and executable on the processor, where the processor executes the instructions to implement the steps of the reading understanding model training method, or implement the steps of the reading understanding method.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the above-mentioned reading and understanding model training method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the above-mentioned reading and understanding model training method. Alternatively, the technical solution of the computing device and the technical solution of the reading and understanding method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the reading and understanding method.
An embodiment of the present application further provides a computer-readable storage medium, which stores computer instructions, and when the instructions are executed by a processor, the method for training the reading understanding model as described above is implemented, or the method for training the reading understanding model as described above is implemented.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the above-mentioned reading and understanding model training method belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned reading and understanding model training method. Alternatively, the technical solution of the storage medium and the technical solution of the reading and understanding method belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the reading and understanding method.
The embodiment of the application discloses a chip, which stores computer instructions, and the instructions are executed by a processor to implement the steps of the reading understanding model training method as described above, or implement the steps of the reading understanding method as described above.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.
Claims (33)
1. A method for determining a predicted answer, the method comprising:
converting the value of each dimension of a target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that the prediction label of the word unit corresponding to each dimension is at least one label;
determining a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension;
and determining a predicted answer based on the predicted label of the word unit corresponding to each dimension.
2. The method of claim 1, wherein prior to converting the values of each dimension of the target hidden layer feature vector into at least one prediction probability by a sequence labeling function, further comprising:
determining a first hidden layer feature vector of a first graph network and a second hidden layer feature vector of a second graph network through a graph convolution network layer;
and carrying out weighted summation on the first hidden layer feature vector and the second hidden layer feature vector to obtain the target hidden layer feature vector.
3. The method of claim 2, wherein the graph convolution network layer is a GCN model.
4. A method as claimed in claim 2 or 3, wherein said graph convolution network layer convolves said first graph network by:
wherein i represents an ith node in the first graph network, j represents a jth node in the first graph network,the feature vector representing the i +1 th convolutional layer input to the i-th node, σ (·) represents a nonlinear transfer function, which is a ReLU activation function, NiRepresenting node i and all nodes connected to node i,feature vector, C, representing the jth node input of the ith convolutional layerijIndicating the attention value of the edge between the ith node and the jth node,represents the weight of the jth node at the ith convolutional layer,represents the intercept of the jth node at the ith convolutional layer.
5. The method of claim 2 or 3, wherein the graph convolutional network layer comprises a plurality of convolutional layers, wherein the convolutional layers comprise a preset weight parameter matrix, and the weight of each node in each convolutional layer is an initial weight in the weight parameter matrix; or the convolutional layers comprise preset intercept parameter matrixes, and the intercept of each node in each convolutional layer is the initial intercept in the intercept parameter matrixes.
6. The method of claim 2, wherein prior to determining, by the graph convolution network layer, the first hidden layer feature vector of the first graph network and the second hidden layer feature vector of the second graph network, further comprising:
constructing an initial first graph network of sample text fragments and sample answers and an initial second graph network of sample questions and the sample answers by reading a graph construction network layer of an understanding model;
inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network;
after the predicted answer is determined based on the predicted label of the word unit corresponding to each dimension, the method further comprises the following steps:
training the reading understanding model based on the difference between the predicted answer and the sample answer until a training stop condition is reached.
7. The method of claim 6, wherein the text processing layer comprises a feature extraction layer and an attention layer; inputting the sample text segment, the sample question and the sample answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network, including:
inputting the sample text segment, the sample question and the sample answer into a feature extraction layer of the reading understanding model to respectively obtain a first feature vector group, a second feature vector group and a third feature vector group;
inputting the first feature vector group, the second feature vector group and the third feature vector group into an attention layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network.
8. The method of claim 7, wherein the feature extraction layer employs a structure of a Bert model.
9. The method of claim 7, wherein the attention layer adopts a structure of an attention layer of a Bert model.
10. The method of claim 7, wherein inputting the sample text passage, the sample question, and the sample answer into a feature extraction layer of the reading understanding model to obtain a first feature vector group, a second feature vector group, and a third feature vector group, respectively, comprises:
performing word segmentation processing on the sample text segment, the sample question and the sample answer to respectively obtain a first word unit group, a second word unit group and a third word unit group;
performing word embedding processing on the first word unit group, the second word unit group and the third word unit group to obtain a first word vector group, a second word vector group and a third word vector group respectively;
and coding the first word vector group, the second word vector group and the third word vector group to respectively obtain the first feature vector group, the second feature vector group and the third feature vector group.
11. The method of claim 10, wherein performing word segmentation on the sample text segment, the sample question, and the sample answer to obtain a first word unit group, a second word unit group, and a third word unit group, respectively, comprises:
if the sample text segment is a Chinese text, respectively and independently dividing a character, a punctuation mark and a number into a word unit, and forming the first word unit group by the word units obtained by dividing the sample text segment; or if the sample text segment is a foreign language text, dividing a word or a phrase into a word unit, and forming the first word unit group by the word units obtained by dividing the sample text segment;
if the sample problem is a Chinese text, a word, a punctuation mark and a number are respectively and independently divided into a word unit, and the word units obtained by the sample problem division form the second word unit group; or if the sample question is a foreign language text, dividing a word or a phrase into word units, and forming the second word unit group by the word units obtained by dividing the sample question;
if the sample answer is a Chinese text, respectively and independently dividing a character, a punctuation mark and a number into a word unit, and forming a third word unit group by the word units obtained by dividing the sample answer; or if the sample answer is a foreign language text, dividing a word or a phrase into word units, and forming the third word unit group by the word units obtained by dividing the sample answer.
12. The method of claim 10, wherein performing word embedding processing on the first word unit group, the second word unit group, and the third word unit group to obtain a first word vector group, a second word vector group, and a third word vector group, respectively, comprises:
performing word embedding processing on each first word unit in the first word unit group by adopting a single hot coding or word2vec coding mode to obtain the first word vector group;
performing word embedding processing on each second word unit in the second word unit group by adopting a single hot coding or word2vec coding mode to obtain a second word vector group;
and performing word embedding processing on each third word unit in the third word unit group by adopting a single hot coding or word2vec coding mode to obtain the third word vector group.
13. The method of claim 10, wherein encoding the first set of word vectors, the second set of word vectors, and the third set of word vectors to obtain the first set of feature vectors, the second set of feature vectors, and the third set of feature vectors, respectively, comprises:
and coding each first word vector, each second word vector and each third word vector to respectively obtain a first feature vector of each first word unit, a second feature vector of each second word unit and a third feature vector of each third word unit, wherein the first feature vector of each first word unit is represented by fused sample text fragment full-text semantic information corresponding to each first word unit, the second feature vector of each second word unit is represented by fused sample question full-text semantic information corresponding to each second word unit, and the third feature vector of each third word unit is represented by fused sample full-text semantic information corresponding to each third word unit.
14. The method of claim 6, wherein constructing an initial first graph network of sample text snippets and sample answers and an initial second graph network of sample questions and sample answers by reading a graph construction network layer of an understanding model comprises:
constructing an initial third graph network based on the dependency relationship among the word units in the sample text segment, and constructing an initial fourth graph network based on the dependency relationship among the word units in the sample question;
and constructing the initial first graph network based on the incidence relation between the initial third graph network and the sample answer, and constructing the initial second graph network based on the incidence relation between the initial fourth graph network and the sample answer.
15. The method of claim 14, wherein constructing an initial third graph network based on dependencies between word units in the sample text segments comprises:
taking word units in the sample text fragment as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample text fragment to obtain the initial third graph network.
16. The method of claim 14 or 15, wherein constructing the initial first graph network based on the initial third graph network and the correlations between the sample answers comprises:
and connecting the target node with a node in the initial third graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample text segment to obtain the initial first graph network.
17. The method of claim 14, wherein constructing an initial fourth graph network based on dependencies between word units in the sample problem comprises:
taking word units in the sample problem as nodes to obtain a plurality of nodes;
and connecting the nodes with the dependency relationship based on the dependency relationship among the word units in the sample problem to obtain the initial fourth graph network.
18. The method according to claim 14, 15 or 17, wherein the dependency is calculated by a Stanford Core NLP algorithm.
19. The method of claim 14 or 17, wherein constructing the initial second graph network based on the correlations between the initial fourth graph network and the sample answers comprises:
and connecting the target node with a node in the initial fourth graph network by taking the word unit in the sample answer as the target node based on the incidence relation between the word unit in the sample answer and the word unit in the sample question to obtain the initial second graph network.
20. The method of claim 7, wherein inputting the first set of feature vectors, the second set of feature vectors, and the third set of feature vectors into an attention layer of the reading understanding model, adding attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network, comprises:
adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors;
adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors.
21. The method of claim 20, wherein adding, by the attention layer, attention values for nodes and edges of the initial first graph network based on the first set of feature vectors and the third set of feature vectors comprises:
taking a first feature vector in the first feature vector group as an attention value of a first node in the initial first graph network, wherein the first node is a node corresponding to a word unit of the sample text segment in the initial first graph network;
taking a third feature vector in the third feature vector group as an attention value of a second node in the initial first graph network, wherein the second node is a node corresponding to a word unit of the sample answer in the initial first graph network;
determining an attention value between two first nodes of an edge in the initial first graph network based on the first feature vector group and serving as the attention value of the edge;
based on the first set of feature vectors and the third set of feature vectors, determining and using an attention value between a first node and a second node of the initial first graph network where an edge exists as an attention value of the edge.
22. The method of claim 21, wherein the method of calculating the attention value comprises:
for two first nodes with edges, performing attention calculation on first feature vectors of word units corresponding to the two first nodes to obtain attention values of the edges; or,
and for a first node and a second node with edges, performing attention calculation on a first feature vector of a word unit corresponding to the first node and a third feature vector of the word unit corresponding to the second node to obtain the attention value of the edges.
23. The method of claim 20, wherein adding, by the attention layer, attention values for nodes and edges of the initial second graph network based on the second set of feature vectors and the third set of feature vectors comprises:
taking a second feature vector in the second feature vector group as an attention value of a third node in the initial second graph network, wherein the third node is a node corresponding to a word unit of the sample problem in the initial second graph network;
taking a third feature vector in the third feature vector group as an attention value of a fourth node in the initial second graph network, wherein the fourth node is a node corresponding to a word unit of the sample answer in the initial second graph network;
determining an attention value between two third nodes of the initial second graph network with edges as the attention value of the edges based on the second feature vector group;
based on the third feature vector group, determining an attention value between a third node and a fourth node of the initial second graph network where the edge exists and using the attention value as the attention value of the edge.
24. The method of claim 23, wherein the method of calculating the attention value comprises:
for two third nodes with edges, performing attention calculation on second feature vectors of word units corresponding to the two third nodes to obtain attention values of the edges; or,
and for a third node and a fourth node with edges, performing attention calculation on a second feature vector of the word unit corresponding to the third node and a third feature vector of the word unit corresponding to the fourth node to obtain the attention value of the edges.
26. The method of claim 1, wherein the at least one label comprises an answer beginning word, an answer middle ending word, and a non-answer word; determining the predicted answer based on the predicted label of the word unit corresponding to each dimension, including:
and taking the word unit corresponding to the head word of the answer and the word unit corresponding to the middle ending word of the answer as the predicted answer.
27. The method of claim 6, wherein training the reading understanding model based on a difference between the predicted answer and the sample answer until a training stop condition is reached comprises:
if the difference value is smaller than a preset threshold value, stopping training the reading understanding model;
and if the difference is larger than or equal to the preset threshold, continuing to train the reading understanding model.
28. The method of claim 6, wherein reaching a training stop condition comprises:
recording and carrying out iterative training once the predicted answer is obtained every time;
and counting the training times of iterative training, and if the training times are greater than a time threshold value, determining that the training stopping condition is reached.
29. A method of reading comprehension, the method comprising:
constructing an initial first graph network of a target text and a target answer and an initial second graph network of a target question and the target answer through a graph construction network layer of a reading understanding model, wherein the reading understanding model is trained by the method of any one of claims 6 to 28;
inputting the target text, the target question and the target answer into a text processing layer of the reading understanding model, and adding attention values to nodes and edges included in the initial first graph network and the initial second graph network respectively to obtain a first graph network and a second graph network;
inputting the first graph network and the second graph network into a graph convolution network layer of the reading understanding model, and determining a target hidden layer feature vector;
converting the value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and at least one probability corresponding to each dimension represents the probability that the label of the word unit corresponding to each dimension is at least one label;
determining a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension;
and determining an answer of the target question based on the label of the word unit corresponding to each dimension.
30. An apparatus for determining a predicted answer, the apparatus comprising:
the first conversion module is configured to convert a value of each dimension of a target hidden layer feature vector into at least one prediction probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one prediction probability corresponding to each dimension represents the probability that a prediction label of the word unit corresponding to each dimension is at least one label;
a first determining module configured to determine a prediction label of a word unit corresponding to each dimension based on at least one prediction probability corresponding to each dimension;
a second determining module configured to determine a predicted answer based on the predicted label of the word unit corresponding to each dimension.
31. A reading and understanding apparatus, comprising:
a graph network construction module configured to construct an initial first graph network of the target text and the target answer and an initial second graph network of the target question and the target answer by reading a graph construction network layer of an understanding model, wherein the reading understanding model is trained by the method of any one of claims 6 to 28;
a text processing module configured to input the target text, the target question and the target answer into a text processing layer of the reading understanding model, and add attention values to nodes and edges included in the initial first graph network and the initial second graph network, respectively, to obtain a first graph network and a second graph network;
a third determination module configured to input the first graph network and the second graph network into a graph convolution network layer of the reading understanding model, and determine a target hidden layer feature vector;
a second conversion module configured to convert a value of each dimension of the target hidden layer feature vector into at least one probability through a sequence labeling function, wherein each dimension of the target hidden layer feature vector corresponds to a word unit, and the at least one probability corresponding to each dimension represents a probability that a label of the word unit corresponding to each dimension is at least one label;
a fourth determining module configured to determine a label of a word unit corresponding to each dimension based on at least one probability corresponding to each dimension;
a fifth determining module configured to determine an answer to the target question based on the label of the word unit corresponding to each dimension.
32. A computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, wherein the processor when executing the instructions performs the steps of the method for predictive answer determination of any one of claims 1 to 28, or the steps of the method for reading and understanding of claim 29, or the apparatus of claim 30 or 31.
33. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method for determining a predicted answer according to any one of claims 1 to 28, or implement the steps of the method for reading and understanding according to claim 29, or include the apparatus according to claim 30 or 31.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111110989.8A CN113792550B (en) | 2021-04-08 | 2021-04-08 | Method and device for determining predicted answers, reading and understanding method and device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110375810.5A CN112800186B (en) | 2021-04-08 | 2021-04-08 | Reading understanding model training method and device and reading understanding method and device |
CN202111110989.8A CN113792550B (en) | 2021-04-08 | 2021-04-08 | Method and device for determining predicted answers, reading and understanding method and device |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110375810.5A Division CN112800186B (en) | 2021-04-08 | 2021-04-08 | Reading understanding model training method and device and reading understanding method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113792550A true CN113792550A (en) | 2021-12-14 |
CN113792550B CN113792550B (en) | 2024-09-24 |
Family
ID=75816480
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111110988.3A Active CN113792120B (en) | 2021-04-08 | 2021-04-08 | Graph network construction method and device, reading and understanding method and device |
CN202111110989.8A Active CN113792550B (en) | 2021-04-08 | 2021-04-08 | Method and device for determining predicted answers, reading and understanding method and device |
CN202110375810.5A Active CN112800186B (en) | 2021-04-08 | 2021-04-08 | Reading understanding model training method and device and reading understanding method and device |
CN202111111031.0A Active CN113792121B (en) | 2021-04-08 | 2021-04-08 | Training method and device of reading and understanding model, reading and understanding method and device |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111110988.3A Active CN113792120B (en) | 2021-04-08 | 2021-04-08 | Graph network construction method and device, reading and understanding method and device |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110375810.5A Active CN112800186B (en) | 2021-04-08 | 2021-04-08 | Reading understanding model training method and device and reading understanding method and device |
CN202111111031.0A Active CN113792121B (en) | 2021-04-08 | 2021-04-08 | Training method and device of reading and understanding model, reading and understanding method and device |
Country Status (1)
Country | Link |
---|---|
CN (4) | CN113792120B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688207B (en) * | 2021-08-24 | 2023-11-17 | 思必驰科技股份有限公司 | Modeling processing method and device based on structural reading understanding of network |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180307969A1 (en) * | 2017-04-20 | 2018-10-25 | Hitachi, Ltd. | Data analysis apparatus, data analysis method, and recording medium |
CN109002519A (en) * | 2018-07-09 | 2018-12-14 | 北京慧闻科技发展有限公司 | Answer selection method, device and electronic equipment based on convolution loop neural network |
CN110309305A (en) * | 2019-06-14 | 2019-10-08 | 中国电子科技集团公司第二十八研究所 | Machine based on multitask joint training reads understanding method and computer storage medium |
CN110309283A (en) * | 2019-06-28 | 2019-10-08 | 阿里巴巴集团控股有限公司 | A kind of answer of intelligent answer determines method and device |
CN110647629A (en) * | 2019-09-20 | 2020-01-03 | 北京理工大学 | Multi-document machine reading understanding method for multi-granularity answer sorting |
CN111046661A (en) * | 2019-12-13 | 2020-04-21 | 浙江大学 | Reading understanding method based on graph convolution network |
US20200151613A1 (en) * | 2018-11-09 | 2020-05-14 | Lunit Inc. | Method and apparatus for machine learning |
CN111460092A (en) * | 2020-03-11 | 2020-07-28 | 中国电子科技集团公司第二十八研究所 | Multi-document-based automatic complex problem solving method |
CN112434142A (en) * | 2020-11-20 | 2021-03-02 | 海信电子科技(武汉)有限公司 | Method for marking training sample, server, computing equipment and storage medium |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11853903B2 (en) * | 2017-09-28 | 2023-12-26 | Siemens Aktiengesellschaft | SGCNN: structural graph convolutional neural network |
US20190122111A1 (en) * | 2017-10-24 | 2019-04-25 | Nec Laboratories America, Inc. | Adaptive Convolutional Neural Knowledge Graph Learning System Leveraging Entity Descriptions |
CN108959396B (en) * | 2018-06-04 | 2021-08-17 | 众安信息技术服务有限公司 | Machine reading model training method and device and question and answer method and device |
CN111445020B (en) * | 2019-01-16 | 2023-05-23 | 阿里巴巴集团控股有限公司 | Graph-based convolutional network training method, device and system |
CN114298310A (en) * | 2019-01-29 | 2022-04-08 | 北京金山数字娱乐科技有限公司 | Length loss determination method and device |
US11461619B2 (en) * | 2019-02-18 | 2022-10-04 | Nec Corporation | Spatio temporal gated recurrent unit |
US10861437B2 (en) * | 2019-03-28 | 2020-12-08 | Wipro Limited | Method and device for extracting factoid associated words from natural language sentences |
CN110210021B (en) * | 2019-05-22 | 2021-05-28 | 北京百度网讯科技有限公司 | Reading understanding method and device |
CN110457450B (en) * | 2019-07-05 | 2023-12-22 | 平安科技(深圳)有限公司 | Answer generation method based on neural network model and related equipment |
CN110598573B (en) * | 2019-08-21 | 2022-11-25 | 中山大学 | Visual problem common sense reasoning model and method based on multi-domain heterogeneous graph guidance |
US11593672B2 (en) * | 2019-08-22 | 2023-02-28 | International Business Machines Corporation | Conversation history within conversational machine reading comprehension |
EP3783531B1 (en) * | 2019-08-23 | 2024-11-13 | Tata Consultancy Services Limited | Automated conversion of text based privacy policy to video |
CN110619123B (en) * | 2019-09-19 | 2021-01-26 | 电子科技大学 | Machine reading understanding method |
CN110750630A (en) * | 2019-09-25 | 2020-02-04 | 北京捷通华声科技股份有限公司 | Generating type machine reading understanding method, device, equipment and storage medium |
CN110781663B (en) * | 2019-10-28 | 2023-08-29 | 北京金山数字娱乐科技有限公司 | Training method and device of text analysis model, text analysis method and device |
CN111274800B (en) * | 2020-01-19 | 2022-03-18 | 浙江大学 | Inference type reading understanding method based on relational graph convolution network |
CN111310848B (en) * | 2020-02-28 | 2022-06-28 | 支付宝(杭州)信息技术有限公司 | Training method and device for multi-task model |
CN111626044B (en) * | 2020-05-14 | 2023-06-30 | 北京字节跳动网络技术有限公司 | Text generation method, text generation device, electronic equipment and computer readable storage medium |
CN112380835B (en) * | 2020-10-10 | 2024-02-20 | 中国科学院信息工程研究所 | Question answer extraction method integrating entity and sentence reasoning information and electronic device |
-
2021
- 2021-04-08 CN CN202111110988.3A patent/CN113792120B/en active Active
- 2021-04-08 CN CN202111110989.8A patent/CN113792550B/en active Active
- 2021-04-08 CN CN202110375810.5A patent/CN112800186B/en active Active
- 2021-04-08 CN CN202111111031.0A patent/CN113792121B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180307969A1 (en) * | 2017-04-20 | 2018-10-25 | Hitachi, Ltd. | Data analysis apparatus, data analysis method, and recording medium |
CN109002519A (en) * | 2018-07-09 | 2018-12-14 | 北京慧闻科技发展有限公司 | Answer selection method, device and electronic equipment based on convolution loop neural network |
US20200151613A1 (en) * | 2018-11-09 | 2020-05-14 | Lunit Inc. | Method and apparatus for machine learning |
CN110309305A (en) * | 2019-06-14 | 2019-10-08 | 中国电子科技集团公司第二十八研究所 | Machine based on multitask joint training reads understanding method and computer storage medium |
CN110309283A (en) * | 2019-06-28 | 2019-10-08 | 阿里巴巴集团控股有限公司 | A kind of answer of intelligent answer determines method and device |
CN110647629A (en) * | 2019-09-20 | 2020-01-03 | 北京理工大学 | Multi-document machine reading understanding method for multi-granularity answer sorting |
CN111046661A (en) * | 2019-12-13 | 2020-04-21 | 浙江大学 | Reading understanding method based on graph convolution network |
CN111460092A (en) * | 2020-03-11 | 2020-07-28 | 中国电子科技集团公司第二十八研究所 | Multi-document-based automatic complex problem solving method |
CN112434142A (en) * | 2020-11-20 | 2021-03-02 | 海信电子科技(武汉)有限公司 | Method for marking training sample, server, computing equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112800186B (en) | 2021-10-12 |
CN112800186A (en) | 2021-05-14 |
CN113792121A (en) | 2021-12-14 |
CN113792121B (en) | 2023-09-22 |
CN113792120B (en) | 2023-09-15 |
CN113792120A (en) | 2021-12-14 |
CN113792550B (en) | 2024-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111581961A (en) | Automatic description method for image content constructed by Chinese visual vocabulary | |
CN113239169B (en) | Answer generation method, device, equipment and storage medium based on artificial intelligence | |
CN112364660B (en) | Corpus text processing method, corpus text processing device, computer equipment and storage medium | |
CN112800768A (en) | Training method and device for nested named entity recognition model | |
CN114495129A (en) | Character detection model pre-training method and device | |
CN113536801A (en) | Reading understanding model training method and device and reading understanding method and device | |
CN113220832A (en) | Text processing method and device | |
CN113505601A (en) | Positive and negative sample pair construction method and device, computer equipment and storage medium | |
CN113282729B (en) | Knowledge graph-based question and answer method and device | |
CN114691864A (en) | Text classification model training method and device and text classification method and device | |
CN112632258A (en) | Text data processing method and device, computer equipment and storage medium | |
CN116958997A (en) | Graphic summary method and system based on heterogeneous graphic neural network | |
CN116975212A (en) | Answer searching method and device for question text, computer equipment and storage medium | |
CN112800186B (en) | Reading understanding model training method and device and reading understanding method and device | |
CN115203388A (en) | Machine reading understanding method and device, computer equipment and storage medium | |
CN115221315A (en) | Text processing method and device, and sentence vector model training method and device | |
CN113961686A (en) | Question-answer model training method and device, question-answer method and device | |
CN114417863A (en) | Word weight generation model training method and device and word weight generation method and device | |
Islam et al. | Bengali caption generation for images using deep learning | |
CN114648005A (en) | Multi-fragment machine reading understanding method and device for multitask joint learning | |
CN115617959A (en) | Question answering method and device | |
CN114692610A (en) | Keyword determination method and device | |
CN114077831A (en) | Training method and device for problem text analysis model | |
CN112015891A (en) | Method and system for classifying messages of network inquiry platform based on deep neural network | |
CN114610819B (en) | Entity relation extraction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |