[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112231450B - Question-answer searching method, question-answer searching device, question-answer searching apparatus and medium - Google Patents

Question-answer searching method, question-answer searching device, question-answer searching apparatus and medium Download PDF

Info

Publication number
CN112231450B
CN112231450B CN201910579670.6A CN201910579670A CN112231450B CN 112231450 B CN112231450 B CN 112231450B CN 201910579670 A CN201910579670 A CN 201910579670A CN 112231450 B CN112231450 B CN 112231450B
Authority
CN
China
Prior art keywords
question
preset
syntax
similarity
syntactic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910579670.6A
Other languages
Chinese (zh)
Other versions
CN112231450A (en
Inventor
张振中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd filed Critical BOE Technology Group Co Ltd
Priority to CN201910579670.6A priority Critical patent/CN112231450B/en
Publication of CN112231450A publication Critical patent/CN112231450A/en
Application granted granted Critical
Publication of CN112231450B publication Critical patent/CN112231450B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are a question-answer search method, a question-answer search device, a question-answer search apparatus, and a medium, the question-answer search method comprising: carrying out syntactic structure analysis on the input problem to obtain a syntactic structure vector of the input problem; processing the input problem to obtain a syntactic content vector of the input problem; and comparing the input problem with a preset problem in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result. By comprehensively considering the syntactic structure and syntactic content information of the input problem, the accuracy of the search result is improved.

Description

Question-answer searching method, question-answer searching device, question-answer searching apparatus and medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and more particularly, to a question-answer search method, a question-answer search apparatus, a question-answer search device, and a medium.
Background
With the widespread use of artificial intelligence in civil and commercial fields, there is an increasing demand for natural language processing, and in particular, there is a higher demand for a process of retrieving corresponding answers based on input questions in a professional field (e.g., medical field).
At present, when a user inputs a problem on the Internet, a great deal of time is required to be consumed to acquire and browse information in the face of massive data resources in the Internet, and the information needs to be further screened to obtain a required search result. Particularly in the professional field (e.g., medical field), users have many difficulties in finding, acquiring and understanding information, so that the answer retrieval process for an input question takes a long time and the obtained answer is poor in accuracy.
Therefore, a question-answer search method with higher search accuracy is needed on the premise of realizing answer search to an input question.
Disclosure of Invention
Aiming at the problems, the disclosure provides a question and answer retrieval method, a question and answer retrieval device, question and answer retrieval equipment and a question and answer retrieval medium. The question-answer searching method provided by the invention can effectively improve the searching speed and the accuracy of the searching result on the basis of realizing the answer searching of the input questions, realizes the real-time and high-precision searching, and has good robustness.
According to an aspect of the present disclosure, a question-answer search method is provided, including: carrying out syntactic structure analysis on the input problem to obtain a syntactic structure vector of the input problem; processing the input problem to obtain a syntactic content vector of the input problem; and comparing the input problem with a preset problem in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result.
In some embodiments, comparing the input question with the preset questions in the preset question-answering library to obtain the search result based on the syntax structure vector and the syntax content vector comprises: calculating the syntactic structure similarity of the syntactic structure vector and the syntactic structure vector of each preset question in a preset question-answering library; calculating the syntactic content similarity of the syntactic content vector and the syntactic content vector of each preset problem in a preset question-answering library; determining the problem similarity of the input problem and each preset problem in a preset question-answering library according to the syntactic structure similarity and the syntactic content similarity; and outputting a search result according to the problem similarity.
In some embodiments, outputting the search result according to the problem similarity includes: based on the problem similarity determined for each preset problem, determining the maximum problem similarity in the preset question-answering library, and acquiring a preset problem and answer pair corresponding to the maximum problem similarity; comparing the maximum problem similarity with the preset threshold; outputting corresponding answers of the preset questions and answer pairs when the maximum question similarity is larger than or equal to a preset threshold value, and outputting null values when the maximum question similarity is smaller than the preset threshold value.
In some embodiments, the input question is a medical question, the preset question-answer library includes a plurality of preset medical question and answer pairs, wherein each preset medical question and answer pair includes: a class of medical questions and their corresponding answers.
In some embodiments, calculating the syntactic structural similarity of the syntactic structural vector to the syntactic structural vector of each of the preset questions in the preset question-answering library comprises: for each sub-element in the syntactic structure vector of the input problem, obtaining a syntactic subtree corresponding to the sub-element; comparing the syntax subtree with the syntax subtree corresponding to each sub-element in the syntax structure vector of the preset problem, and obtaining the sub-element similarity of the sub-element in the syntax structure vector of the input problem based on the comparison result; and adding the sub-element similarity of all the sub-elements in the syntactic structure vector of the input question to obtain the syntactic structure similarity of the input question and the preset question.
In some embodiments, comparing the syntax sub-tree with the syntax sub-tree corresponding to each sub-element in the syntax structure vector of the preset question includes: judging whether the element values of the current sub-element in the syntax structure vector of the input problem and the current sub-element in the syntax structure vector of the preset problem are non-zero values or not; if zero values exist in the values of the current sub-element in the syntax structure vector of the input problem and/or the current sub-element in the syntax structure vector of the preset problem, outputting a preset first comparison result; if the current sub-element in the syntax structure vector of the input problem and the current sub-element in the syntax structure vector of the preset problem are both non-zero values, comparing the syntax subtree corresponding to the current sub-element in the syntax structure vector of the input problem with the syntax subtree corresponding to the current sub-element in the syntax structure vector of the preset problem.
In some embodiments, comparing the syntax subtree corresponding to the current sub-element in the syntax structure vector of the input question with the syntax subtree corresponding to the current sub-element in the syntax structure vector of the preset question includes: taking a syntax subtree corresponding to a current sub-element in a syntax structure vector of an input problem as a first syntax subtree, and taking a syntax subtree corresponding to the current sub-element in the syntax structure vector of the preset problem as a second syntax subtree; and comparing the first syntax subtree with the second syntax subtree based on a preset rule to obtain the similarity of the first syntax subtree and the second syntax subtree.
In some embodiments, comparing the first and second syntactic subtrees based on a preset rule includes: judging whether the generating formula on the initial node of the first syntax subtree is the same as the generating formula on the initial node of the second syntax subtree; outputting a preset first comparison result when the generation formula on the initial node of the first syntax subtree is different from the generation formula on the initial node of the second syntax subtree; and when the generating formula on the initial node of the first syntax subtree and the generating formula on the initial node of the second syntax subtree are the same, judging whether only leaf nodes exist in the offspring of the initial node of the first syntax subtree and the offspring of the initial node of the second syntax subtree; if only leaf nodes exist in the offspring of the initial node of the first syntax subtree and the offspring of the initial node of the second syntax subtree, outputting a preset second comparison result; and if the offspring of the initial node of the first syntax subtree and/or the offspring of the initial node of the second syntax subtree comprise non-leaf nodes, calculating the similarity of the first syntax subtree and the second syntax subtree by adopting a preset algorithm.
According to another aspect of the present disclosure, there is provided a question-answer retrieving apparatus including: the syntactic structure analysis module is configured to perform syntactic structure analysis on the input problem to obtain a syntactic structure vector of the input problem; the syntactic content analysis module is configured to process an input problem to obtain a syntactic content vector of the input problem; and the search result generation module is configured to compare the input problem with a preset problem in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a search result.
In some embodiments, the search result generation module includes: the structure similarity generating module is configured to calculate the syntax structure similarity of the syntax structure vector and the syntax structure vector of each preset question in the preset question-answering library; the content similarity generating module is configured to calculate the syntactic content similarity between the syntactic content vector and the syntactic content vector of each preset problem in the preset question-answering library; the question similarity generating module is configured to determine the question similarity of the input questions and each preset question in a preset question-answering library according to the syntactic structure similarity and the syntactic content similarity; and a result module configured to output a search result according to the problem similarity.
In some embodiments, the results module includes: the maximum question similarity determining module is configured to determine the maximum question similarity in the preset question-answering library based on the question similarity determined for each preset question, and acquire a preset question and answer pair corresponding to the maximum question similarity; a comparison module configured to compare the maximum problem similarity to the preset threshold; the output module is configured to output corresponding answers in the preset questions and answer pairs when the maximum question similarity is greater than or equal to a preset threshold value, and output a null value when the maximum question similarity is less than the preset threshold value.
In some embodiments, the input question is a medical question, the preset question-answer library includes a plurality of preset medical question and answer pairs, wherein each preset medical question and answer pair includes: a class of medical questions and their corresponding answers.
According to another aspect of the present disclosure, there is provided a question-answer retrieval device, wherein the device comprises a processor and a memory containing a set of instructions which, when executed by the processor, cause the question-answer retrieval device to perform operations comprising: carrying out syntactic structure analysis on the input problem to obtain a syntactic structure vector of the input problem; and comparing the input problem with a preset problem in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result.
In some embodiments, comparing the input question with the preset questions in the preset question-answering library based on the syntax structure vector and the syntax content vector to obtain the retrieval result comprises: calculating the syntactic structure similarity of the syntactic structure vector and the syntactic structure vector of each preset question in a preset question-answering library; calculating the syntactic content similarity of the syntactic content vector and the syntactic content vector of each preset problem in a preset question-answering library; determining the problem similarity of the input problem and each preset problem in a preset question-answering library according to the syntactic structure similarity and the syntactic content similarity; and outputting a search result according to the problem similarity.
In some embodiments, outputting the search result according to the problem similarity includes: based on the problem similarity determined for each preset problem, determining the maximum problem similarity in the preset question-answering library, and acquiring a preset problem and answer pair corresponding to the maximum problem similarity; comparing the maximum problem similarity with the preset threshold; outputting corresponding answers of the preset questions and answer pairs when the maximum question similarity is larger than or equal to a preset threshold value, and outputting null values when the maximum question similarity is smaller than the preset threshold value.
In some embodiments, the input question is a medical question, the preset question-answer library includes a plurality of preset medical question and answer pairs, wherein each preset medical question and answer pair includes: a class of medical questions and their corresponding answers.
According to another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a computer, perform the method as described above.
By utilizing the searching method provided by the disclosure, the answer searching process of the input questions can be well completed, and particularly, the method has higher searching accuracy and higher detection speed, and has good robustness.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without making creative efforts to one of ordinary skill in the art. The following drawings are not intended to be drawn to scale on actual dimensions, emphasis instead being placed upon illustrating the principles of the disclosure.
FIG. 1 illustrates an exemplary flow chart of a question-answer retrieval method according to an embodiment of the disclosure;
FIG. 2A illustrates an exemplary flow chart of a process 200 for deriving a syntactic analysis vector by syntactic analysis according to an embodiment of the disclosure;
FIG. 2B shows a schematic diagram of a preset initial vector according to an embodiment of the present disclosure;
FIG. 2C illustrates a schematic diagram of syntactic analysis of an input problem according to an embodiment of the present disclosure;
FIG. 3 illustrates an exemplary flow chart of a process 300 for comparing the input question with preset questions in a preset question-and-answer library to obtain a search result based on the syntactic structure vector and syntactic content vector, according to an embodiment of the disclosure;
FIG. 4A illustrates an exemplary flowchart of a process 400 for computing the syntactic similarity of the syntactic structure vector to the syntactic structure vector of each preset question in a preset library of questions, according to an embodiment of the present disclosure;
FIG. 4B illustrates an exemplary flow chart of a process 410 for discriminating between values of subelements to be compared according to an embodiment of the present disclosure;
FIG. 4C illustrates an exemplary flowchart of a process of comparing a first syntactic subtree and a second syntactic subtree according to an embodiment of the disclosure;
FIG. 4D is a diagram illustrating calculating the similarity of a child element in an input question to a child element in the preset question according to an embodiment of the present disclosure;
FIG. 5 illustrates an exemplary flowchart of a process of calculating a syntactic content similarity of a syntactic content vector of the input question to a syntactic content vector of each of the preset questions in the preset question-answering library, according to an embodiment of the present disclosure;
FIG. 6 illustrates an exemplary flowchart of a process of outputting a search result according to the problem similarity according to an embodiment of the present disclosure;
fig. 7 shows an exemplary block diagram of a question-answer retrieval device according to an embodiment of the present disclosure;
fig. 8 shows an exemplary block diagram of a question-answer retrieval device according to an embodiment of the present disclosure.
Detailed Description
The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some, but not all embodiments of the disclosure. All other embodiments, which can be made by one of ordinary skill in the art without undue burden based on the embodiments of the present disclosure, are also within the scope of the present disclosure.
As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
Although the present application makes various references to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative, and different aspects of the systems and methods may use different modules.
A flowchart is used in the present application to describe the operations performed by a system according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.
Fig. 1 illustrates an exemplary flow chart of a question and answer retrieval method 100 according to an embodiment of the disclosure.
First, in step S101, a syntax structure analysis is performed on an input question, and a syntax structure vector of the input question is obtained.
The syntactic structure analysis is syntactic analysis (parsing), which is to analyze the grammar function of words in the sentence, and can find the syntactic structure in the sentence and the dependency relationship among the constituent parts of the sentence through the analysis. For example, for the statement "i am late", a syntactic structure analysis of it may result in, for example, "i am" in the statement being the subject, "i am" being the predicate and "late" being the complement.
The syntactic structure analysis may be, for example, syntactic analysis based on a probabilistic context-free model, syntactic analysis based on a central word driver, etc., and embodiments of the present disclosure are not limited by the particular algorithm employed in syntactic structure analysis. The syntactic structure analysis may be implemented, for example, by a syntactic analysis tool, such as a Stanford syntactic analyzer and a Berkeley syntactic analyzer, etc. Embodiments of the present disclosure are not limited by the particular tools employed by the syntactic structure analysis process.
The input question may be, for example, a question directly input by the user, or may be a question that the computer system determines itself in response to input information or control information of the user. The embodiments of the present disclosure are not limited by the source of the input problem or the manner in which it is input. For example, the question may be a question input by the user in the web search field, or may be a question generated by a computer based on input information of the user.
Next, in step S102, the input question is processed to obtain a syntactic content vector of the input question.
The process of processing the input problem to obtain a syntax vector may be implemented, for example, by a neural network. The neural network may be, for example, a convolutional neural network, a fully-connected neural network, or a long-short-term memory neural network, so as to achieve different practical application requirements, and the disclosure is not limited to the type of the neural network selected.
The syntactic content vector characterizes content information of the input question. When an input problem is processed through a neural network to obtain a syntactic content vector, it may be, for example, a 1024-dimensional vector, or also a 2048-dimensional vector, depending on the parameter setting inside the neural network. Embodiments of the present disclosure are not limited by the particular dimensions of the syntactic content vector.
After the syntactic content vector and the syntactic structure vector are obtained, in step S103, the input question is compared with a preset question in a preset question-and-answer library based on the syntactic structure vector and the syntactic content vector, so as to obtain a retrieval result.
The preset question-answer library comprises a plurality of preset questions and answer pairs. Each preset question and answer pair comprises a class of questions and corresponding answers.
The preset question-answering library may be, for example, a general knowledge question-answering library, such as a general knowledge question-answering library. Or it may be a question-and-answer library of knowledge in a certain area of expertise, such as a medical question-and-answer library, or a financial knowledge question-and-answer library, embodiments of the present disclosure are not limited by the type of questions and answer pairs in the preset question-and-answer library.
For example, each preset question and answer pair may include only one question, such as the question "how to burn" and its answer; or the preset question and answer pair may also include a plurality of medical questions belonging to a same type, such as a question and answer pair named as "cardiovascular and cerebrovascular diseases", wherein the question "how cerebral thrombosis is handled" and answer thereof, the question "how cerebral apoplexy is handled" and answer thereof, and the question "how heart diseases are handled" and answer thereof may be included. Embodiments of the present disclosure are not limited by the number of specific questions included in each preset question and answer pair.
The comparison process may be, for example, comparing the syntax content vector of the input question with the syntax content vector of each preset question in the preset question library, comparing the syntax structure vector of the input question with the syntax structure vector of each preset question in the preset question library, to obtain the syntax content vector similarity and the syntax structure vector similarity of the input question relative to each preset question in the preset question library, and further obtaining the search result based on the syntax content similarity and the syntax structure similarity. Alternatively, other comparison methods may be employed, and embodiments of the present disclosure are not limited by the particular comparison method selected.
Based on the above, by acquiring the syntactic content vector and the syntactic structure vector of the input question, and further based on the syntactic content vector and the syntactic structure vector, comparing the input question with the preset questions in the preset question-answering library to obtain a retrieval result, when the answer to the input question is retrieved, the syntactic structure feature and the syntactic content feature of the input question are comprehensively considered, so that the retrieved answer is more accurate; meanwhile, compared with the retrieval process that the user browses the information data in the network by himself and manually screens the answers, the method obviously reduces the time cost of retrieval and has higher retrieval efficiency and retrieval speed.
Fig. 2A illustrates an exemplary flow chart of a process 200 for deriving a syntactic analysis vector by syntactic analysis according to an embodiment of the disclosure.
Referring to FIG. 2A, in some embodiments, a process 200 for parsing vectors through syntax analysis, for example, may be described in more detail. First, in step S201, a syntax tree of the input question and all syntax subtrees included therein are obtained through syntax analysis. Next, in step S202, for each syntax subtree in the input question, a preset syntax subtree identical to the syntax subtree is obtained in the preset sentence subtree library. In step S203, based on each obtained preset syntax subtree, in the preset initial vector, a corresponding accumulated value is assigned to the sub-element corresponding to the preset syntax subtree, so as to obtain a syntax analysis vector of the input problem.
The preset initial vector is a vector with a first preset dimension, wherein the first preset dimension is the number of preset syntax subtrees in a preset syntax subtree library, each dimension in the preset initial vector corresponds to one preset syntax subtree in the preset syntax subtree library, and each subelement in the preset initial vector has the same initial value.
The preset sentence treebar library and the preset initial vector can be obtained, for example, through the following processes: firstly, carrying out sentence analysis on a professional library in a certain professional field and carrying out duplication removal operation on analysis results to obtain a syntax subtree library in the professional field, and taking the syntax subtree library as a preset sentence subtree library. The preset syntax subtree library includes a plurality of preset syntax subtrees different from each other, covering most of the syntax subtrees that may occur in the professional field. And secondly, determining a first preset dimension of a preset initial vector based on the number of preset syntax subtrees included in the preset sentence subtree library, namely enabling the first preset dimension to be equal to the number of the preset syntax subtrees in the preset sentence subtree library, and enabling each subelement in the preset initial vector to correspond to one preset syntax subtree in the preset sentence subtree library. However, it should be appreciated that embodiments of the present disclosure are not limited to determining a first preset dimension of a preset initial vector and a specific manner of obtaining a preset syntax sub-tree corresponding thereto.
For example, when the syntax subtree in the input question is other syntax subtrees than the preset syntax subtree in the current syntax subtree library, the syntax subtree is regarded as a sentence which does not belong to the professional field, the syntax subtree is distinguished as error data, and the syntax subtree in the input question is discarded.
The first preset dimension of the preset initial vector represents the number of the corresponding syntactic subtrees. The first predetermined dimension may be 512, or it may be 1028. Embodiments of the present disclosure are not limited by the first preset dimension of the preset initial vector.
The preset syntax subtrees corresponding to each dimension in the preset initial vector may be, for example, universal syntax subtrees; or it may be a special syntax subtree, for example it includes a special syntax subtree in a certain professional field, for example medical field, legal field, etc. Embodiments of the present disclosure are not limited by the type of the preset syntax subtree corresponding to each dimension in the preset initial vector.
The initial value of the preset initial vector is intended to make each subelement in the preset initial vector have the same initial value so as to facilitate the subsequent accumulation process, so that the initial value can be, for example, 0 or can be 10 or any other value according to actual needs. Embodiments of the present disclosure are not limited by the set initial value.
The corresponding accumulated value is assigned to the sub-element in the preset initial vector, namely, the fact that the same syntax subtree as the preset syntax subtree corresponding to the sub-element exists in all the syntax subtrees of the input problem is represented. The accumulated value may be, for example, 1, that is, for each syntax subtree existing in the input problem, if there is a preset syntax subtree identical to the preset syntax subtree in the preset syntax subtree library, the preset syntax subtree is obtained, and the value of the sub-element corresponding to the preset syntax subtree in the preset initial vector is added by 1. However, embodiments of the present disclosure are not limited by the value of the accumulated value.
Fig. 2B shows a schematic diagram of a preset initial vector according to an embodiment of the present disclosure.
Referring to fig. 2B, the process 200 illustrated in fig. 2A may be described in more detail. The preset sentence subtree library includes 10 preset syntax subtrees c 1-c10, the preset initial vector M 0 correspondingly includes 10 sub-elements M 1-m10, where the correspondence between the sub-elements of the preset initial vector M 0 and the preset subtrees is shown by the arrow in fig. 2B, the sub-element M 1 corresponds to the preset syntax subtree c 1, the sub-element M 2 corresponds to the preset syntax subtree c 2, the … … sub-element M 10 corresponds to the preset syntax subtree c 10, and the initial value of each sub-element in the preset initial vector M 0 is 0.
Fig. 2C illustrates a schematic diagram of syntactic analysis of an input problem according to an embodiment of the present disclosure.
Referring to fig. 2C and 2B, for the input problem "cold causes fever", first, a syntax structure analysis is performed on it, resulting in a syntax tree J 0 thereof. Where S represents a sentence, NP represents a noun phrase, VP represents a verb phrase, N represents a noun, and V represents a verb. Based on the resulting syntax tree, it may decompose the syntax tree into, for example, 5 syntax subtrees j 1-j5 as in fig. 2C, thereby resulting in a syntax tree and syntax subtrees of the input problem.
Further, based on the above-mentioned syntactic subtree j 1-j5, the same preset subtree c 2,c3,c4,c5,c8 is obtained in the preset syntax subtree library, and accordingly, an accumulated value is given to the subelement M 2,m3,m4,m5,m8 in the preset initial vector M 0, where the accumulated value is, for example, 1. A syntactic analysis vector I of the input problem is derived therefrom (0,1,1,1,1,0,0,1,0,0).
Based on the above, one method of obtaining a syntax analysis vector through syntax analysis and an example thereof are shown, however, it should be understood that embodiments of the present disclosure are not limited to the above method, and other methods may be selected to obtain the syntax analysis vector.
Based on the above, a syntactic analysis vector is obtained by syntactic analysis of the input question, and the syntactic analysis vector characterizes the syntactic structure and the composition of the input question, so that the input question can be conveniently compared with the question and answer pairs in a preset question library.
In some embodiments, the process of processing an input question to obtain a syntactic content vector may be described in more detail. For example, in the case of processing an input question by a neural network to obtain a syntactic content vector, when the selected neural network is a long-short-time memory network, the input question will be input to the input terminal of the long-short-time memory network first; then, the input problem is calculated by a forward layer and a reverse layer of the long-short time neural network; the calculation result may be further processed via a conditional random field algorithm layer, for example; finally, a processing result of the neural network is obtained at the output end of the neural network, wherein the processing result is a syntactic content vector with a second preset dimension, and the syntactic content vector is the syntactic content vector.
It should be appreciated that in the case of processing the input problem through a neural network to obtain a syntactic content vector, the second preset dimension is determined according to the selected neural network and the set parameters. The second preset dimension may be, for example, the same as the first preset dimension or may be different from the first preset dimension. Embodiments of the present disclosure are not limited by the relationship of the first preset dimension and the second preset dimension and the specific value of the second preset dimension.
Based on the above, the syntactic content vector of the input question is obtained by processing the input question, so that the syntactic content feature of the input question can be obtained, and subsequent answer retrieval for the input question based on the syntactic content feature is facilitated.
Fig. 3 illustrates an exemplary flow chart of a process 300 for comparing the input question with preset questions in a preset question-and-answer library to obtain a search result based on the syntax structure vector and the syntax content vector according to an embodiment of the disclosure.
The process of obtaining the search result may be described in more detail with reference to fig. 3. In some embodiments, first, in step S301, a syntax structure similarity of the syntax structure vector and a syntax structure vector of each preset question in a preset question-answering library is calculated.
Wherein the preset question-and-answer library comprises at least one preset question, and wherein for each preset question in the preset question-and-answer library: carrying out syntactic structure analysis on the preset problem to obtain a syntactic structure vector of the preset problem; and processing the preset problem to obtain a syntactic content vector of the preset problem. Accordingly, the preset question has a syntax structure vector and a syntax content vector corresponding to the preset question, and the syntax structure vector and the syntax content vector have the same dimension as the syntax structure vector and the syntax content vector of the input question respectively.
The syntactic structural similarity is intended to characterize how similar the input question is to the preset question in syntactic structural features. Which may be obtained, for example, by comparing the syntactic structure feature vectors possessed by the input problem and the preset problem. Embodiments of the present disclosure are not limited by the manner in which the specific comparisons are made.
Next, in step S302, the syntactic content similarity of the syntactic content vector to the syntactic content vector of each preset question in the preset question-answering library is calculated.
The syntactic content similarity is intended to characterize how similar the input question is to the preset question in syntactic content characteristics. It may be obtained, for example, by solving for cosine similarity of the syntactic content vector of the input question and the syntactic content vector of the preset question, or it may be obtained by other means. Embodiments of the present disclosure are not limited by the particular method employed to find the syntactic content similarity.
Thereafter, in step S303, a question similarity between the input question and each preset question in the preset question-answering library is determined according to the syntax structure similarity and the syntax content similarity.
The problem similarity may be obtained by directly adding the syntax structure similarity and the syntax content similarity, or it may also be obtained by assigning different weights to the syntax structure similarity and the syntax content similarity, and further multiplying the obtained syntax structure similarity and syntax content similarity by their corresponding weights, and then adding, where the weights of the syntax structure similarity and the syntax content similarity are, for example, 0.4, 0.6, or may also be 0.3, 0.7, respectively. The embodiments of the present disclosure are not limited by the specific manner and specific weight settings to obtain the preset problem similarity.
After obtaining the problem similarity, in step S304, a search result is output according to the problem similarity.
For example, the maximum question similarity in the preset question-answer library may be determined based on the question similarity determined for each preset question, and a preset question and answer pair corresponding to the maximum question similarity is obtained, and an answer in the answer pair is directly output; or a preset threshold value can be set, the obtained question similarity is compared with the preset threshold value, the preset question and answer pair corresponding to one or more question similarities larger than the preset threshold value is obtained, and the answers in the preset question and answer pair are sequentially output. Embodiments of the present disclosure are not limited by the particular method of determining search results based on problem similarities.
Based on the above, the method obtains the similarity of the syntactic structure and the similarity of the syntactic content of each preset question in the input question and the preset answer library, comprehensively obtains the similarity of the input question and the preset question based on the similarity of the syntactic structure and the similarity of the content, and obtains the retrieval result based on the similarity of the question. In the process of calculating the search result, the structural characteristics and the content characteristics of the input problem are comprehensively considered, and the accuracy and the reliability of the search result are improved.
Fig. 4A illustrates an exemplary flowchart of a process 400 for computing the syntactic similarity of the syntactic structure vector to the syntactic structure vector of each preset question in the preset question-answering library, according to an embodiment of the present disclosure.
Referring to fig. 4A, first, in step S401, for each sub-element in a syntax structure vector of an input question, a syntax sub-tree corresponding thereto is obtained. Thereafter, in step S402, the syntax subtree is compared with the syntax subtree corresponding to each sub-element in the syntax structure vector of the preset question, and based on the comparison result, the similarity of the sub-element in the syntax structure vector of the input question is obtained. In step S403, the sub-element similarities of all the sub-elements in the syntax structure vector of the input question are added to obtain the syntax structure similarity of the input question and the preset question.
Specifically, the above process can be described by the following formula:
Wherein q 0 is an input question, q L is an L-th preset question in a preset question bank, wherein L is a positive integer greater than 0 and less than the total number of questions in the preset question bank. H (q 0) is a syntax structure vector of the input problem q 0, and H (q L) is a syntax structure vector of the preset problem q L. sim_t is the similarity of the resulting syntactic structure vector of the input question to the syntactic structure vector of the preset question. Wherein n is a first preset dimension. H (q 0) _i represents an ith sub-element in the syntax structure vector of the input question, and H (q L) _j represents a jth sub-element in the syntax structure vector of the preset question. Wherein C (H (q 0)_i,H(qL) _j) represents the similarity between the ith sub-element in the current input problem and the jth sub-element in the preset problem, wherein i is a positive integer greater than or equal to 1 and less than or equal to n, and j is a positive integer greater than or equal to 1 and less than or equal to n.
The process of obtaining the similarity of the sub-element in the syntax structure vector of the input problem based on the comparison result may be, for example: based on the comparison result obtained by comparing the syntax subtree of the sub-element in the syntax structure vector of the input problem with the syntax subtree corresponding to each sub-element in the syntax structure vector of the preset problem, adding all the comparison results to obtain the sub-element similarity of the sub-element in the syntax structure vector of the input problem. The comparison result represents the similarity between the syntax subtree corresponding to the sub-element in the syntax structure vector of the input problem and the syntax subtree corresponding to the sub-element in the preset problem.
A method for comparing the syntax subtree of the sub-element in the syntax structure vector of the input question with the syntax subtree corresponding to the sub-element in the syntax structure vector of the preset question to obtain the comparison result is given below. It should be understood that embodiments of the present disclosure are not limited by the particular manner of comparison employed.
In some embodiments, before comparing the syntax subtree of the sub-element in the syntax structure vector of the input question with the syntax subtree corresponding to the sub-element in the syntax structure vector of the preset question, the method further includes a process of discriminating the value of the sub-element to be compared.
FIG. 4B illustrates an exemplary flow chart of a process 410 for discriminating between values of subelements to be compared according to an embodiment of the present disclosure.
Referring to fig. 4B, in some embodiments, the initial value of all the sub-elements in the predetermined initial vector is 0, and the accumulated value is 1. Based on this, when discriminating the values of the subelements to be compared: first, in step S411, it is determined whether the element values of the current sub-element in the syntax structure vector of the input question and the current sub-element in the syntax structure vector of the preset question are both non-zero values.
The current sub-element in the syntax structure vector of the input problem refers to the sub-element in the syntax structure vector of the input problem, which needs to calculate the similarity currently; the current sub-element in the syntax structure vector of the preset problem refers to the sub-element in the syntax structure vector of the preset problem that the similarity needs to be calculated currently.
When there is a zero value in the value of the current sub-element in the syntax structure vector of the input question and/or the current sub-element in the syntax structure vector of the preset question, a preset first comparison result is output in step S412.
The preset first comparison result is intended to represent that the similarity of the current sub-element in the syntax structure vector and/or the current sub-element in the syntax structure vector of the preset problem is 0%, based on which the preset first comparison result may be, for example, 0 or may also be a null value, and embodiments of the present disclosure are not limited by the specific value of the preset first comparison result.
The above-described process can be more specifically described as three cases. In the first case, when the value of the current sub-element in the syntax structure vector H (q 0) of the input question q 0 is 0 and the value of the current sub-element in the syntax structure vector H (q L) of the preset question q L is not 0, at this time, the similarity of the current sub-element of the syntax structure vector H (q 0) of the input question and the current sub-element of the syntax structure vector H (q L) of the preset question q L is 0.
In the second case, when the value of the current sub-element in the input question syntax structure vector H (q 0) is not 0 and the value of the current sub-element in the syntax structure vector H (q L) of the preset question q L is 0, the similarity between the current sub-element of the input question syntax structure vector H (q 0) and the current sub-element of the syntax structure vector H (q L) of the preset question q L is 0.
In the third case, when the value of the current sub-element in the syntax structure vector H (q 0) of the input question is 0 and the value of the current sub-element in the syntax structure vector H (q L) of the preset question q L is also 0, then at this time, the similarity of the current sub-element of the syntax structure vector H (q 0) of the input question and the current sub-element of the syntax structure vector H (q L) of the preset question q L is 0.
When no zero value exists in the value of the current sub-element in the syntax structure vector H (q 0) of the input question and the value of the current sub-element in the syntax structure vector H (q L) of the preset question q L, in step S413, the syntax subtree corresponding to the current sub-element in the syntax structure vector of the input question is compared with the syntax subtree corresponding to the current sub-element in the syntax structure vector of the preset question.
In some embodiments, the comparison process in step S413 described above may be described in more detail. First, in step S4131, a syntax subtree corresponding to a current sub-element in a syntax structure vector of an input question is used as a first syntax subtree, and a syntax subtree corresponding to a current sub-element in a syntax structure vector of the preset question is used as a second syntax subtree. In step S4132, the first syntax subtree and the second syntax subtree are compared to obtain the similarity between the first syntax subtree and the second syntax subtree.
The first syntax subtree and the second syntax subtree are only used for distinguishing the syntax subtree corresponding to the current sub-element in the syntax structure vector of the input problem from the syntax subtree corresponding to the current sub-element in the syntax structure vector of the preset problem, and are not used for limiting the type or the content of the syntax subtree. It should be appreciated that the first and second syntax sub-trees may be the same type of syntax sub-tree.
The above-mentioned comparison of the first syntax subtree and the second syntax subtree may be performed, for example, by comparing each node included therein and the generation formula of the node, or other comparison manners may be adopted, and embodiments of the present disclosure are not limited by a specific method of comparing the first syntax subtree and the second syntax subtree to obtain the similarity.
Fig. 4C illustrates an exemplary flowchart of a process 420 of comparing a first syntactic subtree and a second syntactic subtree according to an embodiment of the disclosure.
In some embodiments, comparing the first syntax subtree and the second syntax subtree may employ a method as shown in fig. 4C, first, in step S421, it is determined whether the producer on the initial node of the first syntax subtree is the same as the producer on the initial node of the second syntax subtree.
The producer characterizes the manner in which one or more child nodes directly connected to a non-leaf node are expanded by the non-leaf node, which may be characterized as one or more branches in a syntactic subtree that lead from the non-leaf node. It should be appreciated that the number of generational expressions is not limited by the number of branches it contains, and that for the same node it has only one generational expression to which one or more branches leading from that node belong. For example, for the node S of the syntax subtree c 6 in fig. 2B, it has a generator with two branches, based on which child nodes NP and VP directly connected thereto are expanded.
When the generating formula on the initial node of the first syntax subtree and the generating formula on the initial node of the second syntax subtree are different, in step S422, a preset first comparison result is output.
As described above, the preset first comparison result is intended to characterize that the similarity between the current sub-element of the syntax structure vector of the input question and the current sub-element of the syntax structure vector of the preset question is 0%. The preset first comparison result may be, for example, 0, or may also be a null value, and embodiments of the present disclosure are not limited by a specific numerical value of the preset first comparison result.
The initial node is the node at the uppermost layer of the syntactic subtree, namely the main node of the syntactic subtree.
When the generation formulas of the initial node of the first syntax subtree and the initial node of the second syntax subtree are the same, further, in step S423, it is determined whether only leaf nodes exist in the offspring of the initial node of the first syntax subtree and the offspring of the initial node of the second syntax subtree.
Wherein the descendants of the initial node refer to all child nodes that are brought out by the initial node. The leaf nodes characterize the lowest level nodes in the syntactic subtree, i.e. the nodes in the syntactic subtree that cannot divide the branches any further.
If only leaf nodes exist in the offspring of the initial node of the first syntax subtree and the offspring of the initial node of the second syntax subtree, in step S424, a preset second comparison result is output.
The preset second comparison result is intended to characterize that the similarity of the current sub-element of the syntax structure vector of the input question to the current sub-element of the syntax structure vector of the preset question is 100%. The value may be, for example, 1, or other values may be used. Embodiments of the present disclosure are not limited by the specific values of the preset second comparison result.
It should be understood that, in the present application, the preset first comparison result and the preset second comparison result are intended to distinguish different comparison result values and their representation meanings, and are not intended to limit the preset first comparison result and the preset second comparison result.
When a non-leaf node is included in the descendant of the initial node of the first syntax subtree and/or the descendant of the initial node of the second syntax subtree, a preset algorithm is employed to calculate the similarity of the first syntax subtree and the second syntax subtree in step S425.
The preset algorithm can be selected based on the precision requirement and the actual calculation requirement, for example, a recursive algorithm is selected, or a compound algorithm comprising the recursive operation is selected to realize the calculation. Embodiments of the present disclosure are not limited by the specific algorithm content and type of the preset algorithm.
In some embodiments, the preset algorithm may be, for example, a recursive algorithm, which may be specifically expressed by the following formula:
Based on the above formula, the sub-element similarity can be obtained in a recursive manner. Where J1 represents a first syntactic subtree and J2 represents a second syntactic subtree. S (J1) represents the number of all children nodes generated under the initial node in the first syntax subtree, S (J2) represents the number of all children nodes generated under the initial node in the second syntax subtree, min [ S (J1), S (J2) ] represents the minimum value of the number of children nodes of the first syntax subtree and the number of children nodes of the second syntax subtree. J1 (s) represents an s-th node generated under the initial node in the first syntax subtree, and J2(s) represents an s-th node generated under the initial node in the second syntax subtree. Wherein S is a positive integer of 1 or more and S or less.
Fig. 4D shows a schematic diagram of computing sub-element similarity of sub-elements in an input problem according to an embodiment of the present disclosure.
The above process may be described in more detail with reference to fig. 4D and 2B. When calculating the similarity between the sub-element in the syntax structure vector and the sub-element of the preset problem, if the input problem at the moment is: if "cold causes fever", as described above, firstly, the syntax structure analysis is performed to obtain a syntax tree J 0 and a plurality of syntax subtrees contained therein, and based on the correspondence between the syntax subtrees and the sub-elements in the preset initial vector, a syntax structure vector H (q 0) is obtained: (0,1,1,1,1,0,0,1,0,0).
Next, regarding the process of comparing the syntax structure vector H (q 0) of the input question with the preset questions in the preset question-answering library to find the similarity with the preset questions, the following description will be given taking the similarity of the syntax structure vector of the input question with the syntax structure vector of the first preset question "pneumonia causes fever" in the preset question-answering library as an example. First, for a first preset problem: "pneumonia causes fever", it is known that the syntax tree J 1 and the syntax structure vector H (q 1) are: (1,0,0,1,0,1,1,1,0,0).
Then based on the above procedure, the similarity to the preset question is next found for each sub-element in the syntax structure vector H (q 0) of the input question. First, as is known from the above method, for the child element H (q 0)_1、H(q0)_6、H(q0)_7、H(q0)_9、H(q0) _10 in the syntax structure vector H (q 0) of the input problem, since its own value is 0, the similarity of the child element H (q 1) calculated with the syntax structure vector H (q 0)_1、H(q0)_6、H(q0)_7、H(q0)_9、H(q0) of the first preset problem is 0. Next, the sub-element similarity is obtained for other sub-elements, and the sub-element similarity of all the sub-elements in the syntax structure vector H (q 0) of the input question is added to obtain the syntax structure similarity of the syntax structure vector H (q 0) of the input question and the syntax structure vector of the first preset question. Now, taking H (q 0) _2 as an example, the similarity of the sub-elements corresponding to the sub-element H (q 0) _2 is shown in the following formula:
Based on the above, the sub-element similarity corresponding to the H (q 0) _2 sub-element is obtained, which has a value of 18. For the similarity C (H (q 0)_2,H(qL) _1) in the similarity of the subelement, since the initial nodes in the subelement H (q 0)_2、H(qL) _1 are S and all generate NP and VP, the generation formulas on the initial nodes are the same, but since the non-leaf node (N, V, N) exists in the offspring of the initial nodes, the similarity of the subelement will be obtained by the step S425 in fig. 4C, and the recursive formula as described above is used. In the recursion process, for each child node in the initial node, comparing the generation formulas of the child nodes to obtain the corresponding child node similarity, and multiplying the child node similarities to obtain the similarity of the two child elements, so that the final result of the similarity C (H (q 0)_2,H(qL) _1) of the two child elements is (1+1) ×1+1+1+1, i.e. 8.
Through the above process, the similarity of each sub-element in the syntax structure vector H (q 0) of the input problem relative to the sub-element of the syntax structure vector of the preset problem is obtained, and the similarity of all sub-elements in the syntax structure vector of the input problem is added to obtain the similarity of the syntax structure of the input problem and the preset problem.
Based on the above, the syntactic structure similarity of the input question relative to each preset question is obtained by comparing the syntactic structure vector of the input question with the syntactic structure vector of each preset question in the preset question library, which is beneficial to further calculation based on the syntactic structure similarity.
Fig. 5 illustrates an exemplary flowchart of a process of calculating a syntactic content similarity of a syntactic content vector of the input question to a syntactic content vector of each preset question in a preset question-answering library according to an embodiment of the present disclosure.
Referring to fig. 5, first, in step S501, cosine similarity of a syntax content vector of an input question and a syntax content vector of the preset question is calculated. Specifically, it can be calculated by, for example, the following formula:
sim_V(LSTM(q0),LSTM(qL))=cos(LSTM(q0),LSTM(qL)) 4)
Wherein q 0 is an input question, q L is an L-th preset question in a preset question bank, wherein L is a positive integer greater than 0 and less than the total number of questions in the preset question bank. LSTM (q 0) is a syntactic content vector of the input question q 0 output via neural network processing, and LSTM (q L) is a syntactic content vector of the preset question q L in a preset question library output via neural network processing. sim_v is the cosine similarity of the syntactic content vector of the obtained input question and the syntactic content vector of the preset question.
After the cosine similarity is obtained, in step S502, the cosine similarity is used as a syntax content similarity.
Based on the above, by obtaining the cosine similarity between the syntax content vector of the input problem and the syntax content vector of the preset problem, the syntax content similarity between the input problem and the preset problem can be obtained in a simple and quick manner, so that the subsequent retrieval based on the syntax content similarity is facilitated.
Fig. 6 illustrates an exemplary flowchart of a process of outputting a search result according to the problem similarity according to an embodiment of the present disclosure.
Referring to fig. 6, in some embodiments, when obtaining a search result based on the obtained question similarity, first, in step S601, based on the question similarity determined for each preset question, a maximum question similarity in the preset question-answering library is determined, and a preset question and answer pair corresponding to the maximum question similarity is obtained.
For example, if there are 10 preset questions in the current preset question library, the questions q 1-q10 are respectively, and the similarity between the questions q 1-q10 and the input questions is respectively: 87,90,101,20,32,12,91,82,9,10. Then the maximum similarity of questions is 101, and the question and answer pair corresponding to the maximum similarity of questions is obtained, i.e. the preset question q 3 and answer pair thereof are obtained.
Based on the obtained maximum problem similarity, in step S602, the maximum problem similarity is compared with the preset threshold.
The preset threshold is used for representing the minimum similarity value between the preset problem corresponding to the search result to be output and the input problem. Which may be set, for example, based on the actual desired accuracy of the search, embodiments of the present disclosure are not limited by the particular values that the preset threshold has. For example, it may be set to 50, or it may be set to 100.
Further, in step S603, when the maximum question similarity is greater than or equal to a preset threshold, a corresponding answer in the preset question and answer pair is output, and when the maximum question similarity is less than the preset threshold, a null value is output.
Specifically, for example, the preset threshold is set to 100, which is intended to characterize that the preset question corresponding to the outputted search result and the question having the similarity value of the question to the inputted question should be at least 100. If the maximum problem similarity in the current preset problem library is 90, the answer to the preset problem corresponding to the maximum problem similarity will not be output because the maximum problem similarity is smaller than the preset threshold, and a null value will be output at this time. If the maximum problem similarity in the current preset problem library is 103, outputting an answer to the preset problem corresponding to the maximum problem similarity because the maximum problem similarity is larger than the preset threshold.
Based on the above, the maximum problem similarity is obtained based on the problem similarity, the preset problem and answer pair of the maximum problem similarity is obtained, the problem similarity is further checked through the preset threshold, and the answer corresponding to the corresponding preset problem is output only when the maximum similarity is greater than or equal to the preset threshold, so that the output answer and the input problem can be effectively ensured to have higher similarity, and the accuracy of the search result is improved.
In some embodiments, the input question is a medical question, the preset question-answer library includes a plurality of preset medical question and answer pairs, wherein each preset medical question and answer pair includes: a class of medical questions and their corresponding answers.
The process of obtaining the medical question and answer pair may be, for example, crawling the questions of the patient and the answers of the doctor from the medical community, the hospital official network and the electronic medical record through an algorithm to form the question and answer pair. Further, the question and answer pair may be further content checked, information supplemented and modified, for example, via a medical professional, so that the accuracy of the question answer is further improved. Embodiments of the present disclosure are not limited by the particular method of obtaining the medical question and answer pair and the subsequent particular processing of the obtained question and answer pair.
By setting the question answer library for the medical professional field, when the input question is a medical question, the input question can be conveniently searched based on the preset question library and the answer of the input question can be obtained, so that the search speed in the process of searching the medical question can be remarkably improved; and simultaneously, compared with massive consultation on a network, the method is beneficial to providing more specialized medical problem solutions for users.
Fig. 7 shows an exemplary block diagram of a question-answer retrieving apparatus according to an embodiment of the disclosure.
The question-answer retrieving apparatus 900 shown in fig. 7 includes a syntax structure analysis module 910, a syntax content analysis module 920, and a retrieval result generation module 930.
The syntax structure analysis module 910 is configured to perform syntax structure analysis on an input problem to obtain a syntax structure vector of the input problem. The syntactic content analysis module 920 is configured to process an input question to obtain a syntactic content vector for the input question. The search result generation module 930 is configured to compare the input question with a preset question in a preset question-answer library based on the syntax structure vector and the syntax content vector, and obtain a search result.
The syntactic structure analysis is syntactic analysis (parsing), which is to analyze the grammar function of words in the sentence, and can find the syntactic structure in the sentence and the dependency relationship among the constituent parts of the sentence through the analysis. For example, for the statement "i am late", a syntactic structure analysis of it may result in, for example, "i am" in the statement being the subject, "i am" being the predicate and "late" being the complement.
The syntactic structure analysis may be, for example, syntactic analysis based on a probabilistic context-free model, syntactic analysis based on a central word driver, etc., and embodiments of the present disclosure are not limited by the particular algorithm employed in syntactic structure analysis. The syntactic structure analysis may be implemented, for example, by a syntactic analysis tool, such as a Stanford syntactic analyzer and a Berkeley syntactic analyzer, etc. Embodiments of the present disclosure are not limited by the particular tools employed by the syntactic structure analysis process.
The input question may be, for example, a question directly input by the user, or may be a question that the computer system determines itself in response to input information or control information of the user. The embodiments of the present disclosure are not limited by the source of the input problem or the manner in which it is input. For example, the question may be a question input by the user in the web search field, or may be a question generated by a computer based on input information of the user.
The preset question-answer library comprises a plurality of preset question and answer pairs. Each preset question and answer pair comprises a class of questions and corresponding answers.
The preset question-answering library may be, for example, a general knowledge question-answering library, such as a general knowledge question-answering library. Or it may be a question-and-answer library of knowledge in a certain area of expertise, such as a medical question-and-answer library, or a financial knowledge question-and-answer library, embodiments of the present disclosure are not limited by the type of questions and answer pairs in the preset question-and-answer library.
According to the question-answer retrieval device, the syntactic content vector and the syntactic structure vector of the input question can be obtained based on the input question, and further based on the syntactic content vector and the syntactic structure vector, the input question is compared with the preset questions in the preset question-answer library to obtain a retrieval result, so that the syntactic structure characteristic and the syntactic content characteristic of the input question are comprehensively considered when the answer is retrieved for the input question, and the retrieved answer is more accurate; meanwhile, compared with the retrieval process that the user browses the information data in the network by himself and manually screens the answers, the method obviously reduces the time cost of retrieval and has higher retrieval efficiency and retrieval speed.
In some embodiments, the search result generation module 930 may further include: a structural similarity generation module 931, a content similarity generation module 932, a problem similarity generation module 933, and a result module 934. It may execute the flow shown in fig. 3, and compare the syntactic structure vector and syntactic content vector of the input question with the preset questions in the preset question-answering library to obtain the retrieval result.
Wherein the structural similarity generating module 931 is configured to perform the operation as in step S301 in fig. 3, calculate the structural similarity of the syntactic structural vector with the syntactic structural vector of each preset question in the preset question-answering library.
The preset question-answering library comprises at least one preset question, and for each preset question in the preset question-answering library: carrying out syntactic structure analysis on the preset problem to obtain a syntactic structure vector of the preset problem; and processing the preset problem to obtain a syntactic content vector of the preset problem. Accordingly, the preset question has a syntax structure vector and a syntax content vector corresponding to the preset question, and the syntax structure vector and the syntax content vector have the same dimension as the syntax structure vector and the syntax content vector of the input question respectively.
The syntactic structural similarity is intended to characterize how similar the input question is to the preset question in syntactic structural features. Which may be obtained, for example, by comparing the syntactic structure feature vectors possessed by the input problem and the preset problem. Embodiments of the present disclosure are not limited by the manner in which the specific comparisons are made.
The content similarity generation module 932 is configured to perform the operation as in step S302 in fig. 3, and calculate the syntactic content similarity of the syntactic content vector to the syntactic content vector of each preset question in the preset question-and-answer library.
The syntactic content similarity is intended to characterize how similar the input question is to the preset question in syntactic content characteristics. Embodiments of the present disclosure are not limited by the particular method employed to find the syntactic content similarity.
The question similarity generating module 933 is configured to perform the operation as shown in step S303 in fig. 3, and determine, according to the syntactic structure similarity and the syntactic content similarity, a question similarity of the input question to each preset question in a preset question-answering library.
The problem similarity may be obtained by directly adding the syntax structure similarity and the syntax content similarity, or may be obtained by adding the syntax structure similarity and the syntax content similarity after giving different weights. The embodiments of the present disclosure are not limited by the specific manner and specific weight settings to obtain the preset problem similarity.
The result module 934 is configured to perform the operation of step S304 in fig. 3, and output a search result according to the problem similarity.
For example, the maximum question similarity in the preset question-answer library may be determined based on the question similarity determined for each preset question, the preset question and answer pair corresponding to the maximum question similarity may be obtained, and the answer in the answer pair may be directly output, or the search result may be determined by other methods. Embodiments of the present disclosure are not limited by the particular method of determining search results based on problem similarities.
Based on the above, the question-answer search device obtains the similarity of the syntactic structure and the similarity of the syntactic content of each preset question in the input question and the preset answer library, and comprehensively obtains the similarity of the input question and the preset question based on the similarity of the syntactic structure and the similarity of the content, thereby obtaining the search result. In the process of calculating the search result, the structural characteristics and the content characteristics of the input problem are comprehensively considered, and the accuracy and the reliability of the search result are improved.
In some embodiments, the results module 934 includes a maximum problem similarity determination module 9341, a comparison module 9342, and an output module 9343. It may perform the flow shown in fig. 6, outputting the search result according to the problem similarity.
The maximum question similarity determining module 9341 is configured to perform the step shown in step S601 in fig. 6, determine the maximum question similarity in the preset question-answering library based on the question similarity determined for each preset question, and obtain the preset question and answer pair corresponding to the maximum question similarity.
The comparison module 9342 is configured to perform the step shown in step S602 in fig. 6, and compare the maximum problem similarity with the preset threshold.
The output module 9343 is configured to execute the step shown in step S603 in fig. 6, output the corresponding answer in the preset question and answer pair when the maximum question similarity is greater than or equal to the preset threshold, and output a null value when the maximum question similarity is less than the preset threshold.
The preset threshold is used for representing the minimum similarity value between the preset problem corresponding to the search result to be output and the input problem. Which may be set, for example, based on the actual desired accuracy of the search, embodiments of the present disclosure are not limited by the particular values that the preset threshold has.
Based on the above, through the question-answer search device, the maximum question similarity can be obtained based on the question similarity, the preset question and answer pair of the maximum question similarity can be obtained, the question similarity is further checked through the preset threshold, and the answer corresponding to the corresponding preset question is output only when the maximum similarity is greater than or equal to the preset threshold, so that the output answer and the input question can be effectively ensured to have higher similarity, and the accuracy of the search result is improved.
In some embodiments, the input question is a medical question, the preset question-answer library includes a plurality of preset medical question and answer pairs, wherein each preset medical question and answer pair includes: a class of medical questions and their corresponding answers.
The process of obtaining the medical question and answer pair may be, for example, crawling the questions of the patient and the answers of the doctor from the medical community, the hospital official network and the electronic medical record through an algorithm to form the question and answer pair. Embodiments of the present disclosure are not limited by the particular method of obtaining the medical question and answer pair and the subsequent particular processing of the obtained question and answer pair.
By setting the question answer library for the medical professional field, when the input question is a medical question, the input question can be conveniently searched based on the preset question library and the answer of the input question can be obtained, so that the search speed in the process of searching the medical question can be remarkably improved; and simultaneously, compared with massive consultation on a network, the method is beneficial to providing more specialized medical problem solutions for users.
In some embodiments, the question and answer retrieval device is capable of performing the method as described above and has corresponding functions.
Fig. 8 shows an exemplary block diagram of a question-answer retrieval device according to an embodiment of the present disclosure.
The question and answer retrieval device 950 shown in fig. 8 may be implemented as one or more special purpose or general purpose computer system modules or components, such as a personal computer, notebook computer, tablet computer, cell phone, personal Digital Assistant (PDA) and any smart portable device. Wherein the question and answer retrieval device 950 may include at least one processor 960 and a memory 970.
Wherein the at least one processor is configured to execute program instructions. The memory 970 may exist in the question and answer retrieval device 950 in different forms of program storage units as well as data storage units, such as hard disk, read Only Memory (ROM), random Access Memory (RAM), which can be used to store various data files used by the processor in processing and/or performing the retrieval process, as well as possible program instructions for execution by the processor. Although not shown in the figures, the question and answer retrieval device 950 may also include an input/output component that supports input/output data flow between the question and answer retrieval device 950 and other components. The question and answer retrieval device 950 may also send and receive information and data from a network through a communication port.
In some embodiments, the set of instructions stored by the memory 970, when executed by the processor 960, causes the question and answer retrieval device 950 to perform operations comprising: carrying out syntactic structure analysis on the input problem to obtain a syntactic structure vector of the input problem; processing the input problem to obtain a syntactic content vector of the input problem; and comparing the input problem with a preset problem in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result.
In some embodiments, comparing the input question with the preset questions in the preset question-answering library based on the syntax structure vector and the syntax content vector to obtain the retrieval result comprises: calculating the syntactic structure similarity of the syntactic structure vector and the syntactic structure vector of each preset question in a preset question-answering library; calculating the syntactic content similarity of the syntactic content vector and the syntactic content vector of each preset problem in a preset question-answering library; determining the problem similarity of the input problem and each preset problem in a preset question-answering library according to the syntactic structure similarity and the syntactic content similarity; and outputting a search result according to the problem similarity.
In some embodiments, outputting the search result according to the problem similarity includes: based on the problem similarity determined for each preset problem, determining the maximum problem similarity in the preset question-answering library, and acquiring a preset problem and answer pair corresponding to the maximum problem similarity; comparing the maximum problem similarity with the preset threshold; outputting corresponding answers of the preset questions and answer pairs when the maximum question similarity is larger than or equal to a preset threshold value, and outputting null values when the maximum question similarity is smaller than the preset threshold value.
In some embodiments, the input question is a medical question, the preset question-answer library includes a plurality of preset medical question and answer pairs, wherein each preset medical question and answer pair includes: a class of medical questions and their corresponding answers.
In some embodiments, the question and answer retrieval device 950 may receive user input questions collected from an input device external to the question and answer retrieval device 950, and perform the above-described question and answer retrieval method, implementing the above-described functions of the question and answer retrieval means, on the received input questions.
Although in fig. 8, processor 960 and memory 970 are presented as separate modules, those skilled in the art will appreciate that the above-described device modules may be implemented as separate hardware devices or may be integrated as one or more hardware devices. The specific implementation of the different hardware devices should not be taken as a factor limiting the scope of protection of the present disclosure, as long as the principles described in this disclosure can be implemented.
In embodiments of the present disclosure, the processor may be a Central Processing Unit (CPU), a field programmable logic array (FPGA), a single chip Microcomputer (MCU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or the like, having a data processing capability and/or a program execution capability. The memory includes, but is not limited to, for example, volatile memory and/or nonvolatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or Cache memory (Cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like.
According to another aspect of the present disclosure, there is also provided a non-volatile computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a computer, can perform the method as described above.
Program portions of the technology may be considered to be "products" or "articles of manufacture" in the form of executable code and/or associated data, embodied or carried out by a computer readable medium. A tangible, persistent storage medium may include any memory or storage used by a computer, processor, or similar device or related module. Such as various semiconductor memories, tape drives, disk drives, or the like, capable of providing storage functionality for software.
All or a portion of the software may sometimes communicate over a network, such as the internet or other communication network. Such communication may load software from one computer device or processor to another. For example: a hardware platform loaded from a server or host computer of the question and answer retrieval device to a computer environment, or other computer environment implementing the system, or similar functioning system associated with providing information required for retrieval. Thus, another medium capable of carrying software elements may also be used as a physical connection between local devices, such as optical, electrical, electromagnetic, etc., propagating through cable, optical cable, air, etc. Physical media used for carrier waves, such as electrical, wireless, or optical, may also be considered to be software-bearing media. Unless limited to a tangible "storage" medium, other terms used herein to refer to a computer or machine "readable medium" mean any medium that participates in the execution of any instructions by a processor.
The application uses specific words to describe embodiments of the application. Reference to "a first/second embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the application. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the application may be combined as suitable.
Furthermore, those skilled in the art will appreciate that the various aspects of the application are illustrated and described in the context of a number of patentable categories or circumstances, including any novel and useful procedures, machines, products, or materials, or any novel and useful modifications thereof. Accordingly, aspects of the application may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.) or by a combination of hardware and software. The above hardware or software may be referred to as a "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the application may take the form of a computer product, comprising computer-readable program code, embodied in one or more computer-readable media.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The foregoing is illustrative of the present invention and is not to be construed as limiting thereof. Although a few exemplary embodiments of this invention have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this invention. Accordingly, all such modifications are intended to be included within the scope of this invention as defined in the following claims. It is to be understood that the foregoing is illustrative of the present invention and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The invention is defined by the claims and their equivalents.

Claims (9)

1. A question-answer retrieval method comprising:
carrying out syntactic structure analysis on the input problem to obtain a syntactic structure vector of the input problem;
Processing the input problem to obtain a syntactic content vector of the input problem;
comparing the input problem with a preset problem in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a retrieval result;
Wherein comparing the input question with a preset question in a preset question-and-answer library based on the syntax structure vector and the syntax content vector to obtain a search result comprises: calculating the syntactic structure similarity of the syntactic structure vector and the syntactic structure vector of each preset question in a preset question-answering library; calculating the syntactic content similarity of the syntactic content vector and the syntactic content vector of each preset problem in a preset question-answering library; determining the problem similarity of the input problem and each preset problem in a preset question-answering library according to the syntactic structure similarity and the syntactic content similarity; outputting a search result according to the problem similarity;
The calculating the similarity of the syntax structure vector and the syntax structure vector of each preset question in the preset question-answering library comprises: for each sub-element in the syntactic structure vector of the input problem, obtaining a syntactic subtree corresponding to the sub-element; comparing the syntax subtree with the syntax subtree corresponding to each sub-element in the syntax structure vector of the preset problem, and obtaining the sub-element similarity of the sub-element in the syntax structure vector of the input problem based on the comparison result; adding the sub-element similarity of all sub-elements in the syntactic structure vector of the input problem to obtain the syntactic structure similarity of the input problem and the preset problem;
And comparing the syntax subtree with the syntax subtree corresponding to each sub-element in the syntax structure vector of the preset question comprises: judging whether the element values of the current sub-element in the syntax structure vector of the input problem and the current sub-element in the syntax structure vector of the preset problem are non-zero values or not; if zero values exist in the values of the current sub-element in the syntax structure vector of the input problem and/or the current sub-element in the syntax structure vector of the preset problem, outputting a preset first comparison result; if the current sub-element in the syntax structure vector of the input problem and the current sub-element in the syntax structure vector of the preset problem are both non-zero values, comparing the syntax subtree corresponding to the current sub-element in the syntax structure vector of the input problem with the syntax subtree corresponding to the current sub-element in the syntax structure vector of the preset problem.
2. The question-answering retrieval method according to claim 1, wherein outputting a retrieval result according to the degree of similarity of the questions comprises:
based on the problem similarity determined for each preset problem, determining the maximum problem similarity in the preset question-answering library, and acquiring a preset problem and answer pair corresponding to the maximum problem similarity;
comparing the maximum problem similarity with the preset threshold;
Outputting corresponding answers of the preset questions and answer pairs when the maximum question similarity is larger than or equal to a preset threshold value, and outputting null values when the maximum question similarity is smaller than the preset threshold value.
3. The question-answering retrieval method according to claim 1, wherein the input question is a medical question, the preset question-answering library includes a plurality of preset medical question and answer pairs, wherein each preset medical question and answer pair includes: a class of medical questions and their corresponding answers.
4. The question-answering retrieval method according to claim 1, wherein comparing a syntax subtree corresponding to a current sub-element in a syntax structure vector of an input question with a syntax subtree corresponding to a current sub-element in a syntax structure vector of a preset question comprises:
Taking a syntax subtree corresponding to a current sub-element in a syntax structure vector of an input problem as a first syntax subtree, and taking a syntax subtree corresponding to the current sub-element in the syntax structure vector of the preset problem as a second syntax subtree;
And comparing the first syntax subtree with the second syntax subtree based on a preset rule to obtain the similarity of the first syntax subtree and the second syntax subtree.
5. The question-answering retrieval method according to claim 4, wherein comparing the first and second syntax subtrees based on a preset rule comprises:
Judging whether the generating formula on the initial node of the first syntax subtree is the same as the generating formula on the initial node of the second syntax subtree;
outputting a preset first comparison result when the generation formula on the initial node of the first syntax subtree is different from the generation formula on the initial node of the second syntax subtree;
And when the generating formula on the initial node of the first syntax subtree and the generating formula on the initial node of the second syntax subtree are the same, judging whether only leaf nodes exist in the offspring of the initial node of the first syntax subtree and the offspring of the initial node of the second syntax subtree;
If only leaf nodes exist in the offspring of the initial node of the first syntax subtree and the offspring of the initial node of the second syntax subtree, outputting a preset second comparison result;
And if the offspring of the initial node of the first syntax subtree and/or the offspring of the initial node of the second syntax subtree comprise non-leaf nodes, calculating the similarity of the first syntax subtree and the second syntax subtree by adopting a preset algorithm.
6. The question-answering retrieval method according to claim 1, wherein calculating the syntactic content similarity of the syntactic content vector to syntactic content vectors of each preset question in a preset question-answering library includes:
Calculating cosine similarity between the syntactic content vector of the input problem and the syntactic content vector of the preset problem;
and taking the cosine similarity as the syntax content similarity.
7. A question-answer retrieval device comprising:
the syntactic structure analysis module is configured to perform syntactic structure analysis on the input problem to obtain a syntactic structure vector of the input problem;
The syntactic content analysis module is configured to process an input problem to obtain a syntactic content vector of the input problem;
The search result generation module is configured to compare the input problem with a preset problem in a preset question-answering library based on the syntactic structure vector and the syntactic content vector to obtain a search result;
and wherein the question-answer retrieving means is configured to perform the question-answer retrieving method according to any one of the preceding claims 1-6.
8. A question and answer retrieval device, wherein the device comprises a processor and a memory containing a set of instructions which, when executed by the processor, cause the question and answer retrieval device to perform a question and answer retrieval method according to any one of the preceding claims 1-6.
9. A computer readable storage medium having stored thereon computer readable instructions which when executed by a computer perform the method of any of the preceding claims 1-6.
CN201910579670.6A 2019-06-28 2019-06-28 Question-answer searching method, question-answer searching device, question-answer searching apparatus and medium Active CN112231450B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910579670.6A CN112231450B (en) 2019-06-28 2019-06-28 Question-answer searching method, question-answer searching device, question-answer searching apparatus and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910579670.6A CN112231450B (en) 2019-06-28 2019-06-28 Question-answer searching method, question-answer searching device, question-answer searching apparatus and medium

Publications (2)

Publication Number Publication Date
CN112231450A CN112231450A (en) 2021-01-15
CN112231450B true CN112231450B (en) 2024-06-11

Family

ID=74110905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910579670.6A Active CN112231450B (en) 2019-06-28 2019-06-28 Question-answer searching method, question-answer searching device, question-answer searching apparatus and medium

Country Status (1)

Country Link
CN (1) CN112231450B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
CN104699695A (en) * 2013-12-05 2015-06-10 中国科学院软件研究所 Relation extraction method based on multi-feature semantic tree kernel and information retrieving method
CN107153639A (en) * 2016-03-04 2017-09-12 北大方正集团有限公司 Intelligent answer method and system
CN107491534A (en) * 2017-08-22 2017-12-19 北京百度网讯科技有限公司 Information processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699695A (en) * 2013-12-05 2015-06-10 中国科学院软件研究所 Relation extraction method based on multi-feature semantic tree kernel and information retrieving method
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
CN107153639A (en) * 2016-03-04 2017-09-12 北大方正集团有限公司 Intelligent answer method and system
CN107491534A (en) * 2017-08-22 2017-12-19 北京百度网讯科技有限公司 Information processing method and device

Also Published As

Publication number Publication date
CN112231450A (en) 2021-01-15

Similar Documents

Publication Publication Date Title
CN108959246B (en) Answer selection method and device based on improved attention mechanism and electronic equipment
CN112528672B (en) Aspect-level emotion analysis method and device based on graph convolution neural network
CN109101620B (en) Similarity calculation method, clustering method, device, storage medium and electronic equipment
US9773053B2 (en) Method and apparatus for processing electronic data
CN111382255B (en) Method, apparatus, device and medium for question-answering processing
CN111563192B (en) Entity alignment method, device, electronic equipment and storage medium
CN111382260A (en) Method, device and storage medium for correcting retrieved text
CN111832312A (en) Text processing method, device, equipment and storage medium
CN109145083B (en) Candidate answer selecting method based on deep learning
CN117973544A (en) Text unit reasoning method device based on semantic distance, storage medium and terminal
CN113505196B (en) Text retrieval method and device based on parts of speech, electronic equipment and storage medium
CN113779190B (en) Event causal relationship identification method, device, electronic equipment and storage medium
CN113590811B (en) Text abstract generation method and device, electronic equipment and storage medium
CN117744785B (en) Space-time knowledge graph intelligent construction method and system based on network acquisition data
CN112231450B (en) Question-answer searching method, question-answer searching device, question-answer searching apparatus and medium
CN112541069A (en) Text matching method, system, terminal and storage medium combined with keywords
CN117131176A (en) Interactive question-answering processing method and device, electronic equipment and storage medium
CN116484829A (en) Method and apparatus for information processing
CN112685574B (en) Method and device for determining hierarchical relationship of domain terms
CN115757844A (en) Medical image retrieval network training method, application method and electronic equipment
CN111813846A (en) Data analysis processing system and data processing method
CN116127053B (en) Entity word disambiguation, knowledge graph generation and knowledge recommendation methods and devices
CN111259126B (en) Similarity calculation method, device, equipment and storage medium based on word characteristics
CN114385902B (en) Content recommendation method, device and storage medium
CN111859928B (en) Feature processing method, device, medium and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant