WO2024055582A1

WO2024055582A1 - Optimization method and apparatus for question-and-answer knowledge base

Info

Publication number: WO2024055582A1
Application number: PCT/CN2023/088448
Authority: WO
Inventors: 李鹏; 徐超; 熊超; 包勇军; 颜伟鹏
Original assignee: 北京沃东天骏信息技术有限公司; 北京京东世纪贸易有限公司
Priority date: 2022-09-13
Filing date: 2023-04-14
Publication date: 2024-03-21
Also published as: CN117743519A

Abstract

Provided are an optimization method and apparatus for a question-and-answer knowledge base, and an electronic device, a readable storage medium, a computer program product and a computer program. The optimization method for a question-and-answer knowledge base comprises: selecting a target question sentence from question sets included in a question-and-answer knowledge base, and acquiring, according to the question sets, a target obfuscated question sentence corresponding to the target question sentence; and determining an associated knowledge point corresponding to the target obfuscated question sentence, and classifying the target obfuscated question sentence into a question set which corresponds to the associated knowledge point.

Description

Optimization method and device for question and answer knowledge base

Cross-references to related applications

This application claims priority from Chinese Patent Application No. 2022111100385 filed in China on September 13, 2022, the entire content of which is incorporated herein by reference.

Technical field

The present disclosure relates to the field of natural language processing, and specifically relates to an optimization method for a question and answer knowledge base and its device, electronic equipment, readable storage media, computer program products and computer programs.

Background technique

At present, the question and answer knowledge base is mostly expanded by mining similar question sentences that are synonymous with the standard question sentence but have different words under a knowledge point. The standard question sentence and its similar question sentences that are synonymous but have different words correspond to the same knowledge point. By mining similar question sentences, Question sentences enable the Frequently-asked Questions (FAQ) system to respond to questions raised by users based on the Q&A knowledge base by combining the questions raised by users with question sentences under each knowledge point (such as standard Questions and similar questions that are synonymous with standard questions but have different words) are used for similarity matching to accurately match the knowledge points corresponding to the question sentences raised by the user, so that FQA is not affected by synonyms. Confused question sentences mean that the question sentences raised by the user are relatively similar to the question sentences under multiple knowledge points in the knowledge base. At this time, it will be difficult to correctly match the corresponding knowledge points based on the current question and answer knowledge base, making it difficult to correctly match the question sentences based on the question and answer knowledge base. The response accuracy is low.

Contents of the invention

The present disclosure aims to solve one of the technical problems in the related art, at least to a certain extent.

To this end, the first purpose of the present disclosure is to propose an optimization method for a question and answer knowledge base.

The second purpose of the present disclosure is to provide an optimization device for a question and answer knowledge base.

The third object of the present disclosure is to provide an electronic device.

A fourth object of the present disclosure is to provide a non-transitory computer-readable storage medium.

A fifth object of the present disclosure is to provide a computer program product.

A sixth object of the present disclosure is to provide a computer program.

In order to achieve the above purpose, the first embodiment of the present disclosure proposes an optimization method for a question and answer knowledge base, including: determining a question and answer knowledge base, which includes knowledge points and question sets corresponding to the knowledge points; from the Select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target question sentence corresponding to the target question sentence according to the question set. Confuse the question sentence; determine the relevant knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the relevant knowledge point.

This disclosure selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the associated knowledge points corresponding to the target confusion question sentence, and attributes the target confusion question sentence to Questions corresponding to related knowledge points are concentrated. By mining the confusing question sentences of each knowledge point and attributing the confused question sentences to the corresponding related knowledge points, the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base for the question sentences that may be raised by the user is enhanced. Based on the question and answer knowledge, When the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.

In order to achieve the above purpose, the second embodiment of the present disclosure proposes an optimization device for a question and answer knowledge base, including: a first determination module for determining a question and answer knowledge base, where the question and answer knowledge base includes knowledge points and corresponding knowledge points. The question set; the acquisition module is used to select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target confusion question sentence corresponding to the target question sentence according to the question set; secondly determine the target Confusing the associated knowledge points corresponding to the question sentences, and attributing the target confusing question sentences to the question set corresponding to the associated knowledge points.

To achieve the above object, a third embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be used by the Instructions executed by at least one processor, the instructions being executed by the at least one processor, to implement the optimization method of the question and answer knowledge base as described in any embodiment of the first aspect of the present disclosure.

To achieve the above object, the fourth embodiment of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to implement any of the embodiments of the first aspect of the present disclosure. Optimization method of question and answer knowledge base.

To achieve the above object, the fifth embodiment of the present disclosure provides a computer program product, including a computer program, which when executed by a processor is used to implement the method described in any embodiment of the first aspect of the present disclosure. Optimization method for question and answer knowledge base.

To achieve the above object, the sixth embodiment of the present disclosure provides a computer program, which includes computer program code. When the computer program code is run on a computer, the computer executes any of the embodiments of the first aspect of the present disclosure. the method described.

Description of drawings

Figure 1 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by an embodiment of the present disclosure;

Figure 2 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by another embodiment of the present disclosure;

Figure 3 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by another embodiment of the present disclosure;

Figure 4 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by another embodiment of the present disclosure;

Figure 5 is a schematic flowchart of determining whether a statement is legal in the optimization method of the question and answer knowledge base provided by an embodiment of the present disclosure;

Figure 6 is a block diagram of an optimization device for a question and answer knowledge base proposed by the present disclosure;

Figure 7 is a block diagram of an electronic device provided by the present disclosure.

Detailed ways

Embodiments of the present disclosure are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain the present disclosure and are not to be construed as limitations of the present disclosure.

Figure 1 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by an embodiment of the present disclosure. The optimization method of a question and answer knowledge base according to an embodiment of the present disclosure can be executed by an optimization device for a question and answer knowledge base provided by an embodiment of the present disclosure. The question and answer knowledge base The optimization device of the library can be set in electronic equipment such as terminals and servers. As shown in Figure 1, the optimization method of the question and answer knowledge base according to the embodiment of the present disclosure includes steps S101 to S103.

S101. Determine the question and answer knowledge base. The question and answer knowledge base includes knowledge points and question sets corresponding to the knowledge points.

In the embodiment of the present disclosure, the question and answer knowledge base to be optimized is determined. The knowledge base includes multiple knowledge points and question sets corresponding to the knowledge points. The question set may include standard question sentences and similar question sentences corresponding to the knowledge points, and may also include Answers to standard and similar questions. As shown in Table 1:

Table 1 Example table of question and answer knowledge base

S102: Select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target confusion question sentence corresponding to the target question sentence according to the question set.

In the embodiment of the present disclosure, any question sentence in the question and answer knowledge base can be used as a target question sentence, that is, any standard question sentence or any similar question sentence under any knowledge point can be used as a target question sentence, so as to The target question sentence executes the method of the present disclosure to obtain the target confusion question sentence corresponding to the target question sentence. Among them, a confusing question sentence can be understood as a question sentence that is similar to question sentences under multiple knowledge points, and it is difficult to clarify its corresponding knowledge point. For example, "insufficient resources" can be regarded as a confusing question sentence because "resources" "Insufficient" has a high degree of similarity with "Insufficient resources, how to view existing resources" under knowledge point 1 and "Insufficient resources, how to apply" under knowledge point 2 have a high degree of similarity. When the confusion problem is not expanded under knowledge point In the case of sentences, it is difficult for the FQA system to accurately determine the corresponding knowledge points, resulting in inaccurate responses and difficulty in meeting user needs.

S103: Determine the relevant knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the relevant knowledge point.

In the embodiments of the present disclosure, the actual application scenarios or business content can be analyzed to determine the knowledge points associated with the target confusion question. For example, based on the historical question sentences raised by users, the analysis shows that when the user raises the target confusion question, most of the questions are about confusion. Which knowledge point to query, manually determine the associated knowledge points of the target confusion problem, and attribute it to the question set under the associated knowledge point. For example, the target confusion question sentence is stored in the question set as a similar question sentence under the associated knowledge point. .

Therefore, when the question sentence raised by the user is the target confusion question sentence or is highly similar to the target confusion question sentence, the corresponding knowledge point can be accurately matched.

For example, for the confusion question "insufficient resources" raised by users, the similarity of "insufficient resources" calculated by the FAQ system is basically close to the two knowledge points, and the similarity of knowledge point 2 is higher, so knowledge point 2 is used. responded. However, in actual business scenarios, most users who raise this confusion question query knowledge point 1. It is better to use the first knowledge point to reply to the user's question "insufficient resources". Through the disclosed question and answer knowledge base optimization method, confusing question sentences of each question sentence can be mined, and the confusing question sentences are manually assigned to its associated knowledge points based on business knowledge to enhance the response effect.

The embodiment of the present disclosure proposes an optimization method for the question and answer knowledge base, which selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the target confusion question sentence corresponding to associated knowledge points, and attribute the target confusing question sentences to the question set corresponding to the associated knowledge points. By mining the confusing question sentences of each knowledge point and attributing the confused question sentences to the corresponding related knowledge points, the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base for the question sentences that may be raised by the user is enhanced. Based on the question and answer knowledge, When the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.

On the basis of the above embodiment, as shown in Figure 2, in the above step S101, "obtaining the target confusion question sentence corresponding to the target question sentence according to the question set" may include steps S201 to S204.

S201: Determine the reference question sentence for generating the target confusion question sentence.

In the embodiment of the present disclosure, the reference question sentence for generating the target confusion question sentence corresponding to the target question sentence is determined from the question sentences under the question and answer knowledge base. The reference question sentence may be a question sentence other than the target question sentence in the question and answer knowledge base, or a question sentence in other question sets in the question and answer knowledge base except the question set in which the target question sentence is located.

S202: Determine the similarity between the target question sentence and the reference question sentence.

In the embodiment of the present disclosure, the similarity between the target question sentence and any reference question sentence is calculated respectively.

Among them, the cosine value between the word vector of the target question sentence and the word vector of any reference question sentence can be calculated, and the cosine value of the word vector represents the similarity between the question sentences. In some embodiments, the similarity between two question sentences can also be calculated based on a neural network.

S203: Determine a set of similar question sentences of the target question sentence based on the similarity.

In the embodiment of the present disclosure, based on the similarity between the target question sentence and the reference question sentence, it can be determined whether the reference question sentence can be used as a similar question sentence of the target question sentence, thereby obtaining a set of similar question sentences of the target question sentence.

In some embodiments, whether the reference question sentences are similar question sentences can be screened by setting a similarity threshold.

In some embodiments, the following simplification can also be done to the set of similar question sentences: group the similar question sentences in the set of similar question sentences according to knowledge points, and select one from multiple similar question sentences that can belong to the same knowledge point. Representative similar question sentences are used to simplify the set of similar question sentences and reduce the amount of calculation.

S204: Obtain the target confusion question sentence based on the target question sentence and the set of similar question sentences.

In the embodiment of the present disclosure, based on the target question sentence and a set of similar question sentences corresponding to the target question sentence, the target confusing question sentence corresponding to the target question sentence is mined. There can be one or more target confusion question sentences, and the disclosure does not limit the specific number.

On the basis of the above embodiment, as shown in Figure 3, the above step S204 of "obtaining the target confusion question sentence based on the target question sentence and the set of similar question sentences" may include steps S301 to S303.

S301: Combine the target question sentence and any similar question sentence in the set of similar question sentences to form a question sentence pair.

S302: Based on the respective word segmentation sequences of the two question sentences in the question sentence pair, obtain the common word segmentation sequence corresponding to the question sentences.

In the embodiment of the present disclosure, word segmentation processing is performed on the target question sentence and similar question sentences in the question sentence pair, and the respective word segmentation sequences of the two question sentences are obtained, as shown in Table 2:

Table 2: Word segmentation results of question sentences

Based on the word segmentation sequence of the target question sentence and the word segmentation sequence of the candidate question sentence, obtain the synonymous word segmentations in the two word segmentation sequences (that is, find synonyms or synonymous phrases in the two word segmentation sequences) and classify multiple word segments that are synonymous with each other. Unify it into a unified target participle representation. For example, "hint" in the target question sentence and "display" in similar question sentences in Table 2 are synonyms for each other. These two participles are represented by a unified target participle. This goal The participle can be either "prompt" or "display". Use the target word segmentation to replace the synonymous word segmentation in the two word segmentation sequences to obtain two replaced word segmentation sequences, thereby completing the normalization of the two word segmentation sequences and achieving word alignment processing between the two word segmentation sequences, as shown in Table 3 Show:

Table 3 A representation of the normalized results of the word segmentation sequence corresponding to the question sentence

Among them, finding synonyms or synonymous phrases can be mined using a predefined synonym dictionary or a neural network model. This disclosure is not limited.

Among them, when using the neural network model calculation, it can be achieved through the following process:

Step 1: Use BERT to calculate the semantic vector of each word segment in each question sentence or encode each word segment with BERT to obtain the word vector.

Step 2: Calculate the similarity between the participles of the target question sentence and the participles of similar question sentences in the question sentence pair, and determine the word participles whose similarity is greater than a certain threshold as synonymous participles. Assume that question sentence 1 contains M participles and question sentence 2 contains N participles. The obtained synonymous participles can be obtained using the following formula:

represent the participles in question sentence 1 and question sentence 2 respectively, the superscript represents the question sentence number, and the subscript represents the participle number. Represents the set of synonymous participles for the i-th participle in question sentence 1, Represents the similarity between word segments. You can use word vectors to calculate the cosine value to represent the similarity, and δ represents the threshold.

Compare the two word segmentation sequences that have been normalized by synonymous word segmentation, select the common word segmentation in the two word segmentation sequences, and obtain the common word segmentation sequence of the question sentence pair. The common participle sequence can be the maximum common participle sequence. For example, the maximum common participle sequence of the question sentence pair in Table 3 is: prompt, resource, deficiency, how. The common participle sequence can also be a sequence including participles common to two question sentences and participles unique to each question sentence. The formation method of the common participle sequence can be set as needed.

As a feasible implementation, the word segmentation processing of the question sentences can also be performed before this step, and all the question sentences in the question and answer knowledge base are directly segmented to obtain the word segmentation sequences corresponding to each question sentence.

S303. Generate a target confusion question sentence based on the common word segmentation sequence.

In some embodiments, the target confusing question sentence of the target question sentence is obtained according to the common word segmentation sequence corresponding to the question sentence.

On the basis of the above embodiment, as shown in Figure 4, the above step S303 of "generating a target confusion question sentence based on a common word segmentation sequence" may include steps S401 to S403.

S401. Obtain candidate confusing question sentences of the target question sentence based on the common word segmentation sequence.

In the embodiment of the present disclosure, as shown in Figure 5, it is determined whether the sentence corresponding to the common word segmentation sequence is legal, wherein the sentence corresponding to the common word segmentation sequence is a sentence spliced into each segmentation in the common word segmentation sequence, for example, the above-mentioned public word segmentation sequence "prompt" , resources, insufficient, "how" corresponds to the statement indicating insufficient resources.

In response to the statement corresponding to the public word segmentation sequence being illegal, the public word segmentation sequence is edited to generate a new public word segmentation sequence, and for the new public word segmentation sequence, the steps of determining whether the statement corresponding to the public word segmentation sequence are legal are performed; in response to the public word segmentation sequence The corresponding statement is legal, and the legal statement is determined as a candidate confusing question sentence.

Among them, you can also add a preset condition for exiting the loop before editing the common word segmentation sequence. The preset condition can be that the total number of edits of a common word segmentation sequence is greater than the preset number of times, or the number of word segmentations of the common word segmentation sequence is less than the preset number. Set the number of participles. In response to the common word segmentation sequence meeting the preset condition, the common word segmentation sequence is discarded; in response to the common word segmentation sequence not meeting the preset condition, the step of editing the common word segmentation sequence is performed.

Among them, the editing process of the public word segmentation sequence can be implemented by deleting the first word segmentation or the last word segmentation in the public word segmentation sequence, so that the public word segmentation sequence after the word segmentation is deleted is used as a new public word segmentation sequence. In addition, the public word segmentation sequence can be edited through the model.

As a feasible implementation method, whether a statement is legal can be determined based on the completeness of the statement and/or the probability of the statement appearing in the question and answer scenario.

As another feasible implementation method, you can also judge whether a statement is legal by building a model and training the model. For example, judging whether a statement is legal based on a trained target classification model, where the target classification model can be based on the user log. The question sentences are used as positive samples, and the positive samples after character deletion are used as negative samples, which are obtained through training.

For example: The sentence corresponding to the common word segmentation sequence [prompt, resource, insufficient, how] is judged, and the output is illegal because "prompt resource insufficient, how" is a incomplete sentence. Assume that the preset exit condition is: the number of word segments in the word segmentation sequence is less than 3. Determine whether the common word segmentation sequence meets the exit condition. If the output is not satisfied, the last word segmentation will be deleted and the output will be [prompt, resource, insufficient] as a Use the new public word segmentation sequence to determine whether the sentence corresponding to the new public word segmentation sequence is legal, output "prompt for insufficient resources" as a legal sentence, and determine the legal sentence as a candidate confusion question sentence.

S402. Determine the confusion score of the candidate confusing question sentence.

In the embodiment of the present disclosure, the confusion score of each candidate confusing question sentence is calculated. The calculation method of the confusion score can be the following process:

Calculate the similarity between the candidate confusing question sentence and any question sentence in the question and answer knowledge base; filter out a preset number of higher ranking similarities from these similarities, for example, in the embodiment of the present disclosure, according to the similarity Select two similarities from high to low, and determine the confusion score of the candidate confusing question sentence based on these two similarities. The calculation formula is as follows:

score(C)=-|sim(C,K ₁ )-sim(C,K ₂ )|

Among them, C is the candidate confusing question sentence, score(C) is the confusion score, sim(C,K ₁ ), sim(C,K ₂ ) are the two highest ranking similarities in the retrieval knowledge base, K ₁ and K ₂ These are the two question sentences in the question and answer knowledge base that are most similar to the candidate confusion question sentences. Among them, the greater the difference in similarity, the lower the degree of confusion; the smaller the difference in similarity, the higher the degree of confusion.

S403: Determine the target confusing question sentence from the candidate confusing question sentences based on the confusion score.

In the embodiment of the present disclosure, candidate confusing question sentences corresponding to the target question sentence are sorted according to the confusion score, and a preset number of target confusing question sentences can be selected from high to low as needed.

Figure 6 is a block diagram of an optimization device for a question and answer knowledge base proposed by the present disclosure. As shown in Figure 6, the optimization device 600 for the question and answer knowledge base includes: a first determination module 601, an acquisition module 602 and a second determination module 603 .

The first determination module 601 is used to determine the question and answer knowledge base. The question and answer knowledge base includes knowledge points and question sets corresponding to the knowledge points.

The acquisition module 602 is used to select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target confusion question sentence corresponding to the target question sentence according to the question set.

The second determination module 603 is used to determine the relevant knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the relevant knowledge point.

According to an embodiment of the present disclosure, the acquisition module 602 is further configured to: determine the reference question sentence for generating the target confusion question sentence; determine the similarity between the target question sentence and the reference question sentence; and determine similar questions of the target question sentence based on the similarity. Sentence set; obtain the target confusing question sentence based on the target question sentence and the similar question sentence set.

According to an embodiment of the present disclosure, the acquisition module 602 is also used to: perform word segmentation processing on the question sentences under the question and answer knowledge base, and obtain the word segmentation sequence corresponding to the question sentence.

According to an embodiment of the present disclosure, the acquisition module 602 is further configured to: form a question sentence pair with the target question sentence and any similar question sentence in the set of similar question sentences; based on the word segmentation sequences of the two question sentences in the question sentence pair. , obtain the public word segmentation sequence corresponding to the question sentence; generate the target confusion question sentence based on the public word segmentation sequence.

According to an embodiment of the present disclosure, the acquisition module 602 can further be used to: obtain synonymous participles in two word segmentation sequences; normalize the synonymous participles to obtain the target participles of the synonymous participles; use the target participles to replace the two participles. For synonymous word segmentation in the word segmentation sequence, two replacement word segmentation sequences are obtained; the two replacement word segmentation sequences are compared to generate a common word segmentation sequence.

According to an embodiment of the present disclosure, the acquisition module 602 is further configured to: obtain candidate confusing question sentences of the target question sentence based on the common word segmentation sequence; determine the confusion score of the candidate confusing question sentence; and obtain the candidate confusing question sentence from the candidate confusing question sentence according to the confusion score. Determine the target confusion question sentence.

According to an embodiment of the present disclosure, the acquisition module 602 can further be used to determine whether the sentence corresponding to the common word segmentation sequence is legal. In response to the statement corresponding to the public word segmentation sequence being illegal, the public word segmentation sequence is edited to generate a new public word segmentation sequence, and for the new public word segmentation sequence, the step of determining whether the statement corresponding to the public word segmentation sequence is legal is performed. In response to the statement corresponding to the common word segmentation sequence being legal, the legal statement is determined as a candidate confusion question sentence.

According to an embodiment of the present disclosure, the acquisition module 602 is also configured to: for the new public word segmentation sequence, before executing the step of judging whether the statement corresponding to the public word segmentation sequence is legal, in response to the new public word segmentation sequence meeting the preset conditions, discard the Public word segmentation sequence; in response to the new public word segmentation sequence not meeting the preset conditions, perform the step of judging whether the statement corresponding to the public word segmentation sequence is legal for the new public word segmentation sequence.

According to one embodiment of the present disclosure, the preset condition is that the number of edits of the common word segmentation sequence is greater than the preset number of times, or the number of word segments of the new public word segmentation sequence is less than the preset number of word segmentations.

According to an embodiment of the present disclosure, the acquisition module 602 may be further configured to determine whether a statement is legal based on the completeness of the statement and/or the probability of the statement appearing in the question and answer scenario.

According to an embodiment of the present disclosure, the acquisition module 602 can further be used to: determine whether a statement is legal based on a trained target classification model, where the target classification model is based on question sentences in user logs as positive samples, and will perform character The pruned positive samples are used as negative samples and are obtained through training.

According to an embodiment of the present disclosure, the acquisition module 602 is further configured to: delete the first word segmentation or the last word segmentation in the public word segmentation sequence, and use the public word segmentation sequence after the word segmentation is deleted as a new public word segmentation sequence.

According to an embodiment of the present disclosure, the acquisition module 602 may be further configured to: determine the similarity between the candidate confusing question sentence and any question sentence in the question and answer knowledge base; determine the confusion score of the candidate confusing question sentence based on the similarity. .

It should be noted that the above explanation of the embodiment of the optimization method for the question and answer knowledge base is also applicable to the optimization device of the question and answer knowledge base in this embodiment, and the specific process will not be described again here.

The embodiment of the present disclosure proposes an optimization device for the question and answer knowledge base, which selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the target confusion question sentence corresponding to associated knowledge points, and attribute the target confusing question sentences to the question set corresponding to the associated knowledge points. By mining the confusing question sentences of each knowledge point and assigning the confused question sentences to the corresponding associated knowledge points, the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base that may be raised by the user is enhanced. Based on the question and answer knowledge, When the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.

In order to implement the above embodiments, an embodiment of the present disclosure also proposes an electronic device 700. As shown in Figure 7, the electronic device 700 includes: a processor 701 and a memory 702 communicatively connected to the processor. The memory 702 stores information that can be used by at least one The instructions executed by the processor are executed by at least one processor 701 to implement the optimization method of the question and answer knowledge base as shown in any embodiment of the present disclosure.

In order to implement the above embodiments, embodiments of the present disclosure also provide a non-transient computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to implement the question and answer knowledge base as shown in any embodiment of the present disclosure. Optimization.

In order to implement the above embodiments, embodiments of the present disclosure also provide a computer program product, which includes a computer program. When executed by a processor, the computer program implements the optimization method of the question and answer knowledge base as shown in any embodiment of the present disclosure.

In order to implement the above embodiments, an embodiment of the present disclosure also proposes a computer program. The computer program includes a computer program code. When the computer program code is run on a computer, it causes the computer to execute the question and answer knowledge shown in any embodiment of the present disclosure. Library optimization methods.

It should be noted that the foregoing explanations of the method and device embodiments also apply to the electronic equipment, computer-readable storage media, computer program products and computer programs of the above-mentioned embodiments, and will not be described again here.

In the description of the present disclosure, it should be understood that the terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", " "Back", "Left", "Right", "Vertical", "Horizontal", "Top", "Bottom", "Inside", "Outside", "Clockwise", "Counterclockwise", "Axis", The orientation or positional relationship indicated by "radial direction", "circumferential direction", etc. are based on the orientation or positional relationship shown in the drawings, and are only for convenience of describing the present disclosure and simplifying the description, and do not indicate or imply the device or element to which they are referred. Must have a specific orientation, be constructed and operate in a specific orientation and therefore should not be construed as limiting the disclosure.

In addition, the terms “first” and “second” are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present disclosure, "plurality" means two or more than two, unless otherwise expressly and specifically limited.

In the description of this specification, reference to the terms "one embodiment,""someembodiments,""anexample,""specificexamples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials, or features are included in at least one embodiment or example of the present disclosure. In this specification, the schematic expressions of the above terms are not necessarily The same embodiment or example is intended. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.

Although the embodiments of the present disclosure have been shown and described above, it can be understood that the above-mentioned embodiments are illustrative and should not be construed as limitations of the present disclosure. Those of ordinary skill in the art can make modifications to the above-mentioned embodiments within the scope of the present disclosure. The embodiments are subject to changes, modifications, substitutions and variations.

All embodiments of the present disclosure can be executed alone or in combination with other embodiments, which are considered to be within the scope of protection claimed by the present disclosure.

Claims

An optimization method for question and answer knowledge base, which is characterized by including:

Determine a question and answer knowledge base, which includes knowledge points and question sets corresponding to the knowledge points;

Select a target question sentence from the question set included in the question and answer knowledge base, and obtain the target confusion question sentence corresponding to the target question sentence according to the question set;

Determine the associated knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the associated knowledge point.
The optimization method according to claim 1, characterized in that, according to the question set, obtaining the target confusion question sentence corresponding to the target question sentence includes:

Determine the reference question sentence that generates the target confusion question sentence;

Determine the similarity between the target question sentence and the reference question sentence;

According to the similarity, determine a set of similar question sentences of the target question sentence;

The target confusing question sentence is obtained according to the target question sentence and the set of similar question sentences.
The optimization method according to claim 2, characterized in that, before obtaining the target confusion question sentence according to the target question sentence and the set of similar question sentences, it further includes:

Perform word segmentation processing on the question sentences under the question and answer knowledge base to obtain the word segmentation sequence corresponding to the question sentence.
The optimization method according to claim 2 or 3, characterized in that, obtaining the target confusing question sentence based on the target question sentence and the set of similar question sentences includes:

Form a question sentence pair by forming the target question sentence and any similar question sentence in the set of similar question sentences;

Based on the respective word segmentation sequences of the two question sentences in the question sentence pair, obtain the common word segmentation sequence corresponding to the question sentences;

The target confusion question sentence is generated according to the common word segmentation sequence.
The optimization method according to claim 4, characterized in that, based on the respective word segmentation sequences of two question sentences in the question sentence pair, obtaining the common word segmentation sequence corresponding to the question sentence includes:

Obtain synonymous participles in the two word participle sequences;

Normalize the synonymous participles to obtain the target participles of the synonymous participles;

Use the target word segmentation to replace the synonymous word segmentation in the two word segmentation sequences to obtain two replaced word segmentation sequences;

The two replaced word segmentation sequences are compared to generate the common word segmentation sequence.
The optimization method according to claim 4 or 5, characterized in that generating the target confusion question sentence according to the common word segmentation sequence includes:

Obtain candidate confusing question sentences of the target question sentence based on the common word segmentation sequence;

Determine the confusion score of the candidate confusing question sentence;

According to the confusion score, the target confusion question sentence is determined from the candidate confusion question sentences.
The optimization method according to claim 6, characterized in that said obtaining the candidate confusing question sentence of the target question sentence based on the common word segmentation sequence includes:

Determine whether the statement corresponding to the public word segmentation sequence is legal;

In response to the statement corresponding to the public word segmentation sequence being illegal, the public word segmentation sequence is edited to generate a new public word segmentation sequence, and for the new public word segmentation sequence, the judgment is performed that the public word segmentation sequence corresponds to Steps to determine whether the statement is legal;

In response to the statement corresponding to the common word segmentation sequence being legal, the legal statement is determined as the candidate confusion question sentence.
The optimization method according to claim 7, characterized in that before editing the common word segmentation sequence, it further includes:

In response to the common word segmentation sequence meeting the preset condition, discard the common word segmentation sequence;

In response to the common word segmentation sequence not satisfying the preset condition, the step of editing the common word segmentation sequence is performed.
The optimization method according to claim 8, wherein the preset condition is that the total number of edits of the common word segmentation sequence is greater than the preset number of times, or the number of word segments of the common word segmentation sequence is less than the preset number of word segmentations.
The optimization method according to any one of claims 7 to 9, characterized in that determining whether the statement corresponding to the common word segmentation sequence is legal includes:

Determine whether the statement is legal based on the completeness of the statement and/or the probability of the statement appearing in the question and answer scenario.
The optimization method according to any one of claims 7 to 9, characterized in that determining whether the statement corresponding to the common word segmentation sequence is legal includes:

Based on the trained target classification model, determine whether the statement is legal, wherein the target classification model is based on the question sentence in the user log as a positive sample, and the positive sample after character deletion is used as a negative sample, training owned.
The optimization method according to any one of claims 7 to 11, characterized in that, editing the common word segmentation sequence and generating a new common word segmentation sequence includes:

Delete the first word segmentation or the last word segmentation in the common word segmentation sequence, so that the public word segmentation sequence after the word segmentation is deleted is used as the new common word segmentation sequence.
The optimization method according to any one of claims 6 to 12, wherein determining the confusion score of the candidate confusing question sentence includes:

Determine the similarity between the candidate confusing question sentence and any question sentence in the question and answer knowledge base;

The confusion degree score of the candidate confusing question sentence is determined according to the similarity.
An optimization device for a question and answer knowledge base, which is characterized by including:

A first determination module is used to determine a question-answering knowledge base, wherein the question-answering knowledge base includes knowledge points and question sets corresponding to the knowledge points;

An acquisition module, configured to select a target question sentence from a question set included in the question and answer knowledge base, and obtain a target confusion question sentence corresponding to the target question sentence according to the question set;

The second determination module is used to determine the associated knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the associated knowledge point.
The optimization device according to claim 14, characterized in that the acquisition module is further used to:

Determine the reference question sentence that generates the target confusion question sentence;

Determine the similarity between the target question sentence and the reference question sentence;

According to the similarity, determine a set of similar question sentences of the target question sentence;

The target confusing question sentence is obtained according to the target question sentence and the set of similar question sentences.
The optimization device according to claim 14 or 15, characterized in that the acquisition module is also used to:

Perform word segmentation processing on the question sentences under the question and answer knowledge base to obtain the word segmentation sequence corresponding to the question sentence.
The optimization device according to claim 15 or 16, characterized in that the acquisition module is further used to:

Form a question sentence pair by forming the target question sentence and any similar question sentence in the set of similar question sentences;

Based on the respective word segmentation sequences of the two question sentences in the question sentence pair, obtain the common word segmentation sequence corresponding to the question sentences;

The target confusion question sentence is generated according to the common word segmentation sequence.
The optimization device according to claim 17, characterized in that the acquisition module is further used to:

Obtain candidate confusing question sentences of the target question sentence based on the common word segmentation sequence;

Determine the confusion score of the candidate confusing question sentence;

According to the confusion score, the target confusion question sentence is determined from the candidate confusion question sentences.
An electronic device including:

at least one processor; and

a memory communicatively connected to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1 to 13 Methods.
A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to any one of claims 1 to 13.
A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 13.
A computer program, characterized in that the computer program includes computer program code, and when the computer program code is run on a computer, it causes the computer to perform the method according to any one of claims 1 to 13.