[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2024055582A1 - Optimization method and apparatus for question-and-answer knowledge base - Google Patents

Optimization method and apparatus for question-and-answer knowledge base Download PDF

Info

Publication number
WO2024055582A1
WO2024055582A1 PCT/CN2023/088448 CN2023088448W WO2024055582A1 WO 2024055582 A1 WO2024055582 A1 WO 2024055582A1 CN 2023088448 W CN2023088448 W CN 2023088448W WO 2024055582 A1 WO2024055582 A1 WO 2024055582A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
word segmentation
question sentence
target
sentence
Prior art date
Application number
PCT/CN2023/088448
Other languages
French (fr)
Chinese (zh)
Inventor
李鹏
徐超
熊超
包勇军
颜伟鹏
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2024055582A1 publication Critical patent/WO2024055582A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present disclosure relates to the field of natural language processing, and specifically relates to an optimization method for a question and answer knowledge base and its device, electronic equipment, readable storage media, computer program products and computer programs.
  • the question and answer knowledge base is mostly expanded by mining similar question sentences that are synonymous with the standard question sentence but have different words under a knowledge point.
  • the standard question sentence and its similar question sentences that are synonymous but have different words correspond to the same knowledge point.
  • Question sentences enable the Frequently-asked Questions (FAQ) system to respond to questions raised by users based on the Q&A knowledge base by combining the questions raised by users with question sentences under each knowledge point (such as standard Questions and similar questions that are synonymous with standard questions but have different words) are used for similarity matching to accurately match the knowledge points corresponding to the question sentences raised by the user, so that FQA is not affected by synonyms.
  • Confused question sentences mean that the question sentences raised by the user are relatively similar to the question sentences under multiple knowledge points in the knowledge base. At this time, it will be difficult to correctly match the corresponding knowledge points based on the current question and answer knowledge base, making it difficult to correctly match the question sentences based on the question and answer knowledge base. The response accuracy is low.
  • the present disclosure aims to solve one of the technical problems in the related art, at least to a certain extent.
  • the first purpose of the present disclosure is to propose an optimization method for a question and answer knowledge base.
  • the second purpose of the present disclosure is to provide an optimization device for a question and answer knowledge base.
  • the third object of the present disclosure is to provide an electronic device.
  • a fourth object of the present disclosure is to provide a non-transitory computer-readable storage medium.
  • a fifth object of the present disclosure is to provide a computer program product.
  • a sixth object of the present disclosure is to provide a computer program.
  • the first embodiment of the present disclosure proposes an optimization method for a question and answer knowledge base, including: determining a question and answer knowledge base, which includes knowledge points and question sets corresponding to the knowledge points; from the Select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target question sentence corresponding to the target question sentence according to the question set. Confuse the question sentence; determine the relevant knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the relevant knowledge point.
  • This disclosure selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the associated knowledge points corresponding to the target confusion question sentence, and attributes the target confusion question sentence to Questions corresponding to related knowledge points are concentrated.
  • the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base for the question sentences that may be raised by the user is enhanced.
  • the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.
  • the second embodiment of the present disclosure proposes an optimization device for a question and answer knowledge base, including: a first determination module for determining a question and answer knowledge base, where the question and answer knowledge base includes knowledge points and corresponding knowledge points.
  • the question set the acquisition module is used to select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target confusion question sentence corresponding to the target question sentence according to the question set; secondly determine the target Confusing the associated knowledge points corresponding to the question sentences, and attributing the target confusing question sentences to the question set corresponding to the associated knowledge points.
  • This disclosure selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the associated knowledge points corresponding to the target confusion question sentence, and attributes the target confusion question sentence to Questions corresponding to related knowledge points are concentrated.
  • the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base for the question sentences that may be raised by the user is enhanced.
  • the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.
  • a third embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be used by the Instructions executed by at least one processor, the instructions being executed by the at least one processor, to implement the optimization method of the question and answer knowledge base as described in any embodiment of the first aspect of the present disclosure.
  • the fourth embodiment of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to implement any of the embodiments of the first aspect of the present disclosure. Optimization method of question and answer knowledge base.
  • the fifth embodiment of the present disclosure provides a computer program product, including a computer program, which when executed by a processor is used to implement the method described in any embodiment of the first aspect of the present disclosure. Optimization method for question and answer knowledge base.
  • the sixth embodiment of the present disclosure provides a computer program, which includes computer program code.
  • the computer program code When the computer program code is run on a computer, the computer executes any of the embodiments of the first aspect of the present disclosure. the method described.
  • Figure 1 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by an embodiment of the present disclosure
  • Figure 2 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by another embodiment of the present disclosure
  • Figure 3 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by another embodiment of the present disclosure
  • Figure 4 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by another embodiment of the present disclosure
  • Figure 5 is a schematic flowchart of determining whether a statement is legal in the optimization method of the question and answer knowledge base provided by an embodiment of the present disclosure
  • Figure 6 is a block diagram of an optimization device for a question and answer knowledge base proposed by the present disclosure
  • Figure 7 is a block diagram of an electronic device provided by the present disclosure.
  • Figure 1 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by an embodiment of the present disclosure.
  • the optimization method of a question and answer knowledge base according to an embodiment of the present disclosure can be executed by an optimization device for a question and answer knowledge base provided by an embodiment of the present disclosure.
  • the question and answer knowledge base The optimization device of the library can be set in electronic equipment such as terminals and servers.
  • the optimization method of the question and answer knowledge base according to the embodiment of the present disclosure includes steps S101 to S103.
  • the question and answer knowledge base includes knowledge points and question sets corresponding to the knowledge points.
  • the question and answer knowledge base to be optimized is determined.
  • the knowledge base includes multiple knowledge points and question sets corresponding to the knowledge points.
  • the question set may include standard question sentences and similar question sentences corresponding to the knowledge points, and may also include Answers to standard and similar questions. As shown in Table 1:
  • Table 1 Example table of question and answer knowledge base
  • S102 Select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target confusion question sentence corresponding to the target question sentence according to the question set.
  • any question sentence in the question and answer knowledge base can be used as a target question sentence, that is, any standard question sentence or any similar question sentence under any knowledge point can be used as a target question sentence, so as to
  • the target question sentence executes the method of the present disclosure to obtain the target confusion question sentence corresponding to the target question sentence.
  • a confusing question sentence can be understood as a question sentence that is similar to question sentences under multiple knowledge points, and it is difficult to clarify its corresponding knowledge point.
  • insufficient resources can be regarded as a confusing question sentence because "resources” "Insufficient” has a high degree of similarity with "Insufficient resources, how to view existing resources” under knowledge point 1 and "Insufficient resources, how to apply” under knowledge point 2 have a high degree of similarity.
  • the confusion problem is not expanded under knowledge point In the case of sentences, it is difficult for the FQA system to accurately determine the corresponding knowledge points, resulting in inaccurate responses and difficulty in meeting user needs.
  • S103 Determine the relevant knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the relevant knowledge point.
  • the actual application scenarios or business content can be analyzed to determine the knowledge points associated with the target confusion question. For example, based on the historical question sentences raised by users, the analysis shows that when the user raises the target confusion question, most of the questions are about confusion. Which knowledge point to query, manually determine the associated knowledge points of the target confusion problem, and attribute it to the question set under the associated knowledge point. For example, the target confusion question sentence is stored in the question set as a similar question sentence under the associated knowledge point. .
  • the question sentence raised by the user is the target confusion question sentence or is highly similar to the target confusion question sentence, the corresponding knowledge point can be accurately matched.
  • the similarity of "insufficient resources” calculated by the FAQ system is basically close to the two knowledge points, and the similarity of knowledge point 2 is higher, so knowledge point 2 is used. responded.
  • most users who raise this confusion question query knowledge point 1. It is better to use the first knowledge point to reply to the user's question "insufficient resources”.
  • the embodiment of the present disclosure proposes an optimization method for the question and answer knowledge base, which selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the target confusion question sentence corresponding to associated knowledge points, and attribute the target confusing question sentences to the question set corresponding to the associated knowledge points.
  • the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base for the question sentences that may be raised by the user is enhanced.
  • the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.
  • step S101 "obtaining the target confusion question sentence corresponding to the target question sentence according to the question set" may include steps S201 to S204.
  • the reference question sentence for generating the target confusion question sentence corresponding to the target question sentence is determined from the question sentences under the question and answer knowledge base.
  • the reference question sentence may be a question sentence other than the target question sentence in the question and answer knowledge base, or a question sentence in other question sets in the question and answer knowledge base except the question set in which the target question sentence is located.
  • the similarity between the target question sentence and any reference question sentence is calculated respectively.
  • the cosine value between the word vector of the target question sentence and the word vector of any reference question sentence can be calculated, and the cosine value of the word vector represents the similarity between the question sentences.
  • the similarity between two question sentences can also be calculated based on a neural network.
  • S203 Determine a set of similar question sentences of the target question sentence based on the similarity.
  • the reference question sentence based on the similarity between the target question sentence and the reference question sentence, it can be determined whether the reference question sentence can be used as a similar question sentence of the target question sentence, thereby obtaining a set of similar question sentences of the target question sentence.
  • whether the reference question sentences are similar question sentences can be screened by setting a similarity threshold.
  • the following simplification can also be done to the set of similar question sentences: group the similar question sentences in the set of similar question sentences according to knowledge points, and select one from multiple similar question sentences that can belong to the same knowledge point.
  • Representative similar question sentences are used to simplify the set of similar question sentences and reduce the amount of calculation.
  • S204 Obtain the target confusion question sentence based on the target question sentence and the set of similar question sentences.
  • the target confusing question sentence corresponding to the target question sentence is mined.
  • the above step S204 of "obtaining the target confusion question sentence based on the target question sentence and the set of similar question sentences" may include steps S301 to S303.
  • S301 Combine the target question sentence and any similar question sentence in the set of similar question sentences to form a question sentence pair.
  • word segmentation processing is performed on the target question sentence and similar question sentences in the question sentence pair, and the respective word segmentation sequences of the two question sentences are obtained, as shown in Table 2:
  • the synonymous word segmentations in the two word segmentation sequences that is, find synonyms or synonymous phrases in the two word segmentation sequences
  • Unify it into a unified target participle representation For example, "hint” in the target question sentence and "display” in similar question sentences in Table 2 are synonyms for each other. These two participles are represented by a unified target participle. This goal The participle can be either "prompt" or "display”.
  • Table 3 A representation of the normalized results of the word segmentation sequence corresponding to the question sentence
  • finding synonyms or synonymous phrases can be mined using a predefined synonym dictionary or a neural network model. This disclosure is not limited.
  • Step 1 Use BERT to calculate the semantic vector of each word segment in each question sentence or encode each word segment with BERT to obtain the word vector.
  • Step 2 Calculate the similarity between the participles of the target question sentence and the participles of similar question sentences in the question sentence pair, and determine the word participles whose similarity is greater than a certain threshold as synonymous participles. Assume that question sentence 1 contains M participles and question sentence 2 contains N participles. The obtained synonymous participles can be obtained using the following formula:
  • the common participle sequence can be the maximum common participle sequence.
  • the maximum common participle sequence of the question sentence pair in Table 3 is: prompt, resource, deficiency, how.
  • the common participle sequence can also be a sequence including participles common to two question sentences and participles unique to each question sentence. The formation method of the common participle sequence can be set as needed.
  • the word segmentation processing of the question sentences can also be performed before this step, and all the question sentences in the question and answer knowledge base are directly segmented to obtain the word segmentation sequences corresponding to each question sentence.
  • the target confusing question sentence of the target question sentence is obtained according to the common word segmentation sequence corresponding to the question sentence.
  • the above step S303 of "generating a target confusion question sentence based on a common word segmentation sequence” may include steps S401 to S403.
  • the public word segmentation sequence is edited to generate a new public word segmentation sequence, and for the new public word segmentation sequence, the steps of determining whether the statement corresponding to the public word segmentation sequence are legal are performed; in response to the public word segmentation sequence The corresponding statement is legal, and the legal statement is determined as a candidate confusing question sentence.
  • the preset condition can be that the total number of edits of a common word segmentation sequence is greater than the preset number of times, or the number of word segmentations of the common word segmentation sequence is less than the preset number. Set the number of participles. In response to the common word segmentation sequence meeting the preset condition, the common word segmentation sequence is discarded; in response to the common word segmentation sequence not meeting the preset condition, the step of editing the common word segmentation sequence is performed.
  • the editing process of the public word segmentation sequence can be implemented by deleting the first word segmentation or the last word segmentation in the public word segmentation sequence, so that the public word segmentation sequence after the word segmentation is deleted is used as a new public word segmentation sequence.
  • the public word segmentation sequence can be edited through the model.
  • whether a statement is legal can be determined based on the completeness of the statement and/or the probability of the statement appearing in the question and answer scenario.
  • the question sentences are used as positive samples, and the positive samples after character deletion are used as negative samples, which are obtained through training.
  • the sentence corresponding to the common word segmentation sequence [prompt, resource, insufficient, how] is judged, and the output is illegal because "prompt resource insufficient, how" is a incomplete sentence.
  • the preset exit condition is: the number of word segments in the word segmentation sequence is less than 3. Determine whether the common word segmentation sequence meets the exit condition. If the output is not satisfied, the last word segmentation will be deleted and the output will be [prompt, resource, insufficient] as a Use the new public word segmentation sequence to determine whether the sentence corresponding to the new public word segmentation sequence is legal, output "prompt for insufficient resources" as a legal sentence, and determine the legal sentence as a candidate confusion question sentence.
  • the confusion score of each candidate confusing question sentence is calculated.
  • the calculation method of the confusion score can be the following process:
  • C is the candidate confusing question sentence
  • score(C) is the confusion score
  • sim(C,K 1 ) are the two highest ranking similarities in the retrieval knowledge base
  • K 1 and K 2 are the two question sentences in the question and answer knowledge base that are most similar to the candidate confusion question sentences.
  • the greater the difference in similarity the lower the degree of confusion; the smaller the difference in similarity, the higher the degree of confusion.
  • S403 Determine the target confusing question sentence from the candidate confusing question sentences based on the confusion score.
  • candidate confusing question sentences corresponding to the target question sentence are sorted according to the confusion score, and a preset number of target confusing question sentences can be selected from high to low as needed.
  • the embodiment of the present disclosure proposes an optimization method for the question and answer knowledge base, which selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the target confusion question sentence corresponding to associated knowledge points, and attribute the target confusing question sentences to the question set corresponding to the associated knowledge points.
  • the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base for the question sentences that may be raised by the user is enhanced.
  • the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.
  • Figure 6 is a block diagram of an optimization device for a question and answer knowledge base proposed by the present disclosure.
  • the optimization device 600 for the question and answer knowledge base includes: a first determination module 601, an acquisition module 602 and a second determination module 603 .
  • the first determination module 601 is used to determine the question and answer knowledge base.
  • the question and answer knowledge base includes knowledge points and question sets corresponding to the knowledge points.
  • the acquisition module 602 is used to select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target confusion question sentence corresponding to the target question sentence according to the question set.
  • the second determination module 603 is used to determine the relevant knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the relevant knowledge point.
  • the acquisition module 602 is further configured to: determine the reference question sentence for generating the target confusion question sentence; determine the similarity between the target question sentence and the reference question sentence; and determine similar questions of the target question sentence based on the similarity. Sentence set; obtain the target confusing question sentence based on the target question sentence and the similar question sentence set.
  • the acquisition module 602 is also used to: perform word segmentation processing on the question sentences under the question and answer knowledge base, and obtain the word segmentation sequence corresponding to the question sentence.
  • the acquisition module 602 is further configured to: form a question sentence pair with the target question sentence and any similar question sentence in the set of similar question sentences; based on the word segmentation sequences of the two question sentences in the question sentence pair. , obtain the public word segmentation sequence corresponding to the question sentence; generate the target confusion question sentence based on the public word segmentation sequence.
  • the acquisition module 602 can further be used to: obtain synonymous participles in two word segmentation sequences; normalize the synonymous participles to obtain the target participles of the synonymous participles; use the target participles to replace the two participles.
  • synonymous word segmentation in the word segmentation sequence two replacement word segmentation sequences are obtained; the two replacement word segmentation sequences are compared to generate a common word segmentation sequence.
  • the acquisition module 602 is further configured to: obtain candidate confusing question sentences of the target question sentence based on the common word segmentation sequence; determine the confusion score of the candidate confusing question sentence; and obtain the candidate confusing question sentence from the candidate confusing question sentence according to the confusion score. Determine the target confusion question sentence.
  • the acquisition module 602 can further be used to determine whether the sentence corresponding to the common word segmentation sequence is legal.
  • the public word segmentation sequence is edited to generate a new public word segmentation sequence, and for the new public word segmentation sequence, the step of determining whether the statement corresponding to the public word segmentation sequence is legal is performed.
  • the legal statement is determined as a candidate confusion question sentence.
  • the acquisition module 602 is also configured to: for the new public word segmentation sequence, before executing the step of judging whether the statement corresponding to the public word segmentation sequence is legal, in response to the new public word segmentation sequence meeting the preset conditions, discard the Public word segmentation sequence; in response to the new public word segmentation sequence not meeting the preset conditions, perform the step of judging whether the statement corresponding to the public word segmentation sequence is legal for the new public word segmentation sequence.
  • the preset condition is that the number of edits of the common word segmentation sequence is greater than the preset number of times, or the number of word segments of the new public word segmentation sequence is less than the preset number of word segmentations.
  • the acquisition module 602 may be further configured to determine whether a statement is legal based on the completeness of the statement and/or the probability of the statement appearing in the question and answer scenario.
  • the acquisition module 602 can further be used to: determine whether a statement is legal based on a trained target classification model, where the target classification model is based on question sentences in user logs as positive samples, and will perform character
  • the pruned positive samples are used as negative samples and are obtained through training.
  • the acquisition module 602 is further configured to: delete the first word segmentation or the last word segmentation in the public word segmentation sequence, and use the public word segmentation sequence after the word segmentation is deleted as a new public word segmentation sequence.
  • the acquisition module 602 may be further configured to: determine the similarity between the candidate confusing question sentence and any question sentence in the question and answer knowledge base; determine the confusion score of the candidate confusing question sentence based on the similarity. .
  • the embodiment of the present disclosure proposes an optimization device for the question and answer knowledge base, which selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the target confusion question sentence corresponding to associated knowledge points, and attribute the target confusing question sentences to the question set corresponding to the associated knowledge points.
  • the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base that may be raised by the user is enhanced.
  • the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.
  • an embodiment of the present disclosure also proposes an electronic device 700.
  • the electronic device 700 includes: a processor 701 and a memory 702 communicatively connected to the processor.
  • the memory 702 stores information that can be used by at least one
  • the instructions executed by the processor are executed by at least one processor 701 to implement the optimization method of the question and answer knowledge base as shown in any embodiment of the present disclosure.
  • embodiments of the present disclosure also provide a non-transient computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to implement the question and answer knowledge base as shown in any embodiment of the present disclosure. Optimization.
  • embodiments of the present disclosure also provide a computer program product, which includes a computer program.
  • the computer program When executed by a processor, the computer program implements the optimization method of the question and answer knowledge base as shown in any embodiment of the present disclosure.
  • an embodiment of the present disclosure also proposes a computer program.
  • the computer program includes a computer program code.
  • the computer program code When the computer program code is run on a computer, it causes the computer to execute the question and answer knowledge shown in any embodiment of the present disclosure. Library optimization methods.
  • first and second are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • “plurality” means two or more than two, unless otherwise expressly and specifically limited.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are an optimization method and apparatus for a question-and-answer knowledge base, and an electronic device, a readable storage medium, a computer program product and a computer program. The optimization method for a question-and-answer knowledge base comprises: selecting a target question sentence from question sets included in a question-and-answer knowledge base, and acquiring, according to the question sets, a target obfuscated question sentence corresponding to the target question sentence; and determining an associated knowledge point corresponding to the target obfuscated question sentence, and classifying the target obfuscated question sentence into a question set which corresponds to the associated knowledge point.

Description

问答知识库的优化方法及其装置Optimization method and device for question and answer knowledge base
相关申请的交叉引用Cross-references to related applications
本申请要求在2022年09月13日在中国提交的中国专利申请号2022111100385的优先权,其全部内容通过引用并入本文。This application claims priority from Chinese Patent Application No. 2022111100385 filed in China on September 13, 2022, the entire content of which is incorporated herein by reference.
技术领域Technical field
本公开涉及自然语言处理领域,具体涉及一种问答知识库的优化方法及其装置、电子设备、可读存储介质、计算机程序产品和计算机程序。The present disclosure relates to the field of natural language processing, and specifically relates to an optimization method for a question and answer knowledge base and its device, electronic equipment, readable storage media, computer program products and computer programs.
背景技术Background technique
目前,多通过挖掘一个知识点下与标准问题句同义但不同词的相似问题句来扩充问答知识库,标准问题句与其同义但不同词的相似问题句对应同一个知识点,通过挖掘相似问题句使得常见问题解答(Frequently-asked Questions,简称FAQ)系统在基于问答知识库对用户提出的问题句进行应答时,能够通过将用户提出的问题句和各个知识点下的问题句(如标准问题及与标准问题同义但不同词的相似问题)进行相似度匹配,来准确匹配到用户提出的问题句对应的知识点,从而使得FQA不受同义词的影响。而混淆问题句是指用户提出的问题句与知识库中多个知识点下的问题句都比较相似,此时基于当前的问答知识库将难以正确匹配到对应的知识点,使得基于问答知识库的应答准确率较低。At present, the question and answer knowledge base is mostly expanded by mining similar question sentences that are synonymous with the standard question sentence but have different words under a knowledge point. The standard question sentence and its similar question sentences that are synonymous but have different words correspond to the same knowledge point. By mining similar question sentences, Question sentences enable the Frequently-asked Questions (FAQ) system to respond to questions raised by users based on the Q&A knowledge base by combining the questions raised by users with question sentences under each knowledge point (such as standard Questions and similar questions that are synonymous with standard questions but have different words) are used for similarity matching to accurately match the knowledge points corresponding to the question sentences raised by the user, so that FQA is not affected by synonyms. Confused question sentences mean that the question sentences raised by the user are relatively similar to the question sentences under multiple knowledge points in the knowledge base. At this time, it will be difficult to correctly match the corresponding knowledge points based on the current question and answer knowledge base, making it difficult to correctly match the question sentences based on the question and answer knowledge base. The response accuracy is low.
发明内容Contents of the invention
本公开旨在至少在一定程度上解决相关技术中的技术问题之一。The present disclosure aims to solve one of the technical problems in the related art, at least to a certain extent.
为此,本公开的第一个目的在于提出一种问答知识库的优化方法。To this end, the first purpose of the present disclosure is to propose an optimization method for a question and answer knowledge base.
本公开的第二个目的在于提出一种问答知识库的优化装置。The second purpose of the present disclosure is to provide an optimization device for a question and answer knowledge base.
本公开的第三个目的在于提出一种电子设备。The third object of the present disclosure is to provide an electronic device.
本公开的第四个目的在于提出一种非瞬时计算机可读存储介质。A fourth object of the present disclosure is to provide a non-transitory computer-readable storage medium.
本公开的第五个目的在于提出一种计算机程序产品。A fifth object of the present disclosure is to provide a computer program product.
本公开的第六个目的在于提出一种计算机程序。A sixth object of the present disclosure is to provide a computer program.
为达上述目的,本公开第一方面实施例提出了一种问答知识库的优化方法,包括:确定问答知识库,所述问答知识库中包括知识点和知识点对应的问题集;从所述问答知识库包括的问题集中选取目标问题句,并根据所述问题集,获取所述目标问题句对应的目标混 淆问题句;确定所述目标混淆问题句对应的关联知识点,并将所述目标混淆问题句归属到所述关联知识点对应的问题集中。In order to achieve the above purpose, the first embodiment of the present disclosure proposes an optimization method for a question and answer knowledge base, including: determining a question and answer knowledge base, which includes knowledge points and question sets corresponding to the knowledge points; from the Select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target question sentence corresponding to the target question sentence according to the question set. Confuse the question sentence; determine the relevant knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the relevant knowledge point.
本公开从问答知识库包括的问题集中选取目标问题句,并根据问题集,获取目标问题句对应的目标混淆问题句;确定目标混淆问题句对应的关联知识点,并将目标混淆问题句归属到关联知识点对应的问题集中。通过挖掘各个知识点的混淆问题句,并将混淆问题句归属到对应的关联知识点下,来扩充问答知识库,增强问答知识库对用户可能提出的问题句的涵盖范围,在基于该问答知识库实现常用问题解答或检索功能时,能够正确匹配到用户提出的问题句对应的知识点,增强应答效果。This disclosure selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the associated knowledge points corresponding to the target confusion question sentence, and attributes the target confusion question sentence to Questions corresponding to related knowledge points are concentrated. By mining the confusing question sentences of each knowledge point and attributing the confused question sentences to the corresponding related knowledge points, the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base for the question sentences that may be raised by the user is enhanced. Based on the question and answer knowledge, When the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.
为达上述目的,本公开第二方面实施例提出了一种问答知识库的优化装置,包括:第一确定模块,用于确定问答知识库,所述问答知识库中包括知识点和知识点对应的问题集;获取模块,用于从所述问答知识库包括的问题集中选取目标问题句,并根据所述问题集,获取所述目标问题句对应的目标混淆问题句;第二确定所述目标混淆问题句对应的关联知识点,并将所述目标混淆问题句归属到所述关联知识点对应的问题集中。In order to achieve the above purpose, the second embodiment of the present disclosure proposes an optimization device for a question and answer knowledge base, including: a first determination module for determining a question and answer knowledge base, where the question and answer knowledge base includes knowledge points and corresponding knowledge points. The question set; the acquisition module is used to select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target confusion question sentence corresponding to the target question sentence according to the question set; secondly determine the target Confusing the associated knowledge points corresponding to the question sentences, and attributing the target confusing question sentences to the question set corresponding to the associated knowledge points.
本公开从问答知识库包括的问题集中选取目标问题句,并根据问题集,获取目标问题句对应的目标混淆问题句;确定目标混淆问题句对应的关联知识点,并将目标混淆问题句归属到关联知识点对应的问题集中。通过挖掘各个知识点的混淆问题句,并将混淆问题句归属到对应的关联知识点下,来扩充问答知识库,增强问答知识库对用户可能提出的问题句的涵盖范围,在基于该问答知识库实现常用问题解答或检索功能时,能够正确匹配到用户提出的问题句对应的知识点,增强应答效果。This disclosure selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the associated knowledge points corresponding to the target confusion question sentence, and attributes the target confusion question sentence to Questions corresponding to related knowledge points are concentrated. By mining the confusing question sentences of each knowledge point and attributing the confused question sentences to the corresponding related knowledge points, the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base for the question sentences that may be raised by the user is enhanced. Based on the question and answer knowledge, When the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.
为达上述目的,本公开第三方面实施例提出了一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以实现如本公开第一方面任一实施例所述的问答知识库的优化方法。To achieve the above object, a third embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be used by the Instructions executed by at least one processor, the instructions being executed by the at least one processor, to implement the optimization method of the question and answer knowledge base as described in any embodiment of the first aspect of the present disclosure.
为达上述目的,本公开第四方面实施例提出了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于实现如本公开第一方面任一实施例所述的问答知识库的优化方法。To achieve the above object, the fourth embodiment of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to implement any of the embodiments of the first aspect of the present disclosure. Optimization method of question and answer knowledge base.
为达上述目的,本公开第五方面实施例提出了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时用于实现如本公开第一方面任一实施例所述的问答知识库的优化方法。To achieve the above object, the fifth embodiment of the present disclosure provides a computer program product, including a computer program, which when executed by a processor is used to implement the method described in any embodiment of the first aspect of the present disclosure. Optimization method for question and answer knowledge base.
为达上述目的,本公开第六方面实施例提出了一种计算机程序,包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行如本公开第一方面任一实施例所述的方法。 To achieve the above object, the sixth embodiment of the present disclosure provides a computer program, which includes computer program code. When the computer program code is run on a computer, the computer executes any of the embodiments of the first aspect of the present disclosure. the method described.
附图说明Description of drawings
图1为本公开一实施例提供的问答知识库的优化方法的流程示意图;Figure 1 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by an embodiment of the present disclosure;
图2为本公开另一实施例提供的问答知识库的优化方法的流程示意图;Figure 2 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by another embodiment of the present disclosure;
图3为本公开另一实施例提供的问答知识库的优化方法的流程示意图;Figure 3 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by another embodiment of the present disclosure;
图4为本公开另一实施例提供的问答知识库的优化方法的流程示意图;Figure 4 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by another embodiment of the present disclosure;
图5为本公开一实施例提供的问答知识库的优化方法中判断语句是否合法的流程示意图;Figure 5 is a schematic flowchart of determining whether a statement is legal in the optimization method of the question and answer knowledge base provided by an embodiment of the present disclosure;
图6为本公开提出的一种问答知识库的优化装置的框图;Figure 6 is a block diagram of an optimization device for a question and answer knowledge base proposed by the present disclosure;
图7为本公开提供的电子设备的框图。Figure 7 is a block diagram of an electronic device provided by the present disclosure.
具体实施方式Detailed ways
下面详细描述本公开的实施例,实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本公开,而不能理解为对本公开的限制。Embodiments of the present disclosure are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain the present disclosure and are not to be construed as limitations of the present disclosure.
图1为本公开一实施例提供的问答知识库的优化方法的流程示意图,本公开实施例的问答知识库的优化方法,可由本公开实施例提供的问答知识库的优化装置执行,该问答知识库的优化装置可以设置终端和服务器等电子设备中。如图1所示,本公开实施例的问答知识库的优化方法包括步骤S101至S103。Figure 1 is a schematic flowchart of a method for optimizing a question and answer knowledge base provided by an embodiment of the present disclosure. The optimization method of a question and answer knowledge base according to an embodiment of the present disclosure can be executed by an optimization device for a question and answer knowledge base provided by an embodiment of the present disclosure. The question and answer knowledge base The optimization device of the library can be set in electronic equipment such as terminals and servers. As shown in Figure 1, the optimization method of the question and answer knowledge base according to the embodiment of the present disclosure includes steps S101 to S103.
S101,确定问答知识库,问答知识库中包括知识点和知识点对应的问题集。S101. Determine the question and answer knowledge base. The question and answer knowledge base includes knowledge points and question sets corresponding to the knowledge points.
本公开实施例中确定待优化的问答知识库,知识库中包括多个知识点,和知识点对应的问题集,问题集中可以包括该知识点对应的标准问题句和相似问题句,还可以包括用于回答标准问题句和相似问题句的答案。如表1所示:In the embodiment of the present disclosure, the question and answer knowledge base to be optimized is determined. The knowledge base includes multiple knowledge points and question sets corresponding to the knowledge points. The question set may include standard question sentences and similar question sentences corresponding to the knowledge points, and may also include Answers to standard and similar questions. As shown in Table 1:
表1问答知识库示例表
Table 1 Example table of question and answer knowledge base
S102,从问答知识库包括的问题集中选取目标问题句,并根据问题集,获取目标问题句对应的目标混淆问题句。 S102: Select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target confusion question sentence corresponding to the target question sentence according to the question set.
在本公开实施例中,问答知识库中的任一问题句均可作为目标问题句,即任一知识点下的任一标准问题句或任一相似问题句均可作为目标问题句,以对目标问题句执行本公开的方法获取该目标问题句对应的目标混淆问题句。其中混淆问题句可以理解为与多个知识点下的问题句都存在相似度,难以明确其对应的知识点的问题句,例如“资源不足”可以被看做是一个混淆问题句,因为“资源不足”与知识点1下的“提示资源不足,如何查看现有资源”和知识点2下的“显示资源不足,如何申请”都存在较高的相似度,当未在知识点下扩充混淆问题句的情况下,FQA系统很难从中准确的确定其对应的知识点,从而导致应答不准确,难以满足用户需求。In the embodiment of the present disclosure, any question sentence in the question and answer knowledge base can be used as a target question sentence, that is, any standard question sentence or any similar question sentence under any knowledge point can be used as a target question sentence, so as to The target question sentence executes the method of the present disclosure to obtain the target confusion question sentence corresponding to the target question sentence. Among them, a confusing question sentence can be understood as a question sentence that is similar to question sentences under multiple knowledge points, and it is difficult to clarify its corresponding knowledge point. For example, "insufficient resources" can be regarded as a confusing question sentence because "resources" "Insufficient" has a high degree of similarity with "Insufficient resources, how to view existing resources" under knowledge point 1 and "Insufficient resources, how to apply" under knowledge point 2 have a high degree of similarity. When the confusion problem is not expanded under knowledge point In the case of sentences, it is difficult for the FQA system to accurately determine the corresponding knowledge points, resulting in inaccurate responses and difficulty in meeting user needs.
S103,确定目标混淆问题句对应的关联知识点,并将目标混淆问题句归属到关联知识点对应的问题集中。S103: Determine the relevant knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the relevant knowledge point.
在本公开实施例中,可以对实际的应用场景或业务内容分析确定目标混淆问题句所关联的知识点,例如根据用户提出的历史问题句,分析得到当用户提出该目标混淆问题时大多是对哪一个知识点进行查询,人工确定目标混淆问题的关联知识点,将其归属到该关联知识点下的问题集中,例如将目标混淆问题句作为关联知识点下的一个相似问题句存放在问题集中。In the embodiments of the present disclosure, the actual application scenarios or business content can be analyzed to determine the knowledge points associated with the target confusion question. For example, based on the historical question sentences raised by users, the analysis shows that when the user raises the target confusion question, most of the questions are about confusion. Which knowledge point to query, manually determine the associated knowledge points of the target confusion problem, and attribute it to the question set under the associated knowledge point. For example, the target confusion question sentence is stored in the question set as a similar question sentence under the associated knowledge point. .
因此,当用户提出的问题句为目标混淆问题句或与目标混淆问题句的相似度较高时可以准确的匹配到对应的知识点。Therefore, when the question sentence raised by the user is the target confusion question sentence or is highly similar to the target confusion question sentence, the corresponding knowledge point can be accurately matched.
例如,对于用户提出的混淆问题“资源不足”,FAQ系统通常通过计算得到的“资源不足”与2个知识点相似度基本接近,且知识点2的相似度更高,因此使用知识点2进行了应答。但在实际业务场景中,提出该混淆问题的用户大多对知识点1进行查询,该用户问题“资源不足”应该使用第1个知识点进行回复较好。通过本公开的问答知识库优化方法,可以挖掘各个问题句的混淆问题句,人工基于业务知识将混淆问题句归属到它的关联知识点下,增强应答效果。For example, for the confusion question "insufficient resources" raised by users, the similarity of "insufficient resources" calculated by the FAQ system is basically close to the two knowledge points, and the similarity of knowledge point 2 is higher, so knowledge point 2 is used. responded. However, in actual business scenarios, most users who raise this confusion question query knowledge point 1. It is better to use the first knowledge point to reply to the user's question "insufficient resources". Through the disclosed question and answer knowledge base optimization method, confusing question sentences of each question sentence can be mined, and the confusing question sentences are manually assigned to its associated knowledge points based on business knowledge to enhance the response effect.
本公开实施例提出了一种问答知识库的优化方法,从问答知识库包括的问题集中选取目标问题句,并根据问题集,获取目标问题句对应的目标混淆问题句;确定目标混淆问题句对应的关联知识点,并将目标混淆问题句归属到关联知识点对应的问题集中。通过挖掘各个知识点的混淆问题句,并将混淆问题句归属到对应的关联知识点下,来扩充问答知识库,增强问答知识库对用户可能提出的问题句的涵盖范围,在基于该问答知识库实现常用问题解答或检索功能时,能够正确匹配到用户提出的问题句对应的知识点,增强应答效果。The embodiment of the present disclosure proposes an optimization method for the question and answer knowledge base, which selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the target confusion question sentence corresponding to associated knowledge points, and attribute the target confusing question sentences to the question set corresponding to the associated knowledge points. By mining the confusing question sentences of each knowledge point and attributing the confused question sentences to the corresponding related knowledge points, the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base for the question sentences that may be raised by the user is enhanced. Based on the question and answer knowledge, When the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.
在上述实施例的基础上,如图2所示,上述步骤S101中“根据问题集,获取目标问题句对应的目标混淆问题句”可包括步骤S201至S204。On the basis of the above embodiment, as shown in Figure 2, in the above step S101, "obtaining the target confusion question sentence corresponding to the target question sentence according to the question set" may include steps S201 to S204.
S201,确定生成目标混淆问题句的参考问题句。 S201: Determine the reference question sentence for generating the target confusion question sentence.
在本公开实施例中,从问答知识库下的问题句中确定生成目标问题句对应的目标混淆问题句的参考问题句。参考问题句可以为问答知识库下除了目标问题句之外的其他问题句,或者为问答知识库下除了目标问题句所在问题集之外的其他问题集中的问题句。In the embodiment of the present disclosure, the reference question sentence for generating the target confusion question sentence corresponding to the target question sentence is determined from the question sentences under the question and answer knowledge base. The reference question sentence may be a question sentence other than the target question sentence in the question and answer knowledge base, or a question sentence in other question sets in the question and answer knowledge base except the question set in which the target question sentence is located.
S202,确定目标问题句与参考问题句的相似度。S202: Determine the similarity between the target question sentence and the reference question sentence.
在本公开实施例中,分别计算目标问题句与任一参考问题句的相似度。In the embodiment of the present disclosure, the similarity between the target question sentence and any reference question sentence is calculated respectively.
其中,可以计算目标问题句的词向量和任一参考问题句的词向量之间的余弦值,以词向量的余弦值代表问题句间的相似度。在一些实施例中也可以基于神经网络计算两个问题句间的相似度。Among them, the cosine value between the word vector of the target question sentence and the word vector of any reference question sentence can be calculated, and the cosine value of the word vector represents the similarity between the question sentences. In some embodiments, the similarity between two question sentences can also be calculated based on a neural network.
S203,根据相似度,确定目标问题句的相似问题句集合。S203: Determine a set of similar question sentences of the target question sentence based on the similarity.
在本公开实施例中,可以根据目标问题句与参考问题句之间的相似度,确定该参考问题句是否可以作为目标问题句的相似问题句,从而得到目标问题句的相似问题句集合。In the embodiment of the present disclosure, based on the similarity between the target question sentence and the reference question sentence, it can be determined whether the reference question sentence can be used as a similar question sentence of the target question sentence, thereby obtaining a set of similar question sentences of the target question sentence.
在一些实施例中,可以通过设置相似度阈值来筛选参考问题句是否为相似问题句。In some embodiments, whether the reference question sentences are similar question sentences can be screened by setting a similarity threshold.
在一些实施例中,还可以对相似问题句集合做以下简化:按照知识点对相似问题句集合中的相似问题句进行分组,从可以归属到同一个知识点的多个相似问题句中选择一个代表性的相似问题句,以此来简化相似问题句集合,减少计算量。In some embodiments, the following simplification can also be done to the set of similar question sentences: group the similar question sentences in the set of similar question sentences according to knowledge points, and select one from multiple similar question sentences that can belong to the same knowledge point. Representative similar question sentences are used to simplify the set of similar question sentences and reduce the amount of calculation.
S204,根据目标问题句和相似问题句集合,获取目标混淆问题句。S204: Obtain the target confusion question sentence based on the target question sentence and the set of similar question sentences.
在本公开实施例中,基于目标问题句和目标问题句对应的相似问题句集合,挖掘目标问题句对应的目标混淆问题句。其中目标混淆问题句可以为一个和多个,具体个数本公开不做限制。In the embodiment of the present disclosure, based on the target question sentence and a set of similar question sentences corresponding to the target question sentence, the target confusing question sentence corresponding to the target question sentence is mined. There can be one or more target confusion question sentences, and the disclosure does not limit the specific number.
在上述实施例的基础上,如图3所示,上述步骤S204中“根据目标问题句和相似问题句集合,得到目标混淆问题句”可以包括步骤S301至S303。On the basis of the above embodiment, as shown in Figure 3, the above step S204 of "obtaining the target confusion question sentence based on the target question sentence and the set of similar question sentences" may include steps S301 to S303.
S301,将目标问题句与相似问题句集合中的任一相似问题句组成问题句对。S301: Combine the target question sentence and any similar question sentence in the set of similar question sentences to form a question sentence pair.
S302,基于问题句对中两个问题句各自的分词序列,获取问题句对应的公共分词序列。S302: Based on the respective word segmentation sequences of the two question sentences in the question sentence pair, obtain the common word segmentation sequence corresponding to the question sentences.
在本公开实施例中,对问题句对中的目标问题句和相似问题句进行分词处理,得到两个问题句各自的分词序列,如表2所示:In the embodiment of the present disclosure, word segmentation processing is performed on the target question sentence and similar question sentences in the question sentence pair, and the respective word segmentation sequences of the two question sentences are obtained, as shown in Table 2:
表2问题句的分词结果示意表
Table 2: Word segmentation results of question sentences
基于目标问题句的分词序列和候选问题句的分词序列,获取两个分词序列中的同义分词,(即查找两个分词序列中的同义词或同义短语)将互为同义词的多个分词归一化为一个统一的目标分词表示,例如表2中目标问题句中的“提示”与相似问题句中的“显示”互为同义词,将这两个分词使用一个统一的目标分词表示,这个目标分词可以为“提示”和“显示”中的任一个分词。利用目标分词替换两个分词序列中的同义分词,得到两个替换后分词序列,以此完成两个分词序列的归一化,实现两个分词序列之间的词对齐处理,例如表3所示:Based on the word segmentation sequence of the target question sentence and the word segmentation sequence of the candidate question sentence, obtain the synonymous word segmentations in the two word segmentation sequences (that is, find synonyms or synonymous phrases in the two word segmentation sequences) and classify multiple word segments that are synonymous with each other. Unify it into a unified target participle representation. For example, "hint" in the target question sentence and "display" in similar question sentences in Table 2 are synonyms for each other. These two participles are represented by a unified target participle. This goal The participle can be either "prompt" or "display". Use the target word segmentation to replace the synonymous word segmentation in the two word segmentation sequences to obtain two replaced word segmentation sequences, thereby completing the normalization of the two word segmentation sequences and achieving word alignment processing between the two word segmentation sequences, as shown in Table 3 Show:
表3问题句对应的分词序列的归一化结果示意表
Table 3 A representation of the normalized results of the word segmentation sequence corresponding to the question sentence
其中,查找同义词或同义短语可以使用预定义的同义词词典或神经网络模型来挖掘。本公开不做限定。Among them, finding synonyms or synonymous phrases can be mined using a predefined synonym dictionary or a neural network model. This disclosure is not limited.
其中,使用神经网络模型计算时可以通过以下过程实现:Among them, when using the neural network model calculation, it can be achieved through the following process:
步骤一:应用BERT计算每个问题句中每个分词的语义向量或者对每个分词用BERT进行编码得到词向量。Step 1: Use BERT to calculate the semantic vector of each word segment in each question sentence or encode each word segment with BERT to obtain the word vector.
步骤二:计算问题句对中目标问题句的分词与相似问题句的分词之间的相似度,将相似度大于一定阈值的分词词对确定为同义分词。假设问题句1中包含M个分词,问题句2中包含N个分词,获取的同义分词可以使用如下公式得到:
Step 2: Calculate the similarity between the participles of the target question sentence and the participles of similar question sentences in the question sentence pair, and determine the word participles whose similarity is greater than a certain threshold as synonymous participles. Assume that question sentence 1 contains M participles and question sentence 2 contains N participles. The obtained synonymous participles can be obtained using the following formula:
分别表示问题句1和问题句2中的分词,上标表示问题句编号,下标表示分词编号,表示问题句1中第i个分词的同义分词集合,表示分词间的相似度,可以使用词向量计算余弦值表示相似度,δ表示阈值。 represent the participles in question sentence 1 and question sentence 2 respectively, the superscript represents the question sentence number, and the subscript represents the participle number. Represents the set of synonymous participles for the i-th participle in question sentence 1, Represents the similarity between word segments. You can use word vectors to calculate the cosine value to represent the similarity, and δ represents the threshold.
将经过同义分词归一化后的两个分词序列,进行比对,选取两个分词序列中所共有的分词,得到问题句对的公共分词序列。其中公共分词序列可以为最大公共分词序列,例如表3中问题句对的最大公共分词序列为:提示、资源、不足、如何。公共分词序列也可以为包含两个问题句所共有的分词和各个问题句独有的分词的序列,可根据需要设置公共分词序列的形成方式。Compare the two word segmentation sequences that have been normalized by synonymous word segmentation, select the common word segmentation in the two word segmentation sequences, and obtain the common word segmentation sequence of the question sentence pair. The common participle sequence can be the maximum common participle sequence. For example, the maximum common participle sequence of the question sentence pair in Table 3 is: prompt, resource, deficiency, how. The common participle sequence can also be a sequence including participles common to two question sentences and participles unique to each question sentence. The formation method of the common participle sequence can be set as needed.
作为一种可行的实施方式,对问题句进行分词处理还可以在该步骤之前执行,直接对问答知识库中的全部问题句进行分词获取各个问题句对应的分词序列。 As a feasible implementation, the word segmentation processing of the question sentences can also be performed before this step, and all the question sentences in the question and answer knowledge base are directly segmented to obtain the word segmentation sequences corresponding to each question sentence.
S303,根据公共分词序列生成目标混淆问题句。S303. Generate a target confusion question sentence based on the common word segmentation sequence.
在一些实施例中根据问题句对应的公共分词序列获取目标问题句的目标混淆问题句。In some embodiments, the target confusing question sentence of the target question sentence is obtained according to the common word segmentation sequence corresponding to the question sentence.
在上述实施例的基础上,如图4所示,上述步骤S303中“根据公共分词序列生成目标混淆问题句”,可以包括步骤S401至S403。On the basis of the above embodiment, as shown in Figure 4, the above step S303 of "generating a target confusion question sentence based on a common word segmentation sequence" may include steps S401 to S403.
S401,基于公共分词序列得到目标问题句的候选混淆问题句。S401. Obtain candidate confusing question sentences of the target question sentence based on the common word segmentation sequence.
在本公开实施例中,如图5所示,判断公共分词序列对应的语句是否合法,其中公共分词序列对应的语句为公共分词序列中的各个分词拼接成的语句,例如上述公共分词序列“提示、资源、不足、如何”对应的语句为提示资源不足。In the embodiment of the present disclosure, as shown in Figure 5, it is determined whether the sentence corresponding to the common word segmentation sequence is legal, wherein the sentence corresponding to the common word segmentation sequence is a sentence spliced into each segmentation in the common word segmentation sequence, for example, the above-mentioned public word segmentation sequence "prompt" , resources, insufficient, "how" corresponds to the statement indicating insufficient resources.
响应于公共分词序列对应的语句不合法,对公共分词序列进行编辑,生成新的公共分词序列,针对新的公共分词序列,执行判断公共分词序列对应的语句是否合法的步骤;响应于公共分词序列对应的语句合法,将合法的语句确定为候选混淆问题句。In response to the statement corresponding to the public word segmentation sequence being illegal, the public word segmentation sequence is edited to generate a new public word segmentation sequence, and for the new public word segmentation sequence, the steps of determining whether the statement corresponding to the public word segmentation sequence are legal are performed; in response to the public word segmentation sequence The corresponding statement is legal, and the legal statement is determined as a candidate confusing question sentence.
其中,还可以在对公共分词序列进行编辑之前,增加退出循环的预设条件,其中预设条件可以为一个公共分词序列的总编辑次数大于预设次数,或者公共分词序列的分词个数小于预设分词数。响应于公共分词序列满足预设条件,则舍弃该公共分词序列;响应于公共分词序列不满足预设条件,则执行对该公共分词序列进行编辑的步骤。Among them, you can also add a preset condition for exiting the loop before editing the common word segmentation sequence. The preset condition can be that the total number of edits of a common word segmentation sequence is greater than the preset number of times, or the number of word segmentations of the common word segmentation sequence is less than the preset number. Set the number of participles. In response to the common word segmentation sequence meeting the preset condition, the common word segmentation sequence is discarded; in response to the common word segmentation sequence not meeting the preset condition, the step of editing the common word segmentation sequence is performed.
其中,可以通过删除公共分词序列中的第一个分词或者最后一个分词来实现对公共分词序列的编辑过程,以将分词删除后的公共分词序列作为新的公共分词序列。此外还可以通过模型对公共分词序列进行编辑。Among them, the editing process of the public word segmentation sequence can be implemented by deleting the first word segmentation or the last word segmentation in the public word segmentation sequence, so that the public word segmentation sequence after the word segmentation is deleted is used as a new public word segmentation sequence. In addition, the public word segmentation sequence can be edited through the model.
作为一种可行的实施方式,可以根据语句的完整性和/或语句在问答场景中出现的概率,判断语句是否合法。As a feasible implementation method, whether a statement is legal can be determined based on the completeness of the statement and/or the probability of the statement appearing in the question and answer scenario.
作为另一种可行的实施方式,还可以通过构建模型、训练模型来实现语句是否合法的判断,如基于训练好的目标分类模型,判断语句是否合法,其中目标分类模型可以为基于用户日志中的问题句作为正样本,以及将进行字符删减后的正样本作为负样本,训练得到的。As another feasible implementation method, you can also judge whether a statement is legal by building a model and training the model. For example, judging whether a statement is legal based on a trained target classification model, where the target classification model can be based on the user log. The question sentences are used as positive samples, and the positive samples after character deletion are used as negative samples, which are obtained through training.
举例说明:对公共分词序列【提示、资源、不足、如何】对应的语句进行判断,输出为不合法,因为“提示资源不足如何”是一个残缺句。假设设置的退出预设条件是:分词序列中分词个数小于3,判断该公共分词序列是否满足退出条件,输出为不满足,则删除最后一个分词,输出为【提示、资源、不足】作为一个新的公共分词序列,判断新的公共分词序列对应的语句是否合法,输出“提示资源不足”为合法语句,将合法的语句确定为候选混淆度问题句。For example: The sentence corresponding to the common word segmentation sequence [prompt, resource, insufficient, how] is judged, and the output is illegal because "prompt resource insufficient, how" is a incomplete sentence. Assume that the preset exit condition is: the number of word segments in the word segmentation sequence is less than 3. Determine whether the common word segmentation sequence meets the exit condition. If the output is not satisfied, the last word segmentation will be deleted and the output will be [prompt, resource, insufficient] as a Use the new public word segmentation sequence to determine whether the sentence corresponding to the new public word segmentation sequence is legal, output "prompt for insufficient resources" as a legal sentence, and determine the legal sentence as a candidate confusion question sentence.
S402,确定候选混淆问题句的混淆度得分。 S402. Determine the confusion score of the candidate confusing question sentence.
在本公开实施例中,计算每个候选混淆问题句的混淆度得分,其中混淆度得分的计算方法,可为下列过程:In the embodiment of the present disclosure, the confusion score of each candidate confusing question sentence is calculated. The calculation method of the confusion score can be the following process:
计算候选混淆问题句与问答知识库中的任一问题句之间的相似度;从这些相似度中中筛选出预设数量个排名较高的相似度,例如在本公开实施例中按照相似度的数值从高到低从中选取两个相似度,根据这两个相似度确定候选混淆问题句的混淆度得分。其中计算公式如下:Calculate the similarity between the candidate confusing question sentence and any question sentence in the question and answer knowledge base; filter out a preset number of higher ranking similarities from these similarities, for example, in the embodiment of the present disclosure, according to the similarity Select two similarities from high to low, and determine the confusion score of the candidate confusing question sentence based on these two similarities. The calculation formula is as follows:
score(C)=-|sim(C,K1)-sim(C,K2)|score(C)=-|sim(C,K 1 )-sim(C,K 2 )|
其中,C是候选混淆问题句,score(C)是混淆度得分,sim(C,K1),sim(C,K2)是检索知识库排名最高的两个相似度,K1和K2为问答知识库中与候选混淆问题句相似度最大的两个问题句。其中,相似度差异越大,混淆度越低;相似度差异越小,混淆度越高。Among them, C is the candidate confusing question sentence, score(C) is the confusion score, sim(C,K 1 ), sim(C,K 2 ) are the two highest ranking similarities in the retrieval knowledge base, K 1 and K 2 These are the two question sentences in the question and answer knowledge base that are most similar to the candidate confusion question sentences. Among them, the greater the difference in similarity, the lower the degree of confusion; the smaller the difference in similarity, the higher the degree of confusion.
S403,根据混淆度得分,从候选混淆问题句中确定目标混淆问题句。S403: Determine the target confusing question sentence from the candidate confusing question sentences based on the confusion score.
在本公开实施例中,根据混淆度得分对目标问题句对应的候选混淆问题句进行排序,可根据需要从高到低选取预设数量个目标混淆问题句。In the embodiment of the present disclosure, candidate confusing question sentences corresponding to the target question sentence are sorted according to the confusion score, and a preset number of target confusing question sentences can be selected from high to low as needed.
本公开实施例提出了一种问答知识库的优化方法,从问答知识库包括的问题集中选取目标问题句,并根据问题集,获取目标问题句对应的目标混淆问题句;确定目标混淆问题句对应的关联知识点,并将目标混淆问题句归属到关联知识点对应的问题集中。通过挖掘各个知识点的混淆问题句,并将混淆问题句归属到对应的关联知识点下,来扩充问答知识库,增强问答知识库对用户可能提出的问题句的涵盖范围,在基于该问答知识库实现常用问题解答或检索功能时,能够正确匹配到用户提出的问题句对应的知识点,增强应答效果。The embodiment of the present disclosure proposes an optimization method for the question and answer knowledge base, which selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the target confusion question sentence corresponding to associated knowledge points, and attribute the target confusing question sentences to the question set corresponding to the associated knowledge points. By mining the confusing question sentences of each knowledge point and attributing the confused question sentences to the corresponding related knowledge points, the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base for the question sentences that may be raised by the user is enhanced. Based on the question and answer knowledge, When the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.
图6为本公开提出的一种问答知识库的优化装置的框图,如图6所示,该问答知识库的优化装置600,包括:第一确定模块601、获取模块602和第二确定模块603。Figure 6 is a block diagram of an optimization device for a question and answer knowledge base proposed by the present disclosure. As shown in Figure 6, the optimization device 600 for the question and answer knowledge base includes: a first determination module 601, an acquisition module 602 and a second determination module 603 .
第一确定模块601,用于确定问答知识库,问答知识库中包括知识点和知识点对应的问题集。The first determination module 601 is used to determine the question and answer knowledge base. The question and answer knowledge base includes knowledge points and question sets corresponding to the knowledge points.
获取模块602,用于从问答知识库包括的问题集中选取目标问题句,并根据问题集,获取目标问题句对应的目标混淆问题句。The acquisition module 602 is used to select the target question sentence from the question set included in the question and answer knowledge base, and obtain the target confusion question sentence corresponding to the target question sentence according to the question set.
第二确定模块603,用于确定目标混淆问题句对应的关联知识点,并将目标混淆问题句归属到关联知识点对应的问题集中。The second determination module 603 is used to determine the relevant knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the relevant knowledge point.
根据本公开的一个实施方式,获取模块602,进一步用于:确定生成目标混淆问题句的参考问题句;确定目标问题句与参考问题句的相似度;根据相似度,确定目标问题句的相似问题句集合;根据目标问题句和相似问题句集合,获取目标混淆问题句。According to an embodiment of the present disclosure, the acquisition module 602 is further configured to: determine the reference question sentence for generating the target confusion question sentence; determine the similarity between the target question sentence and the reference question sentence; and determine similar questions of the target question sentence based on the similarity. Sentence set; obtain the target confusing question sentence based on the target question sentence and the similar question sentence set.
根据本公开的一个实施方式,获取模块602,还用于:对问答知识库下的问题句进行分词处理,得到问题句对应的分词序列。 According to an embodiment of the present disclosure, the acquisition module 602 is also used to: perform word segmentation processing on the question sentences under the question and answer knowledge base, and obtain the word segmentation sequence corresponding to the question sentence.
根据本公开的一个实施方式,获取模块602,进一步用于:将目标问题句与相似问题句集合中的任一相似问题句组成问题句对;基于问题句对中两个问题句各自的分词序列,获取问题句对应的公共分词序列;根据公共分词序列生成目标混淆问题句。According to an embodiment of the present disclosure, the acquisition module 602 is further configured to: form a question sentence pair with the target question sentence and any similar question sentence in the set of similar question sentences; based on the word segmentation sequences of the two question sentences in the question sentence pair. , obtain the public word segmentation sequence corresponding to the question sentence; generate the target confusion question sentence based on the public word segmentation sequence.
根据本公开的一个实施方式,获取模块602,进一步可用于:获取两个分词序列中的同义分词;将同义分词进行归一化,得到同义分词的目标分词;利用目标分词替换两个分词序列中的同义分词,得到两个替换后分词序列;对两个替换后分词序列进行比对,生成公共分词序列。According to an embodiment of the present disclosure, the acquisition module 602 can further be used to: obtain synonymous participles in two word segmentation sequences; normalize the synonymous participles to obtain the target participles of the synonymous participles; use the target participles to replace the two participles. For synonymous word segmentation in the word segmentation sequence, two replacement word segmentation sequences are obtained; the two replacement word segmentation sequences are compared to generate a common word segmentation sequence.
根据本公开的一个实施方式,获取模块602,进一步用于:基于公共分词序列得到目标问题句的候选混淆问题句;确定候选混淆问题句的混淆度得分;根据混淆度得分,从候选混淆问题句中确定目标混淆问题句。According to an embodiment of the present disclosure, the acquisition module 602 is further configured to: obtain candidate confusing question sentences of the target question sentence based on the common word segmentation sequence; determine the confusion score of the candidate confusing question sentence; and obtain the candidate confusing question sentence from the candidate confusing question sentence according to the confusion score. Determine the target confusion question sentence.
根据本公开的一个实施方式,获取模块602,进一步可用于:判断公共分词序列对应的语句是否合法。响应于公共分词序列对应的语句不合法,对公共分词序列进行编辑,生成新的公共分词序列,针对新的公共分词序列,执行判断公共分词序列对应的语句是否合法的步骤。响应于公共分词序列对应的语句合法,将合法的语句确定为候选混淆问题句。According to an embodiment of the present disclosure, the acquisition module 602 can further be used to determine whether the sentence corresponding to the common word segmentation sequence is legal. In response to the statement corresponding to the public word segmentation sequence being illegal, the public word segmentation sequence is edited to generate a new public word segmentation sequence, and for the new public word segmentation sequence, the step of determining whether the statement corresponding to the public word segmentation sequence is legal is performed. In response to the statement corresponding to the common word segmentation sequence being legal, the legal statement is determined as a candidate confusion question sentence.
根据本公开的一个实施方式,获取模块602,还用于:针对新的公共分词序列,执行判断公共分词序列对应的语句是否合法的步骤之前,响应于新的公共分词序列满足预设条件,舍弃公共分词序列;响应于新的公共分词序列不满足预设条件,针对新的公共分词序列,执行判断公共分词序列对应的语句是否合法的步骤。According to an embodiment of the present disclosure, the acquisition module 602 is also configured to: for the new public word segmentation sequence, before executing the step of judging whether the statement corresponding to the public word segmentation sequence is legal, in response to the new public word segmentation sequence meeting the preset conditions, discard the Public word segmentation sequence; in response to the new public word segmentation sequence not meeting the preset conditions, perform the step of judging whether the statement corresponding to the public word segmentation sequence is legal for the new public word segmentation sequence.
根据本公开的一个实施方式,预设条件为公共分词序列的编辑次数大于预设次数,或者新的公共分词序列的分词个数小于预设分词数。According to one embodiment of the present disclosure, the preset condition is that the number of edits of the common word segmentation sequence is greater than the preset number of times, or the number of word segments of the new public word segmentation sequence is less than the preset number of word segmentations.
根据本公开的一个实施方式,获取模块602,进一步可用于:根据语句的完整性和/或语句在问答场景中出现的概率,判断语句是否合法。According to an embodiment of the present disclosure, the acquisition module 602 may be further configured to determine whether a statement is legal based on the completeness of the statement and/or the probability of the statement appearing in the question and answer scenario.
根据本公开的一个实施方式,获取模块602,进一步还可用于:基于训练好的目标分类模型,判断语句是否合法,其中目标分类模型为基于用户日志中的问题句作为正样本,以及将进行字符删减后的正样本作为负样本,训练得到的。According to an embodiment of the present disclosure, the acquisition module 602 can further be used to: determine whether a statement is legal based on a trained target classification model, where the target classification model is based on question sentences in user logs as positive samples, and will perform character The pruned positive samples are used as negative samples and are obtained through training.
根据本公开的一个实施方式,获取模块602,进一步用于:删除公共分词序列中的第一个分词或者最后一个分词,以将分词删除后的公共分词序列作为新的公共分词序列According to an embodiment of the present disclosure, the acquisition module 602 is further configured to: delete the first word segmentation or the last word segmentation in the public word segmentation sequence, and use the public word segmentation sequence after the word segmentation is deleted as a new public word segmentation sequence.
根据本公开的一个实施方式,获取模块602,进一步可用于:分别确定候选混淆问题句与问答知识库中的任一问题句之间的相似度;根据相似度确定候选混淆问题句的混淆度得分。According to an embodiment of the present disclosure, the acquisition module 602 may be further configured to: determine the similarity between the candidate confusing question sentence and any question sentence in the question and answer knowledge base; determine the confusion score of the candidate confusing question sentence based on the similarity. .
需要说明的是,上述对问答知识库的优化方法实施例的解释说明,也适用于本实施例的问答知识库的优化装置,具体过程此处不再赘述。 It should be noted that the above explanation of the embodiment of the optimization method for the question and answer knowledge base is also applicable to the optimization device of the question and answer knowledge base in this embodiment, and the specific process will not be described again here.
本公开实施例提出了一种问答知识库的优化装置,从问答知识库包括的问题集中选取目标问题句,并根据问题集,获取目标问题句对应的目标混淆问题句;确定目标混淆问题句对应的关联知识点,并将目标混淆问题句归属到关联知识点对应的问题集中。通过挖掘各个知识点的混淆问题句,并将混淆问题句归属到对应的关联知识点下,来扩充问答知识库,增强问答知识库对用户可能提出的问题句的涵盖范围,在基于该问答知识库实现常用问题解答或检索功能时,能够正确匹配到用户提出的问题句对应的知识点,增强应答效果。The embodiment of the present disclosure proposes an optimization device for the question and answer knowledge base, which selects the target question sentence from the question set included in the question and answer knowledge base, and obtains the target confusion question sentence corresponding to the target question sentence according to the question set; determines the target confusion question sentence corresponding to associated knowledge points, and attribute the target confusing question sentences to the question set corresponding to the associated knowledge points. By mining the confusing question sentences of each knowledge point and assigning the confused question sentences to the corresponding associated knowledge points, the question and answer knowledge base is expanded and the coverage of the question and answer knowledge base that may be raised by the user is enhanced. Based on the question and answer knowledge, When the library implements common question answering or retrieval functions, it can correctly match the knowledge points corresponding to the question sentences raised by users, thereby enhancing the response effect.
为了实现上述实施例,本公开实施例还提出一种电子设备700,如图7所示,该电子设备700包括:处理器701和处理器通信连接的存储器702,存储器702存储有可被至少一个处理器执行的指令,指令被至少一个处理器701执行,以实现如本公开任一实施例所示的问答知识库的优化方法。In order to implement the above embodiments, an embodiment of the present disclosure also proposes an electronic device 700. As shown in Figure 7, the electronic device 700 includes: a processor 701 and a memory 702 communicatively connected to the processor. The memory 702 stores information that can be used by at least one The instructions executed by the processor are executed by at least one processor 701 to implement the optimization method of the question and answer knowledge base as shown in any embodiment of the present disclosure.
为了实现上述实施例,本公开实施例还提出一种存储有计算机指令的非瞬时计算机可读存储介质,其中,计算机指令用于使计算机实现如本公开任一实施例所示的问答知识库的优化方法。In order to implement the above embodiments, embodiments of the present disclosure also provide a non-transient computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to implement the question and answer knowledge base as shown in any embodiment of the present disclosure. Optimization.
为了实现上述实施例,本公开实施例还提出一种计算机程序产品,包括计算机程序,计算机程序在被处理器执行时实现如本公开任一实施例所示的问答知识库的优化方法。In order to implement the above embodiments, embodiments of the present disclosure also provide a computer program product, which includes a computer program. When executed by a processor, the computer program implements the optimization method of the question and answer knowledge base as shown in any embodiment of the present disclosure.
为了实现上述实施例,本公开实施例还提出一种计算机程序,该计算机程序包括计算机程序代码,当该计算机程序代码在计算机上运行时,使得计算机执行本公开任一实施例所示的问答知识库的优化方法。In order to implement the above embodiments, an embodiment of the present disclosure also proposes a computer program. The computer program includes a computer program code. When the computer program code is run on a computer, it causes the computer to execute the question and answer knowledge shown in any embodiment of the present disclosure. Library optimization methods.
需要说明的是,前述对方法、装置实施例的解释说明也适用于上述实施例的电子设备、计算机可读存储介质、计算机程序产品和计算机程序,此处不再赘述。It should be noted that the foregoing explanations of the method and device embodiments also apply to the electronic equipment, computer-readable storage media, computer program products and computer programs of the above-mentioned embodiments, and will not be described again here.
在本公开的描述中,需要理解的是,术语“中心”、“纵向”、“横向”、“长度”、“宽度”、“厚度”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”“内”、“外”、“顺时针”、“逆时针”、“轴向”、“径向”、“周向”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本公开和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本公开的限制。In the description of the present disclosure, it should be understood that the terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", " "Back", "Left", "Right", "Vertical", "Horizontal", "Top", "Bottom", "Inside", "Outside", "Clockwise", "Counterclockwise", "Axis", The orientation or positional relationship indicated by "radial direction", "circumferential direction", etc. are based on the orientation or positional relationship shown in the drawings, and are only for convenience of describing the present disclosure and simplifying the description, and do not indicate or imply the device or element to which they are referred. Must have a specific orientation, be constructed and operate in a specific orientation and therefore should not be construed as limiting the disclosure.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本公开的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。In addition, the terms “first” and “second” are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present disclosure, "plurality" means two or more than two, unless otherwise expressly and specifically limited.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须 针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, reference to the terms "one embodiment,""someembodiments,""anexample,""specificexamples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials, or features are included in at least one embodiment or example of the present disclosure. In this specification, the schematic expressions of the above terms are not necessarily The same embodiment or example is intended. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.
尽管上面已经示出和描述了本公开的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本公开的限制,本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present disclosure have been shown and described above, it can be understood that the above-mentioned embodiments are illustrative and should not be construed as limitations of the present disclosure. Those of ordinary skill in the art can make modifications to the above-mentioned embodiments within the scope of the present disclosure. The embodiments are subject to changes, modifications, substitutions and variations.
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。 All embodiments of the present disclosure can be executed alone or in combination with other embodiments, which are considered to be within the scope of protection claimed by the present disclosure.

Claims (22)

  1. 一种问答知识库的优化方法,其特征在于,包括:An optimization method for question and answer knowledge base, which is characterized by including:
    确定问答知识库,所述问答知识库中包括知识点和知识点对应的问题集;Determine a question and answer knowledge base, which includes knowledge points and question sets corresponding to the knowledge points;
    从所述问答知识库包括的问题集中选取目标问题句,并根据所述问题集,获取所述目标问题句对应的目标混淆问题句;Select a target question sentence from the question set included in the question and answer knowledge base, and obtain the target confusion question sentence corresponding to the target question sentence according to the question set;
    确定所述目标混淆问题句对应的关联知识点,并将所述目标混淆问题句归属到所述关联知识点对应的问题集中。Determine the associated knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the associated knowledge point.
  2. 根据权利要求1所述的优化方法,其特征在于,所述根据所述问题集,获取所述目标问题句对应的目标混淆问题句,包括:The optimization method according to claim 1, characterized in that, according to the question set, obtaining the target confusion question sentence corresponding to the target question sentence includes:
    确定生成所述目标混淆问题句的参考问题句;Determine the reference question sentence that generates the target confusion question sentence;
    确定所述目标问题句与所述参考问题句的相似度;Determine the similarity between the target question sentence and the reference question sentence;
    根据所述相似度,确定所述目标问题句的相似问题句集合;According to the similarity, determine a set of similar question sentences of the target question sentence;
    根据所述目标问题句和所述相似问题句集合,获取所述目标混淆问题句。The target confusing question sentence is obtained according to the target question sentence and the set of similar question sentences.
  3. 根据权利要求2所述的优化方法,其特征在于,所述根据所述目标问题句和所述相似问题句集合,获取所述目标混淆问题句之前,还包括:The optimization method according to claim 2, characterized in that, before obtaining the target confusion question sentence according to the target question sentence and the set of similar question sentences, it further includes:
    对所述问答知识库下的问题句进行分词处理,得到所述问题句对应的分词序列。Perform word segmentation processing on the question sentences under the question and answer knowledge base to obtain the word segmentation sequence corresponding to the question sentence.
  4. 根据权利要求2或3所述的优化方法,其特征在于,所述根据所述目标问题句和所述相似问题句集合,获取所述目标混淆问题句,包括:The optimization method according to claim 2 or 3, characterized in that, obtaining the target confusing question sentence based on the target question sentence and the set of similar question sentences includes:
    将所述目标问题句与所述相似问题句集合中的任一所述相似问题句组成问题句对;Form a question sentence pair by forming the target question sentence and any similar question sentence in the set of similar question sentences;
    基于所述问题句对中两个问题句各自的分词序列,获取所述问题句对应的公共分词序列;Based on the respective word segmentation sequences of the two question sentences in the question sentence pair, obtain the common word segmentation sequence corresponding to the question sentences;
    根据所述公共分词序列生成所述目标混淆问题句。The target confusion question sentence is generated according to the common word segmentation sequence.
  5. 根据权利要求4所述的优化方法,其特征在于,所述基于所述问题句对中两个问题句各自的分词序列,获取所述问题句对应的公共分词序列,包括:The optimization method according to claim 4, characterized in that, based on the respective word segmentation sequences of two question sentences in the question sentence pair, obtaining the common word segmentation sequence corresponding to the question sentence includes:
    获取两个所述分词序列中的同义分词;Obtain synonymous participles in the two word participle sequences;
    将所述同义分词进行归一化,得到所述同义分词的目标分词; Normalize the synonymous participles to obtain the target participles of the synonymous participles;
    利用所述目标分词替换两个所述分词序列中的所述同义分词,得到两个替换后的分词序列;Use the target word segmentation to replace the synonymous word segmentation in the two word segmentation sequences to obtain two replaced word segmentation sequences;
    对两个所述替换后的分词序列进行比对,生成所述公共分词序列。The two replaced word segmentation sequences are compared to generate the common word segmentation sequence.
  6. 根据权利要求4或5所述的优化方法,其特征在于,所述根据所述公共分词序列生成所述目标混淆问题句,包括:The optimization method according to claim 4 or 5, characterized in that generating the target confusion question sentence according to the common word segmentation sequence includes:
    基于所述公共分词序列得到所述目标问题句的候选混淆问题句;Obtain candidate confusing question sentences of the target question sentence based on the common word segmentation sequence;
    确定所述候选混淆问题句的混淆度得分;Determine the confusion score of the candidate confusing question sentence;
    根据所述混淆度得分,从所述候选混淆问题句中确定所述目标混淆问题句。According to the confusion score, the target confusion question sentence is determined from the candidate confusion question sentences.
  7. 根据权利要求6所述的优化方法,其特征在于,所述基于所述公共分词序列得到所述目标问题句的候选混淆问题句,包括:The optimization method according to claim 6, characterized in that said obtaining the candidate confusing question sentence of the target question sentence based on the common word segmentation sequence includes:
    判断所述公共分词序列对应的语句是否合法;Determine whether the statement corresponding to the public word segmentation sequence is legal;
    响应于所述公共分词序列对应的语句不合法,对所述公共分词序列进行编辑,生成新的所述公共分词序列,针对新的所述公共分词序列,执行所述判断所述公共分词序列对应的语句是否合法的步骤;In response to the statement corresponding to the public word segmentation sequence being illegal, the public word segmentation sequence is edited to generate a new public word segmentation sequence, and for the new public word segmentation sequence, the judgment is performed that the public word segmentation sequence corresponds to Steps to determine whether the statement is legal;
    响应于所述公共分词序列对应的语句合法,将合法的所述语句确定为所述候选混淆问题句。In response to the statement corresponding to the common word segmentation sequence being legal, the legal statement is determined as the candidate confusion question sentence.
  8. 根据权利要求7所述的优化方法,其特征在于,所述对所述公共分词序列进行编辑之前,还包括:The optimization method according to claim 7, characterized in that before editing the common word segmentation sequence, it further includes:
    响应于所述公共分词序列满足预设条件,舍弃所述公共分词序列;In response to the common word segmentation sequence meeting the preset condition, discard the common word segmentation sequence;
    响应于所述公共分词序列不满足预设条件,执行所述对所述公共分词序列进行编辑的步骤。In response to the common word segmentation sequence not satisfying the preset condition, the step of editing the common word segmentation sequence is performed.
  9. 根据权利要求8所述的优化方法,其特征在于,所述预设条件为所述公共分词序列的总编辑次数大于预设次数,或者公共分词序列的分词个数小于预设分词数。The optimization method according to claim 8, wherein the preset condition is that the total number of edits of the common word segmentation sequence is greater than the preset number of times, or the number of word segments of the common word segmentation sequence is less than the preset number of word segmentations.
  10. 根据权利要求7至9中任一项所述的优化方法,其特征在于,所述判断所述公共分词序列对应的语句是否合法,包括:The optimization method according to any one of claims 7 to 9, characterized in that determining whether the statement corresponding to the common word segmentation sequence is legal includes:
    根据所述语句的完整性和/或所述语句在问答场景中出现的概率,判断所述语句是否合法。 Determine whether the statement is legal based on the completeness of the statement and/or the probability of the statement appearing in the question and answer scenario.
  11. 根据权利要求7至9中任一项所述的优化方法,其特征在于,所述判断所述公共分词序列对应的语句是否合法,包括:The optimization method according to any one of claims 7 to 9, characterized in that determining whether the statement corresponding to the common word segmentation sequence is legal includes:
    基于训练好的目标分类模型,判断所述语句是否合法,其中所述目标分类模型为基于用户日志中的问题句作为正样本,以及将进行字符删减后的所述正样本作为负样本,训练得到的。Based on the trained target classification model, determine whether the statement is legal, wherein the target classification model is based on the question sentence in the user log as a positive sample, and the positive sample after character deletion is used as a negative sample, training owned.
  12. 根据权利要求7至11中任一项所述的优化方法,其特征在于,所述对所述公共分词序列进行编辑,生成新的所述公共分词序列,包括:The optimization method according to any one of claims 7 to 11, characterized in that, editing the common word segmentation sequence and generating a new common word segmentation sequence includes:
    删除所述公共分词序列中的第一个分词或者最后一个分词,以将分词删除后的所述公共分词序列作为新的所述公共分词序列。Delete the first word segmentation or the last word segmentation in the common word segmentation sequence, so that the public word segmentation sequence after the word segmentation is deleted is used as the new common word segmentation sequence.
  13. 根据权利要求6至12中任一项所述的优化方法,其特征在于,所述确定所述候选混淆问题句的混淆度得分,包括:The optimization method according to any one of claims 6 to 12, wherein determining the confusion score of the candidate confusing question sentence includes:
    分别确定所述候选混淆问题句与所述问答知识库中的任一问题句之间的相似度;Determine the similarity between the candidate confusing question sentence and any question sentence in the question and answer knowledge base;
    根据所述相似度确定所述候选混淆问题句的混淆度得分。The confusion degree score of the candidate confusing question sentence is determined according to the similarity.
  14. 一种问答知识库的优化装置,其特征在于,包括:An optimization device for a question and answer knowledge base, which is characterized by including:
    第一确定模块,用于确定问答知识库,所述问答知识库中包括知识点和知识点对应的问题集;A first determination module is used to determine a question-answering knowledge base, wherein the question-answering knowledge base includes knowledge points and question sets corresponding to the knowledge points;
    获取模块,用于从所述问答知识库包括的问题集中选取目标问题句,并根据所述问题集,获取所述目标问题句对应的目标混淆问题句;An acquisition module, configured to select a target question sentence from a question set included in the question and answer knowledge base, and obtain a target confusion question sentence corresponding to the target question sentence according to the question set;
    第二确定模块,用于确定所述目标混淆问题句对应的关联知识点,并将所述目标混淆问题句归属到所述关联知识点对应的问题集中。The second determination module is used to determine the associated knowledge points corresponding to the target confusion question sentence, and attribute the target confusion question sentence to the question set corresponding to the associated knowledge point.
  15. 根据权利要求14所述的优化装置,其特征在于,所述获取模块,进一步用于:The optimization device according to claim 14, characterized in that the acquisition module is further used to:
    确定生成所述目标混淆问题句的参考问题句;Determine the reference question sentence that generates the target confusion question sentence;
    确定所述目标问题句与所述参考问题句的相似度;Determine the similarity between the target question sentence and the reference question sentence;
    根据所述相似度,确定所述目标问题句的相似问题句集合;According to the similarity, determine a set of similar question sentences of the target question sentence;
    根据所述目标问题句和所述相似问题句集合,获取所述目标混淆问题句。The target confusing question sentence is obtained according to the target question sentence and the set of similar question sentences.
  16. 根据权利要求14或15所述的优化装置,其特征在于,所述获取模块,还用于: The optimization device according to claim 14 or 15, characterized in that the acquisition module is also used to:
    对所述问答知识库下的问题句进行分词处理,得到所述问题句对应的分词序列。Perform word segmentation processing on the question sentences under the question and answer knowledge base to obtain the word segmentation sequence corresponding to the question sentence.
  17. 根据权利要求15或16所述的优化装置,其特征在于,所述获取模块,进一步用于:The optimization device according to claim 15 or 16, characterized in that the acquisition module is further used to:
    将所述目标问题句与所述相似问题句集合中的任一所述相似问题句组成问题句对;Form a question sentence pair by forming the target question sentence and any similar question sentence in the set of similar question sentences;
    基于所述问题句对中两个问题句各自的分词序列,获取所述问题句对应的公共分词序列;Based on the respective word segmentation sequences of the two question sentences in the question sentence pair, obtain the common word segmentation sequence corresponding to the question sentences;
    根据所述公共分词序列生成所述目标混淆问题句。The target confusion question sentence is generated according to the common word segmentation sequence.
  18. 根据权利要求17所述的优化装置,其特征在于,所述获取模块,进一步用于:The optimization device according to claim 17, characterized in that the acquisition module is further used to:
    基于所述公共分词序列得到所述目标问题句的候选混淆问题句;Obtain candidate confusing question sentences of the target question sentence based on the common word segmentation sequence;
    确定所述候选混淆问题句的混淆度得分;Determine the confusion score of the candidate confusing question sentence;
    根据所述混淆度得分,从所述候选混淆问题句中确定所述目标混淆问题句。According to the confusion score, the target confusion question sentence is determined from the candidate confusion question sentences.
  19. 一种电子设备,包括:An electronic device including:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1至13中任一项所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1 to 13 Methods.
  20. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1至13中任一项所述的方法。A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to any one of claims 1 to 13.
  21. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1至13中任一项所述的方法。A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 13.
  22. 一种计算机程序,其特征在于,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行如权利要求1至13中任一项所述的方法。 A computer program, characterized in that the computer program includes computer program code, and when the computer program code is run on a computer, it causes the computer to perform the method according to any one of claims 1 to 13.
PCT/CN2023/088448 2022-09-13 2023-04-14 Optimization method and apparatus for question-and-answer knowledge base WO2024055582A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211110038.5A CN117743519A (en) 2022-09-13 2022-09-13 Question-answering knowledge base optimizing method and device
CN202211110038.5 2022-09-13

Publications (1)

Publication Number Publication Date
WO2024055582A1 true WO2024055582A1 (en) 2024-03-21

Family

ID=90257759

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/088448 WO2024055582A1 (en) 2022-09-13 2023-04-14 Optimization method and apparatus for question-and-answer knowledge base

Country Status (2)

Country Link
CN (1) CN117743519A (en)
WO (1) WO2024055582A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063799A1 (en) * 2003-06-12 2010-03-11 Patrick William Jamieson Process for Constructing a Semantic Knowledge Base Using a Document Corpus
CN110019305A (en) * 2017-12-18 2019-07-16 上海智臻智能网络科技股份有限公司 Knowledge base extended method and storage medium, terminal
CN111125379A (en) * 2019-12-26 2020-05-08 科大讯飞股份有限公司 Knowledge base expansion method and device, electronic equipment and storage medium
CN113158688A (en) * 2021-05-11 2021-07-23 科大讯飞股份有限公司 Domain knowledge base construction method, device, equipment and storage medium
CN113536776A (en) * 2021-06-22 2021-10-22 深圳价值在线信息科技股份有限公司 Confusion statement generation method, terminal device and computer-readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063799A1 (en) * 2003-06-12 2010-03-11 Patrick William Jamieson Process for Constructing a Semantic Knowledge Base Using a Document Corpus
CN110019305A (en) * 2017-12-18 2019-07-16 上海智臻智能网络科技股份有限公司 Knowledge base extended method and storage medium, terminal
CN111125379A (en) * 2019-12-26 2020-05-08 科大讯飞股份有限公司 Knowledge base expansion method and device, electronic equipment and storage medium
CN113158688A (en) * 2021-05-11 2021-07-23 科大讯飞股份有限公司 Domain knowledge base construction method, device, equipment and storage medium
CN113536776A (en) * 2021-06-22 2021-10-22 深圳价值在线信息科技股份有限公司 Confusion statement generation method, terminal device and computer-readable storage medium

Also Published As

Publication number Publication date
CN117743519A (en) 2024-03-22

Similar Documents

Publication Publication Date Title
US10956464B2 (en) Natural language question answering method and apparatus
CN109145099B (en) Question-answering method and device based on artificial intelligence
US10878009B2 (en) Translating natural language utterances to keyword search queries
US11544459B2 (en) Method and apparatus for determining feature words and server
WO2021093755A1 (en) Matching method and apparatus for questions, and reply method and apparatus for questions
CN110019732B (en) Intelligent question answering method and related device
US10437868B2 (en) Providing images for search queries
CN103914494B (en) Method and system for identifying identity of microblog user
CN110188168A (en) Semantic relationship recognition method and device
CN109508458B (en) Legal entity identification method and device
CN106874441A (en) Intelligent answer method and apparatus
US10664755B2 (en) Searching method and system based on multi-round inputs, and terminal
CN111090771B (en) Song searching method, device and computer storage medium
CN110019729B (en) Intelligent question-answering method, storage medium and terminal
CN110019728B (en) Automatic interaction method, storage medium and terminal
CN112528022A (en) Method for extracting characteristic words corresponding to theme categories and identifying text theme categories
CN118377783B (en) SQL sentence generation method and device
US11379527B2 (en) Sibling search queries
CN112907358A (en) Loan user credit scoring method, loan user credit scoring device, computer equipment and storage medium
CN111492364A (en) Data labeling method and device and storage medium
CN110297897A (en) Question and answer processing method and Related product
TW202001621A (en) Corpus generating method and apparatus, and human-machine interaction processing method and apparatus
CN112579729A (en) Training method and device for document quality evaluation model, electronic equipment and medium
CN111062832A (en) Auxiliary analysis method and device for intelligently providing patent answer and debate opinions
CN114780589A (en) Multi-table connection query method, device, device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23864323

Country of ref document: EP

Kind code of ref document: A1