[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2019153607A1 - Intelligent response method, electronic device and storage medium - Google Patents

Intelligent response method, electronic device and storage medium Download PDF

Info

Publication number
WO2019153607A1
WO2019153607A1 PCT/CN2018/089882 CN2018089882W WO2019153607A1 WO 2019153607 A1 WO2019153607 A1 WO 2019153607A1 CN 2018089882 W CN2018089882 W CN 2018089882W WO 2019153607 A1 WO2019153607 A1 WO 2019153607A1
Authority
WO
WIPO (PCT)
Prior art keywords
question
consulting
candidate
similarity
vocabulary
Prior art date
Application number
PCT/CN2018/089882
Other languages
French (fr)
Chinese (zh)
Inventor
于凤英
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019153607A1 publication Critical patent/WO2019153607A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Definitions

  • the present application relates to the field of computer technologies, and in particular, to an intelligent response method, an electronic device, and a storage medium.
  • AI Artificial Intelligence
  • the intelligent answering method in intelligent question and answer usually adopts the method of extracting keywords from customer questions, and then finds the answer matching the keyword from the question and answer knowledge base and outputs it to the customer.
  • the current intelligent response method is insufficient in accuracy, and usually requires manual response as compensation. The problem of human resource consumption and service efficiency will be caused to a certain extent.
  • the present application provides an intelligent response method, which includes the following steps: an obtaining step of: obtaining an input consulting question, and pre-processing the consulting question, the pre-processing including word segmentation to obtain each term, Each term carries out part-of-speech tagging and named entity recognition, extracts keywords from each term, and performs statement error correction on the consulting question; and constructs steps: performing the pre-preparation on each question and answer in the question-and-answer knowledge base Processing, mapping each problem and answer after the pre-processing to the inverted record table, thereby constructing an inverted index for the question-and-answer knowledge base, and querying from the question-and-answer knowledge base by means of an inverted index query a set of candidate questions related to the consulting question, the question and answer knowledge base including a plurality of questions collated in advance and one or more answers associated with each question; a calculating step: for each candidate in the set of candidate questions a problem, respectively calculating a similarity between the consulting question and the candidate question
  • the method for calculating the text similarity between the consulting question and the corresponding candidate question comprises: counting a plurality of specified features between the consulting question and the candidate question, and linearizing the plurality of specified features Weighting calculation to obtain a text similarity between the consulting question and the corresponding candidate question; wherein the plurality of specified features include: a consulting question and a common keyword number a1 of the candidate question; a consulting question and a common key of the candidate question
  • the plurality of specified features include: a consulting question and a common keyword number a1 of the candidate question; a consulting question and a common key of the candidate question.
  • the present application further provides an electronic device including a memory and a processor, wherein the memory includes an intelligent response program, and when the smart response program is executed by the processor, the following steps are performed: the obtaining step: Acquiring the input consulting question, pre-processing the consulting question, the pre-processing includes segmentation to obtain each term, performing part-of-speech tagging and named entity recognition for each term, extracting keywords from each term, and The consulting problem is performed by the statement error correction; the constructing step: performing the pre-processing on each question and answer in the question-and-answer knowledge base, and mapping each question and answer after the pre-processing to the inverted record table, Thereby constructing an inverted index for the question and answer knowledge base, and querying the candidate question set related to the consulting question from the question and answer knowledge base by using an inverted index query, where the question and answer knowledge base includes multiple presets a question and one or more answers associated with each question; a calculation step: for each candidate question in the set of candidate questions, Do not
  • the present application further provides a computer readable storage medium including an intelligent response program, when the smart response program is executed by a processor, implementing the intelligent response method as described above Any step.
  • the intelligent response method proposed by the present application after obtaining the consulting question, first pre-processing the consulting question, and then constructing an inverted index for the question-and-answer knowledge base, and using the inverted index query method from the question-and-answer knowledge base Querying a set of candidate questions related to the consulting question, and calculating, for each candidate question in the candidate question set, a problem similarity between the consulting question and the candidate question, the problem similarity being consulted The text similarity, semantic similarity, topic similarity and syntactic similarity between the problem and the corresponding candidate problem are obtained by linear weighting. Finally, the candidate problem corresponding to the highest problem similarity calculated is selected, and the query is found in the question and answer knowledge base. Selecting one or more associated answers of the candidate question, and outputting the associated answer with the highest output frequency in the preset time period as the target answer, which can improve the accuracy and response efficiency of the intelligent response, and save Human resources to improve service quality.
  • FIG. 1 is a schematic diagram of an operating environment of a preferred embodiment of an electronic device of the present application
  • FIG. 2 is a schematic diagram of interaction between an electronic device and a client according to a preferred embodiment of the present application
  • FIG. 3 is a flow chart of a preferred embodiment of the intelligent response method of the present application.
  • FIG. 4 is a program block diagram of the smart answering program of FIG. 1.
  • embodiments of the present application can be implemented as a method, apparatus, device, system, or computer program product. Accordingly, the application can be embodied in a complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.
  • an intelligent response method, an electronic device, and a storage medium are proposed.
  • FIG. 1 is a schematic diagram of an operating environment of a preferred embodiment of an electronic device of the present application.
  • the electronic device 1 may be a terminal device having a storage and computing function such as a server, a portable computer, or a desktop computer.
  • the electronic device 1 includes a memory 11, a processor 12, a network interface 13, and a communication bus 14.
  • the network interface 13 can optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the communication bus 14 is used to implement connection communication between the above components.
  • the memory 11 includes at least one type of readable storage medium.
  • the at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card type memory, or the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1.
  • the readable storage medium may also be an external memory 11 of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC). , Secure Digital (SD) card, Flash Card, etc.
  • SMC smart memory card
  • SD Secure Digital
  • the readable storage medium of the memory 11 is generally used to store the smart response program 10, the question and answer knowledge base 4, and the like installed in the electronic device 1.
  • the memory 11 can also be used to temporarily store data that has been output or is about to be output.
  • the processor 12 in some embodiments, may be a Central Processing Unit (CPU), microprocessor or other data processing chip for running program code or processing data stored in the memory 11, such as executing an intelligent response program. 10 and so on.
  • CPU Central Processing Unit
  • microprocessor or other data processing chip for running program code or processing data stored in the memory 11, such as executing an intelligent response program. 10 and so on.
  • FIG. 1 shows only the electronic device 1 having the components 11-14 and the intelligent response program 10, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the electronic device 1 may further include a user interface
  • the user interface may include an input unit such as a keyboard, a voice input device such as a microphone, a device with a voice recognition function, a voice output device such as an audio, a headphone, and the like.
  • the user interface may also include a standard wired interface and a wireless interface.
  • the electronic device 1 may further include a display, which may also be referred to as a display screen or a display unit.
  • a display may also be referred to as a display screen or a display unit.
  • it may be an LED display, a liquid crystal display, a touch liquid crystal display, and an Organic Light-Emitting Diode (OLED) display.
  • the display is used to display information processed in the electronic device 1 and a user interface for displaying visualizations.
  • the electronic device 1 further comprises a touch sensor.
  • the area provided by the touch sensor for the user to perform a touch operation is referred to as a touch area.
  • the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like.
  • the touch sensor includes not only a contact type touch sensor but also a proximity type touch sensor or the like.
  • the touch sensor may be a single sensor or a plurality of sensors arranged, for example, in an array. The user can activate the smart answering program 10 by touching the touch area.
  • the area of the display of the electronic device 1 may be the same as or different from the area of the touch sensor.
  • a display is stacked with the touch sensor to form a touch display. The device detects a user-triggered touch operation based on a touch screen display.
  • the electronic device 1 may further include a radio frequency (RF) circuit, a sensor, an audio circuit, and the like, and details are not described herein.
  • RF radio frequency
  • FIG. 2 it is a schematic diagram of interaction between the electronic device 1 and the client 2 according to a preferred embodiment of the present application.
  • the intelligent response program 10 operates in the electronic device 1.
  • the preferred embodiment of the electronic device 1 is a server.
  • the electronic device 1 is communicatively coupled to the client 2 via a network 3.
  • the client 2 can run in various types of terminal devices, such as smart phones, portable computers, and the like.
  • the consulting question can be input to the smart answering program 10, and the smart answering program 10 can process the consulting question by using the smart answering method, so that the question and answer knowledge base is processed. Find the corresponding target answer in 4 and return the target answer to client 2 via network 3.
  • FIG. 3 it is a flowchart of a preferred embodiment of the intelligent response method of the present application. The following steps of implementing the intelligent response method when the processor 12 of the electronic device 1 executes the intelligent response program 10 stored in the memory 11:
  • Step S1 Acquire an input consultation question, and perform pre-processing on the consultation question, the pre-processing includes word segmentation to obtain each term, word-of-speech tagging and named entity recognition for each term, and extracting keywords from each term And correcting the statement of the consulting question.
  • the consultation question may be, for example, “whether the office carbon crystal geothermal pad in the New Year gift package can be quickly started”, and after the word segmentation processing of the consultation question, the obtained terms are “new spring” and “big”.
  • the step S1 may use a term whose term length after the word segmentation is greater than the first threshold (for example, the number of characters is 4) as a long cut word, and only the part of the long cut word is marked.
  • the result of the part-of-speech tagging of the long-cut word is, for example, "New Year spree/adjective + noun", "quick start/adverb + verb".
  • Step S1 may perform a named entity recognition on the long cut word by a hidden Markov model, thereby identifying an entity proper noun having a specific meaning in the consultation question, such as “Microsoft Technology Group”.
  • the entity proper noun includes, for example, a person's name, a place name, an institution name, and the like.
  • step S1 may extract a keyword from the long cut word by using a TF-IDF (Term frequency – Inverse document frequency) algorithm.
  • TF-IDF Term frequency – Inverse document frequency
  • the main idea of the TF-IDF algorithm is that if a word or phrase appears frequently in a document and has a low frequency in other documents, the word or phrase is considered to have a good class distinguishing ability. Used as a keyword.
  • Step S1 may perform sentence error correction processing for the consulting problem by using an N-gram language model and an edit distance.
  • the N-gram language model can use the collocation information between adjacent words in the context to calculate the phrase collocation with the greatest probability, thereby identifying the wrong collocation in the sentence, and providing several possible correct alternative collocation schemes.
  • an alternative scheme with the least editing cost can be determined and adopted, thereby implementing the error correction processing of the query problem. For example, "I want to find the song of Hetang Moonlight as background music", through the speech error correction processing, it can be identified that "Hetang” should be changed to " ⁇ ".
  • Step S2 performing the pre-processing on each question and answer in the question-and-answer knowledge base 4, mapping each question and answer after the pre-processing to the inverted record table, thereby being the Q&A knowledge base 4 Constructing an inverted index, and querying, by the inverted index query, a candidate question set related to the consulting question from the Q&A knowledge base 4, the Q&A knowledge base 4 includes a plurality of questions pre-organized and each question One or more answers associated.
  • Step S2 also performs the pre-processing on each question and answer in the question-and-answer knowledge base 4, and can obtain text feature information such as each term, part-of-speech tag, named entity, keyword, etc. obtained by each question and answer participle. Mapping each question and answer to a preset inverted record table according to the text feature information, mapping all questions and answers having the same entry to the entry, thereby obtaining the question and answer knowledge Library 4 constructs the inverted index.
  • the candidate question set related to the consulting question may be queried from the question-and-answer knowledge base by means of an inverted index query. At least one candidate question is included in the candidate question set, and each candidate question has a certain degree of association with the consulting question due to the manner of using an inverted index query.
  • Step S3 calculating, for each candidate question in the candidate question set, a problem similarity between the consulting question and the candidate question, the problem similarity being the text similarity between the consulting question and the corresponding candidate question , semantic similarity, topic similarity and syntactic similarity are obtained by linear weighting, wherein the similarity and semantic similarity of the text can be obtained because the consultation problem and the candidate problem usually focus more on the degree of similarity between text and semantics.
  • the weights are set to be greater than the weights of the topic similarity and syntactic similarity.
  • the method for calculating the text similarity between the consulting question and the corresponding candidate question may include the following steps:
  • a plurality of specified features between the consulting question and the candidate question are counted, and the plurality of specified features are linearly weighted to obtain a text similarity between the consulting question and the corresponding candidate question.
  • the length of the candidate question is a6.
  • the linear weighting calculation on the plurality of specified features may be implemented by using a multiple logistic regression model. Specifically, the weight of each specified feature is first calculated using an inverse document rate algorithm, and the idea of the inverse document rate algorithm is to calculate the importance of each specified feature in a preset large-scale corpus, thereby determining the weight of each specified feature. .
  • the multivariate logistic regression model is used to perform weighted regression fitting calculation on the plurality of specified features, and the text similarity g(z) of the consulting problem and the candidate problem is obtained, and the formula is as follows:
  • x1, x2, ..., x6 are the weights of the a1, a2, ..., a6, respectively.
  • the method for calculating the semantic similarity between the consulting question and the corresponding candidate question may be: using the word2vec algorithm to represent each term after the question segmentation as a word vector, and averaging the word vectors in the consulting question.
  • To the sentence vector of the consultation question use the word2vec algorithm to represent each term after the candidate problem segmentation as a word vector, average the word vectors in the candidate problem to obtain the sentence vector of the candidate question; calculate the sentence vector of the consulting question The cosine similarity between the sentence vector of the candidate question and the semantic similarity between the consulting question and the candidate question.
  • the method for calculating the topic similarity between the consulting question and the corresponding candidate question may be: using the topic expression method of the LDA linear discriminant analysis, constructing the topic vector of the consulting question, and the subject vector of the candidate question; calculating the consulting problem
  • the cosine similarity between the subject vector and the subject vector of the candidate question is obtained by the topic similarity between the question and the candidate question.
  • the method for calculating the syntactic similarity between the consulting question and the corresponding candidate question may be: using the LTP language technology platform to analyze the consulting question and the syntax of the candidate question, obtaining the consulting question and the syntax vector of the candidate question; calculating the consulting problem
  • the cosine similarity between the syntax vector and the syntax vector of the candidate problem is obtained by the syntactic similarity between the consulting problem and the candidate problem.
  • Step S3 calculates a problem similarity between the consultation question and the candidate question, and may linearly weight the text similarity, the semantic similarity, the topic similarity, and the syntactic similarity between the consultation question and the corresponding candidate question. .
  • the weights of the text similarity and the semantic similarity are set to be greater than the weights of the topic similarity and the syntactic similarity, for example, the weights of the text similarity and the semantic similarity may be set to be respectively the similarity of the theme and Syntactic similarity is 3 times.
  • Step S4 selecting a candidate question corresponding to the highest problem similarity calculated, and querying one or more associated answers of the selected candidate question in the question and answer knowledge base 4, and setting the one or more associated answers in a preset time period
  • the associated answer with the highest internal output frequency is output as the target answer.
  • the candidate question corresponding to the highest problem similarity can be regarded as the candidate question most similar to the consulting question. Therefore, the answer associated with the candidate question corresponding to the highest problem similarity in the Q&A knowledge base 4 can be used as the answer corresponding to the consulting question.
  • the selected candidate question includes a plurality of associated answers
  • one of the plurality of associated answers may be selected as the target answer.
  • the associated answer with the highest output frequency in the preset time period, for example, the target answer may be selected as the target answer.
  • step S4 may also perform humanized retouching of the target answer before outputting the target answer.
  • the preprocessing of the consulting problem in the step S1 further includes:
  • Each term obtained by the consultation problem segmentation is compared with a preset positive vocabulary and a negative vocabulary to determine whether the consulting question includes positive vocabulary or negative vocabulary.
  • the positive vocabulary is, for example, "happy wife", and the negative vocabulary such as "I want to complain”.
  • the humanized retouching of the target answer in step S4 includes:
  • the preset greeting corresponding to the positive vocabulary is obtained, for example, “I wish you happy”, and the preset greeting corresponding to the positive vocabulary is combined with the target answer;
  • the consultation question includes only the negative vocabulary
  • the preset greeting corresponding to the negative vocabulary is obtained, for example, “very sorry”, and the preset greeting corresponding to the negative vocabulary is combined with the target answer;
  • the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include a positive vocabulary and a negative vocabulary, the preset greet corresponding to the neutral vocabulary is obtained, for example, “thanks for support”, the neutrality is The preset greeting corresponding to the vocabulary is combined with the target answer.
  • the consulting question is pre-processed, and then the inverted index is constructed for the question-and-answer knowledge base 4, and the inverted index query is used to
  • the question answering knowledge base 4 queries the candidate question set related to the consulting question, and then calculates the problem similarity between the consulting question and the candidate question for each candidate question in the candidate question set, the problem
  • the similarity is obtained by linearly weighting the text similarity, semantic similarity, topic similarity and syntactic similarity between the consulting question and the corresponding candidate question, and finally selecting the candidate problem corresponding to the highest problem similarity calculated in the question and answer knowledge.
  • the library 4 queries one or more associated answers of the selected candidate question, and outputs the associated answer with the highest output frequency among the one or more associated answers in the preset time period as the target answer, which can improve the accuracy of the smart response. And response efficiency, saving human resources and improving service quality.
  • FIG. 4 it is a program module diagram of the intelligent response program 10 in FIG.
  • the intelligent response program 10 is divided into a plurality of modules, which are stored in the memory 11 and executed by the processor 12 to complete the present application.
  • a module as referred to in this application refers to a series of computer program instructions that are capable of performing a particular function.
  • the intelligent response program 10 can be divided into: an acquisition module 110, a construction module 120, a calculation module 130, and a selection module 140.
  • the obtaining module 110 is configured to obtain an input consulting question, and perform pre-processing on the consulting question, where the pre-processing includes word segmentation to obtain each term, and each term is subjected to part-of-speech tagging and named entity identification, from each term. Extracting keywords and performing statement correction on the consulting question.
  • the pre-processing of the consulting question by the obtaining module 110 may include: obtaining a term for the consulting question segmentation, using the term after the segmentation term length is greater than the first threshold as the long-cut word, and performing the long-cut word for the long-cut word Part-of-speech tagging, naming entity recognition of the long-cut word by hidden Markov model to identify proper nouns, extracting keywords from the long-cut words by TF-IDF algorithm, adopting N-gram language model and editing The distance is corrected for the query problem.
  • the construction module 120 is configured to perform the pre-processing on each question and answer in the Q&A knowledge base 4, and map each problem and answer after the pre-processing to the inverted record table, thereby
  • the knowledge base 4 constructs an inverted index, and queries the candidate question set related to the consulting question from the Q&A knowledge base 4 by means of an inverted index query, and the Q&A knowledge base 4 includes a plurality of questions arranged in advance and One or more answers associated with each question.
  • the constructing module 120 performs the pre-processing on each question and answer in the question-and-answer knowledge base 4, and can obtain each term, part-of-speech tag, named entity, keyword, etc. obtained by each of the question and answer word segments.
  • Text feature information according to the text feature information, mapping each question and answer into a preset inverted record table, mapping all questions and answers having the same entry to the entry, thereby
  • the question and answer knowledge base 4 constructs the inverted index.
  • the candidate question set related to the consulting question may be queried from the question-and-answer knowledge base by means of an inverted index query.
  • the calculating module 130 is configured to separately calculate a problem similarity between the consulting question and the candidate question for each candidate question in the candidate question set, where the problem similarity is between the consulting question and the corresponding candidate question.
  • the text similarity, the semantic similarity, the topic similarity and the syntactic similarity are obtained by linear weighting, wherein the weights of the text similarity and the semantic similarity are greater than the weights of the topic similarity and the syntactic similarity.
  • the calculating module 130 may perform linear weighting calculation on the plurality of specified features by counting a plurality of specified features between the consulting question and the candidate question, and obtain text similarity between the consulting question and the corresponding candidate question.
  • the plurality of specified features includes:
  • the length of the candidate question is a6.
  • the linear weighting calculation on the plurality of specified features may be implemented by using a multiple logistic regression model.
  • the calculation module 130 may calculate the weight of each specified feature by using an inverse document rate algorithm, and the idea of the inverse document rate algorithm is to calculate the importance degree of each specified feature in the preset large-scale corpus, thereby determining each designation. The weight of the feature.
  • the calculation module 130 performs a weighted regression fitting calculation on the plurality of specified features by using a multiple logistic regression model to obtain a text similarity g(z) of the consultation question and the candidate question, and the formula is as follows:
  • x1, x2, ..., x6 are the weights of the a1, a2, ..., a6, respectively.
  • the calculation method of the semantic similarity between the consulting question and the corresponding candidate question by the calculation module 130 may be: using the word2vec algorithm to represent each term after the question segmentation as a word vector, and consulting each word in the question The vector is averaged to obtain the sentence vector of the consulting question; the word2vec algorithm is used to represent each term after the candidate word segmentation as a word vector, and the word vectors in the candidate problem are averaged to obtain the sentence vector of the candidate question; The cosine similarity between the sentence vector of the question and the sentence vector of the candidate question is obtained by the semantic similarity between the consulting question and the candidate question.
  • the calculating method of the topic similarity between the consulting question and the corresponding candidate question by the calculating module 130 may be: constructing a topic vector of the consulting question by using the topic expression method of the LDA linear discriminant analysis, and the subject vector of the candidate question; The cosine similarity between the subject vector of the consulting question and the subject vector of the candidate question is calculated, and the subject similarity between the consulting question and the candidate question is obtained.
  • the calculation method of the syntax similarity between the consulting question and the corresponding candidate question by the calculation module 130 may be: analyzing the consulting question and the syntax of the candidate question by using the LTP language technology platform, and obtaining the consulting question and the syntax vector of the candidate question. Calculating the cosine similarity between the syntax vector of the consulting problem and the syntax vector of the candidate problem, and obtaining the syntactic similarity between the consulting problem and the candidate problem.
  • the calculating module 130 calculates a problem similarity between the consulting question and the candidate question, and may linearly weight the text similarity, the semantic similarity, the topic similarity, and the syntactic similarity between the consulting question and the corresponding candidate question. get.
  • the consultation problem and the candidate question usually focus more on the similarity between text and semantics
  • the weights of the text similarity and the semantic similarity are set to be greater than the weights of the topic similarity and the syntactic similarity.
  • the selecting module 140 is configured to select a candidate question corresponding to the highest problem similarity calculated, and query one or more associated answers of the selected candidate question in the question and answer knowledge base, and preset the one or more associated answers The associated answer with the highest output frequency in the time period is output as the target answer.
  • the selection module 140 may first perform humanized retouching of the target answer before outputting the target answer.
  • the pre-processing of the consulting problem by the obtaining module 110 further includes: comparing each term obtained by the consulting problem segmentation with a preset positive vocabulary and a negative vocabulary, and determining the consulting problem. Whether it contains positive or negative words.
  • the selection module 140 if the consultation question includes only the positive vocabulary, the selection module 140 obtains the preset greeting corresponding to the positive vocabulary, and the preset greeting corresponding to the positive vocabulary The target answers are combined;
  • the selection module 140 acquires the preset greeting corresponding to the negative vocabulary, and combines the preset greeting corresponding to the negative vocabulary with the target answer;
  • the selecting module 140 obtains the preset greeting corresponding to the neutral vocabulary, and corresponds to the neutral vocabulary.
  • the default greeting is combined with the target answer.
  • the memory 11 including the readable storage medium may include an operating system, an intelligent response program 10, and a question and answer knowledge base 4.
  • the processor 12 executes the smart response program 10 stored in the memory 11, the following steps are implemented:
  • Obtaining step obtaining an input consulting question, and pre-processing the consulting question, the pre-processing includes segmentation to obtain each term, performing part-of-speech tagging and named entity recognition for each term, and extracting keywords from each term And correcting the statement of the consulting question;
  • the Q&A knowledge base 4 includes a plurality of questions pre-organized and each question One or more answers associated with;
  • a calculating step calculating, for each candidate question in the candidate question set, a problem similarity between the consulting question and the candidate question, the problem similarity being the text similarity between the consulting question and the corresponding candidate question
  • the semantic similarity, the topic similarity, and the syntactic similarity are obtained by linear weighting, wherein the weights of the text similarity and the semantic similarity are greater than the weights of the topic similarity and the syntactic similarity;
  • the selecting step selecting the candidate question corresponding to the highest problem similarity calculated, querying the question answering knowledge base 4 for one or more associated answers of the selected candidate question, and setting the one or more associated answers in a preset time period
  • the associated answer with the highest internal output frequency is output as the target answer.
  • the pre-processing includes word segmentation to obtain each term, the term after the segmentation term is greater than the first threshold as a long-cut word, the long-cut word is part-of-speech tagged, and the hidden Markov model is used to
  • the long cut word performs the named entity recognition to identify the proper noun, uses the TF-IDF algorithm to extract the keyword from the long cut word, uses the N-gram language model and the edit distance to perform the statement error correction processing for the consulting problem.
  • the method for calculating the text similarity between the consulting question and the corresponding candidate question includes:
  • the length of the candidate question is a6.
  • Performing a linear weighting calculation on the plurality of specified features to obtain a text similarity between the consulting question and the corresponding candidate question includes:
  • the inverse document rate algorithm is used to calculate the weight of each specified feature
  • the multiple logistic regression model is used to perform weighted regression fitting calculation on the plurality of specified features, and the text similarity g(z) of the consulting problem and the candidate problem is obtained, and the formula is as follows :
  • x1, x2, ..., x6 are the weights of the a1, a2, ..., a6, respectively.
  • the method for calculating the semantic similarity between the consulting question and the corresponding candidate question includes:
  • the word2vec algorithm is used to represent each term after the question segmentation as a word vector, and the word vectors in the consultation question are averaged to obtain the sentence vector of the consultation question;
  • each term after the candidate problem segmentation is represented as a word vector, and the word vectors in the candidate question are averaged to obtain a sentence vector of the candidate question;
  • the method for calculating the topic similarity between the consulting question and the corresponding candidate question includes:
  • the method for calculating the syntactic similarity between the consulting question and the corresponding candidate question includes:
  • the cosine similarity between the syntax vector of the consulting problem and the syntax vector of the candidate problem is calculated, and the syntactic similarity between the consulting problem and the candidate problem is obtained.
  • the preprocessing further includes:
  • the consulting question includes only the positive vocabulary, the preset greeting corresponding to the positive vocabulary is obtained, and the preset greeting corresponding to the positive vocabulary is combined with the target answer;
  • the consultation question includes only the negative vocabulary
  • the preset greeting corresponding to the negative vocabulary is obtained, and the preset greeting corresponding to the negative vocabulary is combined with the target answer;
  • the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include the positive vocabulary and the negative vocabulary, the preset greeting corresponding to the neutral vocabulary is obtained, and the preset greeting corresponding to the neutral vocabulary is obtained.
  • the language is combined with the target answer.
  • the embodiment of the present application further provides a computer readable storage medium, which may be a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a read only memory (ROM), and an erasable programmable Any combination or combination of any one or more of read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, and the like.
  • the computer readable storage medium includes the Q&A knowledge base 4, the intelligent response program 10, and the like. When the intelligent response program 10 is executed by the processor 12, the following operations are implemented:
  • Obtaining step obtaining an input consulting question, and pre-processing the consulting question, the pre-processing includes segmentation to obtain each term, performing part-of-speech tagging and named entity recognition for each term, and extracting keywords from each term And correcting the statement of the consulting question;
  • the Q&A knowledge base 4 includes a plurality of questions pre-organized and each question One or more answers associated with;
  • a calculating step calculating, for each candidate question in the candidate question set, a problem similarity between the consulting question and the candidate question, the problem similarity being the text similarity between the consulting question and the corresponding candidate question
  • the semantic similarity, the topic similarity, and the syntactic similarity are obtained by linear weighting, wherein the weights of the text similarity and the semantic similarity are greater than the weights of the topic similarity and the syntactic similarity;
  • the selecting step selecting the candidate question corresponding to the highest problem similarity calculated, querying the question answering knowledge base 4 for one or more associated answers of the selected candidate question, and setting the one or more associated answers in a preset time period
  • the associated answer with the highest internal output frequency is output as the target answer.
  • the pre-processing includes word segmentation to obtain each term, the term after the segmentation term is greater than the first threshold as a long-cut word, the long-cut word is part-of-speech tagged, and the hidden Markov model is used to
  • the long cut word performs the named entity recognition to identify the proper noun, uses the TF-IDF algorithm to extract the keyword from the long cut word, uses the N-gram language model and the edit distance to perform the statement error correction processing for the consulting problem.
  • the method for calculating the text similarity between the consulting question and the corresponding candidate question includes:
  • the length of the candidate question is a6.
  • Performing a linear weighting calculation on the plurality of specified features to obtain a text similarity between the consulting question and the corresponding candidate question includes:
  • the inverse document rate algorithm is used to calculate the weight of each specified feature
  • the multiple logistic regression model is used to perform weighted regression fitting calculation on the plurality of specified features, and the text similarity g(z) of the consulting problem and the candidate problem is obtained, and the formula is as follows :
  • x1, x2, ..., x6 are the weights of the a1, a2, ..., a6, respectively.
  • the method for calculating the semantic similarity between the consulting question and the corresponding candidate question includes:
  • the word2vec algorithm is used to represent each term after the question segmentation as a word vector, and the word vectors in the consultation question are averaged to obtain the sentence vector of the consultation question;
  • each term after the candidate problem segmentation is represented as a word vector, and the word vectors in the candidate question are averaged to obtain a sentence vector of the candidate question;
  • the method for calculating the topic similarity between the consulting question and the corresponding candidate question includes:
  • the method for calculating the syntactic similarity between the consulting question and the corresponding candidate question includes:
  • the cosine similarity between the syntax vector of the consulting problem and the syntax vector of the candidate problem is calculated, and the syntactic similarity between the consulting problem and the candidate problem is obtained.
  • the preprocessing further includes:
  • the consulting question includes only the positive vocabulary, the preset greeting corresponding to the positive vocabulary is obtained, and the preset greeting corresponding to the positive vocabulary is combined with the target answer;
  • the consultation question includes only the negative vocabulary
  • the preset greeting corresponding to the negative vocabulary is obtained, and the preset greeting corresponding to the negative vocabulary is combined with the target answer;
  • the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include the positive vocabulary and the negative vocabulary, the preset greeting corresponding to the neutral vocabulary is obtained, and the preset greeting corresponding to the neutral vocabulary is obtained.
  • the language is combined with the target answer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An intelligent response method, an electronic device and a storage medium, the method comprising: preprocessing a consultation question after acquiring the consultation question, and then constructing an inverted index for a question and answer knowledge base; querying a candidate question set related to the consultation question from the question and answer knowledge base by means of an inverted index query mode; calculating a question similarity between each candidate question in the candidate question set and the consultation question, the question similarity being obtained by means of linear weighting of a text similarity, a semantic similarity, a subject matter similarity and a syntax similarity between the consultation question and the corresponding candidate question; and finally, selecting the candidate question corresponding to the highest question similarity obtained by calculation, and querying an answer associated with the selected candidate question in the question and answer knowledge base to serve as a target answer to be outputted. Thus, the accuracy and response efficiency of intelligent responses may be increased, and the quality of service may be improved.

Description

智能应答方法、电子装置及存储介质Intelligent response method, electronic device and storage medium
本申请要求于2018年2月9日提交中国专利局,申请号为201810134579.9、发明名称为“智能应答方法、电子装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to Chinese Patent Application No. 201101134579.9, entitled "Intelligent Response Method, Electronic Device, and Storage Media", filed on February 9, 2018, the entire contents of which are incorporated herein by reference. In the application.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种智能应答方法、电子装置及存储介质。The present application relates to the field of computer technologies, and in particular, to an intelligent response method, an electronic device, and a storage medium.
背景技术Background technique
随着科技的发展,人工智能(Artificial Intelligence,AI)正逐步改变着我们的生活方式,例如智能问答就是其中一种。当客户通过文字或语音在线咨询时,可以由线上的智能客服为客户进行智能应答。智能问答可以有效缓解客户服务的等待状况,提升服务质量,因而有着非常广阔的前景。With the development of technology, Artificial Intelligence (AI) is gradually changing our way of life. For example, smart question and answer is one of them. When the customer consults online via text or voice, the customer can be intelligently answered by the online intelligent customer service. Intelligent Q&A can effectively alleviate the waiting situation of customer service and improve service quality, so it has a very broad prospect.
目前智能问答中智能应答通常采用的方式是对客户问题进行关键词提取,再从问答知识库中找出与该关键词匹配的答案输出给客户。然而,由于语言文字的博大精深变化多样,仅仅从关键词上很难准确识别出客户问题代表的真正意图,因此目前的智能应答方式在准确性上存在不足,且通常还需要人工应答作为弥补,这将在一定程度上造成人力资源耗费和服务效率低下的问题。At present, the intelligent answering method in intelligent question and answer usually adopts the method of extracting keywords from customer questions, and then finds the answer matching the keyword from the question and answer knowledge base and outputs it to the customer. However, due to the profound and varied language and language, it is difficult to accurately identify the true intention of the customer's problem from the keyword. Therefore, the current intelligent response method is insufficient in accuracy, and usually requires manual response as compensation. The problem of human resource consumption and service efficiency will be caused to a certain extent.
发明内容Summary of the invention
鉴于以上原因,有必要提供一种智能应答方法、电子装置及存储介质,可以提高智能应答的准确性和应答效率,节约人力资源,提高服务质量。In view of the above reasons, it is necessary to provide an intelligent response method, an electronic device and a storage medium, which can improve the accuracy and response efficiency of the intelligent response, save human resources, and improve service quality.
为实现上述目的,本申请提供一种智能应答方法,该方法包括以下步骤:获取步骤:获取输入的咨询问题,对所述咨询问题进行预处理,所述预处理包括分词得到各词条、对每个词条进行词性标注和命名实体识别、从各词条中提取关键词,以及对所述咨询问题进行语句纠错;构建步骤:对问答知识库中的每个问题和答案进行所述预处理,将经所述预处理后的每个问题和答案映射到倒排记录表中,从而为所述问答知识库构建倒排索引,通过倒排索引查询的方式从所述问答知识库中查询出与所述咨询问题相关的候选问题集合,所述问答知识库包括预先整理的多个问题以及每个问题关联的一个或多个答案;计算步骤:针对所述候选问题集合中的每个候选问题,分别计算所述咨询问题与该候选问题的问题相似度,所述问题相似度由咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到,其中所述文本相似度和语义相似度的权重均大于所述主题相似度和句法相似度的权重;选择步骤:选择计算得到的最高问题相似度对应的候选问题,在问答知识库中查询所选择候选问题的一个或多个关联答案,将 所述一个或多个关联答案中在预设时间段内输出频率最高的关联答案作为目标答案输出。To achieve the above objective, the present application provides an intelligent response method, which includes the following steps: an obtaining step of: obtaining an input consulting question, and pre-processing the consulting question, the pre-processing including word segmentation to obtain each term, Each term carries out part-of-speech tagging and named entity recognition, extracts keywords from each term, and performs statement error correction on the consulting question; and constructs steps: performing the pre-preparation on each question and answer in the question-and-answer knowledge base Processing, mapping each problem and answer after the pre-processing to the inverted record table, thereby constructing an inverted index for the question-and-answer knowledge base, and querying from the question-and-answer knowledge base by means of an inverted index query a set of candidate questions related to the consulting question, the question and answer knowledge base including a plurality of questions collated in advance and one or more answers associated with each question; a calculating step: for each candidate in the set of candidate questions a problem, respectively calculating a similarity between the consulting question and the candidate question, the problem similarity being asked by the consulting question and the corresponding candidate question The text similarity, semantic similarity, topic similarity and syntactic similarity are obtained by linear weighting, wherein the weights of the text similarity and the semantic similarity are greater than the weights of the topic similarity and the syntactic similarity; Step: selecting a candidate question corresponding to the highest problem similarity calculated, querying one or more associated answers of the selected candidate question in the question and answer knowledge base, and outputting the one or more associated answers in a preset time period The most frequently associated answer is output as the target answer.
可选地,所述咨询问题和相应的候选问题之间的文本相似度的计算方法包括:统计所述咨询问题与该候选问题之间的多个指定特征,对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度;其中,所述多个指定特征包括:咨询问题和该候选问题的共同关键词数量a1;咨询问题和该候选问题的共同关键词长度a2;咨询问题和该候选问题的共同词条的数量a3;咨询问题和该候选问题的共同词条的长度a4;咨询问题的长度a5;该候选问题的长度a6。Optionally, the method for calculating the text similarity between the consulting question and the corresponding candidate question comprises: counting a plurality of specified features between the consulting question and the candidate question, and linearizing the plurality of specified features Weighting calculation to obtain a text similarity between the consulting question and the corresponding candidate question; wherein the plurality of specified features include: a consulting question and a common keyword number a1 of the candidate question; a consulting question and a common key of the candidate question The word length a2; the number of consultation questions and the number of common terms of the candidate question a3; the length of the consultation question and the common term of the candidate question a4; the length of the consultation question a5; the length of the candidate question a6.
为实现上述目的,本申请还提供一种电子装置,该电子装置包括存储器和处理器,所述存储器中包括智能应答程序,该智能应答程序被所述处理器执行时实现如下步骤:获取步骤:获取输入的咨询问题,对所述咨询问题进行预处理,所述预处理包括分词得到各词条、对每个词条进行词性标注和命名实体识别、从各词条中提取关键词,以及对所述咨询问题进行语句纠错;构建步骤:对问答知识库中的每个问题和答案进行所述预处理,将经所述预处理后的每个问题和答案映射到倒排记录表中,从而为所述问答知识库构建倒排索引,通过倒排索引查询的方式从所述问答知识库中查询出与所述咨询问题相关的候选问题集合,所述问答知识库包括预先整理的多个问题以及每个问题关联的一个或多个答案;计算步骤:针对所述候选问题集合中的每个候选问题,分别计算所述咨询问题与该候选问题的问题相似度,所述问题相似度由咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到,其中所述文本相似度和语义相似度的权重均大于所述主题相似度和句法相似度的权重;选择步骤:选择计算得到的最高问题相似度对应的候选问题,在问答知识库中查询所选择候选问题的一个或多个关联答案,将所述一个或多个关联答案中在预设时间段内输出频率最高的关联答案作为目标答案输出。To achieve the above object, the present application further provides an electronic device including a memory and a processor, wherein the memory includes an intelligent response program, and when the smart response program is executed by the processor, the following steps are performed: the obtaining step: Acquiring the input consulting question, pre-processing the consulting question, the pre-processing includes segmentation to obtain each term, performing part-of-speech tagging and named entity recognition for each term, extracting keywords from each term, and The consulting problem is performed by the statement error correction; the constructing step: performing the pre-processing on each question and answer in the question-and-answer knowledge base, and mapping each question and answer after the pre-processing to the inverted record table, Thereby constructing an inverted index for the question and answer knowledge base, and querying the candidate question set related to the consulting question from the question and answer knowledge base by using an inverted index query, where the question and answer knowledge base includes multiple presets a question and one or more answers associated with each question; a calculation step: for each candidate question in the set of candidate questions, Do not calculate the similarity between the consulting question and the candidate question, the problem similarity is linearly weighted by text similarity, semantic similarity, topic similarity and syntactic similarity between the consulting question and the corresponding candidate question The weight of the text similarity and the semantic similarity are greater than the weight of the topic similarity and the syntactic similarity; the selecting step: selecting the candidate problem corresponding to the highest problem similarity calculated, and querying the query in the question and answer knowledge base One or more associated answers of the candidate questions are selected, and the associated answers of the one or more associated answers that have the highest output frequency within the preset time period are output as the target answers.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中包括智能应答程序,该智能应答程序被处理器执行时,实现如上所述的智能应答方法的任意步骤。In addition, in order to achieve the above object, the present application further provides a computer readable storage medium including an intelligent response program, when the smart response program is executed by a processor, implementing the intelligent response method as described above Any step.
本申请提出的智能应答方法、在获取到咨询问题后,先对所述咨询问题进行预处理,然后为所述问答知识库构建倒排索引,通过倒排索引查询的方式从所述问答知识库中查询出与所述咨询问题相关的候选问题集合,再针对所述候选问题集合中的每个候选问题,分别计算所述咨询问题与该候选问题的问题相似度,所述问题相似度由咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到,最后选择计算得到的最高问题相似度对应的候选问题,在问答知识库中查询所选择候选问题的一个或多个关联答案,将所述一个或多个关联答案中在预设时间段内输出频率最高的关联答案作为目标答案输出,可以提高智能应答的准确性和应答效率,节约人力资源,提高服务质量。The intelligent response method proposed by the present application, after obtaining the consulting question, first pre-processing the consulting question, and then constructing an inverted index for the question-and-answer knowledge base, and using the inverted index query method from the question-and-answer knowledge base Querying a set of candidate questions related to the consulting question, and calculating, for each candidate question in the candidate question set, a problem similarity between the consulting question and the candidate question, the problem similarity being consulted The text similarity, semantic similarity, topic similarity and syntactic similarity between the problem and the corresponding candidate problem are obtained by linear weighting. Finally, the candidate problem corresponding to the highest problem similarity calculated is selected, and the query is found in the question and answer knowledge base. Selecting one or more associated answers of the candidate question, and outputting the associated answer with the highest output frequency in the preset time period as the target answer, which can improve the accuracy and response efficiency of the intelligent response, and save Human resources to improve service quality.
附图说明DRAWINGS
图1为本申请电子装置较佳实施例的运行环境示意图;1 is a schematic diagram of an operating environment of a preferred embodiment of an electronic device of the present application;
图2为本申请电子装置与客户端较佳实施例的交互示意图;2 is a schematic diagram of interaction between an electronic device and a client according to a preferred embodiment of the present application;
图3为本申请智能应答方法较佳实施例的流程图;3 is a flow chart of a preferred embodiment of the intelligent response method of the present application;
图4为图1中智能应答程序的程序模块图。4 is a program block diagram of the smart answering program of FIG. 1.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.
具体实施方式Detailed ways
下面将参考若干具体实施例来描述本申请的原理和精神。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。The principles and spirit of the present application are described below with reference to a number of specific embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.
本领域的技术人员知道,本申请的实施方式可以实现为一种方法、装置、设备、系统或计算机程序产品。因此,本申请可以具体实现为完全的硬件、完全的软件(包括固件、驻留软件、微代码等),或者硬件和软件结合的形式。Those skilled in the art will appreciate that embodiments of the present application can be implemented as a method, apparatus, device, system, or computer program product. Accordingly, the application can be embodied in a complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.
根据本申请的实施例,提出了一种智能应答方法、电子装置及存储介质。According to an embodiment of the present application, an intelligent response method, an electronic device, and a storage medium are proposed.
参照图1所示,为本申请电子装置较佳实施例的运行环境示意图。1 is a schematic diagram of an operating environment of a preferred embodiment of an electronic device of the present application.
该电子装置1可以是服务器、便携式计算机、桌上型计算机等具有存储和运算功能的终端设备。The electronic device 1 may be a terminal device having a storage and computing function such as a server, a portable computer, or a desktop computer.
该电子装置1包括存储器11、处理器12、网络接口13及通信总线14。所述网络接口13可选地可以包括标准的有线接口和无线接口(如WI-FI接口)。通信总线14用于实现上述组件之间的连接通信。The electronic device 1 includes a memory 11, a processor 12, a network interface 13, and a communication bus 14. The network interface 13 can optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The communication bus 14 is used to implement connection communication between the above components.
存储器11包括至少一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器等的非易失性存储介质。在一些实施例中,所述可读存储介质可以是所述电子装置1的内部存储单元,例如该电子装置1的硬盘。在另一些实施例中,所述可读存储介质也可以是所述电子装置1的外部存储器11,例如所述电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。The memory 11 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card type memory, or the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1. In other embodiments, the readable storage medium may also be an external memory 11 of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC). , Secure Digital (SD) card, Flash Card, etc.
在本实施例中,所述存储器11的可读存储介质通常用于存储安装于所述电子装置1的智能应答程序10及问答知识库4等。所述存储器11还可以用于暂时地存储已经输出或者将要输出的数据。In the present embodiment, the readable storage medium of the memory 11 is generally used to store the smart response program 10, the question and answer knowledge base 4, and the like installed in the electronic device 1. The memory 11 can also be used to temporarily store data that has been output or is about to be output.
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行智能应答程序10等。The processor 12, in some embodiments, may be a Central Processing Unit (CPU), microprocessor or other data processing chip for running program code or processing data stored in the memory 11, such as executing an intelligent response program. 10 and so on.
图1仅示出了具有组件11-14以及智能应答程序10的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。1 shows only the electronic device 1 having the components 11-14 and the intelligent response program 10, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
可选地,该电子装置1还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等。可选地,用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 1 may further include a user interface, and the user interface may include an input unit such as a keyboard, a voice input device such as a microphone, a device with a voice recognition function, a voice output device such as an audio, a headphone, and the like. . Optionally, the user interface may also include a standard wired interface and a wireless interface.
可选地,该电子装置1还可以包括显示器,显示器也可以称为显示屏或显示单元。在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)显示器等。显示器用于显示在电子装置1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may further include a display, which may also be referred to as a display screen or a display unit. In some embodiments, it may be an LED display, a liquid crystal display, a touch liquid crystal display, and an Organic Light-Emitting Diode (OLED) display. The display is used to display information processed in the electronic device 1 and a user interface for displaying visualizations.
可选地,该电子装置1还包括触摸传感器。所述触摸传感器所提供的供用户进行触摸操作的区域称为触控区域。此外,这里所述的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且,所述触摸传感器不仅包括接触式的触摸传感器,也可包括接近式的触摸传感器等。此外,所述触摸传感器可以为单个传感器,也可以为例如阵列布置的多个传感器。用户可以通过触摸所述触控区域启动智能应答程序10。Optionally, the electronic device 1 further comprises a touch sensor. The area provided by the touch sensor for the user to perform a touch operation is referred to as a touch area. Further, the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like. Moreover, the touch sensor includes not only a contact type touch sensor but also a proximity type touch sensor or the like. Furthermore, the touch sensor may be a single sensor or a plurality of sensors arranged, for example, in an array. The user can activate the smart answering program 10 by touching the touch area.
此外,该电子装置1的显示器的面积可以与所述触摸传感器的面积相同,也可以不同。可选地,将显示器与所述触摸传感器层叠设置,以形成触摸显示屏。该装置基于触摸显示屏侦测用户触发的触控操作。In addition, the area of the display of the electronic device 1 may be the same as or different from the area of the touch sensor. Optionally, a display is stacked with the touch sensor to form a touch display. The device detects a user-triggered touch operation based on a touch screen display.
该电子装置1还可以包括射频(Radio Frequency,RF)电路、传感器和音频电路等等,在此不再赘述。The electronic device 1 may further include a radio frequency (RF) circuit, a sensor, an audio circuit, and the like, and details are not described herein.
参阅图2所示,为本申请电子装置1与客户端2较佳实施例的交互示意图。所述智能应答程序10运行于电子装置1中,在图2中所述电子装置1的较佳实施例为服务器。所述电子装置1通过网络3与客户端2通信连接。所述客户端2可以运行于各类终端设备中,例如智能手机、便携式计算机等。用户通过客户端2登录至所述电子装置1后,可以向所述智能应答程序10输入咨询问题,智能应答程序10可以采用所述智能应答方法对所述咨询问题进行处理,从而在问答知识库4中找到相应的目标答案,并将所述目标答案通过网络3返回至客户端2。Referring to FIG. 2, it is a schematic diagram of interaction between the electronic device 1 and the client 2 according to a preferred embodiment of the present application. The intelligent response program 10 operates in the electronic device 1. In Fig. 2, the preferred embodiment of the electronic device 1 is a server. The electronic device 1 is communicatively coupled to the client 2 via a network 3. The client 2 can run in various types of terminal devices, such as smart phones, portable computers, and the like. After the user logs in to the electronic device 1 through the client 2, the consulting question can be input to the smart answering program 10, and the smart answering program 10 can process the consulting question by using the smart answering method, so that the question and answer knowledge base is processed. Find the corresponding target answer in 4 and return the target answer to client 2 via network 3.
参阅图3所示,为本申请智能应答方法较佳实施例的流程图。电子装置1的处理器12执行存储器11中存储的智能应答程序10时实现智能应答方法的如下步骤:Referring to FIG. 3, it is a flowchart of a preferred embodiment of the intelligent response method of the present application. The following steps of implementing the intelligent response method when the processor 12 of the electronic device 1 executes the intelligent response program 10 stored in the memory 11:
步骤S1,获取输入的咨询问题,对所述咨询问题进行预处理,所述预处理包括分词得到各词条、对每个词条进行词性标注和命名实体识别、从各词条中提取关键词,以及对所述咨询问题进行语句纠错。Step S1: Acquire an input consultation question, and perform pre-processing on the consultation question, the pre-processing includes word segmentation to obtain each term, word-of-speech tagging and named entity recognition for each term, and extracting keywords from each term And correcting the statement of the consulting question.
具体地,所述咨询问题例如可以为“新春大礼包中的办公室款碳晶地热垫是否能快速启动”,则对该咨询问题进行分词处理后,得到的各词条为“新春”、“大礼包”、“中”、“的”、“办公室”、“款”、“碳晶”、“地热垫”、“是否”、“能”、“快速”、“启动”,其中,词条长度较长的词条,例如“大礼包”、“地 热垫”等,相较于词条长度较短的词条,例如“中”、“的”等,更能表达所述咨询问题的含义。因此,所述步骤S1可以将分词后词条长度大于第一阈值(例如字符数为4)的词条作为长切词,而仅对所述长切词进行词性标注。以上述咨询问题为例,对长切词进行词性标注的结果例如为“新春大礼包/形容词+名词”、“快速启动/副词+动词”。Specifically, the consultation question may be, for example, “whether the office carbon crystal geothermal pad in the New Year gift package can be quickly started”, and after the word segmentation processing of the consultation question, the obtained terms are “new spring” and “big”. Gift package, "中中", "的", "office", "款款", "carbon crystal", "geothermal pad", "whether", "energy", "fast", "start", where the length of the term Longer terms, such as "big gift", "geothermal pad", etc., are more able to express the meaning of the consulting question than words with shorter term lengths, such as "medium", "", etc. Therefore, the step S1 may use a term whose term length after the word segmentation is greater than the first threshold (for example, the number of characters is 4) as a long cut word, and only the part of the long cut word is marked. Taking the above-mentioned consulting question as an example, the result of the part-of-speech tagging of the long-cut word is, for example, "New Year spree/adjective + noun", "quick start/adverb + verb".
步骤S1可以通过隐马尔可夫模型对所述长切词进行命名实体识别,从而识别出咨询问题中具有特定意义的实体专有名词,例如“微软科技集团”。所述实体专有名词例如包括人名、地名、机构名等。此外,步骤S1可以采用TF-IDF(Term frequency–Inverse document frequency,词频-逆向文件频率)算法从所述长切词中提取关键词。TF-IDF算法的主要思想为如果某个词或短语在一篇文档中出现的频率高,而在其他文档中出现的频率很低,则认为此词或短语具有很好的类别区分能力,可以用作关键词。Step S1 may perform a named entity recognition on the long cut word by a hidden Markov model, thereby identifying an entity proper noun having a specific meaning in the consultation question, such as “Microsoft Technology Group”. The entity proper noun includes, for example, a person's name, a place name, an institution name, and the like. In addition, step S1 may extract a keyword from the long cut word by using a TF-IDF (Term frequency – Inverse document frequency) algorithm. The main idea of the TF-IDF algorithm is that if a word or phrase appears frequently in a document and has a low frequency in other documents, the word or phrase is considered to have a good class distinguishing ability. Used as a keyword.
步骤S1可以采用N-gram语言模型和编辑距离为所述咨询问题进行语句纠错处理。所述N-gram语言模型可以利用上下文中相邻词间的搭配信息,计算出具有最大概率的语句词语搭配,从而识别出语句中的错误搭配,并提供若干种可能的正确替代搭配的方案。通过编辑距离算法计算各种替代搭配所产生的编辑成本,可以确定并采用编辑成本最小的替代方案,从而实现对所述咨询问题的语句纠错处理。例如“想找菏塘月色这首歌作为背景音乐”,通过所述语音纠错处理,可以识别出其中“菏塘”应该改为“荷塘”。Step S1 may perform sentence error correction processing for the consulting problem by using an N-gram language model and an edit distance. The N-gram language model can use the collocation information between adjacent words in the context to calculate the phrase collocation with the greatest probability, thereby identifying the wrong collocation in the sentence, and providing several possible correct alternative collocation schemes. By editing the distance algorithm to calculate the editing cost generated by various alternative collocations, an alternative scheme with the least editing cost can be determined and adopted, thereby implementing the error correction processing of the query problem. For example, "I want to find the song of Hetang Moonlight as background music", through the speech error correction processing, it can be identified that "Hetang" should be changed to "荷塘".
步骤S2,对问答知识库4中的每个问题和答案进行所述预处理,将经所述预处理后的每个问题和答案映射到倒排记录表中,从而为所述问答知识库4构建倒排索引,通过倒排索引查询的方式从所述问答知识库4中查询出与所述咨询问题相关的候选问题集合,所述问答知识库4包括预先整理的多个问题以及每个问题关联的一个或多个答案。Step S2, performing the pre-processing on each question and answer in the question-and-answer knowledge base 4, mapping each question and answer after the pre-processing to the inverted record table, thereby being the Q&A knowledge base 4 Constructing an inverted index, and querying, by the inverted index query, a candidate question set related to the consulting question from the Q&A knowledge base 4, the Q&A knowledge base 4 includes a plurality of questions pre-organized and each question One or more answers associated.
步骤S2对问答知识库4中的每个问题和答案也进行所述预处理,可以得到所述每个问题和答案分词得到的各词条、词性标注、命名实体、关键词等文本特征信息,根据所述文本特征信息,将所述每个问题和答案映射到预设的倒排记录表中,将具有同一词条的所有问题和答案都映射到该词条上,从而为所述问答知识库4构建所述倒排索引。根据咨询问题经所述预处理后得到的文本特征信息,通过倒排索引查询的方式可以从所述问答知识库中查询与所述咨询问题相关的候选问题集合。所述候选问题集合中包括至少一个候选问题,且由于采用的是倒排索引查询的方式,每个候选问题都与所述咨询问题存在一定程度的联系。Step S2 also performs the pre-processing on each question and answer in the question-and-answer knowledge base 4, and can obtain text feature information such as each term, part-of-speech tag, named entity, keyword, etc. obtained by each question and answer participle. Mapping each question and answer to a preset inverted record table according to the text feature information, mapping all questions and answers having the same entry to the entry, thereby obtaining the question and answer knowledge Library 4 constructs the inverted index. According to the text feature information obtained by the pre-processing of the consulting problem, the candidate question set related to the consulting question may be queried from the question-and-answer knowledge base by means of an inverted index query. At least one candidate question is included in the candidate question set, and each candidate question has a certain degree of association with the consulting question due to the manner of using an inverted index query.
步骤S3,针对所述候选问题集合中的每个候选问题,分别计算所述咨询问题与该候选问题的问题相似度,所述问题相似度由咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到,其中由于咨询问题与候选问题之间通常更侧重于文本、语义之间的相似程度,可以将所述文本相似度和语义相似度的权重设置为均大于所述主题相似度和句法相似度的权重。Step S3, calculating, for each candidate question in the candidate question set, a problem similarity between the consulting question and the candidate question, the problem similarity being the text similarity between the consulting question and the corresponding candidate question , semantic similarity, topic similarity and syntactic similarity are obtained by linear weighting, wherein the similarity and semantic similarity of the text can be obtained because the consultation problem and the candidate problem usually focus more on the degree of similarity between text and semantics. The weights are set to be greater than the weights of the topic similarity and syntactic similarity.
其中,所述咨询问题和相应的候选问题之间的文本相似度的计算方法可以包括以下步骤:The method for calculating the text similarity between the consulting question and the corresponding candidate question may include the following steps:
统计所述咨询问题与该候选问题之间的多个指定特征,对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度。A plurality of specified features between the consulting question and the candidate question are counted, and the plurality of specified features are linearly weighted to obtain a text similarity between the consulting question and the corresponding candidate question.
其中,所述多个指定特征包括:Wherein the plurality of specified features include:
咨询问题和该候选问题的共同关键词数量a1;Consultation question and the number of common keywords a1 of the candidate question;
咨询问题和该候选问题的共同关键词长度a2;Consultation question and the common keyword length a2 of the candidate question;
咨询问题和该候选问题的共同词条的数量a3;The number of consultation questions and the number of common terms for the candidate question a3;
咨询问题和该候选问题的共同词条的长度a4;Consultation question and the length of the common entry for the candidate question a4;
咨询问题的长度a5;The length of the consultation question a5;
该候选问题的长度a6。The length of the candidate question is a6.
其中,所述对所述多个指定特征进行线性加权计算,可以采用多元逻辑回归模型来实现。具体地,首先采用逆文档率算法计算每个指定特征的权重,所述逆文档率算法的思想是计算每个指定特征在预设大规模语料库中的重要程度,从而确定每个指定特征的权重。对所述多个指定特征采用多元逻辑回归模型进行加权回归拟合计算,得到咨询问题与该候选问题的文本相似度g(z),公式如下:The linear weighting calculation on the plurality of specified features may be implemented by using a multiple logistic regression model. Specifically, the weight of each specified feature is first calculated using an inverse document rate algorithm, and the idea of the inverse document rate algorithm is to calculate the importance of each specified feature in a preset large-scale corpus, thereby determining the weight of each specified feature. . The multivariate logistic regression model is used to perform weighted regression fitting calculation on the plurality of specified features, and the text similarity g(z) of the consulting problem and the candidate problem is obtained, and the formula is as follows:
g(z)=1/(1+e z),e为自然常数; g(z)=1/(1+e z ), where e is a natural constant;
z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;
其中,x1、x2...x6分别为所述a1、a2...a6的权重。Wherein x1, x2, ..., x6 are the weights of the a1, a2, ..., a6, respectively.
此外,所述咨询问题和相应的候选问题之间的语义相似度的计算方法可以为:采用word2vec算法将咨询问题分词后的各词条表示为词向量,将咨询问题中各词向量取平均值得到咨询问题的句子向量;采用word2vec算法将该候选问题分词后的各词条表示为词向量,将该候选问题中各词向量取平均值得到该候选问题的句子向量;计算咨询问题的句子向量与该候选问题的句子向量之间的余弦相似度,得到咨询问题与该候选问题的语义相似度。In addition, the method for calculating the semantic similarity between the consulting question and the corresponding candidate question may be: using the word2vec algorithm to represent each term after the question segmentation as a word vector, and averaging the word vectors in the consulting question. To the sentence vector of the consultation question; use the word2vec algorithm to represent each term after the candidate problem segmentation as a word vector, average the word vectors in the candidate problem to obtain the sentence vector of the candidate question; calculate the sentence vector of the consulting question The cosine similarity between the sentence vector of the candidate question and the semantic similarity between the consulting question and the candidate question.
所述咨询问题和相应的候选问题之间的主题相似度的计算方法可以为:采用LDA线性判别分析的主题表达法,构建咨询问题的主题向量,以及该候选问题的主题向量;计算咨询问题的主题向量与该候选问题的主题向量之间的余弦相似度,得到咨询问题与该候选问题的主题相似度。The method for calculating the topic similarity between the consulting question and the corresponding candidate question may be: using the topic expression method of the LDA linear discriminant analysis, constructing the topic vector of the consulting question, and the subject vector of the candidate question; calculating the consulting problem The cosine similarity between the subject vector and the subject vector of the candidate question is obtained by the topic similarity between the question and the candidate question.
所述咨询问题和相应的候选问题之间的句法相似度的计算方法可以为:采用LTP语言技术平台分析咨询问题和该候选问题的句法,得到咨询问题和该候选问题的句法向量;计算咨询问题的句法向量与该候选问题的句法向量之间的余弦相似度,得到咨询问题与该候选问题的句法相似度。The method for calculating the syntactic similarity between the consulting question and the corresponding candidate question may be: using the LTP language technology platform to analyze the consulting question and the syntax of the candidate question, obtaining the consulting question and the syntax vector of the candidate question; calculating the consulting problem The cosine similarity between the syntax vector and the syntax vector of the candidate problem is obtained by the syntactic similarity between the consulting problem and the candidate problem.
步骤S3计算所述咨询问题与该候选问题的问题相似度,可以通过将所述咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到。其中,由于咨询问题与候选问题之间通常更侧重于文本、语义之间的相似程度,因此对所述文本相似度、语义相似度、主题相似度和句法相似度进行线性加权时,可以将所述文本相似度和语义相似 度的权重设置为均大于所述主题相似度和句法相似度的权重,例如可以将所述文本相似度和语义相似度的权重设置为分别是所述主题相似度和句法相似度的3倍。Step S3 calculates a problem similarity between the consultation question and the candidate question, and may linearly weight the text similarity, the semantic similarity, the topic similarity, and the syntactic similarity between the consultation question and the corresponding candidate question. . Among them, since the consultation problem and the candidate question usually focus more on the similarity between text and semantics, when the text similarity, semantic similarity, topic similarity and syntactic similarity are linearly weighted, The weights of the text similarity and the semantic similarity are set to be greater than the weights of the topic similarity and the syntactic similarity, for example, the weights of the text similarity and the semantic similarity may be set to be respectively the similarity of the theme and Syntactic similarity is 3 times.
步骤S4,选择计算得到的最高问题相似度对应的候选问题,在问答知识库4中查询所选择候选问题的一个或多个关联答案,将所述一个或多个关联答案中在预设时间段内输出频率最高的关联答案作为目标答案输出。Step S4, selecting a candidate question corresponding to the highest problem similarity calculated, and querying one or more associated answers of the selected candidate question in the question and answer knowledge base 4, and setting the one or more associated answers in a preset time period The associated answer with the highest internal output frequency is output as the target answer.
最高问题相似度对应的候选问题,可以被认为是与咨询问题最相似的候选问题,因此问答知识库4中该最高问题相似度对应的候选问题所关联的答案,可以作为咨询问题对应的答案。若所选择的候选问题包括多个关联答案,则从该多个关联答案中选择一个作为目标答案即可。具体地,可以选择在预设时间段内,例如最近一周内输出频率最高的关联答案作为目标答案。The candidate question corresponding to the highest problem similarity can be regarded as the candidate question most similar to the consulting question. Therefore, the answer associated with the candidate question corresponding to the highest problem similarity in the Q&A knowledge base 4 can be used as the answer corresponding to the consulting question. If the selected candidate question includes a plurality of associated answers, one of the plurality of associated answers may be selected as the target answer. Specifically, the associated answer with the highest output frequency in the preset time period, for example, the target answer may be selected as the target answer.
为了使输出的目标答案更加人性化,给客户更佳的体验,步骤S4在输出所述目标答案之前,还可以先对所述目标答案进行人性化的润饰。具体地,所述步骤S1对所述咨询问题进行的预处理还包括:In order to make the target answer of the output more humanized and give the customer a better experience, step S4 may also perform humanized retouching of the target answer before outputting the target answer. Specifically, the preprocessing of the consulting problem in the step S1 further includes:
将所述咨询问题分词得到的各词条分别与预设的积极词汇库和消极词汇库进行对比,判断所述咨询问题中是否包含积极词汇或消极词汇。所述积极词汇例如“哄老婆开心”,所述消极词汇例如“我要投诉”。Each term obtained by the consultation problem segmentation is compared with a preset positive vocabulary and a negative vocabulary to determine whether the consulting question includes positive vocabulary or negative vocabulary. The positive vocabulary is, for example, "happy wife", and the negative vocabulary such as "I want to complain".
步骤S4对所述目标答案进行人性化的润饰包括:The humanized retouching of the target answer in step S4 includes:
若所述咨询问题中仅包含积极词汇,则获取积极词汇对应的预设问候语,例如“祝您开心”,将所述积极词汇对应的预设问候语与所述目标答案结合;If the consulting question includes only the positive vocabulary, the preset greeting corresponding to the positive vocabulary is obtained, for example, “I wish you happy”, and the preset greeting corresponding to the positive vocabulary is combined with the target answer;
若所述咨询问题中仅包含消极词汇,则获取消极词汇对应的预设问候语,例如“非常抱歉”,将所述消极词汇对应的预设问候语与所述目标答案结合;If the consultation question includes only the negative vocabulary, the preset greeting corresponding to the negative vocabulary is obtained, for example, “very sorry”, and the preset greeting corresponding to the negative vocabulary is combined with the target answer;
若所述咨询问题中包含积极词汇和消极词汇,或者所述咨询问题中不包含积极词汇和消极词汇,则获取中性词汇对应的预设问候语,例如“感谢支持”,将所述中性词汇对应的预设问候语与所述目标答案结合。If the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include a positive vocabulary and a negative vocabulary, the preset greet corresponding to the neutral vocabulary is obtained, for example, “thanks for support”, the neutrality is The preset greeting corresponding to the vocabulary is combined with the target answer.
根据本实施例提供的智能应用方法,在获取到咨询问题后,先对所述咨询问题进行预处理,然后为所述问答知识库4构建倒排索引,通过倒排索引查询的方式从所述问答知识库4中查询出与所述咨询问题相关的候选问题集合,再针对所述候选问题集合中的每个候选问题,分别计算所述咨询问题与该候选问题的问题相似度,所述问题相似度由咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到,最后选择计算得到的最高问题相似度对应的候选问题,在问答知识库4中查询所选择候选问题的一个或多个关联答案,将所述一个或多个关联答案中在预设时间段内输出频率最高的关联答案作为目标答案输出,可以提高智能应答的准确性和应答效率,节约人力资源,提高服务质量。According to the smart application method provided by the embodiment, after obtaining the consultation question, the consulting question is pre-processed, and then the inverted index is constructed for the question-and-answer knowledge base 4, and the inverted index query is used to The question answering knowledge base 4 queries the candidate question set related to the consulting question, and then calculates the problem similarity between the consulting question and the candidate question for each candidate question in the candidate question set, the problem The similarity is obtained by linearly weighting the text similarity, semantic similarity, topic similarity and syntactic similarity between the consulting question and the corresponding candidate question, and finally selecting the candidate problem corresponding to the highest problem similarity calculated in the question and answer knowledge. The library 4 queries one or more associated answers of the selected candidate question, and outputs the associated answer with the highest output frequency among the one or more associated answers in the preset time period as the target answer, which can improve the accuracy of the smart response. And response efficiency, saving human resources and improving service quality.
参阅图4所示,为图1中智能应答程序10的程序模块图。在本实施例中,智能应答程序10被分割为多个模块,该多个模块被存储于存储器11中,并由处理器12执行,以完成本申请。本申请所称的模块是指能够完成特定功能 的一系列计算机程序指令段。Referring to FIG. 4, it is a program module diagram of the intelligent response program 10 in FIG. In the present embodiment, the intelligent response program 10 is divided into a plurality of modules, which are stored in the memory 11 and executed by the processor 12 to complete the present application. A module as referred to in this application refers to a series of computer program instructions that are capable of performing a particular function.
所述智能应答程序10可以被分割为:获取模块110、构建模块120、计算模块130和选择模块140。The intelligent response program 10 can be divided into: an acquisition module 110, a construction module 120, a calculation module 130, and a selection module 140.
获取模块110,用于获取输入的咨询问题,对所述咨询问题进行预处理,所述预处理包括分词得到各词条、对每个词条进行词性标注和命名实体识别、从各词条中提取关键词,以及对所述咨询问题进行语句纠错。The obtaining module 110 is configured to obtain an input consulting question, and perform pre-processing on the consulting question, where the pre-processing includes word segmentation to obtain each term, and each term is subjected to part-of-speech tagging and named entity identification, from each term. Extracting keywords and performing statement correction on the consulting question.
具体地,获取模块110对所述咨询问题进行预处理可以包括对咨询问题分词得到各词条、将分词后词条长度大于第一阈值的词条作为长切词、为所述长切词进行词性标注、通过隐马尔可夫模型对所述长切词进行命名实体识别从而识别出专有名词、采用TF-IDF算法从所述长切词中提取关键词、采用N-gram语言模型和编辑距离为所述咨询问题进行语句纠错处理。Specifically, the pre-processing of the consulting question by the obtaining module 110 may include: obtaining a term for the consulting question segmentation, using the term after the segmentation term length is greater than the first threshold as the long-cut word, and performing the long-cut word for the long-cut word Part-of-speech tagging, naming entity recognition of the long-cut word by hidden Markov model to identify proper nouns, extracting keywords from the long-cut words by TF-IDF algorithm, adopting N-gram language model and editing The distance is corrected for the query problem.
构建模块120,用于对问答知识库4中的每个问题和答案进行所述预处理,将经所述预处理后的每个问题和答案映射到倒排记录表中,从而为所述问答知识库4构建倒排索引,通过倒排索引查询的方式从所述问答知识库4中查询出与所述咨询问题相关的候选问题集合,所述问答知识库4包括预先整理的多个问题以及每个问题关联的一个或多个答案。The construction module 120 is configured to perform the pre-processing on each question and answer in the Q&A knowledge base 4, and map each problem and answer after the pre-processing to the inverted record table, thereby The knowledge base 4 constructs an inverted index, and queries the candidate question set related to the consulting question from the Q&A knowledge base 4 by means of an inverted index query, and the Q&A knowledge base 4 includes a plurality of questions arranged in advance and One or more answers associated with each question.
具体地,构建模块120对问答知识库4中的每个问题和答案也进行所述预处理,可以得到所述每个问题和答案分词得到的各词条、词性标注、命名实体、关键词等文本特征信息,根据所述文本特征信息,将所述每个问题和答案映射到预设的倒排记录表中,将具有同一词条的所有问题和答案都映射到该词条上,从而为所述问答知识库4构建所述倒排索引。根据咨询问题经所述预处理后得到的文本特征信息,通过倒排索引查询的方式可以从所述问答知识库中查询与所述咨询问题相关的候选问题集合。Specifically, the constructing module 120 performs the pre-processing on each question and answer in the question-and-answer knowledge base 4, and can obtain each term, part-of-speech tag, named entity, keyword, etc. obtained by each of the question and answer word segments. Text feature information, according to the text feature information, mapping each question and answer into a preset inverted record table, mapping all questions and answers having the same entry to the entry, thereby The question and answer knowledge base 4 constructs the inverted index. According to the text feature information obtained by the pre-processing of the consulting problem, the candidate question set related to the consulting question may be queried from the question-and-answer knowledge base by means of an inverted index query.
计算模块130,用于针对所述候选问题集合中的每个候选问题,分别计算所述咨询问题与该候选问题的问题相似度,所述问题相似度由咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到,其中所述文本相似度和语义相似度的权重均大于所述主题相似度和句法相似度的权重。The calculating module 130 is configured to separately calculate a problem similarity between the consulting question and the candidate question for each candidate question in the candidate question set, where the problem similarity is between the consulting question and the corresponding candidate question The text similarity, the semantic similarity, the topic similarity and the syntactic similarity are obtained by linear weighting, wherein the weights of the text similarity and the semantic similarity are greater than the weights of the topic similarity and the syntactic similarity.
其中,计算模块130可以通过统计所述咨询问题与该候选问题之间的多个指定特征,对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度。所述多个指定特征包括:The calculating module 130 may perform linear weighting calculation on the plurality of specified features by counting a plurality of specified features between the consulting question and the candidate question, and obtain text similarity between the consulting question and the corresponding candidate question. . The plurality of specified features includes:
咨询问题和该候选问题的共同关键词数量a1;Consultation question and the number of common keywords a1 of the candidate question;
咨询问题和该候选问题的共同关键词长度a2;Consultation question and the common keyword length a2 of the candidate question;
咨询问题和该候选问题的共同词条的数量a3;The number of consultation questions and the number of common terms for the candidate question a3;
咨询问题和该候选问题的共同词条的长度a4;Consultation question and the length of the common entry for the candidate question a4;
咨询问题的长度a5;The length of the consultation question a5;
该候选问题的长度a6。The length of the candidate question is a6.
所述对所述多个指定特征进行线性加权计算,可以采用多元逻辑回归模型来实现。具体地,计算模块130可以采用逆文档率算法计算每个指定特征 的权重,所述逆文档率算法的思想是计算每个指定特征在预设大规模语料库中的重要程度,从而确定每个指定特征的权重。然后,计算模块130对所述多个指定特征采用多元逻辑回归模型进行加权回归拟合计算,得到咨询问题与该候选问题的文本相似度g(z),公式如下:The linear weighting calculation on the plurality of specified features may be implemented by using a multiple logistic regression model. Specifically, the calculation module 130 may calculate the weight of each specified feature by using an inverse document rate algorithm, and the idea of the inverse document rate algorithm is to calculate the importance degree of each specified feature in the preset large-scale corpus, thereby determining each designation. The weight of the feature. Then, the calculation module 130 performs a weighted regression fitting calculation on the plurality of specified features by using a multiple logistic regression model to obtain a text similarity g(z) of the consultation question and the candidate question, and the formula is as follows:
g(z)=1/(1+e z),e为自然常数; g(z)=1/(1+e z ), where e is a natural constant;
z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;
其中,x1、x2...x6分别为所述a1、a2...a6的权重。Wherein x1, x2, ..., x6 are the weights of the a1, a2, ..., a6, respectively.
此外,计算模块130对所述咨询问题和相应的候选问题之间的语义相似度的计算方法可以为:采用word2vec算法将咨询问题分词后的各词条表示为词向量,将咨询问题中各词向量取平均值得到咨询问题的句子向量;采用word2vec算法将该候选问题分词后的各词条表示为词向量,将该候选问题中各词向量取平均值得到该候选问题的句子向量;计算咨询问题的句子向量与该候选问题的句子向量之间的余弦相似度,得到咨询问题与该候选问题的语义相似度。In addition, the calculation method of the semantic similarity between the consulting question and the corresponding candidate question by the calculation module 130 may be: using the word2vec algorithm to represent each term after the question segmentation as a word vector, and consulting each word in the question The vector is averaged to obtain the sentence vector of the consulting question; the word2vec algorithm is used to represent each term after the candidate word segmentation as a word vector, and the word vectors in the candidate problem are averaged to obtain the sentence vector of the candidate question; The cosine similarity between the sentence vector of the question and the sentence vector of the candidate question is obtained by the semantic similarity between the consulting question and the candidate question.
计算模块130对所述咨询问题和相应的候选问题之间的主题相似度的计算方法可以为:采用LDA线性判别分析的主题表达法,构建咨询问题的主题向量,以及该候选问题的主题向量;计算咨询问题的主题向量与该候选问题的主题向量之间的余弦相似度,得到咨询问题与该候选问题的主题相似度。The calculating method of the topic similarity between the consulting question and the corresponding candidate question by the calculating module 130 may be: constructing a topic vector of the consulting question by using the topic expression method of the LDA linear discriminant analysis, and the subject vector of the candidate question; The cosine similarity between the subject vector of the consulting question and the subject vector of the candidate question is calculated, and the subject similarity between the consulting question and the candidate question is obtained.
计算模块130对所述咨询问题和相应的候选问题之间的句法相似度的计算方法可以为:采用LTP语言技术平台分析咨询问题和该候选问题的句法,得到咨询问题和该候选问题的句法向量;计算咨询问题的句法向量与该候选问题的句法向量之间的余弦相似度,得到咨询问题与该候选问题的句法相似度。The calculation method of the syntax similarity between the consulting question and the corresponding candidate question by the calculation module 130 may be: analyzing the consulting question and the syntax of the candidate question by using the LTP language technology platform, and obtaining the consulting question and the syntax vector of the candidate question. Calculating the cosine similarity between the syntax vector of the consulting problem and the syntax vector of the candidate problem, and obtaining the syntactic similarity between the consulting problem and the candidate problem.
计算模块130计算所述咨询问题与该候选问题的问题相似度,可以通过将所述咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到。其中,由于咨询问题与候选问题之间通常更侧重于文本、语义之间的相似程度,因此对所述文本相似度、语义相似度、主题相似度和句法相似度进行线性加权时,可以将所述文本相似度和语义相似度的权重设置为均大于所述主题相似度和句法相似度的权重。The calculating module 130 calculates a problem similarity between the consulting question and the candidate question, and may linearly weight the text similarity, the semantic similarity, the topic similarity, and the syntactic similarity between the consulting question and the corresponding candidate question. get. Among them, since the consultation problem and the candidate question usually focus more on the similarity between text and semantics, when the text similarity, semantic similarity, topic similarity and syntactic similarity are linearly weighted, The weights of the text similarity and the semantic similarity are set to be greater than the weights of the topic similarity and the syntactic similarity.
选择模块140,用于选择计算得到的最高问题相似度对应的候选问题,在问答知识库中查询所选择候选问题的一个或多个关联答案,将所述一个或多个关联答案中在预设时间段内输出频率最高的关联答案作为目标答案输出。The selecting module 140 is configured to select a candidate question corresponding to the highest problem similarity calculated, and query one or more associated answers of the selected candidate question in the question and answer knowledge base, and preset the one or more associated answers The associated answer with the highest output frequency in the time period is output as the target answer.
为了使输出的目标答案更加人性化,给客户更佳的体验,选择模块140在输出所述目标答案之前,还可以先对所述目标答案进行人性化的润饰。具体地,首先所述获取模块110对咨询问题进行的预处理还包括:将所述咨询问题分词得到的各词条分别与预设的积极词汇库和消极词汇库进行对比,判断所述咨询问题中是否包含积极词汇或消极词汇。In order to make the target answer of the output more humanized and give the customer a better experience, the selection module 140 may first perform humanized retouching of the target answer before outputting the target answer. Specifically, first, the pre-processing of the consulting problem by the obtaining module 110 further includes: comparing each term obtained by the consulting problem segmentation with a preset positive vocabulary and a negative vocabulary, and determining the consulting problem. Whether it contains positive or negative words.
然后,选择模块140在输出所述目标答案之前,若所述咨询问题中仅包含积极词汇,则选择模块140获取积极词汇对应的预设问候语,将所述积极 词汇对应的预设问候语与所述目标答案结合;Then, before the output of the target answer, the selection module 140, if the consultation question includes only the positive vocabulary, the selection module 140 obtains the preset greeting corresponding to the positive vocabulary, and the preset greeting corresponding to the positive vocabulary The target answers are combined;
若所述咨询问题中仅包含消极词汇,则选择模块140获取消极词汇对应的预设问候语,将所述消极词汇对应的预设问候语与所述目标答案结合;If the consultation question includes only the negative vocabulary, the selection module 140 acquires the preset greeting corresponding to the negative vocabulary, and combines the preset greeting corresponding to the negative vocabulary with the target answer;
若所述咨询问题中包含积极词汇和消极词汇,或者所述咨询问题中不包含积极词汇和消极词汇,则选择模块140获取中性词汇对应的预设问候语,将所述中性词汇对应的预设问候语与所述目标答案结合。If the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include the positive vocabulary and the negative vocabulary, the selecting module 140 obtains the preset greeting corresponding to the neutral vocabulary, and corresponds to the neutral vocabulary. The default greeting is combined with the target answer.
在图1所示的电子装置1较佳实施例的运行环境示意图中,包含可读存储介质的存储器11中可以包括操作系统、智能应答程序10及问答知识库4。处理器12执行存储器11中存储的智能应答程序10时实现如下步骤:In the operating environment diagram of the preferred embodiment of the electronic device 1 shown in FIG. 1, the memory 11 including the readable storage medium may include an operating system, an intelligent response program 10, and a question and answer knowledge base 4. When the processor 12 executes the smart response program 10 stored in the memory 11, the following steps are implemented:
获取步骤:获取输入的咨询问题,对所述咨询问题进行预处理,所述预处理包括分词得到各词条、对每个词条进行词性标注和命名实体识别、从各词条中提取关键词,以及对所述咨询问题进行语句纠错;Obtaining step: obtaining an input consulting question, and pre-processing the consulting question, the pre-processing includes segmentation to obtain each term, performing part-of-speech tagging and named entity recognition for each term, and extracting keywords from each term And correcting the statement of the consulting question;
构建步骤:对问答知识库4中的每个问题和答案进行所述预处理,将经所述预处理后的每个问题和答案映射到倒排记录表中,从而为所述问答知识库4构建倒排索引,通过倒排索引查询的方式从所述问答知识库4中查询出与所述咨询问题相关的候选问题集合,所述问答知识库4包括预先整理的多个问题以及每个问题关联的一个或多个答案;Build step: performing pre-processing on each question and answer in the Q&A knowledge base 4, mapping each question and answer after the pre-processing to the inverted record table, thereby being the Q&A knowledge base 4 Constructing an inverted index, and querying, by the inverted index query, a candidate question set related to the consulting question from the Q&A knowledge base 4, the Q&A knowledge base 4 includes a plurality of questions pre-organized and each question One or more answers associated with;
计算步骤:针对所述候选问题集合中的每个候选问题,分别计算所述咨询问题与该候选问题的问题相似度,所述问题相似度由咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到,其中所述文本相似度和语义相似度的权重均大于所述主题相似度和句法相似度的权重;a calculating step: calculating, for each candidate question in the candidate question set, a problem similarity between the consulting question and the candidate question, the problem similarity being the text similarity between the consulting question and the corresponding candidate question The semantic similarity, the topic similarity, and the syntactic similarity are obtained by linear weighting, wherein the weights of the text similarity and the semantic similarity are greater than the weights of the topic similarity and the syntactic similarity;
选择步骤:选择计算得到的最高问题相似度对应的候选问题,在问答知识库4中查询所选择候选问题的一个或多个关联答案,将所述一个或多个关联答案中在预设时间段内输出频率最高的关联答案作为目标答案输出。The selecting step: selecting the candidate question corresponding to the highest problem similarity calculated, querying the question answering knowledge base 4 for one or more associated answers of the selected candidate question, and setting the one or more associated answers in a preset time period The associated answer with the highest internal output frequency is output as the target answer.
其中,所述预处理包括分词得到各词条、将分词后词条长度大于第一阈值的词条作为长切词、为所述长切词进行词性标注、通过隐马尔可夫模型对所述长切词进行命名实体识别从而识别出专有名词、采用TF-IDF算法从所述长切词中提取关键词、采用N-gram语言模型和编辑距离为所述咨询问题进行语句纠错处理。Wherein, the pre-processing includes word segmentation to obtain each term, the term after the segmentation term is greater than the first threshold as a long-cut word, the long-cut word is part-of-speech tagged, and the hidden Markov model is used to The long cut word performs the named entity recognition to identify the proper noun, uses the TF-IDF algorithm to extract the keyword from the long cut word, uses the N-gram language model and the edit distance to perform the statement error correction processing for the consulting problem.
其中,所述咨询问题和相应的候选问题之间的文本相似度的计算方法包括:The method for calculating the text similarity between the consulting question and the corresponding candidate question includes:
统计所述咨询问题与该候选问题之间的多个指定特征,对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度;Counting a plurality of specified features between the consulting question and the candidate question, performing linear weighting calculation on the plurality of specified features to obtain a text similarity between the consulting question and the corresponding candidate question;
其中,所述多个指定特征包括:Wherein the plurality of specified features include:
咨询问题和该候选问题的共同关键词数量a1;Consultation question and the number of common keywords a1 of the candidate question;
咨询问题和该候选问题的共同关键词长度a2;Consultation question and the common keyword length a2 of the candidate question;
咨询问题和该候选问题的共同词条的数量a3;The number of consultation questions and the number of common terms for the candidate question a3;
咨询问题和该候选问题的共同词条的长度a4;Consultation question and the length of the common entry for the candidate question a4;
咨询问题的长度a5;The length of the consultation question a5;
该候选问题的长度a6。The length of the candidate question is a6.
所述对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度包括:Performing a linear weighting calculation on the plurality of specified features to obtain a text similarity between the consulting question and the corresponding candidate question includes:
采用逆文档率算法计算每个指定特征的权重,对所述多个指定特征采用多元逻辑回归模型进行加权回归拟合计算,得到咨询问题与该候选问题的文本相似度g(z),公式如下:The inverse document rate algorithm is used to calculate the weight of each specified feature, and the multiple logistic regression model is used to perform weighted regression fitting calculation on the plurality of specified features, and the text similarity g(z) of the consulting problem and the candidate problem is obtained, and the formula is as follows :
g(z)=1/(1+e z),e为自然常数; g(z)=1/(1+e z ), where e is a natural constant;
z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;
其中,x1、x2...x6分别为所述a1、a2...a6的权重。Wherein x1, x2, ..., x6 are the weights of the a1, a2, ..., a6, respectively.
此外,所述咨询问题和相应的候选问题之间的语义相似度的计算方法包括:Furthermore, the method for calculating the semantic similarity between the consulting question and the corresponding candidate question includes:
采用word2vec算法将咨询问题分词后的各词条表示为词向量,将咨询问题中各词向量取平均值得到咨询问题的句子向量;The word2vec algorithm is used to represent each term after the question segmentation as a word vector, and the word vectors in the consultation question are averaged to obtain the sentence vector of the consultation question;
采用word2vec算法将该候选问题分词后的各词条表示为词向量,将该候选问题中各词向量取平均值得到该候选问题的句子向量;Using the word2vec algorithm, each term after the candidate problem segmentation is represented as a word vector, and the word vectors in the candidate question are averaged to obtain a sentence vector of the candidate question;
计算咨询问题的句子向量与该候选问题的句子向量之间的余弦相似度,得到咨询问题与该候选问题的语义相似度;Calculating the cosine similarity between the sentence vector of the consulting question and the sentence vector of the candidate question, and obtaining the semantic similarity between the consulting question and the candidate question;
所述咨询问题和相应的候选问题之间的主题相似度的计算方法包括:The method for calculating the topic similarity between the consulting question and the corresponding candidate question includes:
采用LDA线性判别分析的主题表达法,构建咨询问题的主题向量,以及该候选问题的主题向量;Using the topic expression method of LDA linear discriminant analysis, constructing a topic vector of the consulting question and the subject vector of the candidate question;
计算咨询问题的主题向量与该候选问题的主题向量之间的余弦相似度,得到咨询问题与该候选问题的主题相似度;Calculating a cosine similarity between the subject vector of the consulting question and the subject vector of the candidate question, and obtaining a topic similarity between the consulting question and the candidate question;
所述咨询问题和相应的候选问题之间的句法相似度的计算方法包括:The method for calculating the syntactic similarity between the consulting question and the corresponding candidate question includes:
采用LTP语言技术平台分析咨询问题和该候选问题的句法,得到咨询问题和该候选问题的句法向量;Using the LTP language technology platform to analyze the consulting problem and the syntax of the candidate question, and obtain the consulting question and the syntax vector of the candidate question;
计算咨询问题的句法向量与该候选问题的句法向量之间的余弦相似度,得到咨询问题与该候选问题的句法相似度。The cosine similarity between the syntax vector of the consulting problem and the syntax vector of the candidate problem is calculated, and the syntactic similarity between the consulting problem and the candidate problem is obtained.
在一个实施例中,所述预处理还包括:In an embodiment, the preprocessing further includes:
将所述咨询问题分词得到的各词条分别与预设的积极词汇库和消极词汇库进行对比,判断所述咨询问题中是否包含积极词汇或消极词汇;Comparing each term obtained by the consulting problem segmentation with a preset positive vocabulary and a negative vocabulary to determine whether the consulting question includes positive vocabulary or negative vocabulary;
将所述目标答案输出之前还包括:Before outputting the target answer, it also includes:
若所述咨询问题中仅包含积极词汇,则获取积极词汇对应的预设问候语,将所述积极词汇对应的预设问候语与所述目标答案结合;If the consulting question includes only the positive vocabulary, the preset greeting corresponding to the positive vocabulary is obtained, and the preset greeting corresponding to the positive vocabulary is combined with the target answer;
若所述咨询问题中仅包含消极词汇,则获取消极词汇对应的预设问候语,将所述消极词汇对应的预设问候语与所述目标答案结合;If the consultation question includes only the negative vocabulary, the preset greeting corresponding to the negative vocabulary is obtained, and the preset greeting corresponding to the negative vocabulary is combined with the target answer;
若所述咨询问题中包含积极词汇和消极词汇,或者所述咨询问题中不包含积极词汇和消极词汇,则获取中性词汇对应的预设问候语,将所述中性词 汇对应的预设问候语与所述目标答案结合。If the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include the positive vocabulary and the negative vocabulary, the preset greeting corresponding to the neutral vocabulary is obtained, and the preset greeting corresponding to the neutral vocabulary is obtained. The language is combined with the target answer.
具体原理请参照上述图4关于智能应答程序10的程序模块图及图3关于智能应答方法较佳实施例的流程图的介绍。For the specific principle, please refer to the program module diagram of the intelligent response program 10 in FIG. 4 and the flowchart of the preferred embodiment of the intelligent response method in FIG.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以是硬盘、多媒体卡、SD卡、闪存卡、SMC、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器等等中的任意一种或者几种的任意组合。所述计算机可读存储介质中包括所述问答知识库4及智能应答程序10等,所述智能应答程序10被所述处理器12执行时实现如下操作:In addition, the embodiment of the present application further provides a computer readable storage medium, which may be a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a read only memory (ROM), and an erasable programmable Any combination or combination of any one or more of read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, and the like. The computer readable storage medium includes the Q&A knowledge base 4, the intelligent response program 10, and the like. When the intelligent response program 10 is executed by the processor 12, the following operations are implemented:
获取步骤:获取输入的咨询问题,对所述咨询问题进行预处理,所述预处理包括分词得到各词条、对每个词条进行词性标注和命名实体识别、从各词条中提取关键词,以及对所述咨询问题进行语句纠错;Obtaining step: obtaining an input consulting question, and pre-processing the consulting question, the pre-processing includes segmentation to obtain each term, performing part-of-speech tagging and named entity recognition for each term, and extracting keywords from each term And correcting the statement of the consulting question;
构建步骤:对问答知识库4中的每个问题和答案进行所述预处理,将经所述预处理后的每个问题和答案映射到倒排记录表中,从而为所述问答知识库4构建倒排索引,通过倒排索引查询的方式从所述问答知识库4中查询出与所述咨询问题相关的候选问题集合,所述问答知识库4包括预先整理的多个问题以及每个问题关联的一个或多个答案;Build step: performing pre-processing on each question and answer in the Q&A knowledge base 4, mapping each question and answer after the pre-processing to the inverted record table, thereby being the Q&A knowledge base 4 Constructing an inverted index, and querying, by the inverted index query, a candidate question set related to the consulting question from the Q&A knowledge base 4, the Q&A knowledge base 4 includes a plurality of questions pre-organized and each question One or more answers associated with;
计算步骤:针对所述候选问题集合中的每个候选问题,分别计算所述咨询问题与该候选问题的问题相似度,所述问题相似度由咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到,其中所述文本相似度和语义相似度的权重均大于所述主题相似度和句法相似度的权重;a calculating step: calculating, for each candidate question in the candidate question set, a problem similarity between the consulting question and the candidate question, the problem similarity being the text similarity between the consulting question and the corresponding candidate question The semantic similarity, the topic similarity, and the syntactic similarity are obtained by linear weighting, wherein the weights of the text similarity and the semantic similarity are greater than the weights of the topic similarity and the syntactic similarity;
选择步骤:选择计算得到的最高问题相似度对应的候选问题,在问答知识库4中查询所选择候选问题的一个或多个关联答案,将所述一个或多个关联答案中在预设时间段内输出频率最高的关联答案作为目标答案输出。The selecting step: selecting the candidate question corresponding to the highest problem similarity calculated, querying the question answering knowledge base 4 for one or more associated answers of the selected candidate question, and setting the one or more associated answers in a preset time period The associated answer with the highest internal output frequency is output as the target answer.
其中,所述预处理包括分词得到各词条、将分词后词条长度大于第一阈值的词条作为长切词、为所述长切词进行词性标注、通过隐马尔可夫模型对所述长切词进行命名实体识别从而识别出专有名词、采用TF-IDF算法从所述长切词中提取关键词、采用N-gram语言模型和编辑距离为所述咨询问题进行语句纠错处理。Wherein, the pre-processing includes word segmentation to obtain each term, the term after the segmentation term is greater than the first threshold as a long-cut word, the long-cut word is part-of-speech tagged, and the hidden Markov model is used to The long cut word performs the named entity recognition to identify the proper noun, uses the TF-IDF algorithm to extract the keyword from the long cut word, uses the N-gram language model and the edit distance to perform the statement error correction processing for the consulting problem.
其中,所述咨询问题和相应的候选问题之间的文本相似度的计算方法包括:The method for calculating the text similarity between the consulting question and the corresponding candidate question includes:
统计所述咨询问题与该候选问题之间的多个指定特征,对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度;Counting a plurality of specified features between the consulting question and the candidate question, performing linear weighting calculation on the plurality of specified features to obtain a text similarity between the consulting question and the corresponding candidate question;
其中,所述多个指定特征包括:Wherein the plurality of specified features include:
咨询问题和该候选问题的共同关键词数量a1;Consultation question and the number of common keywords a1 of the candidate question;
咨询问题和该候选问题的共同关键词长度a2;Consultation question and the common keyword length a2 of the candidate question;
咨询问题和该候选问题的共同词条的数量a3;The number of consultation questions and the number of common terms for the candidate question a3;
咨询问题和该候选问题的共同词条的长度a4;Consultation question and the length of the common entry for the candidate question a4;
咨询问题的长度a5;The length of the consultation question a5;
该候选问题的长度a6。The length of the candidate question is a6.
所述对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度包括:Performing a linear weighting calculation on the plurality of specified features to obtain a text similarity between the consulting question and the corresponding candidate question includes:
采用逆文档率算法计算每个指定特征的权重,对所述多个指定特征采用多元逻辑回归模型进行加权回归拟合计算,得到咨询问题与该候选问题的文本相似度g(z),公式如下:The inverse document rate algorithm is used to calculate the weight of each specified feature, and the multiple logistic regression model is used to perform weighted regression fitting calculation on the plurality of specified features, and the text similarity g(z) of the consulting problem and the candidate problem is obtained, and the formula is as follows :
g(z)=1/(1+e z),e为自然常数; g(z)=1/(1+e z ), where e is a natural constant;
z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;
其中,x1、x2...x6分别为所述a1、a2...a6的权重。Wherein x1, x2, ..., x6 are the weights of the a1, a2, ..., a6, respectively.
此外,所述咨询问题和相应的候选问题之间的语义相似度的计算方法包括:Furthermore, the method for calculating the semantic similarity between the consulting question and the corresponding candidate question includes:
采用word2vec算法将咨询问题分词后的各词条表示为词向量,将咨询问题中各词向量取平均值得到咨询问题的句子向量;The word2vec algorithm is used to represent each term after the question segmentation as a word vector, and the word vectors in the consultation question are averaged to obtain the sentence vector of the consultation question;
采用word2vec算法将该候选问题分词后的各词条表示为词向量,将该候选问题中各词向量取平均值得到该候选问题的句子向量;Using the word2vec algorithm, each term after the candidate problem segmentation is represented as a word vector, and the word vectors in the candidate question are averaged to obtain a sentence vector of the candidate question;
计算咨询问题的句子向量与该候选问题的句子向量之间的余弦相似度,得到咨询问题与该候选问题的语义相似度;Calculating the cosine similarity between the sentence vector of the consulting question and the sentence vector of the candidate question, and obtaining the semantic similarity between the consulting question and the candidate question;
所述咨询问题和相应的候选问题之间的主题相似度的计算方法包括:The method for calculating the topic similarity between the consulting question and the corresponding candidate question includes:
采用LDA线性判别分析的主题表达法,构建咨询问题的主题向量,以及该候选问题的主题向量;Using the topic expression method of LDA linear discriminant analysis, constructing a topic vector of the consulting question and the subject vector of the candidate question;
计算咨询问题的主题向量与该候选问题的主题向量之间的余弦相似度,得到咨询问题与该候选问题的主题相似度;Calculating a cosine similarity between the subject vector of the consulting question and the subject vector of the candidate question, and obtaining a topic similarity between the consulting question and the candidate question;
所述咨询问题和相应的候选问题之间的句法相似度的计算方法包括:The method for calculating the syntactic similarity between the consulting question and the corresponding candidate question includes:
采用LTP语言技术平台分析咨询问题和该候选问题的句法,得到咨询问题和该候选问题的句法向量;Using the LTP language technology platform to analyze the consulting problem and the syntax of the candidate question, and obtain the consulting question and the syntax vector of the candidate question;
计算咨询问题的句法向量与该候选问题的句法向量之间的余弦相似度,得到咨询问题与该候选问题的句法相似度。The cosine similarity between the syntax vector of the consulting problem and the syntax vector of the candidate problem is calculated, and the syntactic similarity between the consulting problem and the candidate problem is obtained.
在一个实施例中,所述预处理还包括:In an embodiment, the preprocessing further includes:
将所述咨询问题分词得到的各词条分别与预设的积极词汇库和消极词汇库进行对比,判断所述咨询问题中是否包含积极词汇或消极词汇;Comparing each term obtained by the consulting problem segmentation with a preset positive vocabulary and a negative vocabulary to determine whether the consulting question includes positive vocabulary or negative vocabulary;
将所述目标答案输出之前还包括:Before outputting the target answer, it also includes:
若所述咨询问题中仅包含积极词汇,则获取积极词汇对应的预设问候语,将所述积极词汇对应的预设问候语与所述目标答案结合;If the consulting question includes only the positive vocabulary, the preset greeting corresponding to the positive vocabulary is obtained, and the preset greeting corresponding to the positive vocabulary is combined with the target answer;
若所述咨询问题中仅包含消极词汇,则获取消极词汇对应的预设问候语,将所述消极词汇对应的预设问候语与所述目标答案结合;If the consultation question includes only the negative vocabulary, the preset greeting corresponding to the negative vocabulary is obtained, and the preset greeting corresponding to the negative vocabulary is combined with the target answer;
若所述咨询问题中包含积极词汇和消极词汇,或者所述咨询问题中不包含积极词汇和消极词汇,则获取中性词汇对应的预设问候语,将所述中性词 汇对应的预设问候语与所述目标答案结合。If the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include the positive vocabulary and the negative vocabulary, the preset greeting corresponding to the neutral vocabulary is obtained, and the preset greeting corresponding to the neutral vocabulary is obtained. The language is combined with the target answer.
本申请之计算机可读存储介质的具体实施方式与上述智能应答方法以及电子装置1的具体实施方式大致相同,在此不再赘述。The specific implementation manner of the computer readable storage medium of the present application is substantially the same as the above-mentioned intelligent response method and the specific embodiment of the electronic device 1, and details are not described herein again.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It is to be understood that the term "comprises", "comprising", or any other variants thereof, is intended to encompass a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a series of elements includes those elements. It also includes other elements not explicitly listed, or elements that are inherent to such a process, device, item, or method. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, the device, the item, or the method that comprises the element.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, portions of the technical solution of the present application that contribute substantially or to the prior art may be embodied in the form of a software product stored in a storage medium as described above, including a number of instructions. To enable a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the various embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims (20)

  1. 一种智能应答方法,其特征在于,该方法包括以下步骤:An intelligent response method, characterized in that the method comprises the following steps:
    获取步骤:获取输入的咨询问题,对所述咨询问题进行预处理,所述预处理包括分词得到各词条、对每个词条进行词性标注和命名实体识别、从各词条中提取关键词,以及对所述咨询问题进行语句纠错;Obtaining step: obtaining an input consulting question, and pre-processing the consulting question, the pre-processing includes segmentation to obtain each term, performing part-of-speech tagging and named entity recognition for each term, and extracting keywords from each term And correcting the statement of the consulting question;
    构建步骤:对问答知识库中的每个问题和答案进行所述预处理,将经所述预处理后的每个问题和答案映射到倒排记录表中,从而为所述问答知识库构建倒排索引,通过倒排索引查询的方式从所述问答知识库中查询出与所述咨询问题相关的候选问题集合,所述问答知识库包括预先整理的多个问题以及每个问题关联的一个或多个答案;Build step: performing pre-processing on each question and answer in the question-and-answer knowledge base, mapping each question and answer after the pre-processing to the inverted record table, thereby constructing the question-and-answer knowledge base Querying a list of candidate questions related to the consulting question from the Q&A knowledge base by means of an inverted index query, the question and answer knowledge base including a plurality of questions pre-organized and one or each associated with each question Multiple answers;
    计算步骤:针对所述候选问题集合中的每个候选问题,分别计算所述咨询问题与该候选问题的问题相似度,所述问题相似度由咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到,其中所述文本相似度和语义相似度的权重均大于所述主题相似度和句法相似度的权重;a calculating step: calculating, for each candidate question in the candidate question set, a problem similarity between the consulting question and the candidate question, the problem similarity being the text similarity between the consulting question and the corresponding candidate question The semantic similarity, the topic similarity, and the syntactic similarity are obtained by linear weighting, wherein the weights of the text similarity and the semantic similarity are greater than the weights of the topic similarity and the syntactic similarity;
    选择步骤:选择计算得到的最高问题相似度对应的候选问题,在问答知识库中查询所选择候选问题的一个或多个关联答案,将所述一个或多个关联答案中在预设时间段内输出频率最高的关联答案作为目标答案输出。The selecting step is: selecting a candidate question corresponding to the highest problem similarity calculated, and querying one or more associated answers of the selected candidate question in the question and answer knowledge base, and the one or more associated answers are within a preset time period The associated answer with the highest output frequency is output as the target answer.
  2. 如权利要求1所述的智能应答方法,其特征在于,所述咨询问题和相应的候选问题之间的文本相似度的计算方法包括:The intelligent response method according to claim 1, wherein the method for calculating the text similarity between the consultation question and the corresponding candidate question comprises:
    统计所述咨询问题与该候选问题之间的多个指定特征,对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度;Counting a plurality of specified features between the consulting question and the candidate question, performing linear weighting calculation on the plurality of specified features to obtain a text similarity between the consulting question and the corresponding candidate question;
    其中,所述多个指定特征包括:Wherein the plurality of specified features include:
    咨询问题和该候选问题的共同关键词数量a1;Consultation question and the number of common keywords a1 of the candidate question;
    咨询问题和该候选问题的共同关键词长度a2;Consultation question and the common keyword length a2 of the candidate question;
    咨询问题和该候选问题的共同词条的数量a3;The number of consultation questions and the number of common terms for the candidate question a3;
    咨询问题和该候选问题的共同词条的长度a4;Consultation question and the length of the common entry for the candidate question a4;
    咨询问题的长度a5;The length of the consultation question a5;
    该候选问题的长度a6。The length of the candidate question is a6.
  3. 如权利要求2所述的智能应答方法,其特征在于,所述对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度包括:The intelligent response method according to claim 2, wherein said performing linear weighting calculation on said plurality of specified features to obtain text similarity between the consulting question and the corresponding candidate question comprises:
    采用逆文档率算法计算每个指定特征的权重,对所述多个指定特征采用多元逻辑回归模型进行加权回归拟合计算,得到咨询问题与该候选问题的文本相似度g(z),公式如下:The inverse document rate algorithm is used to calculate the weight of each specified feature, and the multiple logistic regression model is used to perform weighted regression fitting calculation on the plurality of specified features, and the text similarity g(z) of the consulting problem and the candidate problem is obtained, and the formula is as follows :
    g(z)=1/(1+e z),e为自然常数; g(z)=1/(1+e z ), where e is a natural constant;
    z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;
    其中,x1、x2...x6分别为所述a1、a2...a6的权重。Wherein x1, x2, ..., x6 are the weights of the a1, a2, ..., a6, respectively.
  4. 如权利要求1所述的智能应答方法,其特征在于,所述咨询问题和相应的候选问题之间的语义相似度的计算方法包括:The intelligent response method according to claim 1, wherein the method for calculating the semantic similarity between the consultation question and the corresponding candidate question comprises:
    采用word2vec算法将咨询问题分词后的各词条表示为词向量,将咨询问题中各词向量取平均值得到咨询问题的句子向量;The word2vec algorithm is used to represent each term after the question segmentation as a word vector, and the word vectors in the consultation question are averaged to obtain the sentence vector of the consultation question;
    采用word2vec算法将该候选问题分词后的各词条表示为词向量,将该候选问题中各词向量取平均值得到该候选问题的句子向量;Using the word2vec algorithm, each term after the candidate problem segmentation is represented as a word vector, and the word vectors in the candidate question are averaged to obtain a sentence vector of the candidate question;
    计算咨询问题的句子向量与该候选问题的句子向量之间的余弦相似度,得到咨询问题与该候选问题的语义相似度;Calculating the cosine similarity between the sentence vector of the consulting question and the sentence vector of the candidate question, and obtaining the semantic similarity between the consulting question and the candidate question;
    所述咨询问题和相应的候选问题之间的主题相似度的计算方法包括:The method for calculating the topic similarity between the consulting question and the corresponding candidate question includes:
    采用LDA线性判别分析的主题表达法,构建咨询问题的主题向量,以及该候选问题的主题向量;Using the topic expression method of LDA linear discriminant analysis, constructing a topic vector of the consulting question and the subject vector of the candidate question;
    计算咨询问题的主题向量与该候选问题的主题向量之间的余弦相似度,得到咨询问题与该候选问题的主题相似度;Calculating a cosine similarity between the subject vector of the consulting question and the subject vector of the candidate question, and obtaining a topic similarity between the consulting question and the candidate question;
    所述咨询问题和相应的候选问题之间的句法相似度的计算方法包括:The method for calculating the syntactic similarity between the consulting question and the corresponding candidate question includes:
    采用LTP语言技术平台分析咨询问题和该候选问题的句法,得到咨询问题和该候选问题的句法向量;Using the LTP language technology platform to analyze the consulting problem and the syntax of the candidate question, and obtain the consulting question and the syntax vector of the candidate question;
    计算咨询问题的句法向量与该候选问题的句法向量之间的余弦相似度,得到咨询问题与该候选问题的句法相似度。The cosine similarity between the syntax vector of the consulting problem and the syntax vector of the candidate problem is calculated, and the syntactic similarity between the consulting problem and the candidate problem is obtained.
  5. 如权利要求1所述的智能应答方法,其特征在于,所述预处理包括分词得到各词条、将分词后词条长度大于第一阈值的词条作为长切词、为所述长切词进行词性标注、通过隐马尔可夫模型对所述长切词进行命名实体识别从而识别出专有名词、采用TF-IDF算法从所述长切词中提取关键词、采用N-gram语言模型和编辑距离为所述咨询问题进行语句纠错处理。The intelligent response method according to claim 1, wherein the pre-processing includes word segmentation to obtain each term, and the term after the segmentation term length is greater than the first threshold is used as the long-cut word, and the long-cut word is Performing part-of-speech tagging, identifying the long-cut word by the hidden Markov model to identify the proper noun, extracting the keyword from the long-cut word by using the TF-IDF algorithm, adopting the N-gram language model and The edit distance performs statement error correction processing for the consultation question.
  6. 如权利要求1所述的智能应答方法,其特征在于,所述预处理还包括:The intelligent response method according to claim 1, wherein the preprocessing further comprises:
    将所述咨询问题分词得到的各词条分别与预设的积极词汇库和消极词汇库进行对比,判断所述咨询问题中是否包含积极词汇或消极词汇;Comparing each term obtained by the consulting problem segmentation with a preset positive vocabulary and a negative vocabulary to determine whether the consulting question includes positive vocabulary or negative vocabulary;
    将所述目标答案输出之前还包括:Before outputting the target answer, it also includes:
    若所述咨询问题中仅包含积极词汇,则获取积极词汇对应的预设问候语,将所述积极词汇对应的预设问候语与所述目标答案结合;If the consulting question includes only the positive vocabulary, the preset greeting corresponding to the positive vocabulary is obtained, and the preset greeting corresponding to the positive vocabulary is combined with the target answer;
    若所述咨询问题中仅包含消极词汇,则获取消极词汇对应的预设问候语,将所述消极词汇对应的预设问候语与所述目标答案结合;If the consultation question includes only the negative vocabulary, the preset greeting corresponding to the negative vocabulary is obtained, and the preset greeting corresponding to the negative vocabulary is combined with the target answer;
    若所述咨询问题中包含积极词汇和消极词汇,或者所述咨询问题中不包含积极词汇和消极词汇,则获取中性词汇对应的预设问候语,将所述中性词汇对应的预设问候语与所述目标答案结合。If the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include the positive vocabulary and the negative vocabulary, the preset greeting corresponding to the neutral vocabulary is obtained, and the preset greeting corresponding to the neutral vocabulary is obtained. The language is combined with the target answer.
  7. 如权利要求2-5任一项所述的智能应答方法,其特征在于,所述预处理还包括:The intelligent response method according to any one of claims 2 to 5, wherein the preprocessing further comprises:
    将所述咨询问题分词得到的各词条分别与预设的积极词汇库和消极词汇库进行对比,判断所述咨询问题中是否包含积极词汇或消极词汇;Comparing each term obtained by the consulting problem segmentation with a preset positive vocabulary and a negative vocabulary to determine whether the consulting question includes positive vocabulary or negative vocabulary;
    将所述目标答案输出之前还包括:Before outputting the target answer, it also includes:
    若所述咨询问题中仅包含积极词汇,则获取积极词汇对应的预设问候语,将所述积极词汇对应的预设问候语与所述目标答案结合;If the consulting question includes only the positive vocabulary, the preset greeting corresponding to the positive vocabulary is obtained, and the preset greeting corresponding to the positive vocabulary is combined with the target answer;
    若所述咨询问题中仅包含消极词汇,则获取消极词汇对应的预设问候语,将所述消极词汇对应的预设问候语与所述目标答案结合;If the consultation question includes only the negative vocabulary, the preset greeting corresponding to the negative vocabulary is obtained, and the preset greeting corresponding to the negative vocabulary is combined with the target answer;
    若所述咨询问题中包含积极词汇和消极词汇,或者所述咨询问题中不包含积极词汇和消极词汇,则获取中性词汇对应的预设问候语,将所述中性词汇对应的预设问候语与所述目标答案结合。If the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include the positive vocabulary and the negative vocabulary, the preset greeting corresponding to the neutral vocabulary is obtained, and the preset greeting corresponding to the neutral vocabulary is obtained. The language is combined with the target answer.
  8. 一种电子装置,包括存储器和处理器,其特征在于,所述存储器中包括智能应答程序,该智能应答程序被所述处理器执行时实现如下步骤:An electronic device includes a memory and a processor, wherein the memory includes an intelligent response program, and the smart response program is executed by the processor to implement the following steps:
    获取步骤:获取输入的咨询问题,对所述咨询问题进行预处理,所述预处理包括分词得到各词条、对每个词条进行词性标注和命名实体识别、从各词条中提取关键词,以及对所述咨询问题进行语句纠错;Obtaining step: obtaining an input consulting question, and pre-processing the consulting question, the pre-processing includes segmentation to obtain each term, performing part-of-speech tagging and named entity recognition for each term, and extracting keywords from each term And correcting the statement of the consulting question;
    构建步骤:对问答知识库中的每个问题和答案进行所述预处理,将经所述预处理后的每个问题和答案映射到倒排记录表中,从而为所述问答知识库构建倒排索引,通过倒排索引查询的方式从所述问答知识库中查询出与所述咨询问题相关的候选问题集合,所述问答知识库包括预先整理的多个问题以及每个问题关联的一个或多个答案;Build step: performing pre-processing on each question and answer in the question-and-answer knowledge base, mapping each question and answer after the pre-processing to the inverted record table, thereby constructing the question-and-answer knowledge base Querying a list of candidate questions related to the consulting question from the Q&A knowledge base by means of an inverted index query, the question and answer knowledge base including a plurality of questions pre-organized and one or each associated with each question Multiple answers;
    计算步骤:针对所述候选问题集合中的每个候选问题,分别计算所述咨询问题与该候选问题的问题相似度,所述问题相似度由咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到,其中所述文本相似度和语义相似度的权重均大于所述主题相似度和句法相似度的权重;a calculating step: calculating, for each candidate question in the candidate question set, a problem similarity between the consulting question and the candidate question, the problem similarity being the text similarity between the consulting question and the corresponding candidate question The semantic similarity, the topic similarity, and the syntactic similarity are obtained by linear weighting, wherein the weights of the text similarity and the semantic similarity are greater than the weights of the topic similarity and the syntactic similarity;
    选择步骤:选择计算得到的最高问题相似度对应的候选问题,在问答知识库中查询所选择候选问题的一个或多个关联答案,将所述一个或多个关联答案中在预设时间段内输出频率最高的关联答案作为目标答案输出。The selecting step is: selecting a candidate question corresponding to the highest problem similarity calculated, and querying one or more associated answers of the selected candidate question in the question and answer knowledge base, and the one or more associated answers are within a preset time period The associated answer with the highest output frequency is output as the target answer.
  9. 如权利要求8所述的电子装置,其特征在于,所述咨询问题和相应的候选问题之间的文本相似度的计算方法包括:The electronic device according to claim 8, wherein the method for calculating the text similarity between the consultation question and the corresponding candidate question comprises:
    统计所述咨询问题与该候选问题之间的多个指定特征,对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度;Counting a plurality of specified features between the consulting question and the candidate question, performing linear weighting calculation on the plurality of specified features to obtain a text similarity between the consulting question and the corresponding candidate question;
    其中,所述多个指定特征包括:Wherein the plurality of specified features include:
    咨询问题和该候选问题的共同关键词数量a1;Consultation question and the number of common keywords a1 of the candidate question;
    咨询问题和该候选问题的共同关键词长度a2;Consultation question and the common keyword length a2 of the candidate question;
    咨询问题和该候选问题的共同词条的数量a3;The number of consultation questions and the number of common terms for the candidate question a3;
    咨询问题和该候选问题的共同词条的长度a4;Consultation question and the length of the common entry for the candidate question a4;
    咨询问题的长度a5;The length of the consultation question a5;
    该候选问题的长度a6。The length of the candidate question is a6.
  10. 如权利要求9所述的电子装置,其特征在于,所述对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度包括:The electronic device according to claim 9, wherein the linear weighting calculation of the plurality of specified features to obtain a text similarity between the consultation question and the corresponding candidate question comprises:
    采用逆文档率算法计算每个指定特征的权重,对所述多个指定特征采用多元逻辑回归模型进行加权回归拟合计算,得到咨询问题与该候选问题的文本相似度g(z),公式如下:The inverse document rate algorithm is used to calculate the weight of each specified feature, and the multiple logistic regression model is used to perform weighted regression fitting calculation on the plurality of specified features, and the text similarity g(z) of the consulting problem and the candidate problem is obtained, and the formula is as follows :
    g(z)=1/(1+e z),e为自然常数; g(z)=1/(1+e z ), where e is a natural constant;
    z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;
    其中,x1、x2...x6分别为所述a1、a2...a6的权重。Wherein x1, x2, ..., x6 are the weights of the a1, a2, ..., a6, respectively.
  11. 如权利要求8所述的电子装置,其特征在于,所述咨询问题和相应的候选问题之间的语义相似度的计算方法包括:The electronic device according to claim 8, wherein the method for calculating the semantic similarity between the consultation question and the corresponding candidate question comprises:
    采用word2vec算法将咨询问题分词后的各词条表示为词向量,将咨询问题中各词向量取平均值得到咨询问题的句子向量;The word2vec algorithm is used to represent each term after the question segmentation as a word vector, and the word vectors in the consultation question are averaged to obtain the sentence vector of the consultation question;
    采用word2vec算法将该候选问题分词后的各词条表示为词向量,将该候选问题中各词向量取平均值得到该候选问题的句子向量;Using the word2vec algorithm, each term after the candidate problem segmentation is represented as a word vector, and the word vectors in the candidate question are averaged to obtain a sentence vector of the candidate question;
    计算咨询问题的句子向量与该候选问题的句子向量之间的余弦相似度,得到咨询问题与该候选问题的语义相似度;Calculating the cosine similarity between the sentence vector of the consulting question and the sentence vector of the candidate question, and obtaining the semantic similarity between the consulting question and the candidate question;
    所述咨询问题和相应的候选问题之间的主题相似度的计算方法包括:The method for calculating the topic similarity between the consulting question and the corresponding candidate question includes:
    采用LDA线性判别分析的主题表达法,构建咨询问题的主题向量,以及该候选问题的主题向量;Using the topic expression method of LDA linear discriminant analysis, constructing a topic vector of the consulting question and the subject vector of the candidate question;
    计算咨询问题的主题向量与该候选问题的主题向量之间的余弦相似度,得到咨询问题与该候选问题的主题相似度;Calculating a cosine similarity between the subject vector of the consulting question and the subject vector of the candidate question, and obtaining a topic similarity between the consulting question and the candidate question;
    所述咨询问题和相应的候选问题之间的句法相似度的计算方法包括:The method for calculating the syntactic similarity between the consulting question and the corresponding candidate question includes:
    采用LTP语言技术平台分析咨询问题和该候选问题的句法,得到咨询问题和该候选问题的句法向量;Using the LTP language technology platform to analyze the consulting problem and the syntax of the candidate question, and obtain the consulting question and the syntax vector of the candidate question;
    计算咨询问题的句法向量与该候选问题的句法向量之间的余弦相似度,得到咨询问题与该候选问题的句法相似度。The cosine similarity between the syntax vector of the consulting problem and the syntax vector of the candidate problem is calculated, and the syntactic similarity between the consulting problem and the candidate problem is obtained.
  12. 如权利要求8所述的电子装置,其特征在于,所述预处理包括分词得到各词条、将分词后词条长度大于第一阈值的词条作为长切词、为所述长切词进行词性标注、通过隐马尔可夫模型对所述长切词进行命名实体识别从而识别出专有名词、采用TF-IDF算法从所述长切词中提取关键词、采用N-gram语言模型和编辑距离为所述咨询问题进行语句纠错处理。The electronic device according to claim 8, wherein the preprocessing comprises word segmentation to obtain each term, the term after the word segmentation term length is greater than the first threshold value as a long word, and the long word segmentation Part-of-speech tagging, naming entity recognition of the long-cut word by hidden Markov model to identify proper nouns, extracting keywords from the long-cut words by TF-IDF algorithm, adopting N-gram language model and editing The distance is corrected for the query problem.
  13. 如权利要求8所述的电子装置,其特征在于,所述预处理还包括:The electronic device according to claim 8, wherein the preprocessing further comprises:
    将所述咨询问题分词得到的各词条分别与预设的积极词汇库和消极词汇库进行对比,判断所述咨询问题中是否包含积极词汇或消极词汇;Comparing each term obtained by the consulting problem segmentation with a preset positive vocabulary and a negative vocabulary to determine whether the consulting question includes positive vocabulary or negative vocabulary;
    将所述目标答案输出之前还包括:Before outputting the target answer, it also includes:
    若所述咨询问题中仅包含积极词汇,则获取积极词汇对应的预设问候语,将所述积极词汇对应的预设问候语与所述目标答案结合;If the consulting question includes only the positive vocabulary, the preset greeting corresponding to the positive vocabulary is obtained, and the preset greeting corresponding to the positive vocabulary is combined with the target answer;
    若所述咨询问题中仅包含消极词汇,则获取消极词汇对应的预设问候语,将所述消极词汇对应的预设问候语与所述目标答案结合;If the consultation question includes only the negative vocabulary, the preset greeting corresponding to the negative vocabulary is obtained, and the preset greeting corresponding to the negative vocabulary is combined with the target answer;
    若所述咨询问题中包含积极词汇和消极词汇,或者所述咨询问题中不包含积极词汇和消极词汇,则获取中性词汇对应的预设问候语,将所述中性词 汇对应的预设问候语与所述目标答案结合。If the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include the positive vocabulary and the negative vocabulary, the preset greeting corresponding to the neutral vocabulary is obtained, and the preset greeting corresponding to the neutral vocabulary is obtained. The language is combined with the target answer.
  14. 如权利要求9-12任一项所述的电子装置,其特征在于,所述预处理还包括:The electronic device according to any one of claims 9 to 12, wherein the preprocessing further comprises:
    将所述咨询问题分词得到的各词条分别与预设的积极词汇库和消极词汇库进行对比,判断所述咨询问题中是否包含积极词汇或消极词汇;Comparing each term obtained by the consulting problem segmentation with a preset positive vocabulary and a negative vocabulary to determine whether the consulting question includes positive vocabulary or negative vocabulary;
    将所述目标答案输出之前还包括:Before outputting the target answer, it also includes:
    若所述咨询问题中仅包含积极词汇,则获取积极词汇对应的预设问候语,将所述积极词汇对应的预设问候语与所述目标答案结合;If the consulting question includes only the positive vocabulary, the preset greeting corresponding to the positive vocabulary is obtained, and the preset greeting corresponding to the positive vocabulary is combined with the target answer;
    若所述咨询问题中仅包含消极词汇,则获取消极词汇对应的预设问候语,将所述消极词汇对应的预设问候语与所述目标答案结合;If the consultation question includes only the negative vocabulary, the preset greeting corresponding to the negative vocabulary is obtained, and the preset greeting corresponding to the negative vocabulary is combined with the target answer;
    若所述咨询问题中包含积极词汇和消极词汇,或者所述咨询问题中不包含积极词汇和消极词汇,则获取中性词汇对应的预设问候语,将所述中性词汇对应的预设问候语与所述目标答案结合。If the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include the positive vocabulary and the negative vocabulary, the preset greeting corresponding to the neutral vocabulary is obtained, and the preset greeting corresponding to the neutral vocabulary is obtained. The language is combined with the target answer.
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中包括智能应答程序,所述智能应答程序被处理器执行时,实现如下步骤:A computer readable storage medium, characterized in that the computer readable storage medium comprises an intelligent response program, and when the intelligent response program is executed by the processor, the following steps are implemented:
    获取步骤:获取输入的咨询问题,对所述咨询问题进行预处理,所述预处理包括分词得到各词条、对每个词条进行词性标注和命名实体识别、从各词条中提取关键词,以及对所述咨询问题进行语句纠错;Obtaining step: obtaining an input consulting question, and pre-processing the consulting question, the pre-processing includes segmentation to obtain each term, performing part-of-speech tagging and named entity recognition for each term, and extracting keywords from each term And correcting the statement of the consulting question;
    构建步骤:对问答知识库中的每个问题和答案进行所述预处理,将经所述预处理后的每个问题和答案映射到倒排记录表中,从而为所述问答知识库构建倒排索引,通过倒排索引查询的方式从所述问答知识库中查询出与所述咨询问题相关的候选问题集合,所述问答知识库包括预先整理的多个问题以及每个问题关联的一个或多个答案;Build step: performing pre-processing on each question and answer in the question-and-answer knowledge base, mapping each question and answer after the pre-processing to the inverted record table, thereby constructing the question-and-answer knowledge base Querying a list of candidate questions related to the consulting question from the Q&A knowledge base by means of an inverted index query, the question and answer knowledge base including a plurality of questions pre-organized and one or each associated with each question Multiple answers;
    计算步骤:针对所述候选问题集合中的每个候选问题,分别计算所述咨询问题与该候选问题的问题相似度,所述问题相似度由咨询问题和相应的候选问题之间的文本相似度、语义相似度、主题相似度和句法相似度经线性加权得到,其中所述文本相似度和语义相似度的权重均大于所述主题相似度和句法相似度的权重;a calculating step: calculating, for each candidate question in the candidate question set, a problem similarity between the consulting question and the candidate question, the problem similarity being the text similarity between the consulting question and the corresponding candidate question The semantic similarity, the topic similarity, and the syntactic similarity are obtained by linear weighting, wherein the weights of the text similarity and the semantic similarity are greater than the weights of the topic similarity and the syntactic similarity;
    选择步骤:选择计算得到的最高问题相似度对应的候选问题,在问答知识库中查询所选择候选问题的一个或多个关联答案,将所述一个或多个关联答案中在预设时间段内输出频率最高的关联答案作为目标答案输出。The selecting step is: selecting a candidate question corresponding to the highest problem similarity calculated, and querying one or more associated answers of the selected candidate question in the question and answer knowledge base, and the one or more associated answers are within a preset time period The associated answer with the highest output frequency is output as the target answer.
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述咨询问题和相应的候选问题之间的文本相似度的计算方法包括:The computer readable storage medium according to claim 15, wherein the method of calculating the text similarity between the consultation question and the corresponding candidate question comprises:
    统计所述咨询问题与该候选问题之间的多个指定特征,对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度;Counting a plurality of specified features between the consulting question and the candidate question, performing linear weighting calculation on the plurality of specified features to obtain a text similarity between the consulting question and the corresponding candidate question;
    其中,所述多个指定特征包括:Wherein the plurality of specified features include:
    咨询问题和该候选问题的共同关键词数量a1;Consultation question and the number of common keywords a1 of the candidate question;
    咨询问题和该候选问题的共同关键词长度a2;Consultation question and the common keyword length a2 of the candidate question;
    咨询问题和该候选问题的共同词条的数量a3;The number of consultation questions and the number of common terms for the candidate question a3;
    咨询问题和该候选问题的共同词条的长度a4;Consultation question and the length of the common entry for the candidate question a4;
    咨询问题的长度a5;The length of the consultation question a5;
    该候选问题的长度a6。The length of the candidate question is a6.
  17. 如权利要求16所述的计算机可读存储介质,其特征在于,所述对所述多个指定特征进行线性加权计算,得到咨询问题和相应的候选问题之间的文本相似度包括:The computer readable storage medium of claim 16, wherein the linear weighting calculation of the plurality of specified features to obtain text similarity between the consulting question and the corresponding candidate question comprises:
    采用逆文档率算法计算每个指定特征的权重,对所述多个指定特征采用多元逻辑回归模型进行加权回归拟合计算,得到咨询问题与该候选问题的文本相似度g(z),公式如下:The inverse document rate algorithm is used to calculate the weight of each specified feature, and the multiple logistic regression model is used to perform weighted regression fitting calculation on the plurality of specified features, and the text similarity g(z) of the consulting problem and the candidate problem is obtained, and the formula is as follows :
    g(z)=1/(1+e z),e为自然常数; g(z)=1/(1+e z ), where e is a natural constant;
    z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;z=a1*x1+a2*x2+a3*x3+a4*x4+a5*x5+a6*x6;
    其中,x1、x2...x6分别为所述a1、a2...a6的权重。Wherein x1, x2, ..., x6 are the weights of the a1, a2, ..., a6, respectively.
  18. 如权利要求15所述的计算机可读存储介质,其特征在于,所述咨询问题和相应的候选问题之间的语义相似度的计算方法包括:The computer readable storage medium of claim 15, wherein the method of calculating the semantic similarity between the consulting question and the corresponding candidate question comprises:
    采用word2vec算法将咨询问题分词后的各词条表示为词向量,将咨询问题中各词向量取平均值得到咨询问题的句子向量;The word2vec algorithm is used to represent each term after the question segmentation as a word vector, and the word vectors in the consultation question are averaged to obtain the sentence vector of the consultation question;
    采用word2vec算法将该候选问题分词后的各词条表示为词向量,将该候选问题中各词向量取平均值得到该候选问题的句子向量;Using the word2vec algorithm, each term after the candidate problem segmentation is represented as a word vector, and the word vectors in the candidate question are averaged to obtain a sentence vector of the candidate question;
    计算咨询问题的句子向量与该候选问题的句子向量之间的余弦相似度,得到咨询问题与该候选问题的语义相似度;Calculating the cosine similarity between the sentence vector of the consulting question and the sentence vector of the candidate question, and obtaining the semantic similarity between the consulting question and the candidate question;
    所述咨询问题和相应的候选问题之间的主题相似度的计算方法包括:The method for calculating the topic similarity between the consulting question and the corresponding candidate question includes:
    采用LDA线性判别分析的主题表达法,构建咨询问题的主题向量,以及该候选问题的主题向量;Using the topic expression method of LDA linear discriminant analysis, constructing a topic vector of the consulting question and the subject vector of the candidate question;
    计算咨询问题的主题向量与该候选问题的主题向量之间的余弦相似度,得到咨询问题与该候选问题的主题相似度;Calculating a cosine similarity between the subject vector of the consulting question and the subject vector of the candidate question, and obtaining a topic similarity between the consulting question and the candidate question;
    所述咨询问题和相应的候选问题之间的句法相似度的计算方法包括:The method for calculating the syntactic similarity between the consulting question and the corresponding candidate question includes:
    采用LTP语言技术平台分析咨询问题和该候选问题的句法,得到咨询问题和该候选问题的句法向量;Using the LTP language technology platform to analyze the consulting problem and the syntax of the candidate question, and obtain the consulting question and the syntax vector of the candidate question;
    计算咨询问题的句法向量与该候选问题的句法向量之间的余弦相似度,得到咨询问题与该候选问题的句法相似度。The cosine similarity between the syntax vector of the consulting problem and the syntax vector of the candidate problem is calculated, and the syntactic similarity between the consulting problem and the candidate problem is obtained.
  19. 如权利要求15所述的计算机可读存储介质,其特征在于,所述预处理包括分词得到各词条、将分词后词条长度大于第一阈值的词条作为长切词、为所述长切词进行词性标注、通过隐马尔可夫模型对所述长切词进行命名实体识别从而识别出专有名词、采用TF-IDF算法从所述长切词中提取关键词、采用N-gram语言模型和编辑距离为所述咨询问题进行语句纠错处理。The computer readable storage medium according to claim 15, wherein said preprocessing comprises word segmentation to obtain each term, and terminology after the segmentation term length is greater than a first threshold value as a long word, for said length Cutting words for part-of-speech tagging, naming entity recognition of the long-cut words by hidden Markov model to identify proper nouns, extracting keywords from the long-cut words by TF-IDF algorithm, adopting N-gram language The model and the edit distance are subjected to statement error correction processing for the consultation question.
  20. 如权利要求15所述的计算机可读存储介质,其特征在于,所述预处理还包括:The computer readable storage medium of claim 15, wherein the preprocessing further comprises:
    将所述咨询问题分词得到的各词条分别与预设的积极词汇库和消极词汇库进行对比,判断所述咨询问题中是否包含积极词汇或消极词汇;Comparing each term obtained by the consulting problem segmentation with a preset positive vocabulary and a negative vocabulary to determine whether the consulting question includes positive vocabulary or negative vocabulary;
    将所述目标答案输出之前还包括:Before outputting the target answer, it also includes:
    若所述咨询问题中仅包含积极词汇,则获取积极词汇对应的预设问候语,将所述积极词汇对应的预设问候语与所述目标答案结合;If the consulting question includes only the positive vocabulary, the preset greeting corresponding to the positive vocabulary is obtained, and the preset greeting corresponding to the positive vocabulary is combined with the target answer;
    若所述咨询问题中仅包含消极词汇,则获取消极词汇对应的预设问候语,将所述消极词汇对应的预设问候语与所述目标答案结合;If the consultation question includes only the negative vocabulary, the preset greeting corresponding to the negative vocabulary is obtained, and the preset greeting corresponding to the negative vocabulary is combined with the target answer;
    若所述咨询问题中包含积极词汇和消极词汇,或者所述咨询问题中不包含积极词汇和消极词汇,则获取中性词汇对应的预设问候语,将所述中性词汇对应的预设问候语与所述目标答案结合。If the consulting question includes a positive vocabulary and a negative vocabulary, or the consulting question does not include the positive vocabulary and the negative vocabulary, the preset greeting corresponding to the neutral vocabulary is obtained, and the preset greeting corresponding to the neutral vocabulary is obtained. The language is combined with the target answer.
PCT/CN2018/089882 2018-02-09 2018-06-05 Intelligent response method, electronic device and storage medium WO2019153607A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810134579.9 2018-02-09
CN201810134579.9A CN108345672A (en) 2018-02-09 2018-02-09 Intelligent response method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
WO2019153607A1 true WO2019153607A1 (en) 2019-08-15

Family

ID=62959188

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/089882 WO2019153607A1 (en) 2018-02-09 2018-06-05 Intelligent response method, electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN108345672A (en)
WO (1) WO2019153607A1 (en)

Families Citing this family (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033156B (en) * 2018-06-13 2021-06-15 腾讯科技(深圳)有限公司 Information processing method and device and terminal
CN109271524B (en) * 2018-08-02 2021-10-15 中国科学院计算技术研究所 Entity linking method in knowledge base question-answering system
CN109040481A (en) * 2018-08-09 2018-12-18 武汉优品楚鼎科技有限公司 The automatic error-correcting smart phone inquiry method, system and device of field of securities
CN110837586B (en) * 2018-08-15 2023-05-26 阿里巴巴集团控股有限公司 Question-answer matching method, system, server and storage medium
CN111046147A (en) * 2018-10-11 2020-04-21 马上消费金融股份有限公司 Question answering method and device and terminal equipment
CN109522394A (en) * 2018-10-12 2019-03-26 北京奔影网络科技有限公司 Knowledge base question and answer system and method for building up
MY195969A (en) * 2018-10-24 2023-02-27 Advanced New Technologies Co Ltd Intelligent Customer Services Based on a Vector Propagation on a Click Graph Model
CN109657232A (en) * 2018-11-16 2019-04-19 北京九狐时代智能科技有限公司 A kind of intension recognizing method
CN109857841A (en) * 2018-12-05 2019-06-07 厦门快商通信息技术有限公司 A kind of FAQ question sentence Text similarity computing method and system
CN109800284B (en) * 2018-12-19 2021-02-05 中国电子科技集团公司第二十八研究所 Task-oriented unstructured information intelligent question-answering system construction method
CN109670029B (en) * 2018-12-28 2021-09-07 百度在线网络技术(北京)有限公司 Method, apparatus, computer device and storage medium for determining answers to questions
CN109920415A (en) * 2019-01-17 2019-06-21 平安城市建设科技(深圳)有限公司 Nan-machine interrogation's method, apparatus, equipment and storage medium based on speech recognition
CN109857850A (en) * 2019-01-18 2019-06-07 深圳壹账通智能科技有限公司 Counsel requests processing method, device, computer equipment and storage medium
CN109710749A (en) * 2019-01-22 2019-05-03 深圳追一科技有限公司 A kind of customer service auxiliary device and method
CN109885657B (en) * 2019-02-18 2021-04-27 武汉瓯越网视有限公司 Text similarity calculation method and device and storage medium
CN109902163B (en) * 2019-02-28 2022-03-01 百度在线网络技术(北京)有限公司 Intelligent response method, device, equipment and storage medium
CN110096567B (en) * 2019-03-14 2020-12-25 中国科学院自动化研究所 QA knowledge base reasoning-based multi-round dialogue reply selection method and system
CN110134777B (en) * 2019-05-29 2021-11-26 腾讯科技(深圳)有限公司 Question duplication eliminating method and device, electronic equipment and computer readable storage medium
CN110459210A (en) * 2019-07-30 2019-11-15 平安科技(深圳)有限公司 Answering method, device, equipment and storage medium based on speech analysis
CN110543553B (en) * 2019-07-31 2024-06-14 平安科技(深圳)有限公司 Problem generation method, device, computer equipment and storage medium
CN110543544A (en) * 2019-09-04 2019-12-06 北京羽扇智信息科技有限公司 Text processing method, storage medium and electronic device
CN110795942B (en) * 2019-09-18 2022-10-14 平安科技(深圳)有限公司 Keyword determination method and device based on semantic recognition and storage medium
CN110750629A (en) * 2019-09-18 2020-02-04 平安科技(深圳)有限公司 Robot dialogue generation method and device, readable storage medium and robot
CN110765244B (en) * 2019-09-18 2023-06-06 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for obtaining answering operation
CN110619038A (en) * 2019-09-20 2019-12-27 上海氦豚机器人科技有限公司 Method, system and electronic equipment for vertically guiding professional consultation
CN110825859A (en) * 2019-10-21 2020-02-21 拉扎斯网络科技(上海)有限公司 Retrieval method, retrieval device, readable storage medium and electronic equipment
CN110867255A (en) * 2019-10-24 2020-03-06 开望(杭州)科技有限公司 Intelligent mother and infant knowledge service method and system
CN110990528A (en) * 2019-11-27 2020-04-10 出门问问(苏州)信息科技有限公司 Question answering method and device and electronic equipment
CN110990538B (en) * 2019-12-20 2022-04-01 深圳前海黑顿科技有限公司 Semantic fuzzy search method based on sentence-level deep learning language model
CN113051390B (en) * 2019-12-26 2023-09-26 百度在线网络技术(北京)有限公司 Knowledge base construction method, knowledge base construction device, electronic equipment and medium
CN111241848B (en) * 2020-01-15 2020-12-01 江苏联著实业股份有限公司 Article reading comprehension answer retrieval method and device based on machine learning
CN111259647A (en) * 2020-01-16 2020-06-09 泰康保险集团股份有限公司 Question and answer text matching method, device, medium and electronic equipment based on artificial intelligence
CN113268572A (en) * 2020-02-14 2021-08-17 华为技术有限公司 Question answering method and device
CN111353290B (en) * 2020-02-28 2023-07-14 支付宝(杭州)信息技术有限公司 Method and system for automatically responding to user inquiry
CN111382255B (en) * 2020-03-17 2023-08-01 北京百度网讯科技有限公司 Method, apparatus, device and medium for question-answering processing
CN111382256B (en) * 2020-03-20 2024-04-09 北京百度网讯科技有限公司 Information recommendation method and device
CN111460081B (en) * 2020-03-30 2023-04-07 招商局金融科技有限公司 Answer generation method based on deep learning, electronic device and readable storage medium
CN111782762A (en) * 2020-05-12 2020-10-16 北京三快在线科技有限公司 Method and device for determining similar questions in question answering application and electronic equipment
CN111553140B (en) * 2020-05-13 2024-03-19 金蝶软件(中国)有限公司 Data processing method, data processing apparatus, and computer storage medium
CN111488448B (en) * 2020-05-27 2023-06-20 支付宝(杭州)信息技术有限公司 Method and device for generating machine reading annotation data
CN113779973A (en) * 2020-06-09 2021-12-10 杭州晨熹多媒体科技有限公司 Text data processing method and device
CN111881672A (en) * 2020-06-18 2020-11-03 升智信息科技(南京)有限公司 Intention identification method
CN111797214A (en) * 2020-06-24 2020-10-20 深圳壹账通智能科技有限公司 FAQ database-based problem screening method and device, computer equipment and medium
CN111782759B (en) * 2020-06-29 2024-04-19 数网金融有限公司 Question-answering processing method and device and computer readable storage medium
CN111949755B (en) * 2020-07-01 2023-09-22 新疆中顺鑫和供应链管理股份有限公司 Information query method and device for hazardous chemicals, electronic equipment and medium
CN111861201A (en) * 2020-07-17 2020-10-30 南京汇宁桀信息科技有限公司 Intelligent government affair order dispatching method based on big data classification algorithm
CN111984774B (en) * 2020-08-11 2024-02-27 北京百度网讯科技有限公司 Searching method, searching device, searching equipment and storage medium
CN112287091A (en) * 2020-11-30 2021-01-29 珠海采筑电子商务有限公司 Intelligent question-answering method and related products
CN112445904A (en) * 2020-12-15 2021-03-05 税友软件集团股份有限公司 Knowledge retrieval method, knowledge retrieval device, knowledge retrieval equipment and computer readable storage medium
CN112527995A (en) * 2020-12-18 2021-03-19 平安银行股份有限公司 Question feedback processing method, device and equipment and readable storage medium
CN112667809A (en) * 2020-12-25 2021-04-16 平安科技(深圳)有限公司 Text processing method and device, electronic equipment and storage medium
CN112905764A (en) * 2021-02-07 2021-06-04 深圳万海思数字医疗有限公司 Epidemic disease consultation prevention and training system construction method and system
CN112948553B (en) * 2021-02-26 2023-06-20 平安国际智慧城市科技股份有限公司 Legal intelligent question-answering method and device, electronic equipment and storage medium
CN112992367B (en) * 2021-03-23 2021-09-28 微脉技术有限公司 Smart medical interaction method based on big data and smart medical cloud computing system
CN112906402B (en) * 2021-03-24 2024-02-27 平安科技(深圳)有限公司 Music response data generation method, device, equipment and storage medium
CN112906377A (en) * 2021-03-25 2021-06-04 平安科技(深圳)有限公司 Question answering method and device based on entity limitation, electronic equipment and storage medium
CN113095061B (en) * 2021-03-31 2023-08-29 京华信息科技股份有限公司 Method, system, device and storage medium for extracting document header
CN113076423A (en) * 2021-04-22 2021-07-06 支付宝(杭州)信息技术有限公司 Data processing method and device and data query method and device
CN113342958B (en) * 2021-07-02 2023-06-16 马上消费金融股份有限公司 Question-answer matching method, text matching model training method and related equipment
CN113342924A (en) * 2021-07-05 2021-09-03 北京读我网络技术有限公司 Answer retrieval method and device, storage medium and electronic equipment
CN117972065A (en) * 2021-07-30 2024-05-03 北京壹心壹翼科技有限公司 Question retrieval method, device, equipment and medium applied to intelligent question-answering system
CN113742469B (en) * 2021-09-03 2023-12-15 科讯嘉联信息技术有限公司 Method for constructing question-answering system based on Pipeline processing and ES storage
CN115510203B (en) * 2022-09-27 2023-09-22 北京百度网讯科技有限公司 Method, device, equipment, storage medium and program product for determining answers to questions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425635A (en) * 2012-05-15 2013-12-04 北京百度网讯科技有限公司 Method and device for recommending answers
CN104636334A (en) * 2013-11-06 2015-05-20 阿里巴巴集团控股有限公司 Keyword recommending method and device
CN107273350A (en) * 2017-05-16 2017-10-20 广东电网有限责任公司江门供电局 A kind of information processing method and its device for realizing intelligent answer
CN107436864A (en) * 2017-08-04 2017-12-05 逸途(北京)科技有限公司 A kind of Chinese question and answer semantic similarity calculation method based on Word2Vec

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2595541A1 (en) * 2007-07-26 2009-01-26 Hamid Htami-Hanza Assisted knowledge discovery and publication system and method
CN101286161B (en) * 2008-05-28 2010-10-06 华中科技大学 Intelligent Chinese request-answering system based on concept
US7996369B2 (en) * 2008-11-14 2011-08-09 The Regents Of The University Of California Method and apparatus for improving performance of approximate string queries using variable length high-quality grams
CN102866990B (en) * 2012-08-20 2016-08-03 北京搜狗信息服务有限公司 A kind of theme dialogue method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425635A (en) * 2012-05-15 2013-12-04 北京百度网讯科技有限公司 Method and device for recommending answers
CN104636334A (en) * 2013-11-06 2015-05-20 阿里巴巴集团控股有限公司 Keyword recommending method and device
CN107273350A (en) * 2017-05-16 2017-10-20 广东电网有限责任公司江门供电局 A kind of information processing method and its device for realizing intelligent answer
CN107436864A (en) * 2017-08-04 2017-12-05 逸途(北京)科技有限公司 A kind of Chinese question and answer semantic similarity calculation method based on Word2Vec

Also Published As

Publication number Publication date
CN108345672A (en) 2018-07-31

Similar Documents

Publication Publication Date Title
WO2019153607A1 (en) Intelligent response method, electronic device and storage medium
CN108491433B (en) Chat response method, electronic device and storage medium
US20220214775A1 (en) Method for extracting salient dialog usage from live data
CN110765244B (en) Method, device, computer equipment and storage medium for obtaining answering operation
US10657332B2 (en) Language-agnostic understanding
WO2019153612A1 (en) Question and answer data processing method, electronic device and storage medium
US20210157984A1 (en) Intelligent system that dynamically improves its knowledge and code-base for natural language understanding
US8073877B2 (en) Scalable semi-structured named entity detection
US11573954B1 (en) Systems and methods for processing natural language queries for healthcare data
US10061843B2 (en) Translating natural language utterances to keyword search queries
EP2570974B1 (en) Automatic crowd sourcing for machine learning in information extraction
US20190220486A1 (en) Method and apparatus for mining general tag, server, and medium
US20100312782A1 (en) Presenting search results according to query domains
US10108698B2 (en) Common data repository for improving transactional efficiencies of user interactions with a computing device
US20230169134A1 (en) Annotation and retrieval of personal bookmarks
US20210110111A1 (en) Methods and systems for providing universal portability in machine learning
US20170060834A1 (en) Natural Language Determiner
JP5020352B2 (en) Named element marking device, named element marking method and computer-readable medium thereof
CN113254588A (en) Data searching method and system
CN116796730A (en) Text error correction method, device, equipment and storage medium based on artificial intelligence
CN109783612B (en) Report data positioning method and device, storage medium and terminal
CN109684357B (en) Information processing method and device, storage medium and terminal
US20160203177A1 (en) Answering Requests Related to Places of Interest
CN116186198A (en) Information retrieval method, information retrieval device, computer equipment and storage medium
CN117033601A (en) Intelligent question-answering method, device, equipment and medium based on network system

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06/11/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18904616

Country of ref document: EP

Kind code of ref document: A1